linux

Author	SHA1	Message	Date
Tony Luck	0f68c088c0	x86/cpufeature: Create a new synthetic cpu capability for machine check recovery The Intel Software Developer Manual describes bit 24 in the MCG_CAP MSR: MCG_SER_P (software error recovery support present) flag, bit 24 — Indicates (when set) that the processor supports software error recovery But only some models with this capability bit set will actually generate recoverable machine checks. Check the model name and set a synthetic capability bit. Provide a command line option to set this bit anyway in case the kernel doesn't recognise the model name. Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2e5bfb23c89800a036fb8a45fa97a74bb16bc362.1455732970.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-18 09:28:47 +01:00
Ingo Molnar	3a2f2ac9b9	Merge branch 'x86/urgent' into x86/asm, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-18 09:28:03 +01:00
Tony Luck	548acf1923	x86/mm: Expand the exception table logic to allow new handling options Huge amounts of help from Andy Lutomirski and Borislav Petkov to produce this. Andy provided the inspiration to add classes to the exception table with a clever bit-squeezing trick, Boris pointed out how much cleaner it would all be if we just had a new field. Linus Torvalds blessed the expansion with: ' I'd rather not be clever in order to save just a tiny amount of space in the exception table, which isn't really criticial for anybody. ' The third field is another relative function pointer, this one to a handler that executes the actions. We start out with three handlers: 1: Legacy - just jumps the to fixup IP 2: Fault - provide the trap number in %ax to the fixup code 3: Cleaned up legacy for the uaccess error hack Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f6af78fcbd348cf4939875cfda9c19689b5e50b8.1455732970.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-18 09:21:46 +01:00
Ingo Molnar	061f817eb6	Linux 4.5-rc4 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWwOwYAAoJEHm+PkMAQRiGjQMH/jy+GLa3KSe/dgdh3IMnhV1Q kAFJ6n7o9n7iYtlbrDeDj4eulktuMpsCjFP3nubG35GvnFPyKMg3BlMhx1oZUQcB NiN8ulMXZQepmSjART1SxCgMkE7uRg9j61tEDjjSzCX4PbpbjSm50mjefjqI2pAh bkjHiepQ4maKlHqCf5S4rtofML0FTt1HXtUIPwlEMlB0pVrRO7yNLp6QuK43caft hAn/jZq/c5m7ykggLJOb3EBTlZD7TdTm9KduTqZUBLebwYbwbZLuXLZq3S5IvcHm 01+MNwD7VsW4p0R+p3WBP0+MdgPo3zlifCQRnUP22DSboXiDVAAsukrj/DiN+DA= =qnLX -----END PGP SIGNATURE----- Merge tag 'v4.5-rc4' into ras/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-18 09:03:57 +01:00
Ingo Molnar	9109dc97b0	Merge branch 'perf/urgent' into perf/core, to queue up dependent patch Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-17 10:37:36 +01:00
Borislav Petkov	053080a9d1	x86/msr: Document msr-index.h rule for addition In order to keep this file's size sensible and not cause too much unnecessary churn, make the rule explicit - similar to pci_ids.h - that only MSRs which are used in multiple compilation units, should get added to it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: alex.williamson@redhat.com Cc: gleb@kernel.org Cc: joro@8bytes.org Cc: kvm@vger.kernel.org Cc: sherry.hurwitz@amd.com Cc: wei@redhat.com Link: http://lkml.kernel.org/r/1455612202-14414-5-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-17 08:47:55 +01:00
Andy Lutomirski	6c25da5ad5	x86/signal/64: Re-add support for SS in the 64-bit signal context This is a second attempt to make the improvements from `c6f2062935` ("x86/signal/64: Fix SS handling for signals delivered to 64-bit programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add support for SS in the 64-bit signal context"). This adds two new uc_flags flags. UC_SIGCONTEXT_SS will be set for all 64-bit signals (including x32). It indicates that the saved SS field is valid and that the kernel supports the new behavior. The goal is to fix a problems with signal handling in 64-bit tasks: SS wasn't saved in the 64-bit signal context, making it awkward to determine what SS was at the time of signal delivery and making it impossible to return to a non-flat SS (as calling sigreturn clobbers SS). This also made it extremely difficult for 64-bit tasks to return to fully-defined 16-bit contexts, because only the kernel can easily do espfix64, but sigreturn was unable to set a non-flag SS:ESP. (DOSEMU has a monstrous hack to partially work around this limitation.) If we could go back in time, the correct fix would be to make 64-bit signals work just like 32-bit signals with respect to SS: save it in signal context, reset it when delivering a signal, and restore it in sigreturn. Unfortunately, doing that (as I tried originally) breaks DOSEMU: DOSEMU wouldn't reset the signal context's SS when clearing the LDT and changing the saved CS to 64-bit mode, since it predates the SS context field existing in the first place. This patch is a bit more complicated, and it tries to balance a bunch of goals. It makes most cases of changing ucontext->ss during signal handling work as expected. I do this by special-casing the interesting case. On sigreturn, ucontext->ss will be honored by default, unless the ucontext was created from scratch by an old program and had a 64-bit CS (unfortunately, CRIU can do this) or was the result of changing a 32-bit signal context to 64-bit without resetting SS (as DOSEMU does). For the benefit of new 64-bit software that uses segmentation (new versions of DOSEMU might), the new behavior can be detected with a new ucontext flag UC_SIGCONTEXT_SS. To avoid compilation issues, __pad0 is left as an alias for ss in ucontext. The nitty-gritty details are documented in the header file. This patch also re-enables the sigreturn_64 and ldt_gdt_64 selftests, as the kernel change allows both of them to pass. Tested-by: Stas Sergeev <stsp@list.ru> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/749149cbfc3e75cd7fcdad69a854b399d792cc6f.1455664054.git.luto@kernel.org [ Small readability edit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-17 08:32:11 +01:00
Andy Lutomirski	8ff5bd2e1e	x86/signal/64: Fix SS if needed when delivering a 64-bit signal Signals are always delivered to 64-bit tasks with CS set to a long mode segment. In long mode, SS doesn't matter as long as it's a present writable segment. If SS starts out invalid (this can happen if the signal was caused by an IRET fault or was delivered on the way out of set_thread_area or modify_ldt), then IRET to the signal handler can fail, eventually killing the task. The straightforward fix would be to simply reset SS when delivering a signal. That breaks DOSEMU, though: 64-bit builds of DOSEMU rely on SS being set to the faulting SS when signals are delivered. As a compromise, this patch leaves SS alone so long as it's valid. The net effect should be that the behavior of successfully delivered signals is unchanged. Some signals that would previously have failed to be delivered will now be delivered successfully. This has no effect for x32 or 32-bit tasks: their signal handlers were already called with SS == __USER_DS. (On Xen, there's a slight hole: if a task sets SS to a writable kernel data segment, then we will fail to identify it as invalid and we'll still kill the task. If anyone cares, this could be fixed with a new paravirt hook.) Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stas Sergeev <stsp@list.ru> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/163c6e1eacde41388f3ff4d2fe6769be651d7b6e.1455664054.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-17 08:32:11 +01:00
Andy Lutomirski	e54fdcca70	x86/signal/64: Add a comment about sigcontext->fs and gs These fields have a strange history. This tries to document it. This borrows from `9a036b93a3` ("x86/signal/64: Remove 'fs' and 'gs' from sigcontext"), which was reverted by `ed596cde94` ("Revert x86 sigcontext cleanups"). Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stas Sergeev <stsp@list.ru> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/baa78f3c84106fa5acbc319377b1850602f5deec.1455664054.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-17 08:32:11 +01:00
Len Brown	670e27d809	x86 msr-index: Simplify syntax for HWP fields syntax only, no functional change Signed-off-by: Len Brown <len.brown@intel.com>	2016-02-17 01:42:26 -05:00
Jake Oshins	92016ba5c1	PCI: Add fwnode_handle to x86 pci_sysdata Add an fwnode_handle to the x86 struct pci_sysdata, which will be used to locate an IRQ domain associated with a root PCI bus. [bhelgaas: changelog] Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2016-02-16 16:48:11 -06:00
Andrey Smetanin	18f098618a	drivers/hv: Move VMBus hypercall codes into Hyper-V UAPI header VMBus hypercall codes inside Hyper-V UAPI header will be used by QEMU to implement VMBus host devices support. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Joerg Roedel <joro@8bytes.org> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org [Do not rename the constant at the same time as moving it, as that would cause semantic conflicts with the Hyper-V tree. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-02-16 18:48:40 +01:00
Andrey Smetanin	8ed6d76781	kvm/x86: Rename Hyper-V long spin wait hypercall Rename HV_X64_HV_NOTIFY_LONG_SPIN_WAIT by HVCALL_NOTIFY_LONG_SPIN_WAIT, so the name is more consistent with the other hypercalls. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Joerg Roedel <joro@8bytes.org> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org [Change name, Andrey used HV_X64_HCALL_NOTIFY_LONG_SPIN_WAIT. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-02-16 18:48:38 +01:00
Dave Hansen	c8df400984	x86/fpu, x86/mm/pkeys: Add PKRU xsave fields and data structures The protection keys register (PKRU) is saved and restored using xsave. Define the data structure that we will use to access it inside the xsave buffer. Note that we also have to widen the printk of the xsave feature masks since this is feature 0x200 and we only did two characters before. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210204.56DF8F7B@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-16 10:11:14 +01:00
Dave Hansen	f28b49d2bc	x86/cpu, x86/mm/pkeys: Define new CR4 bit There is a new bit in CR4 for enabling protection keys. We will actually enable it later in the series. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210202.3CFC3DB2@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-16 10:11:14 +01:00
Dave Hansen	dfb4a70f20	x86/cpufeature, x86/mm/pkeys: Add protection keys related CPUID definitions There are two CPUID bits for protection keys. One is for whether the CPU contains the feature, and the other will appear set once the OS enables protection keys. Specifically: Bit 04: OSPKE. If 1, OS has set CR4.PKE to enable Protection keys (and the RDPKRU/WRPKRU instructions) This is because userspace can not see CR4 contents, but it can see CPUID contents. X86_FEATURE_PKU is referred to as "PKU" in the hardware documentation: CPUID.(EAX=07H,ECX=0H):ECX.PKU [bit 3] X86_FEATURE_OSPKE is "OSPKU": CPUID.(EAX=07H,ECX=0H):ECX.OSPKE [bit 4] These are the first CPU features which need to look at the ECX word in CPUID leaf 0x7, so this patch also includes fetching that word in to the cpuinfo->x86_capability[] array. Add it to the disabled-features mask when its config option is off. Even though we are not using it here, we also extend the REQUIRED_MASK_BIT_SET() macro to keep it mirroring the DISABLED_MASK_BIT_SET() version. This means that in almost all code, you should use: cpu_has(c, X86_FEATURE_PKU) and not the CONFIG option. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210201.7714C250@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-16 10:11:13 +01:00
Dave Hansen	1f96b1efba	x86/fpu: Add placeholder for 'Processor Trace' XSAVE state There is an XSAVE state component for Intel Processor Trace (PT). But, we do not currently use it. We add a placeholder in the code for it so it is not a mystery and also so we do not need an explicit enum initialization for Protection Keys in a moment. Why don't we use it? We might end up using this at _some_ point in the future. But, this is a "system" state which requires using the currently unsupported XSAVES feature. Unlike all the other XSAVE states, PT state is also not directly tied to a thread. You might context-switch between threads, but not want to change any of the PT state. Or, you might switch between threads, and do want to change PT state, all depending on what is being traced. We currently just manually set some MSRs to do this PT context switching, and it is unclear whether replacing our direct MSR use with XSAVE will be a net win or loss, both in code complexity and performance. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: fenghua.yu@intel.com Cc: linux-mm@kvack.org Cc: yu-cheng.yu@intel.com Link: http://lkml.kernel.org/r/20160212210158.5E4BCAE2@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-16 10:11:13 +01:00
Ingo Molnar	1fe3f29e4a	Merge branches 'x86/fpu', 'x86/mm' and 'x86/asm' into x86/pkeys Provide a stable basis for the pkeys patches, which touches various x86 details. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-16 09:37:37 +01:00
Borislav Petkov	f2cc8e0791	x86/cpufeature: Speed up cpu_feature_enabled() When GCC cannot do constant folding for this macro, it falls back to cpu_has(). But static_cpu_has() is optimal and it works at all times now. So use it and speedup the fallback case. Before we had this: mov 0x99d674(%rip),%rdx # ffffffff81b0d9f4 <boot_cpu_data+0x34> shr $0x2e,%rdx and $0x1,%edx jne ffffffff811704e9 <do_munmap+0x3f9> After alternatives patching, it turns into: jmp 0xffffffff81170390 nopl (%rax) ... callq ffffffff81056e00 <mpx_notify_unmap> ffffffff81170390: mov 0x170(%r12),%rdi Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1455578358-28347-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-16 08:45:15 +01:00
Bjorn Helgaas	ed07247dbf	gpio: Remove unused asm/gpio.h files asm/gpio.h is included only by linux/gpio.h, and then only when the arch selects ARCH_HAVE_CUSTOM_GPIO_H. Only the following arches select it: arm avr32 blackfin m68k (COLDFIRE only) sh unicore32. Remove the unused asm/gpio.h files for the arches that do not select ARCH_HAVE_CUSTOM_GPIO_H. This is a follow-on to `7563bbf89d` ("gpiolib/arches: Centralise bolierplate asm/gpio.h"). Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>	2016-02-16 00:20:04 +01:00
Konrad Rzeszutek Wilk	2cfec6a2f9	xen/pcifront: Report the errors better. The messages should be different depending on the type of error. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2016-02-15 14:21:18 +00:00
Borislav Petkov	e2c7698cd6	x86/mm: Fix INVPCID asm constraint So we want to specify the dependency on both @pcid and @addr so that the compiler doesn't reorder accesses to them before the TLB flush. But for that to work, we need to express this properly in the inline asm and deref the whole desc array, not the pointer to it. See clwb() for an example. This fixes the build error on 32-bit: arch/x86/include/asm/tlbflush.h: In function ‘__invpcid’: arch/x86/include/asm/tlbflush.h:26:18: error: memory input 0 is not directly addressable which gcc4.7 caught but 5.x didn't. Which is strange. :-\ Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Michael Matz <matz@suse.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-14 11:19:06 +01:00
Andy Lutomirski	4ecd16ec70	x86/fpu: Fix math emulation in eager fpu mode Systems without an FPU are generally old and therefore use lazy FPU switching. Unsurprisingly, math emulation in eager FPU mode is a bit buggy. Fix it. There were two bugs involving kernel code trying to use the FPU registers in eager mode even if they didn't exist and one BUG_ON() that was incorrect. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/b4b8d112436bd6fab866e1b4011131507e8d7fbe.1453675014.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-09 15:42:55 +01:00
Andy Lutomirski	ce1143aa60	x86/dmi: Switch dmi_remap() from ioremap() [uncached] to ioremap_cache() DMI cacheability is very confused on x86. dmi_early_remap() uses early_ioremap(), which uses FIXMAP_PAGE_IO, which is __PAGE_KERNEL_IO, which is __PAGE_KERNEL, which is cached. Don't ask me why this makes any sense. dmi_remap() uses ioremap(), which requests an uncached mapping. However, on non-EFI systems, the DMI data generally lives between 0xf0000 and 0x100000, which is in the legacy ISA range, which triggers a special case in the PAT code that overrides the cache mode requested by ioremap() and forces a WB mapping. On a UEFI boot, however, the DMI table can live at any physical address. On my laptop, it's around 0x77dd0000. That's nowhere near the legacy ISA range, so the ioremap() implicit uncached type is honored and we end up with a UC- mapping. UC- is a very, very slow way to read from main memory, so dmi_walk() is likely to take much longer than necessary. Given that, even on UEFI, we do early cached DMI reads, it seems safe to just ask for cached access. Switch to ioremap_cache(). I haven't tried to benchmark this, but I'd guess it saves several milliseconds of boot time. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jean Delvare <jdelvare@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Link: http://lkml.kernel.org/r/3147c38e51f439f3c8911db34c7d4ab22d854915.1453791969.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-09 14:36:43 +01:00
Andy Lutomirski	d8bced79af	x86/mm: If INVPCID is available, use it to flush global mappings On my Skylake laptop, INVPCID function 2 (flush absolutely everything) takes about 376ns, whereas saving flags, twiddling CR4.PGE to flush global mappings, and restoring flags takes about 539ns. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/ed0ef62581c0ea9c99b9bf6df726015e96d44743.1454096309.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-09 13:36:11 +01:00
Andy Lutomirski	060a402a1d	x86/mm: Add INVPCID helpers This adds helpers for each of the four currently-specified INVPCID modes. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/8a62b23ad686888cee01da134c91409e22064db9.1454096309.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-09 13:36:10 +01:00
Feng Wu	520040146a	KVM: x86: Use vector-hashing to deliver lowest-priority interrupts Use vector-hashing to deliver lowest-priority interrupts, As an example, modern Intel CPUs in server platform use this method to handle lowest-priority interrupts. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-02-09 13:24:40 +01:00
Borislav Petkov	5f9c01aa7c	x86/microcode: Untangle from BLK_DEV_INITRD Thomas Voegtle reported that doing oldconfig with a .config which has CONFIG_MICROCODE enabled but BLK_DEV_INITRD disabled prevents the microcode loading mechanism from being built. So untangle it from the BLK_DEV_INITRD dependency so that oldconfig doesn't turn it off and add an explanatory text to its Kconfig help what the supported methods for supplying microcode are. Reported-by: Thomas Voegtle <tv@lio96.de> Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.4 Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454499225-21544-2-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-09 11:41:15 +01:00
Denys Vlasenko	8dd5032d9c	x86/asm/bitops: Force inlining of test_and_set_bit and friends Sometimes GCC mysteriously doesn't inline very small functions we expect to be inlined, see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 Arguably, GCC should do better, but GCC people aren't willing to invest time into it and are asking to use __always_inline instead. With this .config: http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os here's an example of functions getting deinlined many times: test_and_set_bit (166 copies, ~1260 calls) 55 push %rbp 48 89 e5 mov %rsp,%rbp f0 48 0f ab 3e lock bts %rdi,(%rsi) 72 04 jb <test_and_set_bit+0xf> 31 c0 xor %eax,%eax eb 05 jmp <test_and_set_bit+0x14> b8 01 00 00 00 mov $0x1,%eax 5d pop %rbp c3 retq test_and_clear_bit (124 copies, ~1000 calls) 55 push %rbp 48 89 e5 mov %rsp,%rbp f0 48 0f b3 3e lock btr %rdi,(%rsi) 72 04 jb <test_and_clear_bit+0xf> 31 c0 xor %eax,%eax eb 05 jmp <test_and_clear_bit+0x14> b8 01 00 00 00 mov $0x1,%eax 5d pop %rbp c3 retq change_bit (3 copies, 8 calls) 55 push %rbp 48 89 e5 mov %rsp,%rbp f0 48 0f bb 3e lock btc %rdi,(%rsi) 5d pop %rbp c3 retq clear_bit_unlock (2 copies, 11 calls) 55 push %rbp 48 89 e5 mov %rsp,%rbp f0 48 0f b3 3e lock btr %rdi,(%rsi) 5d pop %rbp c3 retq This patch works it around via s/inline/__always_inline/. Code size decrease by ~13.5k after the patch: text data bss dec filename 92110727 20826144 36417536 149354407 vmlinux.before 92097234 20826176 36417536 149340946 vmlinux.after Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Graf <tgraf@suug.ch> Link: http://lkml.kernel.org/r/1454881887-1367-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-09 10:31:54 +01:00
Ingo Molnar	2a96fd7417	Linux 4.5-rc3 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWt9V+AAoJEHm+PkMAQRiGvqUH/iKVFxRB+MpfwRteRv22gfxJ 8Y5jDuCt2qkePHieN5JoY4FqWIA+hR+UnctHBvOCLC7DNbY79yaYuWeuShfnRVw2 91Kspk2tgnWkjlZDZ273VNKAEveFYUdhHvHFUlItWubVIlzrKRGKdAcUpecQ33YD IERr4VFxdQDgksGPzH2FLJwcAKilWWygQ9ATTsY0xJepTC6PwxAFxrNx5qPiqOoI aHJBzczk50y9hHEPkYeeWloyzIZUmf81BCi/VwOsbB30q4ENwkS2MMnebhOmBjLp g9Bml8m/Fk4N+h2vbcLj6Bu+rE+bYDZDZ2Fmbupy4NbEkng3x3iaTPkG+R2hErQ= =IdcS -----END PGP SIGNATURE----- Merge tag 'v4.5-rc3' into locking/core, to refresh the tree Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-09 10:26:02 +01:00
Ingo Molnar	b349e9a916	Merge branch 'x86/urgent' into x86/mm, to pick up dependent fix Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-08 12:13:22 +01:00
Dmitry Vyukov	75edb54a1d	x86: Fix KASAN false positives in thread_saved_pc() thread_saved_pc() reads stack of a potentially running task. This can cause false KASAN stack-out-of-bounds reports, because the running task concurrently poisons and unpoisons own stack. The same happens in get_wchan(), and get get_wchan() was fixed by using READ_ONCE_NOCHECK(). Do the same here. Example KASAN report triggered by sysrq-t: BUG: KASAN: out-of-bounds in sched_show_task+0x306/0x3b0 at addr ffff880043c97c18 Read of size 8 by task syz-executor/23839 [...] page dumped because: kasan: bad access detected [...] Call Trace: [<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40 [<ffffffff813e7a26>] sched_show_task+0x306/0x3b0 [<ffffffff813e7bf4>] show_state_filter+0x124/0x1a0 [<ffffffff82d2ca00>] fn_show_state+0x10/0x20 [<ffffffff82d2cf98>] k_spec+0xa8/0xe0 [<ffffffff82d3354f>] kbd_event+0xb9f/0x4000 [<ffffffff843ca8a7>] input_to_handler+0x3a7/0x4b0 [<ffffffff843d1954>] input_pass_values.part.5+0x554/0x6b0 [<ffffffff843d29bc>] input_handle_event+0x2ac/0x1070 [<ffffffff843d3a47>] input_inject_event+0x237/0x280 [<ffffffff843e8c28>] evdev_write+0x478/0x680 [<ffffffff817ac653>] __vfs_write+0x113/0x480 [<ffffffff817ae0e7>] vfs_write+0x167/0x4a0 [<ffffffff817b13d1>] SyS_write+0x111/0x220 Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: glider@google.com Cc: kasan-dev@googlegroups.com Cc: kcc@google.com Cc: linux-kernel@vger.kernel.org Cc: ryabinin.a.a@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-05 08:41:52 +01:00
Ingo Molnar	03e075b38e	Merge branch 'linus' into efi/core, to refresh the branch and to pick up recent fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-03 11:30:36 +01:00
Aravind Gopalakrishnan	e6c8f1873b	x86/mce/AMD: Set MCAX Enable bit It is required for the OS to acknowledge that it is using the MCAX register set and its associated fields by setting the 'McaXEnable' bit in each bank's MCi_CONFIG register. If it is not set, then all UC errors will cause a system panic. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1453750913-4781-9-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-01 10:53:59 +01:00
Huaitong Han	16aaa53756	x86/cpufeature: Use enum cpuid_leafs instead of magic numbers Most of the magic numbers in x86_capability[] have been converted to 'enum cpuid_leafs', and this patch updates the remaining part. Signed-off-by: Huaitong Han <huaitong.han@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: lguest@lists.ozlabs.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1453750913-4781-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-01 10:46:48 +01:00
Linus Torvalds	d517be5fcf	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A bit on the largish side due to a series of fixes for a regression in the x86 vector management which was introduced in 4.3. This work was started in December already, but it took some time to fix all corner cases and a couple of older bugs in that area which were detected while at it Aside of that a few platform updates for intel-mid, quark and UV and two fixes for in the mm code: - Use proper types for pgprot values to avoid truncation - Prevent a size truncation in the pageattr code when setting page attributes for large mappings" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86/mm/pat: Avoid truncation when converting cpa->numpages to address x86/mm: Fix types used in pgprot cacheability flags translations x86/platform/quark: Print boundaries correctly x86/platform/UV: Remove EFI memmap quirk for UV2+ x86/platform/intel-mid: Join string and fix SoC name x86/platform/intel-mid: Enable 64-bit build x86/irq: Plug vector cleanup race x86/irq: Call irq_force_move_complete with irq descriptor x86/irq: Remove outgoing CPU from vector cleanup mask x86/irq: Remove the cpumask allocation from send_cleanup_vector() x86/irq: Clear move_in_progress before sending cleanup IPI x86/irq: Remove offline cpus from vector cleanup x86/irq: Get rid of code duplication x86/irq: Copy vectormask instead of an AND operation x86/irq: Check vector allocation early x86/irq: Reorganize the search in assign_irq_vector x86/irq: Reorganize the return path in assign_irq_vector x86/irq: Do not use apic_chip_data.old_domain as temporary buffer x86/irq: Validate that irq descriptor is still active x86/irq: Fix a race in x86_vector_free_irqs() ...	2016-01-31 16:17:19 -08:00
Brian Gerst	2476f2fa20	x86/alternatives: Discard dynamic check after init Move the code to do the dynamic check to the altinstr_aux section so that it is discarded after alternatives have run and a static branch has been chosen. This way we're changing the dynamic branch from C code to assembly, which makes it substantially smaller while avoiding a completely unnecessary call to an out of line function. Signed-off-by: Brian Gerst <brgerst@gmail.com> [ Changed it to do TESTB, as hpa suggested. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1452972124-7380-1-git-send-email-brgerst@gmail.com Link: http://lkml.kernel.org/r/20160127084525.GC30712@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-30 11:22:22 +01:00
Borislav Petkov	a362bf9f5e	x86/cpufeature: Get rid of the non-asm goto variant I can simply quote hpa from the mail: "Get rid of the non-asm goto variant and just fall back to dynamic if asm goto is unavailable. It doesn't make any sense, really, if it is supposed to be safe, and by now the asm goto-capable gcc is in more wide use. (Originally the gcc 3.x fallback to pure dynamic didn't exist, either.)" Booy, am I lazy. Cleanup the whole CC_HAVE_ASM_GOTO ifdeffery too, while at it. Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160127084325.GB30712@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-30 11:22:19 +01:00
Borislav Petkov	bc696ca05f	x86/cpufeature: Replace the old static_cpu_has() with safe variant So the old one didn't work properly before alternatives had run. And it was supposed to provide an optimized JMP because the assumption was that the offset it is jumping to is within a signed byte and thus a two-byte JMP. So I did an x86_64 allyesconfig build and dumped all possible sites where static_cpu_has() was used. The optimization amounted to all in all 12(!) places where static_cpu_has() had generated a 2-byte JMP. Which has saved us a whopping 36 bytes! This clearly is not worth the trouble so we can remove it. The only place where the optimization might count - in __switch_to() - we will handle differently. But that's not subject of this patch. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1453842730-28463-6-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-30 11:22:18 +01:00
Borislav Petkov	cd4d09ec6f	x86/cpufeature: Carve out X86_FEATURE_* Move them to a separate header and have the following dependency: x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h This makes it easier to use the header in asm code and not include the whole cpufeature.h and add guards for asm. Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-30 11:22:17 +01:00
Ingo Molnar	78726ee5ff	Merge branch 'x86/cpu' into x86/asm, to avoid conflict Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-30 11:21:40 +01:00
Ingo Molnar	76b36fa896	Linux 4.5-rc1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWpTzxAAoJEHm+PkMAQRiGKJEH/0vq8pgt1F4UYSMZLZ0bot5B iGNq/hPW91xcCVYXf5xfc6LzePd9L1rnKpP0ml+qmTInYw8YaCI/hCY6w32QfhP9 3V3q1052T2eZJALqQQd0UH+F/ylTB8dHAPB+n8PBRxPEqpHb/ox+Ry70xbZefvaQ eOKSNBkZEIOFjURZZfeU0NrIzf8nKti8Dw84utGU2N+OICKGXzUmPLoObR0BiMHn 2Xu54S4OPFKB49yfnW55PGiI+dawbVD+iSNEJtK4vMk5Ue7lxHXZ1njVeOdXd2Ls ggy3PPRt0LhDYLHQvr8Ir9uySLw7vUI6bhpvFm/freN4rxGvgxOZbhoQgtzqG/k= =1oU3 -----END PGP SIGNATURE----- Merge tag 'v4.5-rc1' into x86/asm, to refresh the branch before merging new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-29 09:41:18 +01:00
Michael S. Tsirkin	57d9b1b434	locking/x86: Tweak the comment about use of wmb() for IO On x86, we do still use the non-NOP rmb()/wmb() for IO barriers, but even that is generally questionable. Leave them around as historial unless somebody can point to a case where they care about the performance, but tweak the comment so people don't think they are strictly required in all cases. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization <virtualization@lists.linux-foundation.org> Link: http://lkml.kernel.org/r/1453921746-16178-4-git-send-email-mst@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-29 09:40:10 +01:00
Michael S. Tsirkin	e37cee133c	locking/x86: Drop a comment left over from X86_OOSTORE The comment about wmb being non-NOP to deal with non-Intel CPUs is a left over from before the following commit: `09df7c4c80` ("x86: Remove CONFIG_X86_OOSTORE") It makes no sense now: in particular, wmb() is not a NOP even for regular Intel CPUs because of weird use-cases e.g. dealing with WC memory. Drop this comment. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization <virtualization@lists.linux-foundation.org> Link: http://lkml.kernel.org/r/1453921746-16178-3-git-send-email-mst@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-29 09:40:10 +01:00
Michael S. Tsirkin	bd922477d9	locking/x86: Add cc clobber for ADDL ADDL clobbers flags (such as CF) but barrier.h didn't tell this to GCC. Historically, GCC doesn't need one on x86, and always considers flags clobbered. We are probably missing the cc clobber in a lot of places for this reason. But even if not necessary, it's probably a good thing to add for documentation, and in case GCC semantcs ever change. Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization <virtualization@lists.linux-foundation.org> Link: http://lkml.kernel.org/r/1453921746-16178-2-git-send-email-mst@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-29 09:40:10 +01:00
Jan Beulich	3625c2c234	x86/mm: Fix types used in pgprot cacheability flags translations For PAE kernels "unsigned long" is not suitable to hold page protection flags, since _PAGE_NX doesn't fit there. This is the reason for quite a few W+X pages getting reported as insecure during boot (observed namely for the entire initrd range). Fixes: `281d4078be` ("x86: Make page cache mode a real type") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <JGross@suse.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/56A7635602000078000CAFF1@prv-mh.provo.novell.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-01-26 21:05:36 +01:00
Ross Zwisler	3f4a2670de	pmem: add wb_cache_pmem() to the PMEM API __arch_wb_cache_pmem() was already an internal implementation detail of the x86 PMEM API, but this functionality needs to be exported as part of the general PMEM API to handle the fsync/msync case for DAX mmaps. One thing worth noting is that we really do want this to be part of the PMEM API as opposed to a stand-alone function like clflush_cache_range() because of ordering restrictions. By having wb_cache_pmem() as part of the PMEM API we can leave it unordered, call it multiple times to write back large amounts of memory, and then order the multiple calls with a single wmb_pmem(). Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Dave Chinner <david@fromorbit.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Kara <jack@suse.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-22 17:02:18 -08:00
Linus Torvalds	404a47410c	Merge branch 'uaccess' (batched user access infrastructure) Expose an interface to allow users to mark several accesses together as being user space accesses, allowing batching of the surrounding user space access markers (SMAP on x86, PAN on arm64, domain register switching on arm). This is currently only used for the user string lenth and copying functions, where the SMAP overhead on x86 drowned the actual user accesses (only noticeable on newer microarchitectures that support SMAP in the first place, of course). * user access batching branch: Use the new batched user accesses in generic user string handling Add 'unsafe' user access functions for batched accesses x86: reorganize SMAP handling in user space accesses	2016-01-21 13:02:41 -08:00
Linus Torvalds	eae21770b4	Merge branch 'akpm' (patches from Andrew) Merge third patch-bomb from Andrew Morton: "I'm pretty much done for -rc1 now: - the rest of MM, basically - lib/ updates - checkpatch, epoll, hfs, fatfs, ptrace, coredump, exit - cpu_mask simplifications - kexec, rapidio, MAINTAINERS etc, etc. - more dma-mapping cleanups/simplifications from hch" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (109 commits) MAINTAINERS: add/fix git URLs for various subsystems mm: memcontrol: add "sock" to cgroup2 memory.stat mm: memcontrol: basic memory statistics in cgroup2 memory controller mm: memcontrol: do not uncharge old page in page cache replacement Documentation: cgroup: add memory.swap.{current,max} description mm: free swap cache aggressively if memcg swap is full mm: vmscan: do not scan anon pages if memcg swap limit is hit swap.h: move memcg related stuff to the end of the file mm: memcontrol: replace mem_cgroup_lruvec_online with mem_cgroup_online mm: vmscan: pass memcg to get_scan_count() mm: memcontrol: charge swap to cgroup2 mm: memcontrol: clean up alloc, online, offline, free functions mm: memcontrol: flatten struct cg_proto mm: memcontrol: rein in the CONFIG space madness net: drop tcp_memcontrol.c mm: memcontrol: introduce CONFIG_MEMCG_LEGACY_KMEM mm: memcontrol: allow to disable kmem accounting for cgroup2 mm: memcontrol: account "kmem" consumers in cgroup2 memory controller mm: memcontrol: move kmem accounting code to CONFIG_MEMCG mm: memcontrol: separate kmem code from legacy tcp accounting code ...	2016-01-21 12:32:08 -08:00
Linus Torvalds	d43421565b	PCI changes for the v4.5 merge window: Enumeration Simplify config space size computation (Bjorn Helgaas) Avoid iterating through ROM outside the resource window (Edward O'Callaghan) Support PCIe devices with short cfg_size (Jason S. McMullan) Add Netronome vendor and device IDs (Jason S. McMullan) Limit config space size for Netronome NFP6000 family (Jason S. McMullan) Add Netronome NFP4000 PF device ID (Simon Horman) Limit config space size for Netronome NFP4000 (Simon Horman) Print warnings for all invalid expansion ROM headers (Vladis Dronov) Resource management Fix minimum allocation address overwrite (Christoph Biedl) PCI device hotplug acpiphp_ibm: Fix null dereferences on null ibm_slot (Colin Ian King) pciehp: Always protect pciehp_disable_slot() with hotplug mutex (Guenter Roeck) shpchp: Constify hpc_ops structure (Julia Lawall) ibmphp: Remove unneeded NULL test (Julia Lawall) Power management Make ASPM sysfs link_state_store() consistent with link_state_show() (Andy Lutomirski) Virtualization Add function 1 DMA alias quirk for Lite-On/Plextor M6e/Marvell 88SS9183 (Tim Sander) MSI Remove empty pci_msi_init_pci_dev() (Bjorn Helgaas) Mark PCIe/PCI (MSI) IRQ cascade handlers as IRQF_NO_THREAD (Grygorii Strashko) Initialize MSI capability for all architectures (Guilherme G. Piccoli) Relax msi_domain_alloc() to support parentless MSI irqdomains (Liu Jiang) ARM Versatile host bridge driver Remove unused pci_sys_data structures (Lorenzo Pieralisi) Broadcom iProc host bridge driver Hide CONFIG_PCIE_IPROC (Arnd Bergmann) Do not use 0x in front of %pap (Dmitry V. Krivenok) Update iProc PCIe device tree binding (Ray Jui) Add PAXC interface support (Ray Jui) Add iProc PCIe MSI device tree binding (Ray Jui) Add iProc PCIe MSI support (Ray Jui) Freescale i.MX6 host bridge driver Use gpio_set_value_cansleep() (Fabio Estevam) Add support for active-low reset GPIO (Petr Štetiar) HiSilicon host bridge driver Add support for HiSilicon Hip06 PCIe host controllers (Gabriele Paoloni) Intel VMD host bridge driver Export irq_domain_set_info() for module use (Keith Busch) x86/PCI: Allow DMA ops specific to a PCI domain (Keith Busch) Use 32 bit PCI domain numbers (Keith Busch) Add driver for Intel Volume Management Device (VMD) (Keith Busch) Qualcomm host bridge driver Document PCIe devicetree bindings (Stanimir Varbanov) Add Qualcomm PCIe controller driver (Stanimir Varbanov) dts: apq8064: add PCIe devicetree node (Stanimir Varbanov) dts: ifc6410: enable PCIe DT node for this board (Stanimir Varbanov) Renesas R-Car host bridge driver Add support for R-Car H3 to pcie-rcar (Harunobu Kurokawa) Allow DT to override default window settings (Phil Edworthy) Convert to DT resource parsing API (Phil Edworthy) Revert "PCI: rcar: Build pcie-rcar.c only on ARM" (Phil Edworthy) Remove unused pci_sys_data struct from pcie-rcar (Phil Edworthy) Add runtime PM support to pcie-rcar (Phil Edworthy) Add Gen2 PHY setup to pcie-rcar (Phil Edworthy) Add gen2 fallback compatibility string for pci-rcar-gen2 (Simon Horman) Add gen2 fallback compatibility string for pcie-rcar (Simon Horman) Synopsys DesignWare host bridge driver Simplify control flow (Bjorn Helgaas) Make config accessor override checking symmetric (Bjorn Helgaas) Ensure ATU is enabled before IO/conf space accesses (Stanimir Varbanov) Miscellaneous Add of_pci_get_host_bridge_resources() stub (Arnd Bergmann) Check for PCI_HEADER_TYPE_BRIDGE equality, not bitmask (Bjorn Helgaas) Fix all whitespace issues (Bogicevic Sasa) x86/PCI: Simplify pci_bios_{read,write} (Geliang Tang) Use to_pci_dev() instead of open-coding it (Geliang Tang) Use kobj_to_dev() instead of open-coding it (Geliang Tang) Use list_for_each_entry() to simplify code (Geliang Tang) Fix typos in <linux/msi.h> (Thomas Petazzoni) x86/PCI: Clarify AMD Fam10h config access restrictions comment (Tomasz Nowicki) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWoQIuAAoJEFmIoMA60/r8ckYP/0ZrkANeN1SB5cQVi2k7aceq kQb1Hk6ifxohJvgpJ/iwmVCHoApyeBfUBfrC+fUpIC2f7ncPsE5HNyjqpAWzFzj2 sYWwY029yjBQ9g4mPhkvjBXfha+lNtLthWc+Xxcat5pdcyG63Dg4SfJKWm2ZYnbN 0GJzyRZXIwAMnNf0KIr61Aqru0nXeHvi5wblyJ08UZ7AcNzCtB0wKLmE3S6SeZVF f2fry35zcGu+TFvQ1hAYemfl3XyDBJ87nPiKzJAwYSaKcWPFWt+72PBDPO6X9squ 6prm4nmAgeG2Oo4Zu0fbkDlB2bEsWUc14/xT0i5Wfs35vcwzF+S1zirJAtVqoNir NgC7fSbEHbsS7FZOz0rBOBIvIkbb6NdfLFuZqUFv0X1M5bRFywjo8lZRfAYoGJzK Mmus0uKbklx5m6RT5adf9+Plev1YJT6XZW9XrDpGnxrwRyPjHmyvuTWsYkumxY7Q CE5Wr3p7q2I2+MtrQVv2D9Nzsb+4zQ6BgHrd2vwR/IxTsfdXLU7+B691wkUDX8No UKFxBd0FiVCn+srG96u7lWQvdoUqoNCogTZSVzGR5gFBv3zAN9gi8HS7NbV558Mg Io3Xw+6dcbG33uvWdU6jHEDLMQsohZcp05Q5esCgRQNV4cGJbPxBDtOZEO/ezvW4 FAI7lfgYTFiQK3NzE3Ng =z9mQ -----END PGP SIGNATURE----- Merge tag 'pci-v4.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for the v4.5 merge window: Enumeration: - Simplify config space size computation (Bjorn Helgaas) - Avoid iterating through ROM outside the resource window (Edward O'Callaghan) - Support PCIe devices with short cfg_size (Jason S. McMullan) - Add Netronome vendor and device IDs (Jason S. McMullan) - Limit config space size for Netronome NFP6000 family (Jason S. McMullan) - Add Netronome NFP4000 PF device ID (Simon Horman) - Limit config space size for Netronome NFP4000 (Simon Horman) - Print warnings for all invalid expansion ROM headers (Vladis Dronov) Resource management: - Fix minimum allocation address overwrite (Christoph Biedl) PCI device hotplug: - acpiphp_ibm: Fix null dereferences on null ibm_slot (Colin Ian King) - pciehp: Always protect pciehp_disable_slot() with hotplug mutex (Guenter Roeck) - shpchp: Constify hpc_ops structure (Julia Lawall) - ibmphp: Remove unneeded NULL test (Julia Lawall) Power management: - Make ASPM sysfs link_state_store() consistent with link_state_show() (Andy Lutomirski) Virtualization - Add function 1 DMA alias quirk for Lite-On/Plextor M6e/Marvell 88SS9183 (Tim Sander) MSI: - Remove empty pci_msi_init_pci_dev() (Bjorn Helgaas) - Mark PCIe/PCI (MSI) IRQ cascade handlers as IRQF_NO_THREAD (Grygorii Strashko) - Initialize MSI capability for all architectures (Guilherme G. Piccoli) - Relax msi_domain_alloc() to support parentless MSI irqdomains (Liu Jiang) ARM Versatile host bridge driver: - Remove unused pci_sys_data structures (Lorenzo Pieralisi) Broadcom iProc host bridge driver: - Hide CONFIG_PCIE_IPROC (Arnd Bergmann) - Do not use 0x in front of %pap (Dmitry V. Krivenok) - Update iProc PCIe device tree binding (Ray Jui) - Add PAXC interface support (Ray Jui) - Add iProc PCIe MSI device tree binding (Ray Jui) - Add iProc PCIe MSI support (Ray Jui) Freescale i.MX6 host bridge driver: - Use gpio_set_value_cansleep() (Fabio Estevam) - Add support for active-low reset GPIO (Petr Štetiar) HiSilicon host bridge driver: - Add support for HiSilicon Hip06 PCIe host controllers (Gabriele Paoloni) Intel VMD host bridge driver: - Export irq_domain_set_info() for module use (Keith Busch) - x86/PCI: Allow DMA ops specific to a PCI domain (Keith Busch) - Use 32 bit PCI domain numbers (Keith Busch) - Add driver for Intel Volume Management Device (VMD) (Keith Busch) Qualcomm host bridge driver: - Document PCIe devicetree bindings (Stanimir Varbanov) - Add Qualcomm PCIe controller driver (Stanimir Varbanov) - dts: apq8064: add PCIe devicetree node (Stanimir Varbanov) - dts: ifc6410: enable PCIe DT node for this board (Stanimir Varbanov) Renesas R-Car host bridge driver: - Add support for R-Car H3 to pcie-rcar (Harunobu Kurokawa) - Allow DT to override default window settings (Phil Edworthy) - Convert to DT resource parsing API (Phil Edworthy) - Revert "PCI: rcar: Build pcie-rcar.c only on ARM" (Phil Edworthy) - Remove unused pci_sys_data struct from pcie-rcar (Phil Edworthy) - Add runtime PM support to pcie-rcar (Phil Edworthy) - Add Gen2 PHY setup to pcie-rcar (Phil Edworthy) - Add gen2 fallback compatibility string for pci-rcar-gen2 (Simon Horman) - Add gen2 fallback compatibility string for pcie-rcar (Simon Horman) Synopsys DesignWare host bridge driver: - Simplify control flow (Bjorn Helgaas) - Make config accessor override checking symmetric (Bjorn Helgaas) - Ensure ATU is enabled before IO/conf space accesses (Stanimir Varbanov) Miscellaneous: - Add of_pci_get_host_bridge_resources() stub (Arnd Bergmann) - Check for PCI_HEADER_TYPE_BRIDGE equality, not bitmask (Bjorn Helgaas) - Fix all whitespace issues (Bogicevic Sasa) - x86/PCI: Simplify pci_bios_{read,write} (Geliang Tang) - Use to_pci_dev() instead of open-coding it (Geliang Tang) - Use kobj_to_dev() instead of open-coding it (Geliang Tang) - Use list_for_each_entry() to simplify code (Geliang Tang) - Fix typos in <linux/msi.h> (Thomas Petazzoni) - x86/PCI: Clarify AMD Fam10h config access restrictions comment (Tomasz Nowicki)" * tag 'pci-v4.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (58 commits) PCI: Add function 1 DMA alias quirk for Lite-On/Plextor M6e/Marvell 88SS9183 PCI: Limit config space size for Netronome NFP4000 PCI: Add Netronome NFP4000 PF device ID x86/PCI: Add driver for Intel Volume Management Device (VMD) PCI/AER: Use 32 bit PCI domain numbers x86/PCI: Allow DMA ops specific to a PCI domain irqdomain: Export irq_domain_set_info() for module use PCI: host: Add of_pci_get_host_bridge_resources() stub genirq/MSI: Relax msi_domain_alloc() to support parentless MSI irqdomains PCI: rcar: Add Gen2 PHY setup to pcie-rcar PCI: rcar: Add runtime PM support to pcie-rcar PCI: designware: Make config accessor override checking symmetric PCI: ibmphp: Remove unneeded NULL test ARM: dts: ifc6410: enable PCIe DT node for this board ARM: dts: apq8064: add PCIe devicetree node PCI: hotplug: Use list_for_each_entry() to simplify code PCI: rcar: Remove unused pci_sys_data struct from pcie-rcar PCI: hisi: Add support for HiSilicon Hip06 PCIe host controllers PCI: Avoid iterating through memory outside the resource window PCI: acpiphp_ibm: Fix null dereferences on null ibm_slot ...	2016-01-21 11:52:16 -08:00
Christoph Hellwig	e1c7e32453	dma-mapping: always provide the dma_map_ops based implementation Move the generic implementation to <linux/dma-mapping.h> now that all architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now that everyone supports them. [valentinrothberg@gmail.com: remove leftovers in Kconfig] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Helge Deller <deller@gmx.de> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Steven Miao <realmz6@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-20 17:09:18 -08:00
Andy Lutomirski	7c360572b4	x86/mm: Make kmap_prot into a #define The value (once we initialize it) is a foregone conclusion. Make it a #define to save a tiny amount of text and data size and to make it more comprehensible. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/0850eb0213de9da88544ff7fae72dc6d06d2b441.1453239349.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-20 11:39:14 +01:00
Linus Torvalds	2b4015e9fb	platform-drivers-x86 for 4.5-1 Add intel punit and telemetry driver for APL SoCs. Add intel-hid driver for various laptop hotkey support. Add asus-wireless radio control driver. Keyboard backlight support/improvements for ThinkPads, Vaio, and Toshiba. Several hotkey related fixes and improvements for dell and toshiba. Fix oops on dual GPU Macs in apple-gmux. A few new device IDs and quirks. Various minor config related build issues and cleanups. surface pro 4: - fix compare_const_fl.cocci warnings - Add support for Surface Pro 4 Buttons platform/x86: - Add Intel Telemetry Debugfs interfaces - Add Intel telemetry platform device - Add Intel telemetry platform driver - Add Intel Telemetry Core Driver - add NULL check for input parameters - add Intel P-Unit mailbox IPC driver - update acpi resource structure for Punit thinkpad_acpi: - Add support for keyboard backlight dell-wmi: - Process only one event on devices with interface version 0 - Check if Dell WMI descriptor structure is valid - Improve unknown hotkey handling - Use a C99-style array for bios_to_linux_keycode tc1100-wmi: - fix build warning when CONFIG_PM not enabled asus-wireless: - Add ACPI HID ATK4001 - Add Asus Wireless Radio Control driver asus-wmi: - drop to_platform_driver macro intel-hid: - new hid event driver for hotkeys sony-laptop: - Keyboard backlight control for some Vaio Fit models ideapad-laptop: - Add Lenovo ideapad Y700-17ISK to no_hw_rfkill dmi list apple-gmux: - Assign apple_gmux_data before registering toshiba_acpi: - Add rfkill dependency to ACPI_TOSHIBA entry - Fix keyboard backlight sysfs entries not being updated - Add WWAN RFKill support - Add support for WWAN devices - Fix blank screen at boot if transflective backlight is supported - Propagate the hotkey value via genetlink toshiba_bluetooth: - Add missing newline in toshiba_bluetooth_present function -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWnuSYAAoJEKbMaAwKp364xckH/A/INtwr1LzmSw8fsEIe6IGQ MKEzCHZOUdSPcZSHcTW/fE/J8kt1j4YdR2kkxNbivtxST5O81E6KfAmo37TRrTNx bunVt7Swj7FcPsm6t0m3PQkbWkOy3UMUncRaZjeofzhRl2m+bHEycEIJsHs7OU+e 7UPjbWZjRz6/LahcryKDH/lRhweMEsgZSFzjNlby8Wzz4LRCu8HLwNGUQa0vQfNI 2UW+s7g2ZkFXx9fbmWu9twkyP7LwNoaZ3IDGsEffIYjmXeZKe+yTPy/TueYYzpon aclJSjhxLNnfWUaO0eUt/H4T5VyUag3v3mp6PVSwm2TA3Cpuh1SDYrJcurls6Mw= =b4Wa -----END PGP SIGNATURE----- Merge tag 'platform-drivers-x86-v4.5-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86 Pull x86 platform driver updates from Darren Hart: "Add intel punit and telemetry driver for APL SoCs. Add intel-hid driver for various laptop hotkey support. Add asus-wireless radio control driver. Keyboard backlight support/improvements for ThinkPads, Vaio, and Toshiba. Several hotkey related fixes and improvements for dell and toshiba. Fix oops on dual GPU Macs in apple-gmux. A few new device IDs and quirks. Various minor config related build issues and cleanups. surface pro 4: - fix compare_const_fl.cocci warnings - Add support for Surface Pro 4 Buttons platform/x86: - Add Intel Telemetry Debugfs interfaces - Add Intel telemetry platform device - Add Intel telemetry platform driver - Add Intel Telemetry Core Driver - add NULL check for input parameters - add Intel P-Unit mailbox IPC driver - update acpi resource structure for Punit thinkpad_acpi: - Add support for keyboard backlight dell-wmi: - Process only one event on devices with interface version 0 - Check if Dell WMI descriptor structure is valid - Improve unknown hotkey handling - Use a C99-style array for bios_to_linux_keycode tc1100-wmi: - fix build warning when CONFIG_PM not enabled asus-wireless: - Add ACPI HID ATK4001 - Add Asus Wireless Radio Control driver asus-wmi: - drop to_platform_driver macro intel-hid: - new hid event driver for hotkeys sony-laptop: - Keyboard backlight control for some Vaio Fit models ideapad-laptop: - Add Lenovo ideapad Y700-17ISK to no_hw_rfkill dmi list apple-gmux: - Assign apple_gmux_data before registering toshiba_acpi: - Add rfkill dependency to ACPI_TOSHIBA entry - Fix keyboard backlight sysfs entries not being updated - Add WWAN RFKill support - Add support for WWAN devices - Fix blank screen at boot if transflective backlight is supported - Propagate the hotkey value via genetlink toshiba_bluetooth: - Add missing newline in toshiba_bluetooth_present function" * tag 'platform-drivers-x86-v4.5-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: (29 commits) surface pro 4: fix compare_const_fl.cocci warnings surface pro 4: Add support for Surface Pro 4 Buttons platform:x86: Add Intel Telemetry Debugfs interfaces platform:x86: Add Intel telemetry platform device platform:x86: Add Intel telemetry platform driver platform/x86: Add Intel Telemetry Core Driver intel_punit_ipc: add NULL check for input parameters thinkpad_acpi: Add support for keyboard backlight dell-wmi: Process only one event on devices with interface version 0 dell-wmi: Check if Dell WMI descriptor structure is valid tc1100-wmi: fix build warning when CONFIG_PM not enabled asus-wireless: Add ACPI HID ATK4001 platform/x86: Add Asus Wireless Radio Control driver asus-wmi: drop to_platform_driver macro intel-hid: new hid event driver for hotkeys Keyboard backlight control for some Vaio Fit models platform/x86: Add rfkill dependency to ACPI_TOSHIBA entry platform:x86: add Intel P-Unit mailbox IPC driver intel_pmc_ipc: update acpi resource structure for Punit ideapad-laptop: Add Lenovo ideapad Y700-17ISK to no_hw_rfkill dmi list ...	2016-01-19 17:54:15 -08:00
Souvik Kumar Chakravarty	378f956e3f	platform/x86: Add Intel Telemetry Core Driver Intel PM Telemetry is a software mechanism via which various SoC PM and performance related parameters like PM counters, firmware trace verbosity, the status of different devices inside the SoC, etc. can be monitored and analyzed. The different samples that may be monitored can be configured at runtime via exported APIs. This patch adds the telemetry core driver that implements basic exported APIs. Signed-off-by: Souvik Kumar Chakravarty <souvik.k.chakravarty@intel.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>	2016-01-19 17:35:50 -08:00
Qipeng Zha	fdca4f16f5	platform:x86: add Intel P-Unit mailbox IPC driver This driver provides support for P-Unit mailbox IPC on Intel platforms. The heart of the P-Unit is the Foxton microcontroller and its firmware, which provide mailbox interface for power management usage. Signed-off-by: Qipeng Zha <qipeng.zha@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>	2016-01-19 15:49:36 -08:00
Josh Poimboeuf	ec5186557a	x86/asm: Add C versions of frame pointer macros Add C versions of the frame pointer macros which can be used to create a stack frame in inline assembly. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chris J Arges <chris.j.arges@canonical.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Pedro Alves <palves@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f6786a282bf232ede3e2866414eae3cf02c7d662.1450442274.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-19 12:59:07 +01:00
Josh Poimboeuf	997963edd9	x86/asm: Clean up frame pointer macros The asm macros for setting up and restoring the frame pointer aren't currently being used. However, they will be needed soon to help asm functions to comply with stacktool. Rename FRAME/ENDFRAME to FRAME_BEGIN/FRAME_END for more symmetry. Also make the code more readable and improve the comments. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chris J Arges <chris.j.arges@canonical.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Pedro Alves <palves@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/3f488a8e3bfc8ac7d4d3d350953e664e7182b044.1450442274.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-19 12:59:07 +01:00
Borislav Petkov	a1ff572608	x86/cpufeature: Add AMD AVIC bit CPUID Fn8000_000A_EDX[13] denotes support for AMD's Virtual Interrupt controller, i.e., APIC virtualization. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Kaplan <david.kaplan@amd.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Link: http://lkml.kernel.org/r/1452938292-12327-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-19 08:29:28 +01:00
Linus Torvalds	a200dcb346	virtio: barrier rework+fixes This adds a new kind of barrier, and reworks virtio and xen to use it. Plus some fixes here and there. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWlU2kAAoJECgfDbjSjVRpZ6IH/Ra19ecG8sCQo9zskr4zo22Z DZXC3u0sJDBYjjBAiw3IY1FKh7wx2Fr1RhUOj1bteBgcFCMCV1zInP5ITiCyzd1H YYh1w9C2tZaj2T4t9L4hIrAdtIF8fGS+oI2IojXPjOuDLEt6pfFBEjHp/sfl3UJq ZmZvw4OXviSNej7jBw8Xni3Uv18yfmLGXvMdkvMSPC1/XL29voGDqTVwhqJwxLVz k/ZLcKFOzIs9N7Nja0Jl1EiZtC2Y9cpItqweicNAzszlpkSL44vQxmCSefB+WyQ4 gt0O3+AxYkLfrxzCBhUA4IpRex3/XPW1b+1e/V1XjfR2n/FlyLe+AIa8uPJElFc= =ukaV -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio barrier rework+fixes from Michael Tsirkin: "This adds a new kind of barrier, and reworks virtio and xen to use it. Plus some fixes here and there" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (44 commits) checkpatch: add virt barriers checkpatch: check for __smp outside barrier.h checkpatch.pl: add missing memory barriers virtio: make find_vqs() checkpatch.pl-friendly virtio_balloon: fix race between migration and ballooning virtio_balloon: fix race by fill and leak s390: more efficient smp barriers s390: use generic memory barriers xen/events: use virt_xxx barriers xen/io: use virt_xxx barriers xenbus: use virt_xxx barriers virtio_ring: use virt_store_mb sh: move xchg_cmpxchg to a header by itself sh: support 1 and 2 byte xchg virtio_ring: update weak barriers to use virt_xxx Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb" asm-generic: implement virt_xxx memory barriers x86: define __smp_xxx xtensa: define __smp_xxx tile: define __smp_xxx ...	2016-01-18 16:44:24 -08:00
Miroslav Benes	383bf44d1a	livepatch: change the error message in asm/livepatch.h header files If anyone includes asm/livepatch.h when CONFIG_LIVEPATCH=n the build fails with the existing error message. Change it to something saner. [jkosina@suse.cz: fixed changelog typo spotted by Josh] Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2016-01-18 21:35:43 +01:00
Linus Torvalds	0cbeafb245	Merge branch 'akpm' (patches from Andrew) Merge second patch-bomb from Andrew Morton: - more MM stuff: - Kirill's page-flags rework - Kirill's now-allegedly-fixed THP rework - MADV_FREE implementation - DAX feature work (msync/fsync). This isn't quite complete but DAX is new and it's good enough and the guys have a handle on what needs to be done - I expect this to be wrapped in the next week or two. - some vsprintf maintenance work - various other misc bits * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (145 commits) printk: change recursion_bug type to bool lib/vsprintf: factor out %pN[F] handler as netdev_bits() lib/vsprintf: refactor duplicate code to special_hex_number() printk-formats.txt: remove unimplemented %pT printk: help pr_debug and pr_devel to optimize out arguments lib/test_printf.c: test dentry printing lib/test_printf.c: add test for large bitmaps lib/test_printf.c: account for kvasprintf tests lib/test_printf.c: add a few number() tests lib/test_printf.c: test precision quirks lib/test_printf.c: check for out-of-bound writes lib/test_printf.c: don't BUG lib/kasprintf.c: add sanity check to kvasprintf lib/vsprintf.c: warn about too large precisions and field widths lib/vsprintf.c: help gcc make number() smaller lib/vsprintf.c: expand field_width to 24 bits lib/vsprintf.c: eliminate potential race in string() lib/vsprintf.c: move string() below widen_string() lib/vsprintf.c: pull out padding code from dentry_name() printk: do cond_resched() between lines while outputting to consoles ...	2016-01-17 12:58:52 -08:00
Linus Torvalds	a016af2e70	sound updates for 4.5-rc1 We've had quite busy weeks in this cycle. Looking at ALSA core, the significant changes are a few fixes wrt timer and sequencer ioctls that have been revealed by fuzzer recently. Other than that, ASoC core got a few updates about DAI link handling, but these are rather straightforward refactoring. In drivers scene, ASoC received quite lots of new drivers in addition to bunch of updates for still ongoing Intel Skylake support and topology API. HD-audio gained a new HDMI/DP hotplug notification via component. FireWire got a pile of code refactoring/updates with SCS.1x driver integration. More highlights are shown below. [NOTE: this contains also many commits for DRM. This is due to the pull of drm stable branch into sound tree, as the base of i915 audio component work for HD-audio. The highlights below don't contain these DRM changes, as these are supposed to be pulled via drm tree in anyway sooner or later.] Core - Handful fixes to harden ALSA timer and sequencer ioctls against races reported by syzkaller fuzzer - Irq description string can be unique to each card; only for HD-audio for now ASoC - Conversion of the array of DAI links to a list for supporting dynamically adding and removing DAI links - Topology API enhancements to make everything more component based and being able to specify PCM links via topology - Some more fixes for the topology code, though it is still not final and ready for enabling in production; we really need to get to the point where that can be done - A pile of changes for Intel SkyLake drivers which hopefully deliver some useful initial functionality for systems with this chipset, though there is more work still to come - Lots of new features and cleanups for the Renesas drivers - ANC support for WM5110 - New drivers: Imagination Technologies IPs, Atmel class D speaker, Cirrus CS47L24 and WM1831, Dialog DA7128, Realtek RT5659 and RT56156, Rockchip RK3036, TI PC3168A, and AMD ACP - Rename PCM1792a driver to be generic pcm179x HD-Audio - Use audio component for i915 HDMI/DP hotplug handling - On-demand binding with i915 driver - bdl_pos_adj parameter adjustment for Baytrail controllers - Enable power_save_node for CX20722; this shouldn't lead to regression, hopefully - Kabylake HDMI/DP codec support - Quirks for Lenovo E50-80, Dell Latitude E-series, and other Dell machines - A few code refactoring FireWire - Lots of code cleanup and refactoring - Integrate the support of SCS.1x devices into snd-oxfw driver; snd-scs1x driver is obsoleted USB-audio - Fix possible NULL dereference at disconnection - A regression fix for Native Instruments devices Misc - A few code cleanups of fm801 driver -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJWmmhNAAoJEGwxgFQ9KSmk/wsP/3eO+giAT9VRPa6qxR6VdT6I dZwTxcp4ZzUrgLxk9k5VYjqey6QL+1xWfl3Abrd+NzXDj1wo4KsDh2XCKG1btO9K UpIZf76Nzt7o91pzHbsU6mrjDeoVNqloZoGbg1utAmmegaXH3owd18p/ZHfE3sz2 BbaHmYW/R8lnaBgBhzqJB97+zRaLJmMWpWHfpHaIPjdfw8/V4j76jtPnpmv2hDZl BHXVHcQXjVGunFRzxdzBLuTC+FmhzUeTAbbAdOT4fEoOCv5MtZqYppNxdhj+b9l5 mrsXe5FBTNmrt9Z5TtfCuzgJPkzoDperFb0aKd7wI1jVMtLzkNCMlanHr9U6B6fr jSrs6l25xrpF1BBfRMfHjNudA5vng/XC5dtW00JofXSrIxtwPNUoDDiqJgw7xVm5 aVWK7KkQIjRbHdCQaeTymv70oHHKei92hbCrXUobXZ7wLeJMXNVPT25ttChWrgAI 7cu5h+K5PjReI/sJFTMPL4aHZ+jAn9quQl7vK8EXiL9E6G8lLiuBiVW6hjGd9At+ Z6UyGV+nCM6O3qZcyParMuLkNtWx9uT7Pcn8oTZAdKPngNhsf8+yl9qmsFkNLDC4 LKPx0+rdCjtMKn2du3krsHhG3EN9pLDrE6g5U3d6Cz83e69Y7fCuSjl31SjD91H0 bZDcM/ejYSbid3yKN4TL =Gvgb -----END PGP SIGNATURE----- Merge tag 'sound-4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "We've had quite busy weeks in this cycle. Looking at ALSA core, the significant changes are a few fixes wrt timer and sequencer ioctls that have been revealed by fuzzer recently. Other than that, ASoC core got a few updates about DAI link handling, but these are rather straightforward refactoring. In drivers scene, ASoC received quite lots of new drivers in addition to bunch of updates for still ongoing Intel Skylake support and topology API. HD-audio gained a new HDMI/DP hotplug notification via component. FireWire got a pile of code refactoring/updates with SCS.1x driver integration. More highlights are shown below. [ NOTE: this contains also many commits for DRM. This is due to the pull of drm stable branch into sound tree, as the base of i915 audio component work for HD-audio. The highlights below don't contain these DRM changes, as these are supposed to be pulled via drm tree in anyway sooner or later. ] Core: - Handful fixes to harden ALSA timer and sequencer ioctls against races reported by syzkaller fuzzer - Irq description string can be unique to each card; only for HD-audio for now ASoC: - Conversion of the array of DAI links to a list for supporting dynamically adding and removing DAI links - Topology API enhancements to make everything more component based and being able to specify PCM links via topology - Some more fixes for the topology code, though it is still not final and ready for enabling in production; we really need to get to the point where that can be done - A pile of changes for Intel SkyLake drivers which hopefully deliver some useful initial functionality for systems with this chipset, though there is more work still to come - Lots of new features and cleanups for the Renesas drivers - ANC support for WM5110 - New drivers: Imagination Technologies IPs, Atmel class D speaker, Cirrus CS47L24 and WM1831, Dialog DA7128, Realtek RT5659 and RT56156, Rockchip RK3036, TI PC3168A, and AMD ACP - Rename PCM1792a driver to be generic pcm179x HD-Audio: - Use audio component for i915 HDMI/DP hotplug handling - On-demand binding with i915 driver - bdl_pos_adj parameter adjustment for Baytrail controllers - Enable power_save_node for CX20722; this shouldn't lead to regression, hopefully - Kabylake HDMI/DP codec support - Quirks for Lenovo E50-80, Dell Latitude E-series, and other Dell machines - A few code refactoring FireWire: - Lots of code cleanup and refactoring - Integrate the support of SCS.1x devices into snd-oxfw driver; snd-scs1x driver is obsoleted USB-audio: - Fix possible NULL dereference at disconnection - A regression fix for Native Instruments devices Misc: - A few code cleanups of fm801 driver" * tag 'sound-4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (722 commits) ALSA: timer: Code cleanup ALSA: timer: Harden slave timer list handling ALSA: hda - Add fixup for Dell Latitidue E6540 ALSA: timer: Fix race among timer ioctls ALSA: hda - add codec support for Kabylake display audio codec ALSA: timer: Fix double unlink of active_list ALSA: usb-audio: Fix mixer ctl regression of Native Instrument devices ALSA: hda - fix the headset mic detection problem for a Dell laptop ALSA: hda - Fix white noise on Dell Latitude E5550 ALSA: hda_intel: add card number to irq description ALSA: seq: Fix race at timer setup and close ALSA: seq: Fix missing NULL check at remove_events ioctl ALSA: usb-audio: Avoid calling usb_autopm_put_interface() at disconnect ASoC: hdac_hdmi: remove unused hdac_hdmi_query_pin_connlist ASoC: AMD: Add missing include file ALSA: hda - Fixup inverted internal mic for Lenovo E50-80 ALSA: usb: Add native DSD support for Oppo HA-1 ASoC: Make aux_dev more like a generic component ASoC: bcm2835: cleanup includes by ordering them alphabetically ASoC: AMD: Manage ACP 2.x SRAM banks power ...	2016-01-17 12:05:31 -08:00
Dan Williams	3565fce3a6	mm, x86: get_user_pages() for dax mappings A dax mapping establishes a pte with _PAGE_DEVMAP set when the driver has established a devm_memremap_pages() mapping, i.e. when the pfn_t return from ->direct_access() has PFN_DEV and PFN_MAP set. Later, when encountering _PAGE_DEVMAP during a page table walk we lookup and pin a struct dev_pagemap instance to keep the result of pfn_to_page() valid until put_page(). Signed-off-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Logan Gunthorpe <logang@deltatee.com> Cc: Dave Hansen <dave@sr71.net> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 17:56:32 -08:00
Dan Williams	5c7fb56e5e	mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd A dax-huge-page mapping while it uses some thp helpers is ultimately not a transparent huge page. The distinction is especially important in the get_user_pages() path. pmd_devmap() is used to distinguish dax-pmds from pmd_huge() and pmd_trans_huge() which have slightly different semantics. Explicitly mark the pmd_trans_huge() helpers that dax needs by adding pmd_devmap() checks. [kirill.shutemov@linux.intel.com: fix regression in handling mlocked pages in __split_huge_pmd()] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 17:56:32 -08:00
Dan Williams	f25748e3c3	mm, dax: convert vmf_insert_pfn_pmd() to pfn_t Similar to the conversion of vm_insert_mixed() use pfn_t in the vmf_insert_pfn_pmd() to tag the resulting pte with _PAGE_DEVICE when the pfn is backed by a devm_memremap_pages() mapping. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 17:56:32 -08:00
Dan Williams	01c8f1c44b	mm, dax, gpu: convert vm_insert_mixed to pfn_t Convert the raw unsigned long 'pfn' argument to pfn_t for the purpose of evaluating the PFN_MAP and PFN_DEV flags. When both are set it triggers _PAGE_DEVMAP to be set in the resulting pte. There are no functional changes to the gpu drivers as a result of this conversion. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: David Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 17:56:32 -08:00
Dan Williams	69660fd797	x86, mm: introduce _PAGE_DEVMAP _PAGE_DEVMAP is a hardware-unused pte bit that will later be used in the get_user_pages() path to identify pfns backed by the dynamic allocation established by devm_memremap_pages. Upon seeing that bit the gup path will lookup and pin the allocation while the pages are in use. Since the _PAGE_DEVMAP bit is > 32 it must be cast to u64 instead of a pteval_t to allow pmd_flags() usage in the realmode boot code to build. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 17:56:32 -08:00
Dan Williams	52db400fcd	pmem, dax: clean up clear_pmem() To date, we have implemented two I/O usage models for persistent memory, PMEM (a persistent "ram disk") and DAX (mmap persistent memory into userspace). This series adds a third, DAX-GUP, that allows DAX mappings to be the target of direct-i/o. It allows userspace to coordinate DMA/RDMA from/to persistent memory. The implementation leverages the ZONE_DEVICE mm-zone that went into 4.3-rc1 (also discussed at kernel summit) to flag pages that are owned and dynamically mapped by a device driver. The pmem driver, after mapping a persistent memory range into the system memmap via devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus page-backed pmem-pfns via flags in the new pfn_t type. The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the resulting pte(s) inserted into the process page tables with a new _PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys off _PAGE_DEVMAP to pin the device hosting the page range active. Finally, get_page() and put_page() are modified to take references against the device driver established page mapping. Finally, this need for "struct page" for persistent memory requires memory capacity to store the memmap array. Given the memmap array for a large pool of persistent may exhaust available DRAM introduce a mechanism to allocate the memmap from persistent memory. The new "struct vmem_altmap *" parameter to devm_memremap_pages() enables arch_add_memory() to use reserved pmem capacity rather than the page allocator. This patch (of 25): Both __dax_pmd_fault, and clear_pmem() were taking special steps to clear memory a page at a time to take advantage of non-temporal clear_page() implementations. However, x86_64 does not use non-temporal instructions for clear_page(), and arch_clear_pmem() was always incurring the cost of __arch_wb_cache_pmem(). Clean up the assumption that doing clear_pmem() a page at a time is more performant. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: David Airlie <airlied@linux.ie> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Kara <jack@suse.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jens Axboe <axboe@fb.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 17:56:32 -08:00
Minchan Kim	590a471ce9	arch/x86/include/asm/pgtable.h: add pmd_[dirty\|mkclean] for THP MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite of the contents since MADV_FREE syscall is called for THP page. This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE support. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Shaohua Li <shli@kernel.org> Cc: <yalin.wang2010@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Gang <gang.chen.5i5j@gmail.com> Cc: Chris Zankel <chris@zankel.net> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: David S. Miller <davem@davemloft.net> Cc: Helge Deller <deller@gmx.de> Cc: Hugh Dickins <hughd@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jason Evans <je@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttil <mika.penttila@nextfour.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Rik van Riel <riel@redhat.com> Cc: Roland Dreier <roland@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Shaohua Li <shli@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 17:56:32 -08:00
Kirill A. Shutemov	1f19617d77	x86, thp: remove infrastructure for handling splitting PMDs With new refcounting we don't need to mark PMDs splitting. Let's drop code to handle this. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Jerome Marchand <jmarchan@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 17:56:32 -08:00
Bjorn Helgaas	3a6384ba10	Merge branch 'pci/host-vmd' into next * pci/host-vmd: x86/PCI: Add driver for Intel Volume Management Device (VMD) PCI/AER: Use 32 bit PCI domain numbers x86/PCI: Allow DMA ops specific to a PCI domain irqdomain: Export irq_domain_set_info() for module use genirq/MSI: Relax msi_domain_alloc() to support parentless MSI irqdomains	2016-01-15 16:14:39 -06:00
Keith Busch	185a383ada	x86/PCI: Add driver for Intel Volume Management Device (VMD) The Intel Volume Management Device (VMD) is a Root Complex Integrated Endpoint that acts as a host bridge to a secondary PCIe domain. BIOS can reassign one or more Root Ports to appear within a VMD domain instead of the primary domain. The immediate benefit is that additional PCIe domains allow more than 256 buses in a system by letting bus numbers be reused across different domains. VMD domains do not define ACPI _SEG, so to avoid domain clashing with host bridges defining this segment, VMD domains start at 0x10000, which is greater than the highest possible 16-bit ACPI defined _SEG. This driver enumerates and enables the domain using the root bus configuration interface provided by the PCI subsystem. The driver provides configuration space accessor functions (pci_ops), bus and memory resources, an MSI IRQ domain with irq_chip implementation, and DMA operations necessary to use devices through the VMD endpoint's interface. VMD routes I/O as follows: 1) Configuration Space: BAR 0 ("CFGBAR") of VMD provides the base address and size for configuration space register access to VMD-owned root ports. It works similarly to MMCONFIG for extended configuration space. Bus numbering is independent and does not conflict with the primary domain. 2) MMIO Space: BARs 2 and 4 ("MEMBAR1" and "MEMBAR2") of VMD provide the base address, size, and type for MMIO register access. These addresses are not translated by VMD hardware; they are simply reservations to be distributed to root ports' memory base/limit registers and subdivided among devices downstream. 3) DMA: To interact appropriately with an IOMMU, the source ID DMA read and write requests are translated to the bus-device-function of the VMD endpoint. Otherwise, DMA operates normally without VMD-specific address translation. 4) Interrupts: Part of VMD's BAR 4 is reserved for VMD's MSI-X Table and PBA. MSIs from VMD domain devices and ports are remapped to appear as if they were issued using one of VMD's MSI-X table entries. Each MSI and MSI-X address of VMD-owned devices and ports has a special format where the address refers to specific entries in the VMD's MSI-X table. As with DMA, the interrupt source ID is translated to VMD's bus-device-function. The driver provides its own MSI and MSI-X configuration functions specific to how MSI messages are used within the VMD domain, and provides an irq_chip for independent IRQ allocation to relay interrupts from VMD's interrupt handler to the appropriate device driver's handler. 5) Errors: PCIe error message are intercepted by the root ports normally (e.g., AER), except with VMD, system errors (i.e., firmware first) are disabled by default. AER and hotplug interrupts are translated in the same way as endpoint interrupts. 6) VMD does not support INTx interrupts or IO ports. Devices or drivers requiring these features should either not be placed below VMD-owned root ports, or VMD should be disabled by BIOS for such endpoints. [bhelgaas: add VMD BAR #defines, factor out vmd_cfg_addr(), rework VMD resource setup, whitespace, changelog] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> (IRQ-related parts)	2016-01-15 13:54:55 -06:00
Keith Busch	d9c3d6ff22	x86/PCI: Allow DMA ops specific to a PCI domain The Intel Volume Management Device (VMD) is a PCIe endpoint that acts as a host bridge to another PCI domain. When devices below the VMD perform DMA, the VMD replaces their DMA source IDs with its own source ID. Therefore, those devices require special DMA ops. Add interfaces to allow the VMD driver to set up dma_ops for the devices below it. [bhelgaas: remove "extern", add "static", changelog] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2016-01-15 13:54:55 -06:00
Thomas Gleixner	90a2282e23	x86/irq: Call irq_force_move_complete with irq descriptor First of all there is no point in looking up the irq descriptor again, but we also need the descriptor for the final cleanup race fix in the next patch. Make that change seperate. No functional difference. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@alien8.de> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/20151231160107.125211743@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-01-15 13:44:01 +01:00
Linus Torvalds	10a0c0f059	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc changes: - fix lguest bug - fix /proc/meminfo output on certain configs - fix pvclock bug - fix reboot on certain iMacs by adding new reboot quirk - fix bootup crash - fix FPU boot line option parsing - add more x86 self-tests - small cleanups, documentation improvements, etc" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/amd: Remove an unneeded condition in srat_detect_node() x86/vdso/pvclock: Protect STABLE check with the seqcount x86/mm: Improve switch_mm() barrier comments selftests/x86: Test __kernel_sigreturn and __kernel_rt_sigreturn x86/reboot/quirks: Add iMac10,1 to pci_reboot_dmi_table[] lguest: Map switcher text R/O x86/boot: Hide local labels in verify_cpu() x86/fpu: Disable AVX when eagerfpu is off x86/fpu: Disable MPX when eagerfpu is off x86/fpu: Disable XGETBV1 when no XSAVE x86/fpu: Fix early FPU command-line parsing x86/mm: Use PAGE_ALIGNED instead of IS_ALIGNED selftests/x86: Disable the ldt_gdt_64 test for now x86/mm/pat: Make split_page_count() check for empty levels to fix /proc/meminfo output x86/boot: Double BOOT_HEAP_SIZE to 64KB x86/mm: Add barriers and document switch_mm()-vs-flush synchronization	2016-01-14 11:57:22 -08:00
Andy Lutomirski	4eaffdd5a5	x86/mm: Improve switch_mm() barrier comments My previous comments were still a bit confusing and there was a typo. Fix it up. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: `71b3c126e6` ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization") Link: http://lkml.kernel.org/r/0a0b43cdcdd241c5faaaecfbcc91a155ddedc9a1.1452631609.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-13 10:42:49 +01:00
Linus Torvalds	67990608c8	Power management and ACPI updates for v4.5-rc1 - Add a debugfs-based interface for interacting with the ACPICA's AML debugger introduced in the previous cycle and a new user space tool for that, fix some bugs related to the AML debugger and clean up the code in question (Lv Zheng, Dan Carpenter, Colin Ian King, Markus Elfring). - Update ACPICA to upstream revision 20151218 including a number of fixes and cleanups in the ACPICA core (Bob Moore, Lv Zheng, Labbe Corentin, Prarit Bhargava, Colin Ian King, David E Box, Rafael Wysocki). In particular, the previously added erroneous support for the _SUB object is dropped, the concatenate operator will support all ACPI objects now, the Debug Object handling is improved, the SuperName handling of parameters being control methods is fixed, the ObjectType operator handling is updated to follow ACPI 5.0A and the handling of CondRefOf and RefOf is updated accordingly, module-level code will be executed after loading each ACPI table now (instead of being run once after all tables containing AML have been loaded), the Operation Region handlers management is updated to fix some reported problems and a the ACPICA code in the kernel is more in line with the upstream now. - Update the ACPI backlight driver to provide information on whether or not it will generate key-presses for brightness change hotkeys and update some platform drivers (dell-wmi, thinkpad_acpi) to use that information to avoid sending double key-events to users pace for these, add new ACPI backlight quirks (Hans de Goede, Aaron Lu, Adrien Schildknecht). - Improve the ACPI handling of interrupt GPIOs (Christophe Ricard). - Fix the handling of the list of device IDs of device objects found in the ACPI namespace and add a helper for checking if there is a device object for a given device ID (Lukas Wunner). - Change the logic in the ACPI namespace scanning code to create struct acpi_device objects for all ACPI device objects found in the namespace even if _STA fails for them which helps to avoid device enumeration problems on Microsoft Surface 3 (Aaron Lu). - Add support for the APM X-Gene ACPI I2C device to the ACPI driver for AMD SoCs (Loc Ho). - Fix the long-standing issue with the DMA controller on Intel SoCs where ACPI tables have no power management support for the DMA controller itself, but it can be powered off automatically when the last (other) device on the SoC is powered off via ACPI and clean up the ACPI driver for Intel SoCs (acpi-lpss) after previous attempts to fix that problem (Andy Shevchenko). - Assorted ACPI fixes and cleanups (Andy Lutomirski, Colin Ian King, Javier Martinez Canillas, Ken Xue, Mathias Krause, Rafael Wysocki, Sinan Kaya). - Update the device properties framework for better handling of built-in properties, add support for built-in properties to the platform bus type, update the MFD subsystem's handling of device properties and add support for passing default configuration data as device properties to the intel-lpss MFD drivers, convert the designware I2C driver to use the unified device properties API and add a fallback mechanism for using default built-in properties if the platform firmware fails to provide the properties as expected by drivers (Andy Shevchenko, Mika Westerberg, Heikki Krogerus, Andrew Morton). - Add new Device Tree bindings to the Operating Performance Points (OPP) framework and update the exynos4412 DT binding accordingly, introduce debugfs support for the OPP framework (Viresh Kumar, Bartlomiej Zolnierkiewicz). - Migrate the mt8173 cpufreq driver to the new OPP bindings (Pi-Cheng Chen). - Update the cpufreq core to make the handling of governors more efficient, especially on systems where policy objects are shared between multiple CPUs (Viresh Kumar, Rafael Wysocki). - Fix cpufreq governor handling on configurations with CONFIG_HZ_PERIODIC set (Chen Yu). - Clean up the cpufreq core code related to the boost sysfs knob support and update the ACPI cpufreq driver accordingly (Rafael Wysocki). - Add a new cpufreq driver for ST platforms and corresponding Device Tree bindings (Lee Jones). - Update the intel_pstate driver to allow the P-state selection algorithm used by it to depend on the CPU ID of the processor it is running on, make it use a special P-state selection algorithm (with an IO wait time compensation tweak) on Atom CPUs based on the Airmont and Silvermont cores so as to reduce their energy consumption and improve intel_pstate documentation (Philippe Longepe, Srinivas Pandruvada). - Update the cpufreq-dt driver to support registering cooling devices that use the (P * V^2 * f) dynamic power draw formula where V is the voltage, f is the frequency and P is a constant coefficient provided by Device Tree and update the arm_big_little cpufreq driver to use that support (Punit Agrawal). - Assorted cpufreq driver (cpufreq-dt, qoriq, pcc-cpufreq, blackfin-cpufreq) updates (Andrzej Hajda, Hongtao Jia, Jacob Tanenbaum, Markus Elfring). - cpuidle core tweaks related to polling and measured_us calculation (Rik van Riel). - Removal of modularity from a few cpuidle drivers (clps711x, ux500, exynos) that cannot be built as modules in practice (Paul Gortmaker). - PM core update to prevent devices from being probed during system suspend/resume which is generally problematic and may lead to inconsistent behavior (Grygorii Strashko). - Assorted updates of the PM core and related code (Julia Lawall, Manuel Pégourié-Gonnard, Maruthi Bayyavarapu, Rafael Wysocki, Ulf Hansson). - PNP bus type updates (Christophe Le Roy, Heiner Kallweit). - PCI PM code cleanups (Jarkko Nikula, Julia Lawall). - cpupower tool updates (Jacob Tanenbaum, Thomas Renninger). / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJWlZOmAAoJEILEb/54YlRxxtEP/ioR0xMOJQcWd5F6Oyj1PZsx vJeXsmL3fXFAlr6riaE966QqclhUTDhhex3kbFmNQvM8WukxOmBWy5UMSjRg2UmM PHrogc/KrrE+xb8hjGZPgqVr+/L9O3C6lZmM+AUciT0hWZJckYgRh5TpHb1xN/Kx MptvtSXRBM62LWytug+EwA4SHt7OFS0yJ/CI1pKvODVtLaYDIPI5k+4ilPU7y6Be vfoysvmUozNTEYxgPOPXfoQqW2P5t2df32Re31uKtLenLXbc8KW0wIYm24DXgSK6 V/TyDVZTNaZk6OpTqWrjqFbedpGvcBpViwYEY7yv33GDCpXGdHQl3ga+Jy6PAUem 7oGDZtA+5Di/8szhH/wSdpXwSaKEeUdFiaj6Uw2MAwiY4wzv5+WmLRcuIjQFDAxT elrTbQhAgaMlMsUkQ9NV4GC7ByUeeQX2NpCielsHngOQgKdYRQHyYUgGXc2Wgjdq UnVrIWRHzXSED0RtPI7IT0Y4PSxkM9UoSEiVUwt3srCue2CFzuENs23qaDgAzeDa 5uwnDl4RhI2BrLVT1WhioIFgFE5Yh5Xx6dSGC+jcU2ss8r2oN6DdUbqOzWAa1iR4 sFhgwwwizpCCfB6pSqEuDdg8W56HjvE9kQY9kcTPPNPbktL0VImC+iiSN/CgZJv9 MH9NbQM8uHkfNcpjsN7V =OlYA -----END PGP SIGNATURE----- Merge tag 'pm+acpi-4.5-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull oower management and ACPI updates from Rafael Wysocki: "As far as the number of commits goes, ACPICA takes the lead this time, followed by cpufreq and the device properties framework changes. The most significant new feature is the debugfs-based interface to the ACPICA's AML debugger added in the previous cycle and a new user space tool for accessing it. On the cpufreq front, the core is updated to handle governors more efficiently, particularly on systems where a single cpufreq policy object is shared between multiple CPUs, and there are quite a few changes in drivers (intel_pstate, cpufreq-dt etc). The device properties framework is updated to handle built-in (ie included in the kernel itself) device properties better, among other things by adding a fallback mechanism that will allow drivers to provide default properties to be used in case the plaform firmware doesn't provide the properties expected by them. The Operating Performance Points (OPP) framework gets new DT bindings and debugfs support. A new cpufreq driver for ST platforms is added and the ACPI driver for AMD SoCs will now support the APM X-Gene ACPI I2C device. The rest is mostly fixes and cleanups all over. Specifics: - Add a debugfs-based interface for interacting with the ACPICA's AML debugger introduced in the previous cycle and a new user space tool for that, fix some bugs related to the AML debugger and clean up the code in question (Lv Zheng, Dan Carpenter, Colin Ian King, Markus Elfring). - Update ACPICA to upstream revision 20151218 including a number of fixes and cleanups in the ACPICA core (Bob Moore, Lv Zheng, Labbe Corentin, Prarit Bhargava, Colin Ian King, David E Box, Rafael Wysocki). In particular, the previously added erroneous support for the _SUB object is dropped, the concatenate operator will support all ACPI objects now, the Debug Object handling is improved, the SuperName handling of parameters being control methods is fixed, the ObjectType operator handling is updated to follow ACPI 5.0A and the handling of CondRefOf and RefOf is updated accordingly, module- level code will be executed after loading each ACPI table now (instead of being run once after all tables containing AML have been loaded), the Operation Region handlers management is updated to fix some reported problems and a the ACPICA code in the kernel is more in line with the upstream now. - Update the ACPI backlight driver to provide information on whether or not it will generate key-presses for brightness change hotkeys and update some platform drivers (dell-wmi, thinkpad_acpi) to use that information to avoid sending double key-events to users pace for these, add new ACPI backlight quirks (Hans de Goede, Aaron Lu, Adrien Schildknecht). - Improve the ACPI handling of interrupt GPIOs (Christophe Ricard). - Fix the handling of the list of device IDs of device objects found in the ACPI namespace and add a helper for checking if there is a device object for a given device ID (Lukas Wunner). - Change the logic in the ACPI namespace scanning code to create struct acpi_device objects for all ACPI device objects found in the namespace even if _STA fails for them which helps to avoid device enumeration problems on Microsoft Surface 3 (Aaron Lu). - Add support for the APM X-Gene ACPI I2C device to the ACPI driver for AMD SoCs (Loc Ho). - Fix the long-standing issue with the DMA controller on Intel SoCs where ACPI tables have no power management support for the DMA controller itself, but it can be powered off automatically when the last (other) device on the SoC is powered off via ACPI and clean up the ACPI driver for Intel SoCs (acpi-lpss) after previous attempts to fix that problem (Andy Shevchenko). - Assorted ACPI fixes and cleanups (Andy Lutomirski, Colin Ian King, Javier Martinez Canillas, Ken Xue, Mathias Krause, Rafael Wysocki, Sinan Kaya). - Update the device properties framework for better handling of built-in properties, add support for built-in properties to the platform bus type, update the MFD subsystem's handling of device properties and add support for passing default configuration data as device properties to the intel-lpss MFD drivers, convert the designware I2C driver to use the unified device properties API and add a fallback mechanism for using default built-in properties if the platform firmware fails to provide the properties as expected by drivers (Andy Shevchenko, Mika Westerberg, Heikki Krogerus, Andrew Morton). - Add new Device Tree bindings to the Operating Performance Points (OPP) framework and update the exynos4412 DT binding accordingly, introduce debugfs support for the OPP framework (Viresh Kumar, Bartlomiej Zolnierkiewicz). - Migrate the mt8173 cpufreq driver to the new OPP bindings (Pi-Cheng Chen). - Update the cpufreq core to make the handling of governors more efficient, especially on systems where policy objects are shared between multiple CPUs (Viresh Kumar, Rafael Wysocki). - Fix cpufreq governor handling on configurations with CONFIG_HZ_PERIODIC set (Chen Yu). - Clean up the cpufreq core code related to the boost sysfs knob support and update the ACPI cpufreq driver accordingly (Rafael Wysocki). - Add a new cpufreq driver for ST platforms and corresponding Device Tree bindings (Lee Jones). - Update the intel_pstate driver to allow the P-state selection algorithm used by it to depend on the CPU ID of the processor it is running on, make it use a special P-state selection algorithm (with an IO wait time compensation tweak) on Atom CPUs based on the Airmont and Silvermont cores so as to reduce their energy consumption and improve intel_pstate documentation (Philippe Longepe, Srinivas Pandruvada). - Update the cpufreq-dt driver to support registering cooling devices that use the (P * V^2 * f) dynamic power draw formula where V is the voltage, f is the frequency and P is a constant coefficient provided by Device Tree and update the arm_big_little cpufreq driver to use that support (Punit Agrawal). - Assorted cpufreq driver (cpufreq-dt, qoriq, pcc-cpufreq, blackfin-cpufreq) updates (Andrzej Hajda, Hongtao Jia, Jacob Tanenbaum, Markus Elfring). - cpuidle core tweaks related to polling and measured_us calculation (Rik van Riel). - Removal of modularity from a few cpuidle drivers (clps711x, ux500, exynos) that cannot be built as modules in practice (Paul Gortmaker). - PM core update to prevent devices from being probed during system suspend/resume which is generally problematic and may lead to inconsistent behavior (Grygorii Strashko). - Assorted updates of the PM core and related code (Julia Lawall, Manuel Pégourié-Gonnard, Maruthi Bayyavarapu, Rafael Wysocki, Ulf Hansson). - PNP bus type updates (Christophe Le Roy, Heiner Kallweit). - PCI PM code cleanups (Jarkko Nikula, Julia Lawall). - cpupower tool updates (Jacob Tanenbaum, Thomas Renninger)" * tag 'pm+acpi-4.5-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (177 commits) PM / clk: don't leave clocks enabled when driver not bound i2c: dw: Add APM X-Gene ACPI I2C device support ACPI / APD: Add APM X-Gene ACPI I2C device support ACPI / LPSS: change 'does not have' to 'has' in comment Revert "dmaengine: dw: platform: provide platform data for Intel" dmaengine: dw: return immediately from IRQ when DMA isn't in use dmaengine: dw: platform: power on device on shutdown ACPI / LPSS: override power state for LPSS DMA device PM / OPP: Use snprintf() instead of sprintf() Documentation: cpufreq: intel_pstate: enhance documentation ACPI, PCI, irq: remove redundant check for null string pointer ACPI / video: driver must be registered before checking for keypresses cpufreq-dt: fix handling regulator_get_voltage() result cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC PM / sleep: Add support for read-only sysfs attributes ACPI: Fix white space in a structure definition ACPI / SBS: fix inconsistent indenting inside if statement PNP: respect PNP_DRIVER_RES_DO_NOT_CHANGE when detaching ACPI / PNP: constify device IDs ACPI / PCI: Simplify acpi_penalize_isa_irq() ...	2016-01-12 20:25:09 -08:00
Linus Torvalds	1baa5efbeb	* s390: Support for runtime instrumentation within guests, support of 248 VCPUs. * ARM: rewrite of the arm64 world switch in C, support for 16-bit VM identifiers. Performance counter virtualization missed the boat. * x86: Support for more Hyper-V features (synthetic interrupt controller), MMU cleanups -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJWlSKwAAoJEL/70l94x66DY0UIAK5vp4zfQoQOJC4KP4Xgxwdu kpnK2Boz3/74o1b0y5+eJZoUZCsXCVLtmP5uhmMxUYWDgByFG2X8ZDhPFwB5FYLT 2dN+Lr4tsolgIfRdHZtrT6Svp9SDL039bWTdscnbR6l37/j9FRWvpKdhI3orloFD /i4CSW2dVIq1/9Xctwu/rtcOEesEx4Cad+6YV3/530eVAXFzE908nXfmqJNZTocY YCGcmrMVCOu0ng5QM4xSzmmYjKMLUcRs+QzZWkVBzdJtTgwZUr09yj7I2dZ1yj/i cxYrJy6shSwE74XkXsmvG+au3C5u3vX4tnXjBFErnPJ99oqzHatVnFWNRhj4dLQ= =PIj1 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "PPC changes will come next week. - s390: Support for runtime instrumentation within guests, support of 248 VCPUs. - ARM: rewrite of the arm64 world switch in C, support for 16-bit VM identifiers. Performance counter virtualization missed the boat. - x86: Support for more Hyper-V features (synthetic interrupt controller), MMU cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (115 commits) kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROL kvm/x86: Hyper-V SynIC timers tracepoints kvm/x86: Hyper-V SynIC tracepoints kvm/x86: Update SynIC timers on guest entry only kvm/x86: Skip SynIC vector check for QEMU side kvm/x86: Hyper-V fix SynIC timer disabling condition kvm/x86: Reorg stimer_expiration() to better control timer restart kvm/x86: Hyper-V unify stimer_start() and stimer_restart() kvm/x86: Drop stimer_stop() function kvm/x86: Hyper-V timers fix incorrect logical operation KVM: move architecture-dependent requests to arch/ KVM: renumber vcpu->request bits KVM: document which architecture uses each request bit KVM: Remove unused KVM_REQ_KICK to save a bit in vcpu->requests kvm: x86: Check kvm_write_guest return value in kvm_write_wall_clock KVM: s390: implement the RI support of guest kvm/s390: drop unpaired smp_mb kvm: x86: fix comment about {mmu,nested_mmu}.gva_to_gpa KVM: x86: MMU: Use clear_page() instead of init_shadow_page_table() arm/arm64: KVM: Detect vGIC presence at runtime ...	2016-01-12 13:22:12 -08:00
Linus Torvalds	c9bed1cf51	xen: features and fixes for 4.5-rc0 - Stolen ticks and PV wallclock support for arm/arm64. - Add grant copy ioctl to gntdev device. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWk5IUAAoJEFxbo/MsZsTRLxwH/1BDcrbQDRc5hxUOG9JEYSUt H/lMjvZRShPkzweijdNon95ywAXhcSbkS9IV2Mp0+CZV7VyeymW7QIW/g4+G6iRg +LnoV77PAhPv/cmsr1pENXqRCclvemlxQOf7UyWLezuKhB71LC+oNaEnpk/tPIZS et/qef+m/SgSP5R91nO0Esv2KfP7za0UrgJf3Ee4GzjSeDkya0Hko06Cy3yc1/RT 082kHpQ1/KFcHHh2qhdCQwyzhq/cwFkuDA6ksKYJoxC6YAVC2mvvkuIOZYbloHDL c/dzuP9qjjxOZ7Gblv2cmg+RE4UqRfBhxmMycxSCcwW/Mt5LaftCpAxpBQKq2/8= =6F/q -----END PGP SIGNATURE----- Merge tag 'for-linus-4.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from David Vrabel: "Xen features and fixes for 4.5-rc0: - Stolen ticks and PV wallclock support for arm/arm64 - Add grant copy ioctl to gntdev device" * tag 'for-linus-4.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/gntdev: add ioctl for grant copy x86/xen: don't reset vcpu_info on a cancelled suspend xen/gntdev: constify mmu_notifier_ops structures xen/grant-table: constify gnttab_ops structure xen/time: use READ_ONCE xen/x86: convert remaining timespec to timespec64 in xen_pvclock_gtod_notify xen/x86: support XENPF_settime64 xen/arm: set the system time in Xen via the XENPF_settime64 hypercall xen/arm: introduce xen_read_wallclock arm: extend pvclock_wall_clock with sec_hi xen: introduce XENPF_settime64 xen/arm: introduce HYPERVISOR_platform_op on arm and arm64 xen: rename dom0_op to platform_op xen/arm: account for stolen ticks arm64: introduce CONFIG_PARAVIRT, PARAVIRT_TIME_ACCOUNTING and pv_time_ops arm: introduce CONFIG_PARAVIRT, PARAVIRT_TIME_ACCOUNTING and pv_time_ops missing include asm/paravirt.h in cputime.c xen: move xen_setup_runstate_info and get_runstate_snapshot to drivers/xen/time.c	2016-01-12 13:05:36 -08:00
Michael S. Tsirkin	1638fb7207	x86: define __smp_xxx This defines __smp_xxx barriers for x86, for use by virtualization. smp_xxx barriers are removed as they are defined correctly by asm-generic/barriers.h Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2016-01-12 20:46:59 +02:00
Michael S. Tsirkin	300b06d455	x86: reuse asm-generic/barrier.h As on most architectures, on x86 read_barrier_depends and smp_read_barrier_depends are empty. Drop the local definitions and pull the generic ones from asm-generic/barrier.h instead: they are identical. This is in preparation to refactoring this code area. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2016-01-12 20:46:52 +02:00
Rusty Russell	e27d90e8be	lguest: Map switcher text R/O Pavel noted that lguest maps the switcher code executable and read-write. This is a bad idea for any kernel text, but particularly for text mapped at a fixed address. Create two vmas, one for the text (PAGE_KERNEL_RX) and another for the stacks (PAGE_KERNEL). Use VM_NO_GUARD to map them adjacent (as expected by the rest of the code). Reported-by: Pavel Machek <pavel@ucw.cz> Tested-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-12 12:17:28 +01:00
Andy Lutomirski	bd902c5362	x86/vdso: Disallow vvar access to vclock IO for never-used vclocks It makes me uncomfortable that even modern systems grant every process direct read access to the HPET. While fixing this for real without regressing anything is a mess (unmapping the HPET is tricky because we don't adequately track all the mappings), we can do almost as well by tracking which vclocks have ever been used and only allowing pages associated with used vclocks to be faulted in. This will cause rogue programs that try to peek at the HPET to get SIGBUS instead on most systems. We can't restrict faults to vclock pages that are associated with the currently selected vclock due to a race: a process could start to access the HPET for the first time and race against a switch away from the HPET as the current clocksource. We can't segfault the process trying to peek at the HPET in this case, even though the process isn't going to do anything useful with the data. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e79d06295625c02512277737ab55085a498ac5d8.1451446564.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-12 11:59:35 +01:00
Andy Lutomirski	05ef76b20f	x86/vdso: Use .fault for the vDSO text mapping The old scheme for mapping the vDSO text is rather complicated. vdso2c generates a struct vm_special_mapping and a blank .pages array of the correct size for each vdso image. Init code in vdso/vma.c populates the .pages array for each vDSO image, and the mapping code selects the appropriate struct vm_special_mapping. With .fault, we can use a less roundabout approach: vdso_fault() just returns the appropriate page for the selected vDSO image. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f886954c186bafd74e1b967c8931d852ae199aa2.1451446564.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-12 11:59:34 +01:00
Andy Lutomirski	352b78c62f	x86/vdso: Track each mm's loaded vDSO image as well as its base As we start to do more intelligent things with the vDSO at runtime (as opposed to just at mm initialization time), we'll need to know which vDSO is in use. In principle, we could guess based on the mm type, but that's over-complicated and error-prone. Instead, just track it in the mmu context. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/c99ac48681bad709ca7ad5ee899d9042a3af6b00.1451446564.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-12 11:59:34 +01:00
yu-cheng yu	394db20ca2	x86/fpu: Disable AVX when eagerfpu is off When "eagerfpu=off" is given as a command-line input, the kernel should disable AVX support. The Task Switched bit used for lazy context switching does not support AVX. If AVX is enabled without eagerfpu context switching, one task's AVX state could become corrupted or leak to other tasks. This is a bug and has bad security implications. This only affects systems that have AVX/AVX2/AVX512 and this issue will be found only when one actually uses AVX/AVX2/AVX512 _AND_ does eagerfpu=off. Reference: Intel Software Developer's Manual Vol. 3A Sec. 2.5 Control Registers: TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the x87 FPU/ MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch to be delayed until an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instruction is actually executed by the new task. Sec. 13.4.1 Using the TS Flag to Control the Saving of the X87 FPU and SSE State When the TS flag is set, the processor monitors the instruction stream for x87 FPU, MMX, SSE instructions. When the processor detects one of these instructions, it raises a device-not-available exeception (#NM) prior to executing the instruction. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/1452119094-7252-5-git-send-email-yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-12 11:51:21 +01:00
yu-cheng yu	a5fe93a549	x86/fpu: Disable MPX when eagerfpu is off This issue is a fallout from the command-line parsing move. When "eagerfpu=off" is given as a command-line input, the kernel should disable MPX support. The decision for turning off MPX was made in fpu__init_system_ctx_switch(), which is after the selection of the XSAVE format. This patch fixes it by getting that decision done earlier in fpu__init_system_xstate(). Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/1452119094-7252-4-git-send-email-yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-12 11:51:21 +01:00
Ingo Molnar	c0c57019a6	Merge commit 'linus' into x86/urgent, to pick up recent x86 changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-12 11:08:13 +01:00
Linus Torvalds	ae8a52185e	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updates from Ingo Molnar: "Two changes: - one to quirk-save/restore certain system MSRs across suspend/resume, to make certain Intel systems work better (Chen Yu) - and also to constify a read only structure (Julia Lawall)" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/calgary: Constify cal_chipset_ops structures x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume	2016-01-11 17:45:32 -08:00
Linus Torvalds	6896d9f7e7	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Ingo Molnar: "This cleans up the FPU fault handling methods to be more robust, and moves eligible variables to .init.data" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Put a few variables in .init.data x86/fpu: Get rid of xstate_fault() x86/fpu: Add an XSTATE_OP() macro	2016-01-11 16:56:38 -08:00
Linus Torvalds	671d5532aa	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Ingo Molnar: "The main changes in this cycle were: - Improved CPU ID handling code and related enhancements (Borislav Petkov) - RDRAND fix (Len Brown)" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Replace RDRAND forced-reseed with simple sanity check x86/MSR: Chop off lower 32-bit value x86/cpu: Fix MSR value truncation issue x86/cpu/amd, kvm: Satisfy guest kernel reads of IC_CFG MSR kvm: Add accessors for guest CPU's family, model, stepping x86/cpu: Unify CPU family, model, stepping calculation	2016-01-11 16:46:20 -08:00
Linus Torvalds	67c707e451	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "The main changes in this cycle were: - code patching and cpu_has cleanups (Borislav Petkov) - paravirt cleanups (Juergen Gross) - TSC cleanup (Thomas Gleixner) - ptrace cleanup (Chen Gang)" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arch/x86/kernel/ptrace.c: Remove unused arg_offs_table x86/mm: Align macro defines x86/cpu: Provide a config option to disable static_cpu_has x86/cpufeature: Remove unused and seldomly used cpu_has_xx macros x86/cpufeature: Cleanup get_cpu_cap() x86/cpufeature: Move some of the scattered feature bits to x86_capability x86/paravirt: Remove paravirt ops pmd_update[_defer] and pte_update_defer x86/paravirt: Remove unused pv_apic_ops structure x86/tsc: Remove unused tsc_pre_init() hook x86: Remove unused function cpu_has_ht_siblings() x86/paravirt: Kill some unused patching functions	2016-01-11 16:26:03 -08:00
Rafael J. Wysocki	1e3f28a552	Merge branch 'acpi-soc' * acpi-soc: PM / clk: don't leave clocks enabled when driver not bound i2c: dw: Add APM X-Gene ACPI I2C device support ACPI / APD: Add APM X-Gene ACPI I2C device support ACPI / LPSS: change 'does not have' to 'has' in comment Revert "dmaengine: dw: platform: provide platform data for Intel" dmaengine: dw: return immediately from IRQ when DMA isn't in use dmaengine: dw: platform: power on device on shutdown ACPI / LPSS: override power state for LPSS DMA device ACPI / LPSS: power on when probe() and otherwise when remove() ACPI / LPSS: do delay for all LPSS devices when D3->D0 ACPI / LPSS: allow to use specific PM domain during ->probe() Revert "ACPI / LPSS: allow to use specific PM domain during ->probe()" device core: add BUS_NOTIFY_DRIVER_NOT_BOUND notification x86/platform/iosf_mbi: Remove duplicate definitions Conflicts: drivers/i2c/busses/i2c-designware-platdrv.c	2016-01-12 01:08:47 +01:00
Linus Torvalds	88cbfd0711	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "The main changes in this cycle were: - vDSO and asm entry improvements (Andy Lutomirski) - Xen paravirt entry enhancements (Boris Ostrovsky) - asm entry labels enhancement (Borislav Petkov) - and other misc changes (Thomas Gleixner, me)" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n Revert "x86/kvm: On KVM re-enable (e.g. after suspend), update clocks" x86/entry/64_compat: Make labels local x86/platform/uv: Include clocksource.h for clocksource_touch_watchdog() x86/vdso: Enable vdso pvclock access on all vdso variants x86/vdso: Remove pvclock fixmap machinery x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader x86/kvm: On KVM re-enable (e.g. after suspend), update clocks x86/entry/64: Bypass enter_from_user_mode on non-context-tracking boots x86/asm: Add asm macros for static keys/jump labels x86/asm: Error out if asm/jump_label.h is included inappropriately context_tracking: Switch to new static_branch API x86/entry, x86/paravirt: Remove the unused usergs_sysret32 PV op x86/paravirt: Remove the unused irq_enable_sysexit pv op x86/xen: Avoid fast syscall path for Xen PV guests	2016-01-11 15:58:16 -08:00
Linus Torvalds	4f19b8803b	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Ingo Molnar: "The main changes in this cycle were: - introduce optimized single IPI sending methods on modern APICs (Linus Torvalds, Thomas Gleixner) - kexec/crash APIC handling fixes and enhancements (Hidehiro Kawai) - extend lapic vector saving/restoring to the CMCI (MCE) vector as well (Juergen Gross) - various fixes and enhancements (Jake Oshins, Len Brown)" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/irq: Export functions to allow MSI domains in modules Documentation: Document kernel.panic_on_io_nmi sysctl x86/nmi: Save regs in crash dump on external NMI x86/apic: Introduce apic_extnmi command line parameter kexec: Fix race between panic() and crash_kexec() panic, x86: Allow CPUs to save registers even if looping in NMI context panic, x86: Fix re-entrance problem due to panic on NMI x86/apic: Fix the saving and restoring of lapic vectors during suspend/resume x86/smpboot: Re-enable init_udelay=0 by default on modern CPUs x86/smp: Remove single IPI wrapper x86/apic: Use default send single IPI wrapper x86/apic: Provide default send single IPI wrapper x86/apic: Implement single IPI for apic_noop x86/apic: Wire up single IPI for apic_numachip x86/apic: Wire up single IPI for x2apic_uv x86/apic: Implement single IPI for x2apic_phys x86/apic: Wire up single IPI for bigsmp_apic x86/apic: Remove pointless indirections from bigsmp_apic x86/apic: Wire up single IPI for apic_physflat x86/apic: Remove pointless indirections from apic_physflat ...	2016-01-11 15:37:06 -08:00
Linus Torvalds	4bd20db2c0	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "Various x86 MCE fixes and small enhancements" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Make usable address checks Intel-only x86/mce: Add the missing memory error check on AMD x86/RAS: Remove mce.usable_addr x86/mce: Do not enter deferred errors into the generic pool twice	2016-01-11 15:07:19 -08:00
Linus Torvalds	5cb52b5e16	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side changes: - Intel Knights Landing support. (Harish Chegondi) - Intel Broadwell-EP uncore PMU support. (Kan Liang) - Core code improvements. (Peter Zijlstra.) - Event filter, LBR and PEBS fixes. (Stephane Eranian) - Enable cycles:pp on Intel Atom. (Stephane Eranian) - Add cycles:ppp support for Skylake. (Andi Kleen) - Various x86 NMI overhead optimizations. (Andi Kleen) - Intel PT enhancements. (Takao Indoh) - AMD cache events fix. (Vince Weaver) Tons of tooling changes: - Show random perf tool tips in the 'perf report' bottom line (Namhyung Kim) - perf report now defaults to --group if the perf.data file has grouped events, try it with: # perf record -e '{cycles,instructions}' -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.093 MB perf.data (1247 samples) ] # perf report # Samples: 1K of event 'anon group { cycles, instructions }' # Event count (approx.): 1955219195 # # Overhead Command Shared Object Symbol 2.86% 0.22% swapper [kernel.kallsyms] [k] intel_idle 1.05% 0.33% firefox libxul.so [.] js::SetObjectElement 1.05% 0.00% kworker/0:3 [kernel.kallsyms] [k] gen6_ring_get_seqno 0.88% 0.17% chrome chrome [.] 0x0000000000ee27ab 0.65% 0.86% firefox libxul.so [.] js::ValueToId<(js::AllowGC)1> 0.64% 0.23% JS Helper libxul.so [.] js::SplayTree<js::jit::LiveRange, js::jit::LiveRange>::splay 0.62% 1.27% firefox libxul.so [.] js::GetIterator 0.61% 1.74% firefox libxul.so [.] js::NativeSetProperty 0.61% 0.31% firefox libxul.so [.] js::SetPropertyByDefining - Introduce the 'perf stat record/report' workflow: Generate perf.data files from 'perf stat', to tap into the scripting capabilities perf has instead of defining a 'perf stat' specific scripting support to calculate event ratios, etc. Simple example: $ perf stat record -e cycles usleep 1 Performance counter stats for 'usleep 1': 1,134,996 cycles 0.000670644 seconds time elapsed $ perf stat report Performance counter stats for '/home/acme/bin/perf stat record -e cycles usleep 1': 1,134,996 cycles 0.000670644 seconds time elapsed $ It generates PERF_RECORD_ userspace records to store the details: $ perf report -D \| grep PERF_RECORD 0xf0 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 27637 0x118 [0x12]: PERF_RECORD_CPU_MAP nr: 1 cpu: 65535 0x12a [0x40]: PERF_RECORD_STAT_CONFIG 0x16a [0x30]: PERF_RECORD_STAT -1 -1 0x19a [0x40]: PERF_RECORD_MMAP -1/0: [0xffffffff81000000(0x1f000000) @ 0xffffffff81000000]: x [kernel.kallsyms]_text 0x1da [0x18]: PERF_RECORD_STAT_ROUND [acme@ssdandy linux]$ An effort was made to make perf.data files generated like this to not generate cryptic messages when processed by older tools. The 'perf script' bits need rebasing, will go up later. - Make command line options always available, even when they depend on some feature being enabled, warning the user about use of such options (Wang Nan) - Support hw breakpoint events (mem:0xAddress) in the default output mode in 'perf script' (Wang Nan) - Fixes and improvements for supporting annotating ARM binaries, support ARM call and jump instructions, more work needed to have arch specific stuff separated into tools/perf/arch//annotate/ (Russell King) - Add initial 'perf config' command, for now just with a --list command to the contents of the configuration file in use and a basic man page describing its format, commands for doing edits and detailed documentation are being reviewed and proof-read. (Taeung Song) - Allows BPF scriptlets specify arguments to be fetched using DWARF info, using a prologue generated at compile/build time (He Kuang, Wang Nan) - Allow attaching BPF scriptlets to module symbols (Wang Nan) - Allow attaching BPF scriptlets to userspace code using uprobe (Wang Nan) - BPF programs now can specify 'perf probe' tunables via its section name, separating key=val values using semicolons (Wang Nan) Testing some of these new BPF features: Use case: get callchains when receiving SSL packets, filter then in the kernel, at arbitrary place. # cat ssl.bpf.c #define SEC(NAME) __attribute__((section(NAME), used)) struct pt_regs; SEC("func=__inet_lookup_established hnum") int func(struct pt_regs ctx, int err, unsigned short port) { return err == 0 && port == 443; } char _license[] SEC("license") = "GPL"; int _version SEC("version") = LINUX_VERSION_CODE; # # perf record -a -g -e ssl.bpf.c ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.787 MB perf.data (3 samples) ] # perf script \| head -30 swapper 0 [000] 58783.268118: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb 8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux) 896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux) 8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux) 855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux) 8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux) 8572a8 process_backlog (/lib/modules/4.3.0+/build/vmlinux) 856b11 net_rx_action (/lib/modules/4.3.0+/build/vmlinux) 2a284b __do_softirq (/lib/modules/4.3.0+/build/vmlinux) 2a2ba3 irq_exit (/lib/modules/4.3.0+/build/vmlinux) 96b7a4 do_IRQ (/lib/modules/4.3.0+/build/vmlinux) 969807 ret_from_intr (/lib/modules/4.3.0+/build/vmlinux) 2dede5 cpu_startup_entry (/lib/modules/4.3.0+/build/vmlinux) 95d5bc rest_init (/lib/modules/4.3.0+/build/vmlinux) 1163ffa start_kernel ([kernel.vmlinux].init.text) 11634d7 x86_64_start_reservations ([kernel.vmlinux].init.text) 1163623 x86_64_start_kernel ([kernel.vmlinux].init.text) qemu-system-x86 9178 [003] 58785.792417: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb 8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux) 896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux) 8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux) 855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux) 8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux) 856660 netif_receive_skb_internal (/lib/modules/4.3.0+/build/vmlinux) 8566ec netif_receive_skb_sk (/lib/modules/4.3.0+/build/vmlinux) 430a br_handle_frame_finish ([bridge]) 48bc br_handle_frame ([bridge]) 855f44 __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux) 8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux) # - Use 'perf probe' various options to list functions, see what variables can be collected at any given point, experiment first collecting without a filter, then filter, use it together with 'perf trace', 'perf top', with or without callchains, if it explodes, please tell us! - Introduce a new callchain mode: "folded", that will list per line representations of all callchains for a give histogram entry, facilitating 'perf report' output processing by other tools, such as Brendan Gregg's flamegraph tools (Namhyung Kim) E.g: # perf report \| grep -v ^# \| head 18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry \| ---cpu_startup_entry \| \|--12.07%--start_secondary \| --6.30%--rest_init start_kernel x86_64_start_reservations x86_64_start_kernel # Becomes, in "folded" mode: # perf report -g folded \| grep -v ^# \| head -5 18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry 12.07% cpu_startup_entry;start_secondary 6.30% cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel 16.90% 0.00% swapper [kernel.kallsyms] [k] call_cpuidle 11.23% call_cpuidle;cpu_startup_entry;start_secondary 5.67% call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel 16.90% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter 11.23% cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 5.67% cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel 15.12% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter_state # The user can also select one of "count", "period" or "percent" as the first column. ... and lots of infrastructure enhancements, plus fixes and other changes, features I failed to list - see the shortlog and the git log for details" 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (271 commits) perf evlist: Add --trace-fields option to show trace fields perf record: Store data mmaps for dwarf unwind perf libdw: Check for mmaps also in MAP__VARIABLE tree perf unwind: Check for mmaps also in MAP__VARIABLE tree perf unwind: Use find_map function in access_dso_mem perf evlist: Remove perf_evlist__(enable\|disable)_event functions perf evlist: Make perf_evlist__open() open evsels with their cpus and threads (like perf record does) perf report: Show random usage tip on the help line perf hists: Export a couple of hist functions perf diff: Use perf_hpp__register_sort_field interface perf tools: Add overhead/overhead_children keys defaults via string perf tools: Remove list entry from struct sort_entry perf tools: Include all tools/lib directory for tags/cscope/TAGS targets perf script: Align event name properly perf tools: Add missing headers in perf's MANIFEST perf tools: Do not show trace command if it's not compiled in perf report: Change default to use event group view perf top: Decay periods in callchains tools lib: Move bitmap.[ch] from tools/perf/ to tools/{lib,include}/ tools lib: Sync tools/lib/find_bit.c with the kernel ...	2016-01-11 14:39:17 -08:00
Linus Torvalds	24af98c4cf	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "So we have a laundry list of locking subsystem changes: - continuing barrier API and code improvements - futex enhancements - atomics API improvements - pvqspinlock enhancements: in particular lock stealing and adaptive spinning - qspinlock micro-enhancements" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op futex: Cleanup the goto confusion in requeue_pi() futex: Remove pointless put_pi_state calls in requeue() futex: Document pi_state refcounting in requeue code futex: Rename free_pi_state() to put_pi_state() futex: Drop refcount if requeue_pi() acquired the rtmutex locking/barriers, arch: Remove ambiguous statement in the smp_store_mb() documentation lcoking/barriers, arch: Use smp barriers in smp_store_release() locking/cmpxchg, arch: Remove tas() definitions locking/pvqspinlock: Queue node adaptive spinning locking/pvqspinlock: Allow limited lock stealing locking/pvqspinlock: Collect slowpath lock statistics sched/core, locking: Document Program-Order guarantees locking, sched: Introduce smp_cond_acquire() and use it locking/pvqspinlock, x86: Optimize the PV unlock code path locking/qspinlock: Avoid redundant read of next pointer locking/qspinlock: Prefetch the next node cacheline locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg() atomics: Add test for atomic operations with _relaxed variants	2016-01-11 14:18:38 -08:00
H.J. Lu	8c31902cff	x86/boot: Double BOOT_HEAP_SIZE to 64KB When decompressing kernel image during x86 bootup, malloc memory for ELF program headers may run out of heap space, which leads to system halt. This patch doubles BOOT_HEAP_SIZE to 64KB. Tested with 32-bit kernel which failed to boot without this patch. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: <stable@vger.kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-11 12:30:50 +01:00
Andy Lutomirski	71b3c126e6	x86/mm: Add barriers and document switch_mm()-vs-flush synchronization When switch_mm() activates a new PGD, it also sets a bit that tells other CPUs that the PGD is in use so that TLB flush IPIs will be sent. In order for that to work correctly, the bit needs to be visible prior to loading the PGD and therefore starting to fill the local TLB. Document all the barriers that make this work correctly and add a couple that were missing. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-11 12:03:15 +01:00
Paolo Bonzini	2860c4b167	KVM: move architecture-dependent requests to arch/ Since the numbers now overlap, it makes sense to enumerate them in asm/kvm_host.h rather than linux/kvm_host.h. Functions that refer to architecture-specific requests are also moved to arch/. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-01-08 19:04:36 +01:00
Andy Shevchenko	eebb3e8d8a	ACPI / LPSS: override power state for LPSS DMA device This is a third approach to workaround long standing issue with LPSS on BayTrail. First one [1] was reverted since it didn't resolve the issue comprehensively. Second one [2] was rejected by internal review. The LPSS DMA controller does not have neither _PS0 nor _PS3 method. Moreover it can be powered off automatically whenever the last LPSS device goes down. In case of no power any access to the DMA controller will hang the system. The behaviour is reproduced on some HP laptops based on Intel BayTrail [3,4] as well as on ASuS T100TA transformer. Power on the LPSS island through the registers accessible in a specific way. [1] http://www.spinics.net/lists/linux-acpi/msg53963.html [2] https://bugzilla.redhat.com/attachment.cgi?id=1066779&action=diff [3] https://bugzilla.redhat.com/show_bug.cgi?id=1184273 [4] http://www.spinics.net/lists/dmaengine/msg01514.html Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-01-07 14:11:32 +01:00
Paolo Bonzini	def840ede3	KVM/ARM changes for Linux v4.5 - Complete rewrite of the arm64 world switch in C, hopefully paving the way for more sharing with the 32bit code, better maintainability and easier integration of new features. Also smaller and slightly faster in some cases... - Support for 16bit VM identifiers - Various cleanups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWe80IAAoJECPQ0LrRPXpDdt8P/ittxzklIT7rsZxdOiIY6vLQ i2hWGo1KdZR+8rsNyQEeGyg2Ocdja0Vld9ciBKgXKeKtc6x6AHfq0x6eyGRbF2jJ Wdkd2a9lLJVJJIf1LBhOQuwjNiEvAgvqO5nwXL77s/rqx3Ur5OlyohSvRFBy7Pqp 8cdnCV/43I/y7k0iGhitFVrEC9TL0cfeJmM7YhXkt8IcpkcpCDfgdI7wgIb0ntvv dqvrRmfp+Q3/hJ6SVRsy6uzOrjFjRE8hIIG6TiqfRd/FgI5x2xvGkSAE3Wx2YdRM myPDiAuY2wOyALZpn9zD7qFMOfI2wX8kaX5S/ctnbvLelkmQblI39/zfYuxJ36xC Mo2yMKcvT37AIMz2fxx3mGnIR7NZBNXVQGJmv/1p9vMQ8RbUXUhT0w6hP5SH9S7m RDoOkfd37wQugQ7bgI7cqg4hodMRlybGPq8QaKp80y0Ej3cPblM+y0fbR153SSbS 6nCwYceFLdWJEV+tTFpKD5cvxOGeYfoC/8LcVRYRcWg6nlr4+qo61MHevyCe/Qxw Pw+z9wFpoVKumRT72KmzFUxFQqRnTshE3KJqNJdsqMPM8ZuW5TJ/MtUa+JdAWmSH dEAqd7Hy5W1CqgGEk+los+QlKw+5uFepcZ7SOuQ2ehME/Ae3xI8s9+UE6yqeHx6Y PV2s5lyJRCPfm3qOu7TS =wuir -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next KVM/ARM changes for Linux v4.5 - Complete rewrite of the arm64 world switch in C, hopefully paving the way for more sharing with the 32bit code, better maintainability and easier integration of new features. Also smaller and slightly faster in some cases... - Support for 16bit VM identifiers - Various cleanups	2016-01-07 11:00:57 +01:00
Andy Lutomirski	8705d603ed	x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n arch/x86/built-in.o: In function `arch_setup_additional_pages': (.text+0x587): undefined reference to `pvclock_pvti_cpu0_va' KVM_GUEST selects PARAVIRT_CLOCK, so we can make pvclock_pvti_cpu0_va depend on KVM_GUEST. Signed-off-by: Andy Lutomirski <luto@kernel.org> Tested-by: Borislav Petkov <bp@alien8.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/444d38a9bcba832685740ea1401b569861d09a72.1451446564.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-01-06 10:49:53 +01:00
Mark Brown	b9546d09b1	Merge remote-tracking branches 'asoc/topic/fsl-spdif', 'asoc/topic/img' and 'asoc/topic/intel' into asoc-next	2015-12-23 00:23:43 +00:00
Stefano Stabellini	cfafae9403	xen: rename dom0_op to platform_op The dom0_op hypercall has been renamed to platform_op since Xen 3.2, which is ancient, and modern upstream Linux kernels cannot run as dom0 and it anymore anyway. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2015-12-21 14:40:55 +00:00
Jake Oshins	c8f3e518d3	x86/irq: Export functions to allow MSI domains in modules The Linux kernel already has the concept of IRQ domain, wherein a component can expose a set of IRQs which are managed by a particular interrupt controller chip or other subsystem. The PCI driver exposes the notion of an IRQ domain for Message-Signaled Interrupts (MSI) from PCI Express devices. This patch exposes the functions which are necessary for creating a MSI IRQ domain within a module. [ tglx: Split it into x86 and core irq parts ] Signed-off-by: Jake Oshins <jakeo@microsoft.com> Cc: gregkh@linuxfoundation.org Cc: kys@microsoft.com Cc: devel@linuxdriverproject.org Cc: olaf@aepfle.de Cc: apw@canonical.com Cc: vkuznets@redhat.com Cc: haiyangz@microsoft.com Cc: marc.zyngier@arm.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1449769983-12948-4-git-send-email-jakeo@microsoft.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-12-20 12:40:49 +01:00
David Vrabel	d8c98a1d14	x86/paravirt: Prevent rtc_cmos platform device init on PV guests Adding the rtc platform device in non-privileged Xen PV guests causes an IRQ conflict because these guests do not have legacy PIC and may allocate irqs in the legacy range. In a single VCPU Xen PV guest we should have: /proc/interrupts: CPU0 0: 4934 xen-percpu-virq timer0 1: 0 xen-percpu-ipi spinlock0 2: 0 xen-percpu-ipi resched0 3: 0 xen-percpu-ipi callfunc0 4: 0 xen-percpu-virq debug0 5: 0 xen-percpu-ipi callfuncsingle0 6: 0 xen-percpu-ipi irqwork0 7: 321 xen-dyn-event xenbus 8: 90 xen-dyn-event hvc_console ... But hvc_console cannot get its interrupt because it is already in use by rtc0 and the console does not work. genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000 (rtc0) We can avoid this problem by realizing that unprivileged PV guests (both Xen and lguests) are not supposed to have rtc_cmos device and so adding it is not necessary. Privileged guests (i.e. Xen's dom0) do use it but they should not have irq conflicts since they allocate irqs above legacy range (above gsi_top, in fact). Instead of explicitly testing whether the guest is privileged we can extend pv_info structure to include information about guest's RTC support. Reported-and-tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: vkuznets@redhat.com Cc: xen-devel@lists.xenproject.org Cc: konrad.wilk@oracle.com Cc: stable@vger.kernel.org # 4.2+ Link: http://lkml.kernel.org/r/1449842873-2613-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-12-19 21:35:13 +01:00
Pierre-Louis Bossart	8788f83929	ASoc: Intel: Atom: add deep buffer definitions for atom platforms Add definitions for MERR_DPCM_DEEP_BUFFER AND PIPE_MEDIA3_IN Add relevant cpu-dai and dai link names Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>	2015-12-19 11:49:56 +00:00
Borislav Petkov	4baf7fe407	x86/mm: Align macro defines Bring PAGE_{SHIFT,SIZE,MASK} to the same indentation level as the rest of the header. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1449480268-26583-1-git-send-email-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-12-19 11:53:40 +01:00
Borislav Petkov	6e1315fe82	x86/cpu: Provide a config option to disable static_cpu_has This brings .text savings of about ~1.6K when building a tinyconfig. It is off by default so nothing changes for the default. Kconfig help text from Josh. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Link: http://lkml.kernel.org/r/1449481182-27541-5-git-send-email-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-12-19 11:49:55 +01:00
Borislav Petkov	362f924b64	x86/cpufeature: Remove unused and seldomly used cpu_has_xx macros Those are stupid and code should use static_cpu_has_safe() or boot_cpu_has() instead. Kill the least used and unused ones. The remaining ones need more careful inspection before a conversion can happen. On the TODO. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1449481182-27541-4-git-send-email-bp@alien8.de Cc: David Sterba <dsterba@suse.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Matt Mackall <mpm@selenic.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-12-19 11:49:55 +01:00
Borislav Petkov	39c06df4dc	x86/cpufeature: Cleanup get_cpu_cap() Add an enum for the ->x86_capability array indices and cleanup get_cpu_cap() by killing some redundant local vars. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1449481182-27541-3-git-send-email-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-12-19 11:49:54 +01:00
Borislav Petkov	2ccd71f1b2	x86/cpufeature: Move some of the scattered feature bits to x86_capability Turn the CPUID leafs which are proper CPUID feature bit leafs into separate ->x86_capability words. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1449481182-27541-2-git-send-email-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-12-19 11:49:53 +01:00
Thomas Gleixner	0fa85119cd	Merge branch 'linus' into x86/cleanups Pull in upstream changes so we can apply depending patches.	2015-12-19 11:49:13 +01:00
Hidehiro Kawai	b279d67df8	x86/nmi: Save regs in crash dump on external NMI Now, multiple CPUs can receive an external NMI simultaneously by specifying the "apic_extnmi=all" command line parameter. When we take a crash dump by using external NMI with this option, we fail to save registers into the crash dump. This happens as follows: CPU 0 CPU 1 ================================ ============================= receive an external NMI default_do_nmi() receive an external NMI spin_lock(&nmi_reason_lock) default_do_nmi() io_check_error() spin_lock(&nmi_reason_lock) panic() busy loop ... kdump_nmi_shootdown_cpus() issue NMI IPI -----------> blocked until IRET busy loop... Here, since CPU 1 is in NMI context, an additional NMI from CPU 0 remains unhandled until CPU 1 IRETs. However, CPU 1 will never execute IRET so the NMI is not handled and the callback function to save registers is never called. To solve this issue, we check if the IPI for crash dumping was issued while waiting for nmi_reason_lock to be released, and if so, call its callback function directly. If the IPI is not issued (e.g. kdump is disabled), the actual behavior doesn't change. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: kexec@lists.infradead.org Cc: linux-doc@vger.kernel.org Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stefan Lippers-Hollmann <s.l-h@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/20151210065245.4587.39316.stgit@softrs Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-12-19 11:07:01 +01:00
Hidehiro Kawai	b7c4948e98	x86/apic: Introduce apic_extnmi command line parameter This patch introduces a command line parameter apic_extnmi: apic_extnmi=( bsp\|all\|none ) The default value is "bsp" and this is the current behavior: only the Boot-Strapping Processor receives an external NMI. "all" allows external NMIs to be broadcast to all CPUs. This would raise the success rate of panic on NMI when BSP hangs in NMI context or the external NMI is swallowed by other NMI handlers on the BSP. If you specify "none", no CPUs receive external NMIs. This is useful for the dump capture kernel so that it cannot be shot down by accidentally pressing the external NMI button (on platforms which have it) while saving a crash dump. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Bandan Das <bsd@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: kexec@lists.infradead.org Cc: linux-doc@vger.kernel.org Cc: "Maciej W. Rozycki" <macro@linux-mips.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/20151210014632.25437.43778.stgit@softrs Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-12-19 11:07:01 +01:00
Thomas Gleixner	d267b8d6c6	Merge branch 'linus' into x86/apic Pull in update changes so we can apply conflicting patches	2015-12-19 11:03:18 +01:00
Boris Ostrovsky	91e2eea98f	x86/xen: Avoid fast syscall path for Xen PV guests After 32-bit syscall rewrite, and specifically after commit: `5f310f739b` ("x86/entry/32: Re-implement SYSENTER using the new C path") ... the stack frame that is passed to xen_sysexit is no longer a "standard" one (i.e. it's not pt_regs). Since we end up calling xen_iret from xen_sysexit we don't need to fix up the stack and instead follow entry_SYSENTER_32's IRET path directly to xen_iret. We can do the same thing for compat mode even though stack does not need to be fixed. This will allow us to drop usergs_sysret32 paravirt op (in the subsequent patch) Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1447970147-1733-2-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-12-19 09:55:52 +01:00
Linus Torvalds	5b24a7a2aa	Add 'unsafe' user access functions for batched accesses The naming is meant to discourage random use: the helper functions are not really any more "unsafe" than the traditional double-underscore functions (which need the address range checking), but they do need even more infrastructure around them, and should not be used willy-nilly. In addition to checking the access range, these user access functions require that you wrap the user access with a "user_acess_{begin,end}()" around it. That allows architectures that implement kernel user access control (x86: SMAP, arm64: PAN) to do the user access control in the wrapping user_access_begin/end part, and then batch up the actual user space accesses using the new interfaces. The main (and hopefully only) use for these are for core generic access helpers, initially just the generic user string functions (strnlen_user() and strncpy_from_user()). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-12-17 09:57:27 -08:00
Linus Torvalds	11f1a4b975	x86: reorganize SMAP handling in user space accesses This reorganizes how we do the stac/clac instructions in the user access code. Instead of adding the instructions directly to the same inline asm that does the actual user level access and exception handling, add them at a higher level. This is mainly preparation for the next step, where we will expose an interface to allow users to mark several accesses together as being user space accesses, but it does already clean up some code: - the inlined trivial cases of copy_in_user() now do stac/clac just once over the accesses: they used to do one pair around the user space read, and another pair around the write-back. - the {get,put}_user_ex() macros that are used with the catch/try handling don't do any stac/clac at all, because that happens in the try/catch surrounding them. Other than those two cleanups that happened naturally from the re-organization, this should not make any difference. Yet. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-12-17 09:45:09 -08:00
Andrey Smetanin	1f4b34f825	kvm/x86: Hyper-V SynIC timers Per Hyper-V specification (and as required by Hyper-V-aware guests), SynIC provides 4 per-vCPU timers. Each timer is programmed via a pair of MSRs, and signals expiration by delivering a special format message to the configured SynIC message slot and triggering the corresponding synthetic interrupt. Note: as implemented by this patch, all periodic timers are "lazy" (i.e. if the vCPU wasn't scheduled for more than the timer period the timer events are lost), regardless of the corresponding configuration MSR. If deemed necessary, the "catch up" mode (the timer period is shortened until the timer catches up) will be implemented later. Changes v2: * Use remainder to calculate periodic timer expiration time Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-12-16 18:49:45 +01:00
Andrey Smetanin	c71acc4c74	drivers/hv: Move struct hv_timer_message_payload into UAPI Hyper-V x86 header This struct is required for Hyper-V SynIC timers implementation inside KVM and for upcoming Hyper-V VMBus support by userspace(QEMU). So place it into Hyper-V UAPI header. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-12-16 18:49:41 +01:00
Andrey Smetanin	5b423efe11	drivers/hv: Move struct hv_message into UAPI Hyper-V x86 header This struct is required for Hyper-V SynIC timers implementation inside KVM and for upcoming Hyper-V VMBus support by userspace(QEMU). So place it into Hyper-V UAPI header. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-12-16 18:49:40 +01:00
Andrey Smetanin	4f39bcfd1c	drivers/hv: Move HV_SYNIC_STIMER_COUNT into Hyper-V UAPI x86 header This constant is required for Hyper-V SynIC timers MSR's support by userspace(QEMU). Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-12-16 18:49:40 +01:00
Ingo Molnar	057032e457	Linux 4.4-rc5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWbh6rAAoJEHm+PkMAQRiG2L4H/AmgfAPX8vYQTKXAnpxt7cLx 2W9C+fyKRcHzqBCsUB4wxPwYwDGPmEZHo4Y/evaSdUbPhngRRRfEVZpzbTUqheEm A7h/1mCl5gcaGRzDNTAST3vfmNmwOHAWC+Ch3ZuuzxH+brtY0Ynb32CNa1XnmW9K 7qknzpgyE3ZNQgwKzZ7F/+TscGcslalKRoAxPa7fumb1srW/Z04aGXYZdEQxOhow 6Oc2op0IijTse5TdfW/MsbpvbH2uBLnQcYHvKXJ0wRmnQGeowguLSgyW356EAOQ9 7L7xCvXyX5dFakZ9EApT3wsSP+cS7jUInDLDAxf14gyODrnl65KO3LU8vmZbJmI= =l5Rq -----END PGP SIGNATURE----- Merge tag 'v4.4-rc5' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-12-14 09:31:23 +01:00
Andy Lutomirski	cc1e24fdb0	x86/vdso: Remove pvclock fixmap machinery Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/4933029991103ae44672c82b97a20035f5c1fe4f.1449702533.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-12-11 08:56:03 +01:00
Andy Lutomirski	dac16fba6f	x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/9d37826fdc7e2d2809efe31d5345f97186859284.1449702533.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-12-11 08:56:03 +01:00
Tomasz Nowicki	21461775f3	x86/PCI: Clarify AMD Fam10h config access restrictions comment Clarify the comment about AMD Fam10h config access restrictions, fix typos, and add a reference to the specification. [bhelgaas: streamline] Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>	2015-12-10 19:38:06 -06:00
Andy Shevchenko	4077a387b7	x86/platform/iosf_mbi: Remove duplicate definitions The read and write opcodes are global for all units on SoC and even across Intel SoCs. Remove duplication of corresponding constants. At the same time convert all current users. No functional change. Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Boon Leong Ong <boon.leong.ong@intel.com> Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-12-09 01:18:34 +01:00
Andi Kleen	7f47d8cc03	x86, tracing, perf: Add trace point for MSR accesses For debugging low level code interacting with the CPU it is often useful to trace the MSR read/writes. This gives a concise summary of PMU and other operations. perf has an ad-hoc way to do this using trace_printk, but it's somewhat limited (and also now spews ugly boot messages when enabled) Instead define real trace points for all MSR accesses. This adds three new trace points: read_msr and write_msr and rdpmc. They also report if the access faulted (if _safe is used) This allows filtering and triggering on specific MSR values, which allows various more advanced debugging techniques. All the values are well defined in the CPU documentation. The trace can be post processed with Documentation/trace/postprocess/decode_msr.py to add symbolic MSR names to the trace. I only added it to native MSR accesses in C, not paravirtualized or in entry.S (which is not too interesting) Originally the patch kit moved the MSRs out of line. This uses an alternative approach recommended by Steven Rostedt of only moving the trace calls out of line, but open coding the access to the jump label. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1449018060-1742-3-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-12-06 12:56:10 +01:00
Andi Kleen	153a4334c4	x86/headers: Don't include asm/processor.h in asm/atomic.h asm/atomic.h doesn't really need asm/processor.h anymore. Everything it uses has moved to other header files. So remove that include. processor.h is a nasty header that includes lots of other headers and makes it prone to include loops. Removing the include here makes asm/atomic.h a "leaf" header that can be safely included in most other headers. The only fallout is in the lib/atomic tester which relied on this implicit include. Give it an explicit include. (the include is in ifdef because the user is also in ifdef) Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/1449018060-1742-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-12-06 12:56:03 +01:00
Kirill A. Shutemov	70f1528747	x86/mm: Fix regression with huge pages on PAE Recent PAT patchset has caused issue on 32-bit PAE machines: page:eea45000 count:0 mapcount:-128 mapping: (null) index:0x0 flags: 0x40000000() page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) < 0) ------------[ cut here ]------------ kernel BUG at /home/build/linux-boris/mm/huge_memory.c:1485! invalid opcode: 0000 [#1] SMP [...] Call Trace: unmap_single_vma ? __wake_up unmap_vmas unmap_region do_munmap vm_munmap SyS_munmap do_fast_syscall_32 ? __do_page_fault sysenter_past_esp Code: ... EIP: [<c11bde80>] zap_huge_pmd+0x240/0x260 SS:ESP 0068:f6459d98 The problem is in pmd_pfn_mask() and pmd_flags_mask(). These helpers use PMD_PAGE_MASK to calculate resulting mask. PMD_PAGE_MASK is 'unsigned long', not 'unsigned long long' as phys_addr_t is on 32-bit PAE (ARCH_PHYS_ADDR_T_64BIT). As a result, the upper bits of resulting mask get truncated. pud_pfn_mask() and pud_flags_mask() aren't problematic since we don't have PUD page table level on 32-bit systems, but it's reasonable to keep them consistent with PMD counterpart. Introduce PHYSICAL_PMD_PAGE_MASK and PHYSICAL_PUD_PAGE_MASK in addition to existing PHYSICAL_PAGE_MASK and reworks helpers to use them. Reported-and-Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> [ Fix -Woverflow warnings from the realmode code. ] Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jürgen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: elliott@hpe.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Fixes: `f70abb0fc3` ("x86/asm: Fix pud/pmd interfaces to handle large PAT bit") Link: http://lkml.kernel.org/r/1448878233-11390-2-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-12-04 09:14:27 +01:00
Matt Fleming	67a9108ed4	x86/efi: Build our own page table structures With commit `e1a58320a3` ("x86/mm: Warn on W^X mappings") all users booting on 64-bit UEFI machines see the following warning, ------------[ cut here ]------------ WARNING: CPU: 7 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x5dc/0x780() x86/mm: Found insecure W+X mapping at address ffff88000005f000/0xffff88000005f000 ... x86/mm: Checked W+X mappings: FAILED, 165660 W+X pages found. ... This is caused by mapping EFI regions with RWX permissions. There isn't much we can do to restrict the permissions for these regions due to the way the firmware toolchains mix code and data, but we can at least isolate these mappings so that they do not appear in the regular kernel page tables. In commit `d2f7cbe7b2` ("x86/efi: Runtime services virtual mapping") we started using 'trampoline_pgd' to map the EFI regions because there was an existing identity mapping there which we use during the SetVirtualAddressMap() call and for broken firmware that accesses those addresses. But 'trampoline_pgd' shares some PGD entries with 'swapper_pg_dir' and does not provide the isolation we require. Notably the virtual address for __START_KERNEL_map and MODULES_START are mapped by the same PGD entry so we need to be more careful when copying changes over in efi_sync_low_kernel_mappings(). This patch doesn't go the full mile, we still want to share some PGD entries with 'swapper_pg_dir'. Having completely separate page tables brings its own issues such as synchronising new mappings after memory hotplug and module loading. Sharing also keeps memory usage down. Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1448658575-17029-6-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-29 09:15:42 +01:00
Matt Fleming	c9f2a9a65e	x86/efi: Hoist page table switching code into efi_call_virt() This change is a prerequisite for pending patches that switch to a dedicated EFI page table, instead of using 'trampoline_pgd' which shares PGD entries with 'swapper_pg_dir'. The pending patches make it impossible to dereference the runtime service function pointer without first switching %cr3. It's true that we now have duplicated switching code in efi_call_virt() and efi_call_phys_{prolog,epilog}() but we are sacrificing code duplication for a little more clarity and the ease of writing the page table switching code in C instead of asm. Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1448658575-17029-5-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-29 09:15:42 +01:00
Julia Lawall	d6b56b0bc6	x86/platform/calgary: Constify cal_chipset_ops structures The cal_chipset_ops structures are never modified, so declare them as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jon D. Mason <jdmason@kudzu.us> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1448726295-10959-1-git-send-email-Julia.Lawall@lip6.fr Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-29 08:50:58 +01:00
Chen Yu	7a9c2dd08e	x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume A bug was reported that on certain Broadwell platforms, after resuming from S3, the CPU is running at an anomalously low speed. It turns out that the BIOS has modified the value of the THERM_CONTROL register during S3, and changed it from 0 to 0x10, thus enabled clock modulation(bit4), but with undefined CPU Duty Cycle(bit1:3) - which causes the problem. Here is a simple scenario to reproduce the issue: 1. Boot up the system 2. Get MSR 0x19a, it should be 0 3. Put the system into sleep, then wake it up 4. Get MSR 0x19a, it shows 0x10, while it should be 0 Although some BIOSen want to change the CPU Duty Cycle during S3, in our case we don't want the BIOS to do any modification. Fix this issue by introducing a more generic x86 framework to save/restore specified MSR registers(THERM_CONTROL in this case) for suspend/resume. This allows us to fix similar bugs in a much simpler way in the future. When the kernel wants to protect certain MSRs during suspending, we simply add a quirk entry in msr_save_dmi_table, and customize the MSR registers inside the quirk callback, for example: u32 msr_id_need_to_save[] = {MSR_ID0, MSR_ID1, MSR_ID2...}; and the quirk mechanism ensures that, once resumed from suspend, the MSRs indicated by these IDs will be restored to their original, pre-suspend values. Since both 64-bit and 32-bit kernels are affected, this patch covers the common 64/32-bit suspend/resume code path. And because the MSRs specified by the user might not be available or readable in any situation, we use rdmsrl_safe() to safely save these MSRs. Reported-and-tested-by: Marcin Kaszewski <marcin.kaszewski@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@suse.de Cc: len.brown@intel.com Cc: linux@horizon.com Cc: luto@kernel.org Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/c9abdcbc173dd2f57e8990e304376f19287e92ba.1448382971.git.yu.c.chen@intel.com [ More edits to the naming of data structures. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-26 10:04:53 +01:00
Juergen Gross	d6ccc3ec95	x86/paravirt: Remove paravirt ops pmd_update[_defer] and pte_update_defer pte_update_defer can be removed as it is always set to the same function as pte_update. So any usage of pte_update_defer() can be replaced by pte_update(). pmd_update and pmd_update_defer are always set to paravirt_nop, so they can just be nuked. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: jeremy@goop.org Cc: chrisw@sous-sol.org Cc: akataria@vmware.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xen.org Cc: konrad.wilk@oracle.com Cc: david.vrabel@citrix.com Cc: boris.ostrovsky@oracle.com Link: http://lkml.kernel.org/r/1447771879-1806-1-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-11-25 23:08:37 +01:00
Takuya Yoshikawa	018aabb56d	KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct New struct kvm_rmap_head makes the code type-safe to some extent. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-25 17:25:44 +01:00
Andrey Smetanin	db3975717a	kvm/x86: Hyper-V kvm exit A new vcpu exit is introduced to notify the userspace of the changes in Hyper-V SynIC configuration triggered by guest writing to the corresponding MSRs. Changes v4: * exit into userspace only if guest writes into SynIC MSR's Changes v3: * added KVM_EXIT_HYPERV types and structs notes into docs Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-25 17:24:22 +01:00
Andrey Smetanin	5c919412fe	kvm/x86: Hyper-V synthetic interrupt controller SynIC (synthetic interrupt controller) is a lapic extension, which is controlled via MSRs and maintains for each vCPU - 16 synthetic interrupt "lines" (SINT's); each can be configured to trigger a specific interrupt vector optionally with auto-EOI semantics - a message page in the guest memory with 16 256-byte per-SINT message slots - an event flag page in the guest memory with 16 2048-bit per-SINT event flag areas The host triggers a SINT whenever it delivers a new message to the corresponding slot or flips an event flag bit in the corresponding area. The guest informs the host that it can try delivering a message by explicitly asserting EOI in lapic or writing to End-Of-Message (EOM) MSR. The userspace (qemu) triggers interrupts and receives EOM notifications via irqfd with resampler; for that, a GSI is allocated for each configured SINT, and irq_routing api is extended to support GSI-SINT mapping. Changes v4: * added activation of SynIC by vcpu KVM_ENABLE_CAP * added per SynIC active flag * added deactivation of APICv upon SynIC activation Changes v3: * added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into docs Changes v2: * do not use posted interrupts for Hyper-V SynIC AutoEOI vectors * add Hyper-V SynIC vectors into EOI exit bitmap * Hyper-V SyniIC SINT msr write logic simplified Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-25 17:24:22 +01:00
Andrey Smetanin	d62caabb41	kvm/x86: per-vcpu apicv deactivation support The decision on whether to use hardware APIC virtualization used to be taken globally, based on the availability of the feature in the CPU and the value of a module parameter. However, under certain circumstances we want to control it on per-vcpu basis. In particular, when the userspace activates HyperV synthetic interrupt controller (SynIC), APICv has to be disabled as it's incompatible with SynIC auto-EOI behavior. To achieve that, introduce 'apicv_active' flag on struct kvm_vcpu_arch, and kvm_vcpu_deactivate_apicv() function to turn APICv off. The flag is initialized based on the module parameter and CPU capability, and consulted whenever an APICv-specific action is performed. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-25 17:24:21 +01:00
Andrey Smetanin	6308630bd3	kvm/x86: split ioapic-handled and EOI exit bitmaps The function to determine if the vector is handled by ioapic used to rely on the fact that only ioapic-handled vectors were set up to cause vmexits when virtual apic was in use. We're going to break this assumption when introducing Hyper-V synthetic interrupts: they may need to cause vmexits too. To achieve that, introduce a new bitmap dedicated specifically for ioapic-handled vectors, and populate EOI exit bitmap from it for now. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-25 17:24:21 +01:00
Andy Lutomirski	2671c3e4fe	x86/asm: Add asm macros for static keys/jump labels Unfortunately, we can only do this if HAVE_JUMP_LABEL. In principle, we could do some serious surgery on the core jump label infrastructure to keep the patch infrastructure available on x86 on all builds, but that's probably not worth it. Implementing the macros using a conditional branch as a fallback seems like a bad idea: we'd have to clobber flags. This limitation can't cause silent failures -- trying to include asm/jump_label.h at all on a non-HAVE_JUMP_LABEL kernel will error out. The macro's users are responsible for handling this issue themselves. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/63aa45c4b692e8469e1876d6ccbb5da707972990.1447361906.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-24 09:56:44 +01:00
Andy Lutomirski	c28454332f	x86/asm: Error out if asm/jump_label.h is included inappropriately Rather than potentially generating incorrect code on a non-HAVE_JUMP_LABEL kernel if someone includes asm/jump_label.h, error out. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/99407f0ac7fa3ab03a3d31ce076d47b5c2f44795.1447361906.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-24 09:56:43 +01:00
Borislav Petkov	b7106fa0f2	x86/fpu: Get rid of xstate_fault() Add macros for the alternative XSAVE/XRSTOR operations which contain the fault handling and use them. Kill xstate_fault(). Also, copy_xregs_to_kernel() didn't have the extended state as memory reference in the asm. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1447932326-4371-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-24 09:52:52 +01:00
Borislav Petkov	b74a0cf1b3	x86/fpu: Add an XSTATE_OP() macro Add an XSTATE_OP() macro which contains the XSAVE* fault handling and replace all non-alternatives users of xstate_fault() with it. This fixes also the buglet in copy_xregs_to_user() and copy_user_to_xregs() where the inline asm didn't have @xstate as memory reference and thus potentially causing unwanted reordering of accesses to the extended state. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1447932326-4371-2-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-24 09:52:52 +01:00
Borislav Petkov	679bcea857	x86/MSR: Chop off lower 32-bit value sparse complains that the cast truncates the high bits. But here we really do know what we're doing and we need the lower 32 bits only as the @low argument. So make that explicit. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1448273546-2567-7-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-24 09:15:55 +01:00
Borislav Petkov	ae8b787543	x86/cpu/amd, kvm: Satisfy guest kernel reads of IC_CFG MSR The kernel accesses IC_CFG MSR (0xc0011021) on AMD because it checks whether the way access filter is enabled on some F15h models, and, if so, disables it. kvm doesn't handle that MSR access and complains about it, which can get really noisy in dmesg when one starts kvm guests all the time for testing. And it is useless anyway - guest kernel shouldn't be doing such changes anyway so tell it that that filter is disabled. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1448273546-2567-4-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-24 09:15:54 +01:00
Borislav Petkov	99f925ce92	x86/cpu: Unify CPU family, model, stepping calculation Add generic functions which calc family, model and stepping from the CPUID_1.EAX leaf and stick them into the library we have. Rename those which do call CPUID with the prefix "x86_cpuid" as suggested by Paolo Bonzini. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1448273546-2567-2-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-24 09:15:54 +01:00
Borislav Petkov	c0ec382e19	x86/RAS: Remove mce.usable_addr It is useless and we can use the function instead. Besides, mcelog(8) hasn't managed to make use of it yet. So kill it. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1448350880-5573-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-24 09:12:35 +01:00
Boris Ostrovsky	75ef82190d	x86/entry, x86/paravirt: Remove the unused usergs_sysret32 PV op As result of commit "x86/xen: Avoid fast syscall path for Xen PV guests", usergs_sysret32 pv op is not called by Xen PV guests anymore and since they were the only ones who used it we can safely remove it. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1447970147-1733-4-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-23 10:48:16 +01:00
Boris Ostrovsky	88c15ec90f	x86/paravirt: Remove the unused irq_enable_sysexit pv op As result of commit "x86/xen: Avoid fast syscall path for Xen PV guests", the irq_enable_sysexit pv op is not called by Xen PV guests anymore and since they were the only ones who used it we can safely remove it. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1447970147-1733-3-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-23 10:48:16 +01:00
Boris Ostrovsky	5fdf5d37f4	x86/xen: Avoid fast syscall path for Xen PV guests After 32-bit syscall rewrite, and specifically after commit: `5f310f739b` ("x86/entry/32: Re-implement SYSENTER using the new C path") ... the stack frame that is passed to xen_sysexit is no longer a "standard" one (i.e. it's not pt_regs). Since we end up calling xen_iret from xen_sysexit we don't need to fix up the stack and instead follow entry_SYSENTER_32's IRET path directly to xen_iret. We can do the same thing for compat mode even though stack does not need to be fixed. This will allow us to drop usergs_sysret32 paravirt op (in the subsequent patch) Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1447970147-1733-2-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-23 10:48:16 +01:00
Waiman Long	d78045306c	locking/pvqspinlock, x86: Optimize the PV unlock code path The unlock function in queued spinlocks was optimized for better performance on bare metal systems at the expense of virtualized guests. For x86-64 systems, the unlock call needs to go through a PV_CALLEE_SAVE_REGS_THUNK() which saves and restores 8 64-bit registers before calling the real __pv_queued_spin_unlock() function. The thunk code may also be in a separate cacheline from __pv_queued_spin_unlock(). This patch optimizes the PV unlock code path by: 1) Moving the unlock slowpath code from the fastpath into a separate __pv_queued_spin_unlock_slowpath() function to make the fastpath as simple as possible.. 2) For x86-64, hand-coded an assembly function to combine the register saving thunk code with the fastpath code. Only registers that are used in the fastpath will be saved and restored. If the fastpath fails, the slowpath function will be called via another PV_CALLEE_SAVE_REGS_THUNK(). For 32-bit, it falls back to the C __pv_queued_spin_unlock() code as the thunk saves and restores only one 32-bit register. With a microbenchmark of 5M lock-unlock loop, the table below shows the execution times before and after the patch with different number of threads in a VM running on a 32-core Westmere-EX box with x86-64 4.2-rc1 based kernels: Threads Before patch After patch % Change ------- ------------ ----------- -------- 1 134.1 ms 119.3 ms -11% 2 1286 ms 953 ms -26% 3 3715 ms 3480 ms -6.3% 4 4092 ms 3764 ms -8.0% Signed-off-by: Waiman Long <Waiman.Long@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Douglas Hatch <doug.hatch@hpe.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1447114167-47185-5-git-send-email-Waiman.Long@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-23 10:02:02 +01:00
Takao Indoh	24cc12b176	perf/x86/intel/pt: Add interface to stop Intel PT logging This patch add a function for external components to stop Intel PT. Basically this function is used when kernel panic occurs. When it is called, the intel_pt driver disables Intel PT and saves its registers using pt_event_stop(), which is also used by pmu.stop handler. This function stops Intel PT on the CPU where it is working, therefore users of it need to call it for each CPU to stop all logging. Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin<alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: H.Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Vivek Goyal <vgoyal@redhat.com> Link: http://lkml.kernel.org/r/1446614553-6072-2-git-send-email-indou.takao@jp.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-23 09:58:26 +01:00
Andi Kleen	10013ebb5d	x86: Add an inlined __copy_from_user_nmi() variant Add a inlined __ variant of copy_from_user_nmi. The inlined variant allows the user to: - batch the access_ok() check for multiple accesses - avoid having a pagefault_disable/enable() on every access if the caller already ensures disabled page faults due to its context. - get all the optimizations in copy_*_user() for small constant sized transfers It is just a define to __copy_from_user_inatomic(). Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1445551641-13379-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-23 09:58:24 +01:00
Juergen Gross	4609586592	x86/paravirt: Remove unused pv_apic_ops structure The only member of that structure is startup_ipi_hook which is always set to paravirt_nop. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: jeremy@goop.org Cc: chrisw@sous-sol.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xen.org Cc: konrad.wilk@oracle.com Cc: boris.ostrovsky@oracle.com Link: http://lkml.kernel.org/r/1447767872-16730-1-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-11-19 11:03:58 +01:00
Thomas Gleixner	2f7a3f8e87	x86/tsc: Remove unused tsc_pre_init() hook No more users. Remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@suse.de>	2015-11-19 11:03:13 +01:00
Juergen Gross	ed29210cd6	x86: Remove unused function cpu_has_ht_siblings() It is used nowhere. Signed-off-by: Juergen Gross <jgross@suse.com> Link: http://lkml.kernel.org/r/1447761943-770-1-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-11-17 15:40:17 +01:00
Rafael J. Wysocki	d9f67dbc0f	Merge branch 'pm-tools' * pm-tools: x86: remove unused definition of MSR_NHM_PLATFORM_INFO tools/power turbostat: use new name for MSR_PLATFORM_INFO	2015-11-16 22:57:02 +01:00
Linus Torvalds	bba072dfd7	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A couple of fixes and updates related to x86: - Fix the W+X check regression on XEN - The real fix for the low identity map trainwreck - Probe legacy PIC early instead of unconditionally allocating legacy irqs - Add cpu verification to long mode entry - Adjust the cache topology to AMD Fam17H systems - Let Merrifield use the TSC across S3" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Call verify_cpu() after having entered long mode too x86/setup: Fix low identity map for >= 2GB kernel range x86/mm: Skip the hypervisor range when walking PGD x86/AMD: Fix last level cache topology for AMD Fam17h systems x86/irq: Probe for PIC presence before allocating descs for legacy IRQs x86/cpu/intel: Enable X86_FEATURE_NONSTOP_TSC_S3 for Merrifield	2015-11-15 09:32:59 -08:00
Len Brown	5369a21e3f	x86: remove unused definition of MSR_NHM_PLATFORM_INFO MSR_NHM_PLATFORM_INFO has been replaced by... MSR_PLATFORM_INFO Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-11-13 23:28:01 +01:00
Linus Torvalds	3370b69eb0	Four changes: - x86: work around two nasty cases where a benign exception occurs while another is being delivered. The endless stream of exceptions causes an infinite loop in the processor, which not even NMIs or SMIs can interrupt; in the virt case, there is no possibility to exit to the host either. - x86: support for Skylake per-guest TSC rate. Long supported by AMD, the patches mostly move things from there to common arch/x86/kvm/ code. - generic: remove local_irq_save/restore from the guest entry and exit paths when context tracking is enabled. The patches are a few months old, but we discussed them again at kernel summit. Andy will pick up from here and, in 4.5, try to remove it from the user entry/exit paths. - PPC: Two bug fixes, see merge commit `370289756b` for details. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJWRFb0AAoJEL/70l94x66DjjMH/31jr8d119MW0uv2x+03+wRq 6dbJ8tjQ8grvBRExKvLsUVjDmHlhCa1BQl5qjCsyYhX9UeAf4NQOmoEFpq+YTLxh Ctveyn+yiZWC7qxbQDmauiQ4JCOp+W9ial782iqw5+ouQMajGOffq5WrojCa2ZNF jI278JgdHJLrKj/uie//WBu3V7MJY5Apc3p4zatnSYFSQ3MA0sxl4r4zIrwOa5qs 23ZeeoqbP4sHh4X5wL/30Y6XFSCHj0qoYHHyAgzLi0PCMvBdt4DrAFUPDG/Rhlv6 o1WB/kcUfcz3DtBX85wfSOMuw0nF6patWhWv07R/3EIbYoz3dKvp9d6ORYgXqlY= =Um9M -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull second batch of kvm updates from Paolo Bonzini: "Four changes: - x86: work around two nasty cases where a benign exception occurs while another is being delivered. The endless stream of exceptions causes an infinite loop in the processor, which not even NMIs or SMIs can interrupt; in the virt case, there is no possibility to exit to the host either. - x86: support for Skylake per-guest TSC rate. Long supported by AMD, the patches mostly move things from there to common arch/x86/kvm/ code. - generic: remove local_irq_save/restore from the guest entry and exit paths when context tracking is enabled. The patches are a few months old, but we discussed them again at kernel summit. Andy will pick up from here and, in 4.5, try to remove it from the user entry/exit paths. - PPC: Two bug fixes, see merge commit `370289756b` for details" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits) KVM: x86: rename update_db_bp_intercept to update_bp_intercept KVM: svm: unconditionally intercept #DB KVM: x86: work around infinite loop in microcode when #AC is delivered context_tracking: avoid irq_save/irq_restore on guest entry and exit context_tracking: remove duplicate enabled check KVM: VMX: Dump TSC multiplier in dump_vmcs() KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSC KVM: VMX: Setup TSC scaling ratio when a vcpu is loaded KVM: VMX: Enable and initialize VMX TSC scaling KVM: x86: Use the correct vcpu's TSC rate to compute time scale KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc() KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset() KVM: x86: Replace call-back compute_tsc_offset() with a common function KVM: x86: Replace call-back set_tsc_khz() with a common function KVM: x86: Add a common TSC scaling function KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch KVM: x86: Collect information for setting TSC scaling ratio KVM: x86: declare a few variables as __read_mostly KVM: x86: merge handle_mmio_page_fault and handle_mmio_page_fault_common KVM: PPC: Book3S HV: Don't dynamically split core when already split ...	2015-11-12 14:34:06 -08:00
Paolo Bonzini	a96036b8ef	KVM: x86: rename update_db_bp_intercept to update_bp_intercept Because #DB is now intercepted unconditionally, this callback only operates on #BP for both VMX and SVM. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-10 12:06:25 +01:00
Eric Northup	54a20552e1	KVM: x86: work around infinite loop in microcode when #AC is delivered It was found that a guest can DoS a host by triggering an infinite stream of "alignment check" (#AC) exceptions. This causes the microcode to enter an infinite loop where the core never receives another interrupt. The host kernel panics pretty quickly due to the effects (CVE-2015-5307). Signed-off-by: Eric Northup <digitaleric@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-10 12:06:24 +01:00
Haozhong Zhang	64903d6195	KVM: VMX: Enable and initialize VMX TSC scaling This patch exhances kvm-intel module to enable VMX TSC scaling and collects information of TSC scaling ratio during initialization. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-10 12:06:19 +01:00
Haozhong Zhang	4ba76538dd	KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc() Both VMX and SVM scales the host TSC in the same way in call-back read_l1_tsc(), so this patch moves the scaling logic from call-back read_l1_tsc() to a common function kvm_read_l1_tsc(). Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-10 12:06:18 +01:00
Haozhong Zhang	58ea676787	KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset() For both VMX and SVM, if the 2nd argument of call-back adjust_tsc_offset() is the host TSC, then adjust_tsc_offset() will scale it first. This patch moves this common TSC scaling logic to its caller adjust_tsc_offset_host() and rename the call-back adjust_tsc_offset() to adjust_tsc_offset_guest(). Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-10 12:06:17 +01:00
Haozhong Zhang	07c1419a32	KVM: x86: Replace call-back compute_tsc_offset() with a common function Both VMX and SVM calculate the tsc-offset in the same way, so this patch removes the call-back compute_tsc_offset() and replaces it with a common function kvm_compute_tsc_offset(). Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-10 12:06:16 +01:00
Haozhong Zhang	381d585c80	KVM: x86: Replace call-back set_tsc_khz() with a common function Both VMX and SVM propagate virtual_tsc_khz in the same way, so this patch removes the call-back set_tsc_khz() and replaces it with a common function. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-10 12:06:16 +01:00
Haozhong Zhang	35181e86df	KVM: x86: Add a common TSC scaling function VMX and SVM calculate the TSC scaling ratio in a similar logic, so this patch generalizes it to a common TSC scaling function. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> [Inline the multiplication and shift steps into mul_u64_u64_shr. Remove BUG_ON. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-10 12:06:15 +01:00
Haozhong Zhang	ad721883e9	KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch This patch moves the field of TSC scaling ratio from the architecture struct vcpu_svm to the common struct kvm_vcpu_arch. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-10 12:06:14 +01:00
Haozhong Zhang	bc9b961b35	KVM: x86: Collect information for setting TSC scaling ratio The number of bits of the fractional part of the 64-bit TSC scaling ratio in VMX and SVM is different. This patch makes the architecture code to collect the number of fractional bits and other related information into variables that can be accessed in the common code. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-10 12:06:14 +01:00
Paolo Bonzini	893590c734	KVM: x86: declare a few variables as __read_mostly These include module parameters and variables that are set by kvm_x86_ops->hardware_setup. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-10 12:06:13 +01:00
Nicolas Pitre	77c5b5da02	kmap_atomic_to_page() has no users, remove it Removal started in commit `5bbeed12bd` ("sparc32: drop unused kmap_atomic_to_page"). Let's do it across the whole tree. Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-09 15:11:24 -08:00
Borislav Petkov	79f1d83692	x86/paravirt: Kill some unused patching functions paravirt_patch_ignore() is completely unused and paravirt_patch_nop() doesn't do a whole lot. Remove them both. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1446542329-32037-1-git-send-email-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-11-07 18:09:35 +01:00
Vitaly Kuznetsov	8c058b0b9c	x86/irq: Probe for PIC presence before allocating descs for legacy IRQs Commit `d32932d02e` ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces") brought a regression for Hyper-V Gen2 instances. These instances don't have i8259 legacy PIC but they use legacy IRQs for serial port, rtc, and acpi. With this commit included we end up with these IRQs not initialized. Earlier, there was a special workaround for legacy IRQs in mp_map_pin_to_irq() doing mp_irqdomain_map() without looking at nr_legacy_irqs() and now we fail in __irq_domain_alloc_irqs() when irq_domain_alloc_descs() returns -EEXIST. The essence of the issue seems to be that early_irq_init() calls arch_probe_nr_irqs() to figure out the number of legacy IRQs before we probe for i8259 and gets 16. Later when init_8259A() is called we switch to NULL legacy PIC and nr_legacy_irqs() starts to return 0 but we already have 16 descs allocated. Solve the issue by separating i8259 probe from init and calling it in arch_probe_nr_irqs() before we actually use nr_legacy_irqs() information. Fixes: `d32932d02e` ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1446543614-3621-1-git-send-email-vkuznets@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-11-07 10:37:37 +01:00
Linus Torvalds	933425fb00	s390: A bunch of fixes and optimizations for interrupt and time handling. PPC: Mostly bug fixes. ARM: No big features, but many small fixes and prerequisites including: - a number of fixes for the arch-timer - introducing proper level-triggered semantics for the arch-timers - a series of patches to synchronously halt a guest (prerequisite for IRQ forwarding) - some tracepoint improvements - a tweak for the EL2 panic handlers - some more VGIC cleanups getting rid of redundant state x86: quite a few changes: - support for VT-d posted interrupts (i.e. PCI devices can inject interrupts directly into vCPUs). This introduces a new component (in virt/lib/) that connects VFIO and KVM together. The same infrastructure will be used for ARM interrupt forwarding as well. - more Hyper-V features, though the main one Hyper-V synthetic interrupt controller will have to wait for 4.5. These will let KVM expose Hyper-V devices. - nested virtualization now supports VPID (same as PCID but for vCPUs) which makes it quite a bit faster - for future hardware that supports NVDIMM, there is support for clflushopt, clwb, pcommit - support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in userspace, which reduces the attack surface of the hypervisor - obligatory smattering of SMM fixes - on the guest side, stable scheduler clock support was rewritten to not require help from the hypervisor. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJWO2IQAAoJEL/70l94x66D/K0H/3AovAgYmJQToZlimsktMk6a f2xhdIqfU5lIQQh5uNBCfL3o9o8H9Py1ym7aEw3fmztPHHJYc91oTatt2UEKhmEw VtZHp/dFHt3hwaIdXmjRPEXiYctraKCyrhaUYdWmUYkoKi7lW5OL5h+S7frG2U6u p/hFKnHRZfXHr6NSgIqvYkKqtnc+C0FWY696IZMzgCksOO8jB1xrxoSN3tANW3oJ PDV+4og0fN/Fr1capJUFEc/fejREHneANvlKrLaa8ht0qJQutoczNADUiSFLcMPG iHljXeDsv5eyjMtUuIL8+MPzcrIt/y4rY41ZPiKggxULrXc6H+JJL/e/zThZpXc= =iv2z -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "First batch of KVM changes for 4.4. s390: A bunch of fixes and optimizations for interrupt and time handling. PPC: Mostly bug fixes. ARM: No big features, but many small fixes and prerequisites including: - a number of fixes for the arch-timer - introducing proper level-triggered semantics for the arch-timers - a series of patches to synchronously halt a guest (prerequisite for IRQ forwarding) - some tracepoint improvements - a tweak for the EL2 panic handlers - some more VGIC cleanups getting rid of redundant state x86: Quite a few changes: - support for VT-d posted interrupts (i.e. PCI devices can inject interrupts directly into vCPUs). This introduces a new component (in virt/lib/) that connects VFIO and KVM together. The same infrastructure will be used for ARM interrupt forwarding as well. - more Hyper-V features, though the main one Hyper-V synthetic interrupt controller will have to wait for 4.5. These will let KVM expose Hyper-V devices. - nested virtualization now supports VPID (same as PCID but for vCPUs) which makes it quite a bit faster - for future hardware that supports NVDIMM, there is support for clflushopt, clwb, pcommit - support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in userspace, which reduces the attack surface of the hypervisor - obligatory smattering of SMM fixes - on the guest side, stable scheduler clock support was rewritten to not require help from the hypervisor" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits) KVM: VMX: Fix commit which broke PML KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0() KVM: x86: allow RSM from 64-bit mode KVM: VMX: fix SMEP and SMAP without EPT KVM: x86: move kvm_set_irq_inatomic to legacy device assignment KVM: device assignment: remove pointless #ifdefs KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic KVM: x86: zero apic_arb_prio on reset drivers/hv: share Hyper-V SynIC constants with userspace KVM: x86: handle SMBASE as physical address in RSM KVM: x86: add read_phys to x86_emulate_ops KVM: x86: removing unused variable KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr() KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings KVM: arm/arm64: Optimize away redundant LR tracking KVM: s390: use simple switch statement as multiplexer KVM: s390: drop useless newline in debugging data KVM: s390: SCA must not cross page boundaries KVM: arm: Do not indent the arguments of DECLARE_BITMAP ...	2015-11-05 16:26:26 -08:00
Thomas Gleixner	7e29393b20	x86/apic: Provide default send single IPI wrapper Instead of doing the wrapping in the smp code we can provide a default wrapper for those APICs which insist on cpumasks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Travis <travis@sgi.com> Cc: Daniel J Blueman <daniel@numascale.com> Link: http://lkml.kernel.org/r/20151104220849.631111846@linutronix.de	2015-11-05 13:07:53 +01:00
Thomas Gleixner	53be0fac8b	x86/apic: Implement default single target IPI function apic_physflat and bigsmp_apic can share that implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Travis <travis@sgi.com> Cc: Daniel J Blueman <daniel@numascale.com> Link: http://lkml.kernel.org/r/20151104220848.898543767@linutronix.de	2015-11-05 13:07:52 +01:00
Linus Torvalds	539da78772	x86/apic: Add a single-target IPI function to the apic We still fall back on the "send mask" versions if an apic definition doesn't have the single-target version, but at least this allows the (trivial) case for the common clustered x2apic case. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Travis <travis@sgi.com> Cc: Daniel J Blueman <daniel@numascale.com> Link: http://lkml.kernel.org/r/20151104220848.737120838@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-11-05 13:07:51 +01:00
Linus Torvalds	0d51ce9ca1	Power management and ACPI updates for v4.4-rc1 - ACPICA update to upstream revision 20150930 (Bob Moore, Lv Zheng). The most significant change is to allow the AML debugger to be built into the kernel. On top of that there is an update related to the NFIT table (the ACPI persistent memory interface) and a few fixes and cleanups. - ACPI CPPC2 (Collaborative Processor Performance Control v2) support along with a cpufreq frontend (Ashwin Chaugule). This can only be enabled on ARM64 at this point. - New ACPI infrastructure for the early probing of IRQ chips and clock sources (Marc Zyngier). - Support for a new hierarchical properties extension of the ACPI _DSD (Device Specific Data) device configuration object allowing the kernel to handle hierarchical properties (provided by the platform firmware this way) automatically and make them available to device drivers via the generic device properties interface (Rafael Wysocki). - Generic device properties API extension to obtain an index of certain string value in an array of strings, along the lines of of_property_match_string(), but working for all of the supported firmware node types, and support for the "dma-names" device property based on it (Mika Westerberg). - ACPI core fix to parse the MADT (Multiple APIC Description Table) entries in the order expected by platform firmware (and mandated by the specification) to avoid confusion on systems with more than 255 logical CPUs (Lukasz Anaczkowski). - Consolidation of the ACPI-based handling of PCI host bridges on x86 and ia64 (Jiang Liu). - ACPI core fixes to ensure that the correct IRQ number is used to represent the SCI (System Control Interrupt) in the cases when it has been re-mapped (Chen Yu). - New ACPI backlight quirk for Lenovo IdeaPad S405 (Hans de Goede). - ACPI EC driver fixes (Lv Zheng). - Assorted ACPI fixes and cleanups (Dan Carpenter, Insu Yun, Jiri Kosina, Rami Rosen, Rasmus Villemoes). - New mechanism in the PM core allowing drivers to check if the platform firmware is going to be involved in the upcoming system suspend or if it has been involved in the suspend the system is resuming from at the moment (Rafael Wysocki). This should allow drivers to optimize their suspend/resume handling in some cases and the changes include a couple of users of it (the i8042 input driver, PCI PM). - PCI PM fix to prevent runtime-suspended devices with PME enabled from being resumed during system suspend even if they aren't configured to wake up the system from sleep (Rafael Wysocki). - New mechanism to report the number of a wakeup IRQ that woke up the system from sleep last time (Alexandra Yates). - Removal of unused interfaces from the generic power domains framework and fixes related to latency measurements in that code (Ulf Hansson, Daniel Lezcano). - cpufreq core sysfs interface rework to make it handle CPUs that share performance scaling settings (represented by a common cpufreq policy object) more symmetrically (Viresh Kumar). This should help to simplify the CPU offline/online handling among other things. - cpufreq core fixes and cleanups (Viresh Kumar). - intel_pstate fixes related to the Turbo Activation Ratio (TAR) mechanism on client platforms which causes the turbo P-states range to vary depending on platform firmware settings (Srinivas Pandruvada). - intel_pstate sysfs interface fix (Prarit Bhargava). - Assorted cpufreq driver (imx, tegra20, powernv, integrator) fixes and cleanups (Bai Ping, Bartlomiej Zolnierkiewicz, Shilpasri G Bhat, Luis de Bethencourt). - cpuidle mvebu driver cleanups (Russell King). - OPP (Operating Performance Points) framework code reorganization to make it more maintainable (Viresh Kumar). - Intel Broxton support for the RAPL (Running Average Power Limits) power capping driver (Amy Wiles). - Assorted power management code fixes and cleanups (Dan Carpenter, Geert Uytterhoeven, Geliang Tang, Luis de Bethencourt, Rasmus Villemoes). / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJWOC9oAAoJEILEb/54YlRx/c8P/joflwoFsISwJccG62YTQMuc bMQKM4Kw0vl5La8+pkLpe5t6+mW7l81UFtYF6Dzd8LOKlD9sszD34z1lHmCeT/oR wn0uZpHagRyLMUfoyiEtlU/VRU6WQNNtS3EgjwUi7xgFz9Q0pjcCZ9OQ6vKov1j5 +6j40ODif5sgo+2vl+rztJiV0SIMkYdkgNqgfN1FE9bdLA2Zkk+PxxJbtGQORuDu O/K+XhQT2xWquVWi/1p+VtQxs5glBS1oKm0kogV5bElCvNTRNIVABUNcjogITQwo QSAKgoCKIoaIl5jtDT6u5dc0y67q/dMtqOY9fOCcOz1Z7jbWQzR8D7mpFWIsJUPK K2LClI3t85ynpN6Jref246A6+C9nwB8JMAiAR11oBw7WbBlkd6tbRgcT5B+iz8UE FuCCif7pha/Fs+Jt1YRazscIqteQ2bAhhxikuIPMfw2M6M67MNfVNeKA1bAoSM34 dH7JsilblitvV7shrwJHwXPXCOF2jEPoK8I4/q2+TR5qUxEpRJjelQxXGSaQScMZ iNnjeTgv8H8q+rY5Yjzsl4pxP0Fvf7IuqkptWOJbgepg4cQc9pS87wOpY3uEeQzr H7ruaQJFCnLO4aXbPNClsiJARhrBk+qMlsh4vBEyCJ2T0ucb+nIUcN4BTi8t85yl X97BfHHUiDoUrnIsNids =1gaH -----END PGP SIGNATURE----- Merge tag 'pm+acpi-4.4-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI updates from Rafael Wysocki: "Quite a new features are included this time. First off, the Collaborative Processor Performance Control interface (version 2) defined by ACPI will now be supported on ARM64 along with a cpufreq frontend for CPU performance scaling. Second, ACPI gets a new infrastructure for the early probing of IRQ chips and clock sources (along the lines of the existing similar mechanism for DT). Next, the ACPI core and the generic device properties API will now support a recently introduced hierarchical properties extension of the _DSD (Device Specific Data) ACPI device configuration object. If the ACPI platform firmware uses that extension to organize device properties in a hierarchical way, the kernel will automatically handle it and make those properties available to device drivers via the generic device properties API. It also will be possible to build the ACPICA's AML interpreter debugger into the kernel now and use that to diagnose AML-related problems more efficiently. In the future, this should make it possible to single-step AML execution and do similar things. Interesting stuff, although somewhat experimental at this point. Finally, the PM core gets a new mechanism that can be used by device drivers to distinguish between suspend-to-RAM (based on platform firmware support) and suspend-to-idle (or other variants of system suspend the platform firmware is not involved in) and possibly optimize their device suspend/resume handling accordingly. In addition to that, some existing features are re-organized quite substantially. First, the ACPI-based handling of PCI host bridges on x86 and ia64 is unified and the common code goes into the ACPI core (so as to reduce code duplication and eliminate non-essential differences between the two architectures in that area). Second, the Operating Performance Points (OPP) framework is reorganized to make the code easier to find and follow. Next, the cpufreq core's sysfs interface is reorganized to get rid of the "primary CPU" concept for configurations in which the same performance scaling settings are shared between multiple CPUs. Finally, some interfaces that aren't necessary any more are dropped from the generic power domains framework. On top of the above we have some minor extensions, cleanups and bug fixes in multiple places, as usual. Specifics: - ACPICA update to upstream revision 20150930 (Bob Moore, Lv Zheng). The most significant change is to allow the AML debugger to be built into the kernel. On top of that there is an update related to the NFIT table (the ACPI persistent memory interface) and a few fixes and cleanups. - ACPI CPPC2 (Collaborative Processor Performance Control v2) support along with a cpufreq frontend (Ashwin Chaugule). This can only be enabled on ARM64 at this point. - New ACPI infrastructure for the early probing of IRQ chips and clock sources (Marc Zyngier). - Support for a new hierarchical properties extension of the ACPI _DSD (Device Specific Data) device configuration object allowing the kernel to handle hierarchical properties (provided by the platform firmware this way) automatically and make them available to device drivers via the generic device properties interface (Rafael Wysocki). - Generic device properties API extension to obtain an index of certain string value in an array of strings, along the lines of of_property_match_string(), but working for all of the supported firmware node types, and support for the "dma-names" device property based on it (Mika Westerberg). - ACPI core fix to parse the MADT (Multiple APIC Description Table) entries in the order expected by platform firmware (and mandated by the specification) to avoid confusion on systems with more than 255 logical CPUs (Lukasz Anaczkowski). - Consolidation of the ACPI-based handling of PCI host bridges on x86 and ia64 (Jiang Liu). - ACPI core fixes to ensure that the correct IRQ number is used to represent the SCI (System Control Interrupt) in the cases when it has been re-mapped (Chen Yu). - New ACPI backlight quirk for Lenovo IdeaPad S405 (Hans de Goede). - ACPI EC driver fixes (Lv Zheng). - Assorted ACPI fixes and cleanups (Dan Carpenter, Insu Yun, Jiri Kosina, Rami Rosen, Rasmus Villemoes). - New mechanism in the PM core allowing drivers to check if the platform firmware is going to be involved in the upcoming system suspend or if it has been involved in the suspend the system is resuming from at the moment (Rafael Wysocki). This should allow drivers to optimize their suspend/resume handling in some cases and the changes include a couple of users of it (the i8042 input driver, PCI PM). - PCI PM fix to prevent runtime-suspended devices with PME enabled from being resumed during system suspend even if they aren't configured to wake up the system from sleep (Rafael Wysocki). - New mechanism to report the number of a wakeup IRQ that woke up the system from sleep last time (Alexandra Yates). - Removal of unused interfaces from the generic power domains framework and fixes related to latency measurements in that code (Ulf Hansson, Daniel Lezcano). - cpufreq core sysfs interface rework to make it handle CPUs that share performance scaling settings (represented by a common cpufreq policy object) more symmetrically (Viresh Kumar). This should help to simplify the CPU offline/online handling among other things. - cpufreq core fixes and cleanups (Viresh Kumar). - intel_pstate fixes related to the Turbo Activation Ratio (TAR) mechanism on client platforms which causes the turbo P-states range to vary depending on platform firmware settings (Srinivas Pandruvada). - intel_pstate sysfs interface fix (Prarit Bhargava). - Assorted cpufreq driver (imx, tegra20, powernv, integrator) fixes and cleanups (Bai Ping, Bartlomiej Zolnierkiewicz, Shilpasri G Bhat, Luis de Bethencourt). - cpuidle mvebu driver cleanups (Russell King). - OPP (Operating Performance Points) framework code reorganization to make it more maintainable (Viresh Kumar). - Intel Broxton support for the RAPL (Running Average Power Limits) power capping driver (Amy Wiles). - Assorted power management code fixes and cleanups (Dan Carpenter, Geert Uytterhoeven, Geliang Tang, Luis de Bethencourt, Rasmus Villemoes)" * tag 'pm+acpi-4.4-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (108 commits) cpufreq: postfix policy directory with the first CPU in related_cpus cpufreq: create cpu/cpufreq/policyX directories cpufreq: remove cpufreq_sysfs_{create\|remove}_file() cpufreq: create cpu/cpufreq at boot time cpufreq: Use cpumask_copy instead of cpumask_or to copy a mask cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate() PM / Domains: Merge measurements for PM QoS device latencies PM / Domains: Don't measure ->start\|stop() latency in system PM callbacks PM / clk: Fix broken build due to non-matching code and header #ifdefs ACPI / Documentation: add copy_dsdt to ACPI format options ACPI / sysfs: correctly check failing memory allocation ACPI / video: Add a quirk to force native backlight on Lenovo IdeaPad S405 ACPI / CPPC: Fix potential memory leak ACPI / CPPC: signedness bug in register_pcc_channel() ACPI / PAD: power_saving_thread() is not freezable ACPI / PM: Fix incorrect wakeup IRQ setting during suspend-to-idle ACPI: Using correct irq when waiting for events ACPI: Use correct IRQ when uninstalling ACPI interrupt handler cpuidle: mvebu: disable the bind/unbind attributes and use builtin_platform_driver cpuidle: mvebu: clean up multiple platform drivers ...	2015-11-04 18:10:13 -08:00
Linus Torvalds	41ecf1404b	xen: features for 4.4-rc0 - Improve balloon driver memory hotplug placement. - Use unpopulated hotplugged memory for foreign pages (if supported/enabled). - Support 64 KiB guest pages on arm64. - CPU hotplug support on arm/arm64. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWOeSkAAoJEFxbo/MsZsTRph0H/0nE8Tx0GyGtOyCYfBdInTvI WgjvL8VR1XrweZMVis3668MzhLSYg6b5lvJsoi+L3jlzYRyze43iHXsKfvp+8p0o TVUhFnlHEHF8ASEtPydAi6HgS7Dn9OQ9LaZ45R1Gk0rHnwJjIQonhTn2jB0yS9Am Hf4aZXP2NVZphjYcloqNsLH0G6mGLtgq8cS0uKcVO2YIrR4Dr3sfj9qfq9mflf8n sA/5ifoHRfOUD1vJzYs4YmIBUv270jSsprWK/Mi2oXIxUTBpKRAV1RVCAPW6GFci HIZjIJkjEPWLsvxWEs0dUFJQGp3jel5h8vFPkDWBYs3+9rILU2DnLWpKGNDHx3k= =vUfa -----END PGP SIGNATURE----- Merge tag 'for-linus-4.4-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from David Vrabel: - Improve balloon driver memory hotplug placement. - Use unpopulated hotplugged memory for foreign pages (if supported/enabled). - Support 64 KiB guest pages on arm64. - CPU hotplug support on arm/arm64. * tag 'for-linus-4.4-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (44 commits) xen: fix the check of e_pfn in xen_find_pfn_range x86/xen: add reschedule point when mapping foreign GFNs xen/arm: don't try to re-register vcpu_info on cpu_hotplug. xen, cpu_hotplug: call device_offline instead of cpu_down xen/arm: Enable cpu_hotplug.c xenbus: Support multiple grants ring with 64KB xen/grant-table: Add an helper to iterate over a specific number of grants xen/xenbus: Rename RING_PAGE to RING_GRANT xen/arm: correct comment in enlighten.c xen/gntdev: use types from linux/types.h in userspace headers xen/gntalloc: use types from linux/types.h in userspace headers xen/balloon: Use the correct sizeof when declaring frame_list xen/swiotlb: Add support for 64KB page granularity xen/swiotlb: Pass addresses rather than frame numbers to xen_arch_need_swiotlb arm/xen: Add support for 64KB page granularity xen/privcmd: Add support for Linux 64KB page granularity net/xen-netback: Make it running on 64KB page granularity net/xen-netfront: Make it running on 64KB page granularity block/xen-blkback: Make it running on 64KB page granularity block/xen-blkfront: Make it running on 64KB page granularity ...	2015-11-04 17:32:42 -08:00
Linus Torvalds	e627078a0c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "There is only one new feature in this pull for the 4.4 merge window, most of it is small enhancements, cleanup and bug fixes: - Add the s390 backend for the software dirty bit tracking. This adds two new pgtable functions pte_clear_soft_dirty and pmd_clear_soft_dirty which is why there is a hit to arch/x86/include/asm/pgtable.h in this pull request. - A series of cleanup patches for the AP bus, this includes the removal of the support for two outdated crypto cards (PCICC and PCICA). - The irq handling / signaling on buffer full in the runtime instrumentation code is dropped. - Some micro optimizations: remove unnecessary memory barriers for a couple of functions: [smb_]rmb, [smb_]wmb, atomics, bitops, and for spin_unlock. Use the builtin bswap if available and make test_and_set_bit_lock more cache friendly. - Statistics and a tracepoint for the diagnose calls to the hypervisor. - The CPU measurement facility support to sample KVM guests is improved. - The vector instructions are now always enabled for user space processes if the hardware has the vector facility. This simplifies the FPU handling code. The fpu-internal.h header is split into fpu internals, api and types just like x86. - Cleanup and improvements for the common I/O layer. - Rework udelay to solve a problem with kprobe. udelay has busy loop semantics but still uses an idle processor state for the wait" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (66 commits) s390: remove runtime instrumentation interrupts s390/cio: de-duplicate subchannel validation s390/css: unneeded initialization in for_each_subchannel s390/Kconfig: use builtin bswap s390/dasd: fix disconnected device with valid path mask s390/dasd: fix invalid PAV assignment after suspend/resume s390/dasd: fix double free in dasd_eckd_read_conf s390/kernel: fix ptrace peek/poke for floating point registers s390/cio: move ccw_device_stlck functions s390/cio: move ccw_device_call_handler s390/topology: reduce per_cpu() invocations s390/nmi: reduce size of percpu variable s390/nmi: fix terminology s390/nmi: remove casts s390/nmi: remove pointless error strings s390: don't store registers on disabled wait anymore s390: get rid of __set_psw_mask() s390/fpu: split fpu-internal.h into fpu internals, api, and type headers s390/dasd: fix list_del corruption after lcu changes s390/spinlock: remove unneeded serializations at unlock ...	2015-11-04 11:31:31 -08:00
Andrey Smetanin	c75efa974e	drivers/hv: share Hyper-V SynIC constants with userspace Moved Hyper-V synic contants from guest Hyper-V drivers private header into x86 arch uapi Hyper-V header. Added Hyper-V synic msr's flags into x86 arch uapi Hyper-V header. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-04 16:24:33 +01:00
Radim Krčmář	7a036a6f67	KVM: x86: add read_phys to x86_emulate_ops We want to read the physical memory when emulating RSM. X86EMUL_IO_NEEDED is returned on all errors for consistency with other helpers. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-11-04 16:24:31 +01:00
Paolo Bonzini	197a4f4b06	KVM/ARM Changes for v4.4-rc1 Includes a number of fixes for the arch-timer, introducing proper level-triggered semantics for the arch-timers, a series of patches to synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups getting rid of redundant state, and finally a stylistic change that gets rid of some ctags warnings. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWOhgAAAoJEEtpOizt6ddyKS8H/2ZHMTPjo6yChnrusNWy4Qbr 6laPDlzL+g45oMQRwNL7GnM1deRftaxvT2Wi+X84D/6Y/BD6MPds4HgtBfuWcSZ1 CyRJ0Ot/zrxenucSuJuOjq+a9gdizdAczkbB1MfYDULJH8fb6D+7RYLo3zgh4Xo4 pla3L9U6gSWe+YopBjZtZH43m3fwiwSM/v+uHOTIcXrsbR+fEgx/EFSKmA/DUCuo P5cFO/ceUGu7nATCexu5V82TgR2hvurrsR7mqfwY8YcF6HRM+NEOoS29xWC77v5S u/F08TKuKQLv0YTEFTyLETI/oEeuC0cHtrRQBNf4+9kXEOzKyXaae0wR/I6X2Ss= =GMNk -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM Changes for v4.4-rc1 Includes a number of fixes for the arch-timer, introducing proper level-triggered semantics for the arch-timers, a series of patches to synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups getting rid of redundant state, and finally a stylistic change that gets rid of some ctags warnings. Conflicts: arch/x86/include/asm/kvm_host.h	2015-11-04 16:24:17 +01:00
Linus Torvalds	66ef3493d4	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform changes from Ingo Molnar: "Misc updates to the Intel MID and SGI UV platforms" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel-mid: Make intel_mid_ops static arch/x86/intel-mid: Use kmemdup rather than duplicating its implementation x86/platform/uv: Implement simple dump failover if kdump fails x86/platform/uv: Insert per_cpu accessor function on uv_hub_nmi	2015-11-03 21:33:18 -08:00
Linus Torvalds	639ab3eb38	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "The main changes are: continued PAT work by Toshi Kani, plus a new boot time warning about insecure RWX kernel mappings, by Stephen Smalley. The new CONFIG_DEBUG_WX=y warning is marked default-y if CONFIG_DEBUG_RODATA=y is already eanbled, as a special exception, as these bugs are hard to notice and this check already found several live bugs" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Warn on W^X mappings x86/mm: Fix no-change case in try_preserve_large_page() x86/mm: Fix __split_large_page() to handle large PAT bit x86/mm: Fix try_preserve_large_page() to handle large PAT bit x86/mm: Fix gup_huge_p?d() to handle large PAT bit x86/mm: Fix slow_virt_to_phys() to handle large PAT bit x86/mm: Fix page table dump to show PAT bit x86/asm: Add pud_pgprot() and pmd_pgprot() x86/asm: Fix pud/pmd interfaces to handle large PAT bit x86/asm: Add pud/pmd mask interfaces to handle large PAT bit x86/asm: Move PUD_PAGE macros to page_types.h x86/vdso32: Define PGTABLE_LEVELS to 32bit VDSO	2015-11-03 21:23:56 -08:00
Linus Torvalds	4302d506d5	Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 sigcontext header cleanups from Ingo Molnar: "This series reorganizes and cleans up various aspects of the main sigcontext UAPI headers, such as unifying the data structures and updating/adding lots of comments to explain all the ABI details and quirks. The headers can now also be built in user-space standalone" * 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/headers: Clean up too long lines x86/headers: Remove <asm/sigcontext.h> references on the kernel side x86/headers: Remove direct sigcontext32.h uses x86/headers: Convert sigcontext_ia32 uses to sigcontext_32 x86/headers: Unify 'struct sigcontext_ia32' and 'struct sigcontext_32' x86/headers: Make sigcontext pointers bit independent x86/headers: Move the 'struct sigcontext' definitions into the UAPI header x86/headers: Clean up the kernel's struct sigcontext types to be ABI-clean x86/headers: Convert uses of _fpstate_ia32 to _fpstate_32 x86/headers: Unify 'struct _fpstate_ia32' and i386 struct _fpstate x86/headers: Unify register type definitions between 32-bit compat and i386 x86/headers: Use ABI types consistently in sigcontext*.h x86/headers: Separate out legacy user-space structure definitions x86/headers: Clean up and better document uapi/asm/sigcontext.h x86/headers: Clean up uapi/asm/sigcontext32.h x86/headers: Fix (old) header file dependency bug in uapi/asm/sigcontext32.h	2015-11-03 21:05:40 -08:00
Linus Torvalds	ce4d72fac1	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu changes from Ingo Molnar: "There are two main areas of changes: - Rework of the extended FPU state code to robustify the kernel's usage of cpuid provided xstate sizes - and related changes (Dave Hansen)" - math emulation enhancements: new modern FPU instructions support, with testcases, plus cleanups (Denys Vlasnko)" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/fpu: Fixup uninitialized feature_name warning x86/fpu/math-emu: Add support for FISTTP instructions x86/fpu/math-emu, selftests: Add test for FISTTP instructions x86/fpu/math-emu: Add support for FCMOVcc insns x86/fpu/math-emu: Add support for F[U]COMI[P] insns x86/fpu/math-emu: Remove define layer for undocumented opcodes x86/fpu/math-emu, selftests: Add tests for FCMOV and FCOMI insns x86/fpu/math-emu: Remove !NO_UNDOC_CODE x86/fpu: Check CPU-provided sizes against struct declarations x86/fpu: Check to ensure increasing-offset xstate offsets x86/fpu: Correct and check XSAVE xstate size calculations x86/fpu: Add C structures for AVX-512 state components x86/fpu: Rework YMM definition x86/fpu/mpx: Rework MPX 'xstate' types x86/fpu: Add xfeature_enabled() helper instead of test_bit() x86/fpu: Remove 'xfeature_nr' x86/fpu: Rework XSTATE_* macros to remove magic '2' x86/fpu: Rename XFEATURES_NR_MAX x86/fpu: Rename XSAVE macros x86/fpu: Remove partial LWP support definitions ...	2015-11-03 20:50:26 -08:00
Linus Torvalds	f323c49b30	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu changes from Ingo Molnar: "Two changes in this cycle: a Kconfig help text enhancement, and an AMD CLZERO instruction capability detection and enumeration" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add CLZERO detection x86/Kconfig/cpus: Fix/complete CPU type help texts	2015-11-03 19:39:42 -08:00
Linus Torvalds	378e4e9825	Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot cleanup from Ingo Molnar: "A single commit: remove an obsolete kcrash boot flag" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kexec: Remove obsolete 'in_crash_kexec' flag	2015-11-03 19:28:37 -08:00
Linus Torvalds	a75a3f6fc9	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm changes from Ingo Molnar: "The main change in this cycle is another step in the big x86 system call interface rework by Andy Lutomirski, which moves most of the low level x86 entry code from assembly to C, for all syscall entries except native 64-bit system calls: arch/x86/entry/entry_32.S \| 182 ++++------ arch/x86/entry/entry_64_compat.S \| 547 ++++++++----------------------- 194 insertions(+), 535 deletions(-) ... our hope is that the final remaining step (converting native 64-bit system calls) will be less painful as all the previous steps, given that most of the legacies and quirks are concentrated around native 32-bit and compat environments" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits) x86/entry/32: Fix FS and GS restore in opportunistic SYSEXIT x86/entry/32: Fix entry_INT80_32() to expect interrupts to be on um/x86: Fix build after x86 syscall changes x86/asm: Remove the xyz_cfi macros from dwarf2.h selftests/x86: Style fixes for the 'unwind_vdso' test x86/entry/64/compat: Document sysenter_fix_flags's reason for existence x86/entry: Split and inline syscall_return_slowpath() x86/entry: Split and inline prepare_exit_to_usermode() x86/entry: Use pt_regs_to_thread_info() in syscall entry tracing x86/entry: Hide two syscall entry assertions behind CONFIG_DEBUG_ENTRY x86/entry: Micro-optimize compat fast syscall arg fetch x86/entry: Force inlining of 32-bit syscall code x86/entry: Make irqs_disabled checks in exit code depend on lockdep x86/entry: Remove unnecessary IRQ twiddling in fast 32-bit syscalls x86/asm: Remove thread_info.sysenter_return x86/entry/32: Re-implement SYSENTER using the new C path x86/entry/32: Switch INT80 to the new C syscall path x86/entry/32: Open-code return tracking from fork and kthreads x86/entry/compat: Implement opportunistic SYSRETL for compat syscalls x86/vdso/compat: Wire up SYSENTER and SYSCSALL for compat userspace ...	2015-11-03 18:59:10 -08:00
Linus Torvalds	d2bea739f8	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic changes from Ingo Molnar: "The main changes in this cycle were: - Numachip updates: new hardware support, fixes and cleanups. (Daniel J Blueman) - misc smaller cleanups and fixlets" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/io_apic: Make eoi_ioapic_pin() static x86/irq: Drop unlikely before IS_ERR_OR_NULL x86/x2apic: Make stub functions available even if !CONFIG_X86_LOCAL_APIC x86/apic: Deinline various functions x86/numachip: Fix timer build conflict x86/numachip: Introduce Numachip2 timer mechanisms x86/numachip: Add Numachip IPI optimisations x86/numachip: Add Numachip2 APIC support x86/numachip: Cleanup Numachip support	2015-11-03 18:33:15 -08:00
Linus Torvalds	53528695ff	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The main changes in this cycle were: - sched/fair load tracking fixes and cleanups (Byungchul Park) - Make load tracking frequency scale invariant (Dietmar Eggemann) - sched/deadline updates (Juri Lelli) - stop machine fixes, cleanups and enhancements for bugs triggered by CPU hotplug stress testing (Oleg Nesterov) - scheduler preemption code rework: remove PREEMPT_ACTIVE and related cleanups (Peter Zijlstra) - Rework the sched_info::run_delay code to fix races (Peter Zijlstra) - Optimize per entity utilization tracking (Peter Zijlstra) - ... misc other fixes, cleanups and smaller updates" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) sched: Don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS sched: Move cpu_active() tests from stop_two_cpus() into migrate_swap_stop() sched: Start stopper early stop_machine: Kill cpu_stop_threads->setup() and cpu_stop_unpark() stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() stop_machine: Change cpu_stop_queue_two_works() to rely on stopper->enabled stop_machine: Introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works() stop_machine: Ensure that a queued callback will be called before cpu_stop_park() sched/x86: Fix typo in __switch_to() comments sched/core: Remove a parameter in the migrate_task_rq() function sched/core: Drop unlikely behind BUG_ON() sched/core: Fix task and run queue sched_info::run_delay inconsistencies sched/numa: Fix task_tick_fair() from disabling numa_balancing sched/core: Add preempt_count invariant check sched/core: More notrace annotations sched/core: Kill PREEMPT_ACTIVE sched/core, sched/x86: Kill thread_info::saved_preempt_count sched/core: Simplify preempt_count tests sched/core: Robustify preemption leak checks sched/core: Stop setting PREEMPT_ACTIVE ...	2015-11-03 18:03:50 -08:00
Linus Torvalds	b831ef2cad	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS changes from Ingo Molnar: "The main system reliability related changes were from x86, but also some generic RAS changes: - AMD MCE error injection subsystem enhancements. (Aravind Gopalakrishnan) - Fix MCE and CPU hotplug interaction bug. (Ashok Raj) - kcrash bootup robustness fix. (Baoquan He) - kcrash cleanups. (Borislav Petkov) - x86 microcode driver rework: simplify it by unmodularizing it and other cleanups. (Borislav Petkov)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/mce: Add a default case to the switch in __mcheck_cpu_ancient_init() x86/mce: Add a Scalable MCA vendor flags bit MAINTAINERS: Unify the microcode driver section x86/microcode/intel: Move #ifdef DEBUG inside the function x86/microcode/amd: Remove maintainers from comments x86/microcode: Remove modularization leftovers x86/microcode: Merge the early microcode loader x86/microcode: Unmodularize the microcode driver x86/mce: Fix thermal throttling reporting after kexec kexec/crash: Say which char is the unrecognized x86/setup/crash: Check memblock_reserve() retval x86/setup/crash: Cleanup some more x86/setup/crash: Remove alignment variable x86/setup: Cleanup crashkernel reservation functions x86/amd_nb, EDAC: Rename amd_get_node_id() x86/setup: Do not reserve crashkernel high memory if low reservation failed x86/microcode/amd: Do not overwrite final patch levels x86/microcode/amd: Extract current patch level read to a function x86/ras/mce_amd_inj: Inject bank 4 errors on the NBC x86/ras/mce_amd_inj: Trigger deferred and thresholding errors interrupts ...	2015-11-03 17:51:33 -08:00
Linus Torvalds	d63a978865	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking changes from Ingo Molnar: "The main changes in this cycle were: - More gradual enhancements to atomic ops: new atomic_read_ctrl() ops, synchronize atomic_{read,set}() ordering requirements between architectures, add atomic_long_t bitops. (Peter Zijlstra) - Add _{relaxed\|acquire\|release}() variants for inc/dec atomics and use them in various locking primitives: mutex, rtmutex, mcs, rwsem. This enables weakly ordered architectures (such as arm64) to make use of more locking related optimizations. (Davidlohr Bueso) - Implement atomic[64]_{inc,dec}_relaxed() on ARM. (Will Deacon) - Futex kernel data cache footprint micro-optimization. (Rasmus Villemoes) - pvqspinlock runtime overhead micro-optimization. (Waiman Long) - misc smaller fixlets" 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ARM, locking/atomics: Implement _relaxed variants of atomic[64]_{inc,dec} locking/rwsem: Use acquire/release semantics locking/mcs: Use acquire/release semantics locking/rtmutex: Use acquire/release semantics locking/mutex: Use acquire/release semantics locking/asm-generic: Add _{relaxed\|acquire\|release}() variants for inc/dec atomics atomic: Implement atomic_read_ctrl() atomic, arch: Audit atomic_{read,set}() atomic: Add atomic_long_t bitops futex: Force hot variables into a single cache line locking/pvqspinlock: Kick the PV CPU unconditionally when _Q_SLOW_VAL locking/osq: Relax atomic semantics locking/qrwlock: Rename ->lock to ->wait_lock locking/Documentation/lockstat: Fix typo - lokcing -> locking locking/atomics, cmpxchg: Privatize the inclusion of asm/cmpxchg.h	2015-11-03 16:10:43 -08:00
Linus Torvalds	f5a8160c1e	Merge branch 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI changes from Ingo Molnar: "The main changes in this cycle were: - further EFI code generalization to make it more workable for ARM64 - various extensions, such as 64-bit framebuffer address support, UEFI v2.5 EFI_PROPERTIES_TABLE support - code modularization simplifications and cleanups - new debugging parameters - various fixes and smaller additions" * 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) efi: Fix warning of int-to-pointer-cast on x86 32-bit builds efi: Use correct type for struct efi_memory_map::phys_map x86/efi: Fix kernel panic when CONFIG_DEBUG_VIRTUAL is enabled efi: Add "efi_fake_mem" boot option x86/efi: Rename print_efi_memmap() to efi_print_memmap() efi: Auto-load the efi-pstore module efi: Introduce EFI_NX_PE_DATA bit and set it from properties table efi: Add support for UEFIv2.5 Properties table efi: Add EFI_MEMORY_MORE_RELIABLE support to efi_md_typeattr_format() efifb: Add support for 64-bit frame buffer addresses efi/arm64: Clean up efi_get_fdt_params() interface arm64: Use core efi=debug instead of uefi_debug command line parameter efi/x86: Move efi=debug option parsing to core drivers/firmware: Make efi/esrt.c driver explicitly non-modular efi: Use the generic efi.memmap instead of 'memmap' acpi/apei: Use appropriate pgprot_t to map GHES memory arm64, acpi/apei: Implement arch_apei_get_mem_attributes() arm64/mm: Add PROT_DEVICE_nGnRnE and PROT_NORMAL_WT acpi, x86: Implement arch_apei_get_mem_attributes() efi, x86: Rearrange efi_mem_attributes() ...	2015-11-03 15:05:52 -08:00
Linus Torvalds	7b2a4306f9	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The timer departement provides: - More y2038 work in the area of ntp and pps. - Optimization of posix cpu timers - New time related selftests - Some new clocksource drivers - The usual pile of fixes, cleanups and improvements" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) timeconst: Update path in comment timers/x86/hpet: Type adjustments clocksource/drivers/armada-370-xp: Implement ARM delay timer clocksource/drivers/tango_xtal: Add new timer for Tango SoCs clocksource/drivers/imx: Allow timer irq affinity change clocksource/drivers/exynos_mct: Use container_of() instead of this_cpu_ptr() clocksource/drivers/h8300_*: Remove unneeded memset()s clocksource/drivers/sh_cmt: Remove unneeded memset() in sh_cmt_setup() clocksource/drivers/em_sti: Remove unneeded memset()s clocksource/drivers/mediatek: Use GPT as sched clock source clockevents/drivers/mtk: Fix spurious interrupt leading to crash posix_cpu_timer: Reduce unnecessary sighand lock contention posix_cpu_timer: Convert cputimer->running to bool posix_cpu_timer: Check thread timers only when there are active thread timers posix_cpu_timer: Optimize fastpath_timer_check() timers, kselftest: Add 'adjtick' test to validate adjtimex() tick adjustments timers: Use __fls in apply_slack() clocksource: Remove return statement from void functions net: sfc: avoid using timespec ntp/pps: use y2038 safe types in pps_event_time ...	2015-11-03 14:13:41 -08:00
Rafael J. Wysocki	69f8947b8c	Merge branches 'pm-cpufreq' and 'pm-cpuidle' * pm-cpufreq: cpufreq: postfix policy directory with the first CPU in related_cpus cpufreq: create cpu/cpufreq/policyX directories cpufreq: remove cpufreq_sysfs_{create\|remove}_file() cpufreq: create cpu/cpufreq at boot time cpufreq: Use cpumask_copy instead of cpumask_or to copy a mask cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate() cpufreq: intel_pstate: Fix intel_pstate powersave min_perf_pct value cpufreq: intel_pstate: Avoid calculation for max/min Documentation: kernel_parameters for Intel P state driver cpufreq: intel_pstate: Use ACPI perf configuration cpufreq: intel-pstate: Use separate max pstate for scaling cpufreq: intel_pstate: get P1 from TAR when available cpufreq: Drop redundant check for inactive policies cpufreq : powernv: Report Pmax throttling if capped below nominal frequency cpufreq: imx: update the clock switch flow to support imx6ul cpufreq: tegra20: remove superfluous CONFIG_PM ifdefs cpufreq: conservative: remove 'enable' field cpufreq: integrator: Fix module autoload for OF platform driver * pm-cpuidle: cpuidle: mvebu: disable the bind/unbind attributes and use builtin_platform_driver cpuidle: mvebu: clean up multiple platform drivers	2015-11-02 00:54:10 +01:00
Wan Zongshun	2167ceabf3	x86/cpu: Add CLZERO detection AMD Fam17h processors introduce support for the CLZERO instruction. It zeroes out the 64 byte cache line specified in RAX. Add the bit here to allow /proc/cpuinfo to list the feature. Boris: we're adding this as a separate ->x86_capability leaf because CPUID_80000008_EBX is going to contain more feature bits and it will fill out with time. Signed-off-by: Wan Zongshun <Vincent.Wan@amd.com> Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> [ Wrap code in patch form, fix comments. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1446207099-24948-4-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-01 11:26:23 +01:00
Aravind Gopalakrishnan	c7f54d21fb	x86/mce: Add a Scalable MCA vendor flags bit Scalable MCA (SMCA) is a new feature in AMD Fam17h processors which indicates presence of MCA extensions. MCA extensions expands existing register space for the MCE banks and also introduces a new MSR range to accommodate new banks. Add the detection bit. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Reformat mce_vendor_flags definitions and save indentation levels. Improve comments. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1446207099-24948-2-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-11-01 11:26:13 +01:00
Linus Torvalds	0386729247	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: two KASAN fixes, two EFI boot fixes, two boot-delay optimization fixes, and a fix for a IRQ handling hang observed on virtual platforms" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm, kasan: Silence KASAN warnings in get_wchan() compiler, atomics, kasan: Provide READ_ONCE_NOCHECK() x86, kasan: Fix build failure on KASAN=y && KMEMCHECK=y kernels x86/smpboot: Fix CPU #1 boot timeout x86/smpboot: Fix cpu_init_udelay=10000 corner case boot parameter misbehavior x86/ioapic: Disable interrupts when re-routing legacy IRQs x86/setup: Extend low identity map to cover whole kernel range x86/efi: Fix multiple GOP device support	2015-10-23 22:34:32 +09:00
Stefano Stabellini	a314e3eb84	xen/arm: Enable cpu_hotplug.c Build cpu_hotplug for ARM and ARM64 guests. Rename arch_(un)register_cpu to xen_(un)register_cpu and provide an empty implementation on ARM and ARM64. On x86 just call arch_(un)register_cpu as we are already doing. Initialize cpu_hotplug on ARM. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2015-10-23 14:20:47 +01:00
Julien Grall	291be10fd7	xen/swiotlb: Pass addresses rather than frame numbers to xen_arch_need_swiotlb With 64KB page granularity support, the frame number will be different. It will be easier to modify the behavior in a single place rather than in each caller. Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-10-23 14:20:43 +01:00
Julien Grall	008c320a96	xen/grant: Introduce helpers to split a page into grant Currently, a grant is always based on the Xen page granularity (i.e 4KB). When Linux is using a different page granularity, a single page will be split between multiple grants. The new helpers will be in charge of splitting the Linux page into grants and call a function given by the caller on each grant. Also provide an helper to count the number of grants within a given contiguous region. Note that the x86/include/asm/xen/page.h is now including xen/interface/grant_table.h rather than xen/grant_table.h. It's necessary because xen/grant_table.h depends on asm/xen/page.h and will break the compilation. Furthermore, only definition in interface/grant_table.h is required. Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-10-23 14:20:33 +01:00
David Vrabel	8edfcf882e	x86/xen: export xen_alloc_p2m_entry() Rename alloc_p2m() to xen_alloc_p2m_entry() and export it. This is useful for ensuring that a p2m entry is allocated (i.e., not a shared missing or identity entry) so that subsequent set_phys_to_machine() calls will require no further allocations. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> --- v3: - Make xen_alloc_p2m_entry() a nop on auto-xlate guests.	2015-10-23 14:20:28 +01:00
Christoffer Dall	3217f7c25b	KVM: Add kvm_arch_vcpu_{un}blocking callbacks Some times it is useful for architecture implementations of KVM to know when the VCPU thread is about to block or when it comes back from blocking (arm/arm64 needs to know this to properly implement timers, for example). Therefore provide a generic architecture callback function in line with what we do elsewhere for KVM generic-arch interactions. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-10-22 23:01:41 +02:00
Borislav Petkov	6b26e1bf66	x86/microcode: Remove modularization leftovers Remove the remaining module functionality leftovers. Make "dis_ucode_ldr" an early_param and make it static again. Drop module aliases, autoloading table, description, etc. Bump version number, while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1445334889-300-4-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-21 11:22:12 +02:00
Borislav Petkov	fe055896c0	x86/microcode: Merge the early microcode loader Merge the early loader functionality into the driver proper. The diff is huge but logically, it is simply moving code from the _early.c files into the main driver. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1445334889-300-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-21 11:22:12 +02:00
Borislav Petkov	9a2bc335f1	x86/microcode: Unmodularize the microcode driver Make CONFIG_MICROCODE a bool. It was practically a bool already anyway, since early loader was forcing it to =y. Regardless, there's no real reason to have something be a module which gets built-in on the majority of installations out there. And its not like there's noticeable change in functionality - we still can load late microcode - just the module glue disappears. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1445334889-300-2-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-21 11:22:11 +02:00
Jan Beulich	3d45ac4b35	timers/x86/hpet: Type adjustments Standardize on bool instead of an inconsistent mixture of u8 and plain 'int'. Also use u32 or 'unsigned int' instead of 'unsigned long' when a 32-bit type suffices, generating slightly better code on x86-64. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5624E3A002000078000AC49A@prv-mh.provo.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-21 11:17:32 +02:00
Aravind Gopalakrishnan	1a6775c1a2	x86/amd_nb, EDAC: Rename amd_get_node_id() This function doesn't give us the "Node ID" as the function name suggests. Rather, it receives a PCI device as argument, checks the available F3 PCI device IDs in the system and returns the index of the matching Bus/Device IDs. Rename it to amd_pci_dev_to_node_id(). No functional change is introduced. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1445246268-26285-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-21 11:10:55 +02:00
Ingo Molnar	6af597de62	Merge branch 'sched/urgent' into sched/core, to pick up fixes and resolve conflicts Conflicts: kernel/sched/fair.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-20 10:18:16 +02:00
Ingo Molnar	a1a2ab2ff7	Linux 4.3-rc6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWJCaDAAoJEHm+PkMAQRiGvJQH/jGlxawu+mnNFjIRXmk8Zucq cxDVTPnQ4S1DJMyRSQTqe6ccDgDM/TEISJPZ0Gwc2SQLda27zdQiz05upbqHSPSH /FUHMKiMRhhIBOzdfEGR4Ry2YZID+DlG5EWCgRKf/GlgY3STZSpjIBPYd3Rl3zsU qragNjtbCE6eUpnLwkv40Q2mzzR3/fZxJ0drgE/QiSF8P0VmvVbrcQMF58vnUQgy 9URp48mLhmKj+IrElSnvXMG0XLrKn1tRYXGXd3w+x1Nlr//SHYsIuQHUcp6SAVQh p2HXLL7eoy8rs45MCiOfXG7VgNWF6HFjVMtrNUcNfgBsLCx0qFMVrKIILrGtKh0= =ETze -----END PGP SIGNATURE----- Merge tag 'v4.3-rc6' into locking/core, to pick up fixes before applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-20 10:16:46 +02:00
Andrey Ryabinin	a75ca545e8	x86, kasan: Fix build failure on KASAN=y && KMEMCHECK=y kernels Declaration of memcpy() is hidden under #ifndef CONFIG_KMEMCHECK. In asm/efi.h under #ifdef CONFIG_KASAN we #undef memcpy(), due to which the following happens: In file included from arch/x86/kernel/setup.c:96:0: ./arch/x86/include/asm/desc.h: In function ‘native_write_idt_entry’: ./arch/x86/include/asm/desc.h:122:2: error: implicit declaration of function ‘memcpy’ [-Werror=implicit-function-declaration] memcpy(&idt[entry], gate, sizeof(gate)); ^ cc1: some warnings being treated as errors make[2]: ** [arch/x86/kernel/setup.o] Error 1 We will get rid of that #undef in asm/efi.h eventually. But in the meanwhile move memcpy() declaration out of #ifdefs to fix the build. Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1444994933-28328-1-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-19 10:07:23 +02:00
Rafael J. Wysocki	7855e10294	Merge back earlier cpufreq material for v4.4.	2015-10-16 22:12:02 +02:00
Wanpeng Li	99b83ac893	KVM: nVMX: emulate the INVVPID instruction Add the INVVPID instruction emulation. Reviewed-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-16 10:30:24 +02:00
Srinivas Pandruvada	6a35fc2d6c	cpufreq: intel_pstate: get P1 from TAR when available After Ivybridge, the max non turbo ratio obtained from platform info msr is not always guaranteed P1 on client platforms. The max non turbo activation ratio (TAR), determines the max for the current level of TDP. The ratio in platform info is physical max. The TAR MSR can be locked, so updating this value is not possible on all platforms. This change gets this ratio from MSR TURBO_ACTIVATION_RATIO if available, but also do some sanity checking to make sure that this value is correct. The sanity check involves reading the TDP ratio for the current tdp control value when platform has configurable TDP present and matching TAC with this. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-10-15 01:53:18 +02:00
Linus Torvalds	cfed1e3de4	Bug fixes for system management mode emulation. The first two patches fix SMM emulation on Nehalem processors. The others fix some cases that became apparent as work progressed on the firmware side. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJWHms0AAoJEL/70l94x66DfQkIAIpya6c/1UAthxSTqJ1wFOf8 ZKp3GCMjUjtm9k88kk6JGOlPiAvWz7CG9BVbptpkJGpgoDzquvr6ZKGG2BV88F17 MnkZCid4IBW6VeKYy7R2otkKw7+Pw8DTHRQks+VI6BN/KkeaZLzh5J8+FAl4ZaWk YX/VulRce6SfZPYuUTRkkK8aebsopZNVG8mwWIGuBYwyH54R3KH1k/euX2joUPwm oopzmQLgEWW7e3RsO67T36rIRgEorJLZaiiexvj1djI+e0kEEudvhJ9nC6eB52qa oZ9nR0nkkmBmrBF8gldKDZBC+Y/ci1cJLAaoi7tdsp0wVCebPxubwbPOXxKwD8g= =ij8Q -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "Bug fixes for system management mode emulation. The first two patches fix SMM emulation on Nehalem processors. The others fix some cases that became apparent as work progressed on the firmware side" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: fix RSM into 64-bit protected mode KVM: x86: fix previous commit for 32-bit KVM: x86: fix SMI to halted VCPU KVM: x86: clean up kvm_arch_vcpu_runnable KVM: x86: map/unmap private slots in __x86_set_memory_region KVM: x86: build kvm_userspace_memory_region in x86_set_memory_region	2015-10-14 10:01:32 -07:00
Andy Lutomirski	af22aa7c76	x86/asm: Remove the xyz_cfi macros from dwarf2.h They are currently unused, and I don't think that anyone was ever particularly happy with them. They had the unfortunate property that they made it easy to CFI-annotate things without thinking about them -- when pushing, do you want to just update the CFA offset, or do you also want to update the saved location of the register being pushed? Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1447bfbd10bb268b4593b32534ecefa1f4df287e.1444696194.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-14 16:56:28 +02:00
Ingo Molnar	790a2ee242	* Make the EFI System Resource Table (ESRT) driver explicitly non-modular by ripping out the module_* code since Kconfig doesn't allow it to be built as a module anyway - Paul Gortmaker * Make the x86 efi=debug kernel parameter, which enables EFI debug code and output, generic and usable by arm64 - Leif Lindholm * Add support to the x86 EFI boot stub for 64-bit Graphics Output Protocol frame buffer addresses - Matt Fleming * Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled in the firmware and set an efi.flags bit so the kernel knows when it can apply more strict runtime mapping attributes - Ard Biesheuvel * Auto-load the efi-pstore module on EFI systems, just like we currently do for the efivars module - Ben Hutchings * Add "efi_fake_mem" kernel parameter which allows the system's EFI memory map to be updated with additional attributes for specific memory ranges. This is useful for testing the kernel code that handles the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware doesn't include support - Taku Izumi -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWG7OwAAoJEC84WcCNIz1VEEEP/0SsdrwJ66B4MfP5YNjqHYWm +OTHR6Ovv2i10kc+NjOV/GN8sWPndnkLfIfJ4EqJ9BoQ9PDEYZilV2aleSQ4DrPm H7uGwBXQkfd76tZKX9pMToK76mkhg6M7M2LR3Suv3OGfOEzuozAOt3Ez37lpksTN 2ByhHr/oGbhu99jC2ki5+k0ySH8PMqDBRxqrPbBzTD+FfB7bM11vAJbSNbSMQ21R ZwX0acZBLqb9J2Vf7tDsW+fCfz0TFo8JHW8jdLRFm/y2dpquzxswkkBpODgA8+VM 0F5UbiUdkaIRug75I6N/OJ8+yLwdzuxm7ul+tbS3JrXGLAlK3850+dP2Pr5zQ2Ce zaYGRUy+tD5xMXqOKgzpu+Ia8XnDRLhOlHabiRd5fG6ZC9nR8E9uK52g79voSN07 pADAJnVB03CGV/HdduDOI4C4UykUKubuArbQVkqWJcecV1Jic/tYI0gjeACmU1VF v8FzXpBUe3U3A0jauOz8PBz8M+k5qky/GbIrnEvXreBtKdt999LN9fykTN7rBOpo dk/6vTR1Jyv3aYc9EXHmRluktI6KmfWCqmRBOIgQveX1VhdRM+1w2LKC0+8co3dF v/DBh19KDyfPI8eOvxKykhn164UeAt03EXqDa46wFGr2nVOm/JiShL/d+QuyYU4G 8xb/rET4JrhCG4gFMUZ7 =1Oee -----END PGP SIGNATURE----- Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into core/efi Pull v4.4 EFI updates from Matt Fleming: - Make the EFI System Resource Table (ESRT) driver explicitly non-modular by ripping out the module_* code since Kconfig doesn't allow it to be built as a module anyway. (Paul Gortmaker) - Make the x86 efi=debug kernel parameter, which enables EFI debug code and output, generic and usable by arm64. (Leif Lindholm) - Add support to the x86 EFI boot stub for 64-bit Graphics Output Protocol frame buffer addresses. (Matt Fleming) - Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled in the firmware and set an efi.flags bit so the kernel knows when it can apply more strict runtime mapping attributes - Ard Biesheuvel - Auto-load the efi-pstore module on EFI systems, just like we currently do for the efivars module. (Ben Hutchings) - Add "efi_fake_mem" kernel parameter which allows the system's EFI memory map to be updated with additional attributes for specific memory ranges. This is useful for testing the kernel code that handles the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware doesn't include support. (Taku Izumi) Note: there is a semantic conflict between the following two commits: `8a53554e12` ("x86/efi: Fix multiple GOP device support") `ae2ee627dc` ("efifb: Add support for 64-bit frame buffer addresses") I fixed up the interaction in the merge commit, changing the type of current_fb_base from u32 to u64. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-14 16:51:34 +02:00
Ingo Molnar	c7d77a7980	Merge branch 'x86/urgent' into core/efi, to pick up a pending EFI fix Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-14 16:05:18 +02:00
Martin Schwidefsky	a7b7617493	mm: add architecture primitives for software dirty bit clearing There are primitives to create and query the software dirty bits in a pte or pmd. But the clearing of the software dirty bits is done in common code with x86 specific page table functions. Add the missing architecture primitives to clear the software dirty bits to allow the feature to be used on non-x86 systems, e.g. the s390 architecture. Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2015-10-14 14:32:05 +02:00
Paolo Bonzini	58f800d5ac	Merge branch 'kvm-master' into HEAD This merge brings in a couple important SMM fixes, which makes it easier to test latest KVM with unrestricted_guest=0 and to test the in-progress work on SMM support in the firmware. Conflicts: arch/x86/kvm/x86.c	2015-10-13 21:32:50 +02:00
Paolo Bonzini	1d8007bdee	KVM: x86: build kvm_userspace_memory_region in x86_set_memory_region The next patch will make x86_set_memory_region fill the userspace_addr. Since the struct is not used untouched anymore, it makes sense to build it in x86_set_memory_region directly; it also simplifies the callers. Reported-by: Alexandre DERUMIER <aderumier@odiso.com> Cc: stable@vger.kernel.org Fixes: `9da0e4d5ac` Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-13 18:28:46 +02:00
Borislav Petkov	0399f73299	x86/microcode/amd: Do not overwrite final patch levels A certain number of patch levels of applied microcode should not be overwritten by the microcode loader, otherwise bad things will happen. Check those and abort update if the current core has one of those final patch levels applied by the BIOS. 32-bit needs special handling, of course. See https://bugzilla.suse.com/show_bug.cgi?id=913996 for more info. Tested-by: Peter Kirchgeßner <pkirchgessner@t-online.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1444641762-9437-7-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-12 16:15:48 +02:00
Borislav Petkov	2eff73c0a1	x86/microcode/amd: Extract current patch level read to a function Pave the way for checking the current patch level of the microcode in a core. We want to be able to do stuff depending on the patch level - in this case decide whether to update or not. But that will be added in a later patch. Drop unused local var uci assignment, while at it. Integrate a fix for 32-bit and CONFIG_PARAVIRT from Takashi Iwai: Use native_rdmsr() in check_current_patch_level() because with CONFIG_PARAVIRT enabled and on 32-bit, where we run before paging has been enabled, we cannot deref pv_info yet. Or we could, but we'd need to access its physical address. This way of fixing it is simpler. See: https://bugzilla.suse.com/show_bug.cgi?id=943179 for the background. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Takashi Iwai <tiwai@suse.com>: Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1444641762-9437-6-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-12 16:15:48 +02:00
Taku Izumi	0bbea1ce98	x86/efi: Rename print_efi_memmap() to efi_print_memmap() This patch renames print_efi_memmap() to efi_print_memmap() and make it global function so that we can invoke it outside of arch/x86/platform/efi/efi.c Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2015-10-12 14:20:08 +01:00
Minfei Huang	e9c40d257f	x86/kexec: Remove obsolete 'in_crash_kexec' flag Previously, UV NMI used the 'in_crash_kexec' flag to determine whether we are in a kdump kernel or not: `5edd19af18` ("x86, UV: Make kdump avoid stack dumps") But this flags was removed in the following commit: `9c48f1c629` ("x86, nmi: Wire up NMI handlers to new routines") Since it isn't used any more, remove it. Signed-off-by: Minfei Huang <mnfhuang@gmail.com> Acked-by: Don Zickus <dzickus@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: cpw@sgi.com Cc: kexec@lists.infradead.org Cc: mhuang@redhat.com Link: http://lkml.kernel.org/r/1444070155-17934-1-git-send-email-mhuang@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-12 09:43:11 +02:00
Gabriel Laskar	7e0abcd6b7	x86/mce: Include linux/ioctl.h in uapi mce header asm/ioctls.h contains definition for termios, not just the _IO* macros. This error was found with a tool in development used to generate automated pretty-printing functions for ioctl decoding in strace. Signed-off-by: Gabriel Laskar <gabriel@lse.epita.fr> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1444141657-14898-2-git-send-email-gabriel@lse.epita.fr Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-10-11 21:24:27 +02:00
Andy Lutomirski	487e3bf4f7	x86/asm: Remove thread_info.sysenter_return It's no longer needed. We could reinstate something like it as an optimization, which would remove two cachelines from the fast syscall entry working set. I benchmarked it, and it makes no difference whatsoever to the performance of cache-hot compat syscalls on Sandy Bridge. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/f08cc0cff30201afe9bb565c47134c0a6c1a96a2.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-09 09:41:11 +02:00
Andy Lutomirski	eb974c6256	x86/syscalls: Give sys_call_ptr_t a useful type Syscalls are asmlinkage functions (on 32-bit kernels), take six args of type unsigned long, and return long. Note that uml could probably be slightly cleaned up on top of this patch. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/4d3ecc4a169388d47009175408b2961961744e6f.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-09 09:41:08 +02:00
Andy Lutomirski	034042cc1e	x86/entry/syscalls: Move syscall table declarations into asm/syscalls.h The header was missing some compat declarations. Also make sys_call_ptr_t have a consistent type. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/3166aaff0fb43897998fcb6ef92991533f8c5c6c.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-09 09:41:08 +02:00
Andy Lutomirski	8242c6c84a	x86/vdso/32: Save extra registers in the INT80 vsyscall path The goal is to integrate the SYSENTER and SYSCALL32 entry paths with the INT80 path. SYSENTER clobbers ESP and EIP. SYSCALL32 clobbers ECX (and, invisibly, R11). SYSRETL (long mode to compat mode) clobbers ECX and, invisibly, R11. SYSEXIT (which we only need for native 32-bit) clobbers ECX and EDX. This means that we'll need to provide ESP to the kernel in a register (I chose ECX, since it's only needed for SYSENTER) and we need to provide the args that normally live in ECX and EDX in memory. The epilogue needs to restore ECX and EDX, since user code relies on regs being preserved. We don't need to do anything special about EIP, since the kernel already knows where we are. The kernel will eventually need to know where int $0x80 lands, so add a vdso_image entry for it. The only user-visible effect of this code is that ptrace-induced changes to ECX and EDX during fast syscalls will be lost. This is already the case for the SYSENTER path. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/b860925adbee2d2627a0671fbfe23a7fd04127f8.1444091584.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-09 09:41:06 +02:00
Andy Lutomirski	7bcdea4d05	x86/elf/64: Clear more registers in elf_common_init() Before we start calling execve in contexts that honor the full pt_regs, we need to teach it to initialize all registers. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/65a38a9edee61a1158cfd230800c61dbd963dac5.1444091584.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-09 09:41:06 +02:00
Andy Lutomirski	f24f910884	x86/vdso: Define BUILD_VDSO while building and emit .eh_frame in asm For the vDSO, user code wants runtime unwind info. Make sure that, if we use .cfi directives, we generate it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/16e29ad8855e6508197000d8c41f56adb00d7580.1444091584.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-09 09:41:05 +02:00
Andy Lutomirski	7b956f035a	x86/asm: Re-add parts of the manual CFI infrastructure Commit: `131484c8da` ("x86/debug: Remove perpetually broken, unmaintainable dwarf annotations") removed all the manual DWARF annotations outside the vDSO. It also removed the macros we used for the manual annotations. Re-add these macros so that we can clean up the vDSO annotations. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/4c70bb98a8b773c8ccfaabf6745e569ff43e7f65.1444091584.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-09 09:41:05 +02:00
Andy Lutomirski	0a6d1fa0d2	x86/vdso: Remove runtime 32-bit vDSO selection 32-bit userspace will now always see the same vDSO, which is exactly what used to be the int80 vDSO. Subsequent patches will clean it up and make it support SYSENTER and SYSCALL using alternatives. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/e7e6b3526fa442502e6125fe69486aab50813c32.1444091584.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-07 11:34:08 +02:00
Andy Lutomirski	7e0f51cb44	x86/uaccess: Add unlikely() to __chk_range_not_ok() failure paths This should improve code quality a bit. It also shrinks the kernel text: Before: text data bss dec filename 21828379 5194760 1277952 28301091 vmlinux After: text data bss dec filename 21827997 5194760 1277952 28300709 vmlinux ... by 382 bytes. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/f427b8002d932e5deab9055e0074bb4e7e80ee39.1444091584.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-07 11:34:06 +02:00
Andy Lutomirski	a76cf66e94	x86/uaccess: Tell the compiler that uaccess is unlikely to fault GCC doesn't realize that get_user(), put_user(), and their __ variants are unlikely to fail. Tell it. I noticed this while playing with the C entry code. Before: text data bss dec filename 21828763 5194760 1277952 28301475 vmlinux.baseline After: text data bss dec filename 21828379 5194760 1277952 28301091 vmlinux.new The generated code shrunk by 384 bytes. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/dc37bed7024319c3004d950d57151fca6aeacf97.1444091584.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-07 11:34:06 +02:00
Ingo Molnar	25a9a924c0	Merge branch 'linus' into x86/asm, to pick up fixes before applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-07 11:24:24 +02:00
Ingo Molnar	82fc167c39	Linux 4.3-rc4 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWEUxnAAoJEHm+PkMAQRiGYCYH/3gtGkFdvSLi+E1PfI8Qk3ZA XuYA4Mj09JBVSmaICeueMTDVrdiq0OE0zPib26GWlF/za13kNU8KgMR3+6XCuLSX DiCmh6mwDItoNoSIIUERLqrFHABXz8rZ3gb3uu2+kNN74Cl0piNm1YpFclEEWjMr 9Wk5fkq+ontnDVUQOvWUxPiUXOJTvdLXBWTRDw1yTdE3RMNwRI2d/hme6Hq++WYV tRalZZKQaoB33js9WRVAoLVunvtna+i+/y7VGLj8QyS0+d6ec81Hey2r1/fR/oG4 bs4ul6vtqeb3IR/PjUqxF59pSrCLEO+qrp9KrTlJNYgr1m1QyjRxWUdy/XhyaWo= =gIhN -----END PGP SIGNATURE----- Merge tag 'v4.3-rc4' into locking/core, to pick up fixes before applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-06 17:10:28 +02:00
Peter Zijlstra	d87b7a3379	sched/core, sched/x86: Kill thread_info::saved_preempt_count With the introduction of the context switch preempt_count invariant, and the demise of PREEMPT_ACTIVE, its pointless to save/restore the per-cpu preemption count, it must always be 2. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-06 17:08:18 +02:00
Peter Zijlstra	609ca06638	sched/core: Create preempt_count invariant Assuming units of PREEMPT_DISABLE_OFFSET for preempt_count() numbers. Now that TASK_DEAD no longer results in preempt_count() == 3 during scheduling, we will always call context_switch() with preempt_count() == 2. However, we don't always end up with preempt_count() == 2 in finish_task_switch() because new tasks get created with preempt_count() == 1. Create FORK_PREEMPT_COUNT and set it to 2 and use that in the right places. Note that we cannot use INIT_PREEMPT_COUNT as that serves another purpose (boot). After this, preempt_count() is invariant across the context switch, with exception of PREEMPT_ACTIVE. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-06 17:08:14 +02:00
Linus Torvalds	f6702681a0	xen: bug fixes for 4.3-rc4 - Fix VM save performance regression with x86 PV guests. - Make kexec work in x86 PVHVM guests (if Xen has the soft-reset ABI). - Other minor fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWE8wQAAoJEFxbo/MsZsTRVTMH/0eqSg2M78wv4sBl234Y3FE9 AN8KFUdlkK7VN9v0uuLMDSKIWNUuFJIvo/2rElWGRiX2Q+/pfnQg3ZSFhub9S8uL T4LCvmG9viRFb2oUz792ewqncSw3X98Jpto4smA820gJRjndBSWm5HUKUtPAkv1M l5DFMEgOeHbu+wCbKD/ZPEt5K9GsIaNviSNoWtYHirZwrd00oLmNbWp+g8lIGQiT 3vLW0SaZzjL6akKxihb/p3WZ9eNmyz8yk0V7dItUEVUB9qoaDDLJ5qIRSHHWTWQD Jza/GE32VallZLuEXGG5/D86MsnyVYHC+lZtwo2IptOGm8v7WuZRv094wI1ev5c= =aiDw -----END PGP SIGNATURE----- Merge tag 'for-linus-4.3b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen bug fixes from David Vrabel: - Fix VM save performance regression with x86 PV guests - Make kexec work in x86 PVHVM guests (if Xen has the soft-reset ABI) - Other minor fixes. * tag 'for-linus-4.3b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen/p2m: hint at the last populated P2M entry x86/xen: Do not clip xen_e820_map to xen_e820_map_entries when sanitizing map x86/xen: Support kexec/kdump in HVM guests by doing a soft reset xen/x86: Don't try to write syscall-related MSRs for PV guests xen: use correct type for HYPERVISOR_memory_op()	2015-10-06 15:05:02 +01:00
Stephen Smalley	e1a58320a3	x86/mm: Warn on W^X mappings Warn on any residual W+X mappings after setting NX if DEBUG_WX is enabled. Introduce a separate X86_PTDUMP_CORE config that enables the code for dumping the page tables without enabling the debugfs interface, so that DEBUG_WX can be enabled without exposing the debugfs interface. Switch EFI_PGT_DUMP to using X86_PTDUMP_CORE so that it also does not require enabling the debugfs interface. On success it prints this to the kernel log: x86/mm: Checked W+X mappings: passed, no W+X pages found. On failure it prints a warning and a count of the failed pages: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 note_page+0x610/0x7b0() x86/mm: Found insecure W+X mapping at address ffffffff81755000/__stop___ex_table+0xfa8/0xabfa8 [...] Call Trace: [<ffffffff81380a5f>] dump_stack+0x44/0x55 [<ffffffff8109d3f2>] warn_slowpath_common+0x82/0xc0 [<ffffffff8109d48c>] warn_slowpath_fmt+0x5c/0x80 [<ffffffff8106cfc9>] ? note_page+0x5c9/0x7b0 [<ffffffff8106d010>] note_page+0x610/0x7b0 [<ffffffff8106d409>] ptdump_walk_pgd_level_core+0x259/0x3c0 [<ffffffff8106d5a7>] ptdump_walk_pgd_level_checkwx+0x17/0x20 [<ffffffff81063905>] mark_rodata_ro+0xf5/0x100 [<ffffffff817415a0>] ? rest_init+0x80/0x80 [<ffffffff817415bd>] kernel_init+0x1d/0xe0 [<ffffffff8174cd1f>] ret_from_fork+0x3f/0x70 [<ffffffff817415a0>] ? rest_init+0x80/0x80 ---[ end trace a1f23a1e42a2ac76 ]--- x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1444064120-11450-1-git-send-email-sds@tycho.nsa.gov [ Improved the Kconfig help text and made the new option default-y if CONFIG_DEBUG_RODATA=y, because it already found buggy mappings, so we really want people to have this on by default. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-06 11:11:48 +02:00
Ingo Molnar	38a413cbc2	Linux 4.3-rc3 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWB9f6AAoJEHm+PkMAQRiGiFMIAJYFLIkF/dXFYMNPGsRGRGYO SsQkfYzjy4i/yloyVlGB33e6dqxWdVgCeqYC77TO+1CBq34o6dqM4PACTrhjtS+3 qQvaP/qn6cSoaGIkdD3v43CCiwMpZZ5+Uj7F7Uz8N4twrpykOZFMM5T7f1lrsG2F wJGafmvok9NU2F2wYwaJ8JrzsF6iO6ibFeB8BosRF5Ba4nKqiXVI0xNa0R8PFDm3 tbh/IkkqokemEqnHyWyszhGFsCQupi+QgsjY/LhWUcCaL7HLEgJmkBX0tXNlgMmK TFCq7L8Bigu4nlgZ/iVUB9kh4GTBNVcbdRVN3loJFlczFJlIAa171OVlfRu3lvU= =m29x -----END PGP SIGNATURE----- Merge tag 'v4.3-rc3' into x86/mm, to pick up fixes before applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-06 10:56:54 +02:00
Linus Torvalds	2cf30826bb	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Fixes all around the map: W+X kernel mapping fix, WCHAN fixes, two build failure fixes for corner case configs, x32 header fix and a speling fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds x86/mm: Set NX on gap between __ex_table and rodata x86/kexec: Fix kexec crash in syscall kexec_file_load() x86/process: Unify 32bit and 64bit implementations of get_wchan() x86/process: Add proper bound checks in 64bit get_wchan() x86, efi, kasan: Fix build failure on !KASAN && KMEMCHECK=y kernels x86/hyperv: Fix the build in the !CONFIG_KEXEC_CORE case x86/cpufeatures: Correct spelling of the HWP_NOTIFY flag	2015-10-03 10:53:05 -04:00
Ben Hutchings	f4b4aae182	x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds On x32, gcc predefines __x86_64__ but long is only 32-bit. Use __ILP32__ to distinguish x32. Fixes this compiler error in perf: tools/include/asm-generic/bitops/__ffs.h: In function '__ffs': tools/include/asm-generic/bitops/__ffs.h:19:8: error: right shift count >= width of type [-Werror=shift-count-overflow] word >>= 32; ^ This isn't sufficient to build perf for x32, though. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443660043.2730.15.camel@decadent.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-02 09:43:21 +02:00
Linus Torvalds	bde17b90dd	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "12 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: dmapool: fix overflow condition in pool_find_page() thermal: avoid division by zero in power allocator memcg: remove pcp_counter_lock kprobes: use _do_fork() in samples to make them work again drivers/input/joystick/Kconfig: zhenhua.c needs BITREVERSE memcg: make mem_cgroup_read_stat() unsigned memcg: fix dirty page migration dax: fix NULL pointer in __dax_pmd_fault() mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault mm/slab: fix unexpected index mapping result of kmalloc_size(INDEX_NODE+1) userfaultfd: remove kernel header include from uapi header arch/x86/include/asm/efi.h: fix build failure	2015-10-01 22:20:11 -04:00
Andrey Ryabinin	a523841ee4	arch/x86/include/asm/efi.h: fix build failure With KMEMCHECK=y, KASAN=n: arch/x86/platform/efi/efi.c:673:3: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration] arch/x86/platform/efi/efi_64.c:139:2: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration] arch/x86/include/asm/desc.h:121:2: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration] Don't #undef memcpy if KASAN=n. Fixes: `769a8089c1` ("x86, efi, kasan: #undef memset/memcpy/memmove per arch") Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Reported-by: Ingo Molnar <mingo@kernel.org> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-10-01 21:42:35 -04:00
Linus Torvalds	ccf70ddcbe	(Relatively) a lot of reverts, mostly. Bugs have trickled in for a new feature in 4.2 (MTRR support in guests) so I'm reverting it all; let's not make this -rc period busier for KVM than it's been so far. This covers the four reverts from me. The fifth patch is being reverted because Radim found a bug in the implementation of stable scheduler clock, but also managed to implement the feature entirely without hypervisor support. So instead of fixing the hypervisor side we can remove it completely; 4.4 will get the new implementation. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJWDXc/AAoJEL/70l94x66D8GoH/0WXeSYHn8+Ql5oZ5vI0QcCG 6MiKVixhHTOpkug2QE4DGClYoFSUPuDEB/w6D7YciNn0quDHFZbI3XEMXYtLobHN 0J9cMv9Vpy5pBVMG/LJOw9pFAJRdhSx/cHU2DW9vUiRG9dO9zuxFzBtUciWLOPAX tSQfDumeUV30BsTP5ldi9kaIUJBM9oBD4JhES0JHx6ePBvy+9vCRmHotugzrrGx6 N+AbCmwUwxnK29PF9i7KMfex6T8l1uQG3fwWVazHoswsqbFEQyF6NpaSTYoZkjM9 6gaXEE1FQ7tRhuio4bBDos0lLu6iGesveP71p/HpULleq2sbH2ER8TpzR5iSnQA= =zAJS -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "(Relatively) a lot of reverts, mostly. Bugs have trickled in for a new feature in 4.2 (MTRR support in guests) so I'm reverting it all; let's not make this -rc period busier for KVM than it's been so far. This covers the four reverts from me. The fifth patch is being reverted because Radim found a bug in the implementation of stable scheduler clock, but also managed to implement the feature entirely without hypervisor support. So instead of fixing the hypervisor side we can remove it completely; 4.4 will get the new implementation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS Update KVM homepage Url Revert "KVM: SVM: use NPT page attributes" Revert "KVM: svm: handle KVM_X86_QUIRK_CD_NW_CLEARED in svm_get_mt_mask" Revert "KVM: SVM: Sync g_pat with guest-written PAT value" Revert "KVM: x86: apply guest MTRR virtualization on host reserved pages" Revert "KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR"	2015-10-01 16:43:25 -04:00
Feng Wu	bf9f6ac8d7	KVM: Update Posted-Interrupts Descriptor when vCPU is blocked This patch updates the Posted-Interrupts Descriptor when vCPU is blocked. pre-block: - Add the vCPU to the blocked per-CPU list - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR post-block: - Remove the vCPU from the per-CPU list Signed-off-by: Feng Wu <feng.wu@intel.com> [Concentrate invocation of pre/post-block hooks to vcpu_block. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:53 +02:00
Feng Wu	8727688006	KVM: x86: select IRQ_BYPASS_MANAGER Select IRQ_BYPASS_MANAGER for x86 when CONFIG_KVM is set Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:52 +02:00
Feng Wu	efc644048e	KVM: x86: Update IRTE for posted-interrupts This patch adds the routine to update IRTE for posted-interrupts when guest changes the interrupt configuration. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> [Squashed in automatically generated patch from the build robot "KVM: x86: vcpu_to_pi_desc() can be static" - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:51 +02:00
Feng Wu	d84f1e0755	KVM: make kvm_set_msi_irq() public Make kvm_set_msi_irq() public, we can use this function outside. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:50 +02:00
Feng Wu	8feb4a04dc	KVM: Define a new interface kvm_intr_is_single_vcpu() This patch defines a new interface kvm_intr_is_single_vcpu(), which can returns whether the interrupt is for single-CPU or not. It is used by VT-d PI, since now we only support single-CPU interrupts, For lowest-priority interrupts, if user configures it via /proc/irq or uses irqbalance to make it single-CPU, we can use PI to deliver the interrupts to it. Full functionality of lowest-priority support will be added later. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:49 +02:00
Paolo Bonzini	18cd52c4d9	irq_remapping: move structs outside #ifdef This is friendlier to clients of the code, who are going to prepare vcpu_data structs unconditionally, even if CONFIG_IRQ_REMAP is not defined. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:42 +02:00
Xiao Guangrong	8b3e34e46a	KVM: x86: add pcommit support Pass PCOMMIT CPU feature to guest to enable PCOMMIT instruction Currently we do not catch pcommit instruction for L1 guest and allow L1 to catch this instruction for L2 if, as required by the spec, L1 can enumerate the PCOMMIT instruction via CPUID: \| IA32_VMX_PROCBASED_CTLS2[53] (which enumerates support for the \| 1-setting of PCOMMIT exiting) is always the same as \| CPUID.07H:EBX.PCOMMIT[bit 22]. Thus, software can set PCOMMIT exiting \| to 1 if and only if the PCOMMIT instruction is enumerated via CPUID The spec can be found at https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:35 +02:00
Andrey Smetanin	9eec50b8bb	kvm/x86: Hyper-V HV_X64_MSR_VP_RUNTIME support HV_X64_MSR_VP_RUNTIME msr used by guest to get "the time the virtual processor consumes running guest code, and the time the associated logical processor spends running hypervisor code on behalf of that guest." Calculation of this time is performed by task_cputime_adjusted() for vcpu task. Necessary to support loading of winhv.sys in guest, which in turn is required to support Windows VMBus. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:33 +02:00
Andrey Smetanin	e516cebb4f	kvm/x86: Hyper-V HV_X64_MSR_RESET msr HV_X64_MSR_RESET msr is used by Hyper-V based Windows guest to reset guest VM by hypervisor. Necessary to support loading of winhv.sys in guest, which in turn is required to support Windows VMBus. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:32 +02:00
Steve Rutherford	1c1a9ce973	KVM: x86: Add support for local interrupt requests from userspace In order to enable userspace PIC support, the userspace PIC needs to be able to inject local interrupts even when the APICs are in the kernel. KVM_INTERRUPT now supports sending local interrupts to an APIC when APICs are in the kernel. The ready_for_interrupt_request flag is now only set when the CPU/APIC will immediately accept and inject an interrupt (i.e. APIC has not masked the PIC). When the PIC wishes to initiate an INTA cycle with, say, CPU0, it kicks CPU0 out of the guest, and renedezvous with CPU0 once it arrives in userspace. When the CPU/APIC unmasks the PIC, a KVM_EXIT_IRQ_WINDOW_OPEN is triggered, so that userspace has a chance to inject a PIC interrupt if it had been pending. Overall, this design can lead to a small number of spurious userspace renedezvous. In particular, whenever the PIC transistions from low to high while it is masked and whenever the PIC becomes unmasked while it is low. Note: this does not buffer more than one local interrupt in the kernel, so the VMM needs to enter the guest in order to complete interrupt injection before injecting an additional interrupt. Compiles for x86. Can pass the KVM Unit Tests. Signed-off-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:29 +02:00
Steve Rutherford	b053b2aef2	KVM: x86: Add EOI exit bitmap inference In order to support a userspace IOAPIC interacting with an in kernel APIC, the EOI exit bitmaps need to be configurable. If the IOAPIC is in userspace (i.e. the irqchip has been split), the EOI exit bitmaps will be set whenever the GSI Routes are configured. In particular, for the low MSI routes are reservable for userspace IOAPICs. For these MSI routes, the EOI Exit bit corresponding to the destination vector of the route will be set for the destination VCPU. The intention is for the userspace IOAPICs to use the reservable MSI routes to inject interrupts into the guest. This is a slight abuse of the notion of an MSI Route, given that MSIs classically bypass the IOAPIC. It might be worthwhile to add an additional route type to improve clarity. Compile tested for Intel x86. Signed-off-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:28 +02:00
Steve Rutherford	7543a635aa	KVM: x86: Add KVM exit for IOAPIC EOIs Adds KVM_EXIT_IOAPIC_EOI which allows the kernel to EOI level-triggered IOAPIC interrupts. Uses a per VCPU exit bitmap to decide whether or not the IOAPIC needs to be informed (which is identical to the EOI_EXIT_BITMAP field used by modern x86 processors, but can also be used to elide kvm IOAPIC EOI exits on older processors). [Note: A prototype using ResampleFDs found that decoupling the EOI from the VCPU's thread made it possible for the VCPU to not see a recent EOI after reentering the guest. This does not match real hardware.] Compile tested for Intel x86. Signed-off-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:27 +02:00
Steve Rutherford	49df6397ed	KVM: x86: Split the APIC from the rest of IRQCHIP. First patch in a series which enables the relocation of the PIC/IOAPIC to userspace. Adds capability KVM_CAP_SPLIT_IRQCHIP; KVM_CAP_SPLIT_IRQCHIP enables the construction of LAPICs without the rest of the irqchip. Compile tested for x86. Signed-off-by: Steve Rutherford <srutherford@google.com> Suggested-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:26 +02:00
Paolo Bonzini	d50ab6c1a2	KVM: x86: replace vm_has_apicv hook with cpu_uses_apicv This will avoid an unnecessary trip to ->kvm and from there to the VPIC. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:24 +02:00
Paolo Bonzini	3bb345f387	KVM: x86: store IOAPIC-handled vectors in each VCPU We can reuse the algorithm that computes the EOI exit bitmap to figure out which vectors are handled by the IOAPIC. The only difference between the two is for edge-triggered interrupts other than IRQ8 that have no notifiers active; however, the IOAPIC does not have to do anything special for these interrupts anyway. This again limits the interactions between the IOAPIC and the LAPIC, making it easier to move the former to userspace. Inspired by a patch from Steve Rutherford. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:23 +02:00
Paolo Bonzini	82f6c9cd90	Merge branch 'x86/for-kvm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD This merges a cleanup of asm/apic.h, which is needed by the KVM patches to support VT-d posted interrupts. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:02:45 +02:00
Thomas Gleixner	09907ca630	Merge branch 'x86/for-kvm' into x86/apic Pull in the apic change which is provided for kvm folks to pull into their tree.	2015-09-30 21:20:39 +02:00
Paolo Bonzini	e02ae38713	x86/x2apic: Make stub functions available even if !CONFIG_X86_LOCAL_APIC Some CONFIG_X86_X2APIC functions, especially x2apic_enabled(), are not declared if !CONFIG_X86_LOCAL_APIC. However, the same stubs that work for !CONFIG_X86_X2APIC are okay even if there is no local APIC support at all. Avoid the introduction of #ifdefs by moving the x2apic declarations completely outside the CONFIG_X86_LOCAL_APIC block. (Unfortunately, diff generation messes up the actual change that this patch makes). There is no semantic change because CONFIG_X86_X2APIC depends on CONFIG_X86_LOCAL_APIC. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Feng Wu <feng.wu@intel.com> Link: http://lkml.kernel.org/r/1443435991-35750-1-git-send-email-pbonzini@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-09-30 21:17:36 +02:00
Andrey Ryabinin	4ac86a6dce	x86, efi, kasan: Fix build failure on !KASAN && KMEMCHECK=y kernels With KMEMCHECK=y, KASAN=n we get this build failure: arch/x86/platform/efi/efi.c:673:3: error: implicit declaration of function ‘memcpy’ [-Werror=implicit-function-declaration] arch/x86/platform/efi/efi_64.c:139:2: error: implicit declaration of function ‘memcpy’ [-Werror=implicit-function-declaration] arch/x86/include/asm/desc.h:121:2: error: implicit declaration of function ‘memcpy’ [-Werror=implicit-function-declaration] Don't #undef memcpy if KASAN=n. Reported-by: Ingo Molnar <mingo@kernel.org> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `769a8089c1` ("x86, efi, kasan: #undef memset/memcpy/memmove per arch") Link: http://lkml.kernel.org/r/1443544814-20122-1-git-send-email-ryabinin.a.a@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-30 09:29:53 +02:00
Ingo Molnar	ccf79c238f	Linux 4.3-rc3 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWB9f6AAoJEHm+PkMAQRiGiFMIAJYFLIkF/dXFYMNPGsRGRGYO SsQkfYzjy4i/yloyVlGB33e6dqxWdVgCeqYC77TO+1CBq34o6dqM4PACTrhjtS+3 qQvaP/qn6cSoaGIkdD3v43CCiwMpZZ5+Uj7F7Uz8N4twrpykOZFMM5T7f1lrsG2F wJGafmvok9NU2F2wYwaJ8JrzsF6iO6ibFeB8BosRF5Ba4nKqiXVI0xNa0R8PFDm3 tbh/IkkqokemEqnHyWyszhGFsCQupi+QgsjY/LhWUcCaL7HLEgJmkBX0tXNlgMmK TFCq7L8Bigu4nlgZ/iVUB9kh4GTBNVcbdRVN3loJFlczFJlIAa171OVlfRu3lvU= =m29x -----END PGP SIGNATURE----- Merge tag 'v4.3-rc3' into x86/urgent, before applying dependent fix Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-30 09:29:27 +02:00
Juergen Gross	24f775a660	xen: use correct type for HYPERVISOR_memory_op() HYPERVISOR_memory_op() is defined to return an "int" value. This is wrong, as the Xen hypervisor will return "long". The sub-function XENMEM_maximum_reservation returns the maximum number of pages for the current domain. An int will overflow for a domain configured with 8TB of memory or more. Correct this by using the correct type. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-09-28 14:48:52 +01:00
Radim Krčmář	9bac175d8e	Revert "KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR" Shifting pvclock_vcpu_time_info.system_time on write to KVM system time MSR is a change of ABI. Probably only 2.6.16 based SLES 10 breaks due to its custom enhancements to kvmclock, but KVM never declared the MSR only for one-shot initialization. (Doc says that only one write is needed.) This reverts commit `b7e60c5aed`. And adds a note to the definition of PVCLOCK_COUNTS_FROM_ZERO. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-28 13:06:37 +02:00
Linus Torvalds	e3be4266d3	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Another pile of fixes for perf: - Plug overflows and races in the core code - Sanitize the flow of the perf syscall so we error out before handling the more complex and hard to undo setups - Improve and fix Broadwell and Skylake hardware support - Revert a fix which broke what it tried to fix in perf tools - A couple of smaller fixes in various places of perf tools" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Fix copying of /proc/kcore perf intel-pt: Remove no_force_psb from documentation perf probe: Use existing routine to look for a kernel module by dso->short_name perf/x86: Change test_aperfmperf() and test_intel() to static tools lib traceevent: Fix string handling in heterogeneous arch environments perf record: Avoid infinite loop at buildid processing with no samples perf: Fix races in computing the header sizes perf: Fix u16 overflows perf: Restructure perf syscall point of no return perf/x86/intel: Fix Skylake FRONTEND MSR extrareg mask perf/x86/intel/pebs: Add PEBS frontend profiling for Skylake perf/x86/intel: Make the CYCLE_ACTIVITY.* constraint on Broadwell more specific perf tools: Bool functions shouldn't return -1 tools build: Add test for presence of __get_cpuid() gcc builtin tools build: Add test for presence of numa_num_possible_cpus() in libnuma Revert "perf symbols: Fix mismatched declarations for elf_getphdrnum" perf stat: Fix per-pkg event reporting bug	2015-09-27 12:51:39 -04:00
Linus Torvalds	b6d980f493	AMD fixes for bugs introduced in the 4.2 merge window, and a few PPC bug fixes too. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJWBSn7AAoJEL/70l94x66Dxd4H/RT6kWWj9x4grEYUkcJUDyK2 AXm7XcKQm04auwAic8Otr+ts/Qix/50kWmBe/TU0QLgqb8rj5Dj3yGFK6Z1y6mAz KvaxqMJd4tZGTqN0DDvC2ItEdzjfAdeJZo/FHXqPHVspG0G14T7STLna02LTBBEJ tNzY9qor8nFhg2fT2szqKaudUNgTqkCTpo57o2BrHE96SHG+m0WdpQCV1F5hPVpg Te0Pb7qX9xng5n3sQ7IV/t3QYbrza1ACwNQS9XJa0Yu6iEz7JdmVmzHQASK9ynn6 hUHhsNYGx4IsPjPtfJk2GroNaRDZL+VMzw07tfcOvPx8xkS9hS63pwzmSBqfLrM= =Ywqn -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "AMD fixes for bugs introduced in the 4.2 merge window, and a few PPC bug fixes too" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: disable halt_poll_ns as default for s390x KVM: x86: fix off-by-one in reserved bits check KVM: x86: use correct page table format to check nested page table reserved bits KVM: svm: do not call kvm_set_cr0 from init_vmcb KVM: x86: trap AMD MSRs for the TSeg base and mask KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store() KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs kvm: svm: reset mmu on VCPU reset	2015-09-25 10:51:40 -07:00
David Hildenbrand	920552b213	KVM: disable halt_poll_ns as default for s390x We observed some performance degradation on s390x with dynamic halt polling. Until we can provide a proper fix, let's enable halt_poll_ns as default only for supported architectures. Architectures are now free to set their own halt_poll_ns default value. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-25 10:31:30 +02:00
Denys Vlasenko	0b101e62af	x86/asm: Force inlining of cpu_relax() On x86, cpu_relax() simply calls rep_nop(), which generates one instruction, PAUSE (aka REP NOP). With this config: http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os gcc-4.7.2 does not always inline rep_nop(): it generates several copies of this: <rep_nop> (16 copies, 194 calls): 55 push %rbp 48 89 e5 mov %rsp,%rbp f3 90 pause 5d pop %rbp c3 retq See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 This patch fixes this via s/inline/__always_inline/ on rep_nop() and cpu_relax(). ( Forcing inlining only on rep_nop() causes GCC to deinline cpu_relax(), with almost no change in generated code). text data bss dec hex filename 88118971 19905208 36421632 144445811 89c1173 vmlinux.before 88118139 19905208 36421632 144444979 89c0e33 vmlinux Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443096149-27291-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-25 09:44:34 +02:00
Andy Lutomirski	3f2c5085ed	x86/sched/64: Don't save flags on context switch (reinstated) This reinstates the following commit: `2c7577a758` ("sched/x86_64: Don't save flags on context switch") which was reverted in: `512255a2ad` ("Revert 'sched/x86_64: Don't save flags on context switch'") Historically, Linux has always saved and restored EFLAGS across context switches. As far as I know, the only reason to do this is because of the NT flag. In particular, if something calls switch_to() with the NT flag set, then we don't want to leak the NT flag into a different task that might try to IRET and fail because NT is set. Before this commit: `8c7aa698ba` ("x86_64, entry: Filter RFLAGS.NT on entry from userspace") we could run system call bodies with NT set. This would be a DoS or possibly privilege escalation hole if scheduling in such a system call would leak NT into a different task. Importantly, we don't need to worry about NT being set while preemptible or across page faults. The only way we can schedule due to preemption or a page fault is in an interrupt entry that nests inside the SYSENTER prologue. The CPU will clear NT when entering through an interrupt gate, so we won't schedule with NT set. The only other interesting flags are IOPL and AC. Allowing switch_to() to change IOPL has no effect, as the value loaded during kernel execution doesn't matter at all except between a SYSENTER entry and the subsequent PUSHF, and anythign that interrupts in that window will restore IOPL on return. If we call __switch_to() with AC set, we have bigger problems. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/d4440fdc2a89247bffb7c003d2a9a2952bd46827.1441146105.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-25 09:29:17 +02:00
Kristen Carlson Accardi	a7adb91b13	x86/cpufeatures: Correct spelling of the HWP_NOTIFY flag Because noitification just isn't right. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/1442944296-11737-1-git-send-email-kristen@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-23 09:57:24 +02:00
Peter Zijlstra	62e8a3258b	atomic, arch: Audit atomic_{read,set}() This patch makes sure that atomic_{read,set}() are at least {READ,WRITE}_ONCE(). We already had the 'requirement' that atomic_read() should use ACCESS_ONCE(), and most archs had this, but a few were lacking. All are now converted to use READ_ONCE(). And, by a symmetry and general paranoia argument, upgrade atomic_set() to use WRITE_ONCE(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: james.hogan@imgtec.com Cc: linux-kernel@vger.kernel.org Cc: oleg@redhat.com Cc: will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-23 09:54:28 +02:00
Andrey Ryabinin	769a8089c1	x86, efi, kasan: #undef memset/memcpy/memmove per arch In not-instrumented code KASAN replaces instrumented memset/memcpy/memmove with not-instrumented analogues __memset/__memcpy/__memove. However, on x86 the EFI stub is not linked with the kernel. It uses not-instrumented mem() functions from arch/x86/boot/compressed/string.c So we don't replace them with __mem() variants in EFI stub. On ARM64 the EFI stub is linked with the kernel, so we should replace mem() functions with __mem(), because the EFI stub runs before KASAN sets up early shadow. So let's move these #undef mem* into arch's asm/efi.h which is also included by the EFI stub. Also, this will fix the warning in 32-bit build reported by kbuild test robot: efi-stub-helper.c:599:2: warning: implicit declaration of function 'memcpy' [akpm@linux-foundation.org: use 80 cols in comment] Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Reported-by: Fengguang Wu <fengguang.wu@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-22 15:09:53 -07:00
Daniel J Blueman	ce2e572cfe	x86/numachip: Introduce Numachip2 timer mechanisms Add 1GHz 64-bit Numachip2 clocksource timer support for accurate system-wide timekeeping, as core TSCs are unsynchronised. Additionally, add a per-core clockevent mechanism that interrupts via the platform IPI vector after a programmed period. [ tglx: Taking it through x86 due to dependencies ] Signed-off-by: Daniel J Blueman <daniel@numascale.com> Acked-by: Steffen Persvold <sp@numascale.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Link: http://lkml.kernel.org/r/1442829745-29311-1-git-send-email-daniel@numascale.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-09-22 22:25:33 +02:00
Daniel J Blueman	ad03a9c25d	x86/numachip: Add Numachip IPI optimisations When sending IPIs, first check if the non-local part of the source and destination APIC IDs match; if so, send via the local APIC for efficiency. Secondly, since the AMD BIOS-kernel developer guide states IPI delivery will occur invarient of prior deliver status, avoid polling the delivery status bit for efficiency. Signed-off-by: Daniel J Blueman <daniel@numascale.com> Acked-by: Steffen Persvold <sp@numascale.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Link: http://lkml.kernel.org/r/1442768522-19217-3-git-send-email-daniel@numascale.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-09-22 22:25:33 +02:00
Daniel J Blueman	d9d4dee6ce	x86/numachip: Add Numachip2 APIC support Introduce support for Numachip2 remote interrupts via detecting the right ACPI SRAT signature. Access is performed via a fixed mapping in the x86 physical address space. Signed-off-by: Daniel J Blueman <daniel@numascale.com> Acked-by: Steffen Persvold <sp@numascale.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Link: http://lkml.kernel.org/r/1442768522-19217-2-git-send-email-daniel@numascale.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-09-22 22:25:33 +02:00
Daniel J Blueman	db1003a719	x86/numachip: Cleanup Numachip support Drop unused code and includes in Numachip header files and APIC driver. Additionally, use the 'numachip1' prefix on Numachip1-specific functions; this prepares for adding Numachip2 support in later patches. Signed-off-by: Daniel J Blueman <daniel@numascale.com> Acked-by: Steffen Persvold <sp@numascale.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Link: http://lkml.kernel.org/r/1442768522-19217-1-git-send-email-daniel@numascale.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-09-22 22:25:32 +02:00
Toshi Kani	bbac8c6dea	x86/asm: Add pud_pgprot() and pmd_pgprot() pte_pgprot() returns a pgprot_t value by calling pte_flags(). Now that pud_flags() and pmd_flags() work specifically for the pud/pmd levels, define pud_pgprot() and pmd_pgprot() for PUD/PMD. Also update pte_pgprot() to remove the unnecessary mask with PTE_FLAGS_MASK as pte_flags() takes care of it. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Konrad Wilk <konrad.wilk@oracle.com> Cc: Robert Elliot <elliott@hpe.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1442514264-12475-6-git-send-email-toshi.kani@hpe.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-09-22 21:27:32 +02:00
Toshi Kani	f70abb0fc3	x86/asm: Fix pud/pmd interfaces to handle large PAT bit Now that we have pud/pmd mask interfaces, which handle pfn & flags mask properly for the large PAT bit. Fix pud/pmd pfn & flags interfaces by replacing PTE_PFN_MASK and PTE_FLAGS_MASK with the pud/pmd mask interfaces. Suggested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Konrad Wilk <konrad.wilk@oracle.com> Cc: Robert Elliot <elliott@hpe.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1442514264-12475-5-git-send-email-toshi.kani@hpe.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-09-22 21:27:32 +02:00
Toshi Kani	4be4c1fb9a	x86/asm: Add pud/pmd mask interfaces to handle large PAT bit The PAT bit gets relocated to bit 12 when PUD and PMD mappings are used. This bit 12, however, is not covered by PTE_FLAGS_MASK, which is used for masking pfn and flags for all levels. Add pud/pmd mask interfaces to handle pfn and flags properly by using P?D_PAGE_MASK when PUD/PMD mappings are used, i.e. PSE bit is set. Suggested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Konrad Wilk <konrad.wilk@oracle.com> Cc: Robert Elliot <elliott@hpe.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1442514264-12475-4-git-send-email-toshi.kani@hpe.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-09-22 21:27:32 +02:00
Toshi Kani	8321026718	x86/asm: Move PUD_PAGE macros to page_types.h PUD_SHIFT is defined according to a given kernel configuration, which allows it be commonly used by any x86 kernels. However, PUD_PAGE_SIZE and PUD_PAGE_MASK, which are set from PUD_SHIFT, are defined in page_64_types.h, which can be used by 64-bit kernel only. Move PUD_PAGE_SIZE and PUD_PAGE_MASK to page_types.h so that they can be used by any x86 kernels as well. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Konrad Wilk <konrad.wilk@oracle.com> Cc: Robert Elliot <elliott@hpe.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1442514264-12475-3-git-send-email-toshi.kani@hpe.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-09-22 21:27:32 +02:00
Paolo Bonzini	3afb112180	KVM: x86: trap AMD MSRs for the TSeg base and mask These have roughly the same purpose as the SMRR, which we do not need to implement in KVM. However, Linux accesses MSR_K8_TSEG_ADDR at boot, which causes problems when running a Xen dom0 under KVM. Just return 0, meaning that processor protection of SMRAM is not in effect. Reported-by: M A Young <m.a.young@durham.ac.uk> Cc: stable@vger.kernel.org Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-21 07:41:22 +02:00
Linus Torvalds	3ae839454e	Mostly stable material, a lot of ARM fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJV+/ucAAoJEL/70l94x66DV8YH/1KDym/1GJ+/Br/YkHZnM53l 3Q0PwSLu9cNcIL9lUuDLwGTaVj+y8ud1Hjr/uzvKwivktmUYVZhkdtnZmnanvGOM qKB9K3nFXCPx8uqy8Dn7fOwEKcg9FmDOTTkWy13HDnXO+V4crSVVt+rPw+6FUMld NV5tYdw9Lu7y3XrveDebPWaPtyDL7OAagzmeK47eMffxG7X9Hf1H2aT7HueRi7x/ SkLIe3gmiOWmHVJDPE9TOmFYIj19gywDFysKes1gdVJLVUIXiELMT7SrvAYnToVB zISIEj7Zx4SINPxpf2dUn8REm7NsmJY+PffLIl/Nv+ozGggFQGFH0SMZ08p0bxw= =tfmn -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "Mostly stable material, a lot of ARM fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits) sched: access local runqueue directly in single_task_running arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS' arm64: KVM: Remove all traces of the ThumbEE registers arm: KVM: Disable virtual timer even if the guest is not using it arm64: KVM: Disable virtual timer even if the guest is not using it arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources KVM: s390: Replace incorrect atomic_or with atomic_andnot arm: KVM: Fix incorrect device to IPA mapping arm64: KVM: Fix user access for debug registers KVM: vmx: fix VPID is 0000H in non-root operation KVM: add halt_attempted_poll to VCPU stats kvm: fix zero length mmio searching kvm: fix double free for fast mmio eventfd kvm: factor out core eventfd assign/deassign logic kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd KVM: make the declaration of functions within 80 characters KVM: arm64: add workaround for Cortex-A57 erratum #852523 KVM: fix polling for guest halt continued even if disable it arm/arm64: KVM: Fix PSCI affinity info return value for non valid cores arm64: KVM: set {v,}TCR_EL2 RES1 bits ...	2015-09-18 09:23:08 -07:00
Andi Kleen	d0dc8494cd	perf/x86/intel/pebs: Add PEBS frontend profiling for Skylake Skylake has a new FRONTEND_LATENCY PEBS event to accurately profile frontend problems (like ITLB or decoding issues). The new event is configured through a separate MSR, which selects a range of sub events. Define the extra MSR as a extra reg and export support for it through sysfs. To avoid duplicating the existing tables use a new function to add new entries to existing tables. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1435707205-6676-4-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-18 09:20:22 +02:00
Linus Torvalds	42dc2a3048	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - misc fixes all around the map - block non-root vm86(old) if mmap_min_addr != 0 - two small debuggability improvements - removal of obsolete paravirt op * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform: Fix Geode LX timekeeping in the generic x86 build x86/apic: Serialize LVTT and TSC_DEADLINE writes x86/ioapic: Force affinity setting in setup_ioapic_dest() x86/paravirt: Remove the unused pv_time_ops::get_tsc_khz method x86/ldt: Fix small LDT allocation for Xen x86/vm86: Fix the misleading CONFIG_VM86 Kconfig help text x86/cpu: Print family/model/stepping in hex x86/vm86: Block non-root vm86(old) if mmap_min_addr != 0 x86/alternatives: Make optimize_nops() interrupt safe and synced x86/mm/srat: Print non-volatile flag in SRAT x86/cpufeatures: Enable cpuid for Intel SHA extensions	2015-09-17 11:01:34 -07:00
Linus Torvalds	9786cff38a	Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Spinlock performance regression fix, plus documentation fixes" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/static_keys: Fix up the static keys documentation locking/qspinlock/x86: Only emit the test-and-set fallback when building guest support locking/qspinlock/x86: Fix performance regression under unaccelerated VMs locking/static_keys: Fix a silly typo	2015-09-17 08:45:23 -07:00
Paolo Bonzini	62bea5bff4	KVM: add halt_attempted_poll to VCPU stats This new statistic can help diagnosing VCPUs that, for any reason, trigger bad behavior of halt_poll_ns autotuning. For example, say halt_poll_ns = 480000, and wakeups are spaced exactly like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes 10+20+40+80+160+320+480 = 1110 microseconds out of every 479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then is consuming about 30% more CPU than it would use without polling. This would show as an abnormally high number of attempted polling compared to the successful polls. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com< Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-16 12:17:00 +02:00
Juergen Gross	cda34fc774	x86/paravirt: Remove the unused pv_time_ops::get_tsc_khz method It's not used anywhere. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akataria@vmware.com Cc: chrisw@sous-sol.org Cc: jeremy@goop.org Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/1442227343-403-1-git-send-email-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-14 14:15:22 +02:00
Dave Hansen	060dd0f712	x86/fpu: Add C structures for AVX-512 state components AVX-512 has 3 separate state components: 1. opmask registers 2. zmm upper half of registers 0-15 3. new zmm registers (16-31) This patch adds C structures for the three components along with a few comments mostly lifted from the SDM to explain what they do. This will allow us to check our structures against what the hardware tells us about the sizes of the components. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233129.F2433B98@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-14 12:22:01 +02:00
Dave Hansen	83aa3c4530	x86/fpu: Rework YMM definition We are about to rework all of the "extended state" definitions. This makes the 'ymm' naming consistent with the AVX-512 types we will introduce later. We also add a convenience type: "reg_128_bit" so that we do not have to spell out our arithmetic. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233129.B4EB045F@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-14 12:22:00 +02:00
Dave Hansen	1126cb4535	x86/fpu/mpx: Rework MPX 'xstate' types MPX includes two separate "extended state components". There is no real need to have an 'mpx_struct' because we never really manage the states together. We also separate out the actual data in 'mpx_bndcsr_state' from the padding. We will shortly be checking the state sizes against our structures and need them to match. For consistency, we also ensure to prefix these types with 'mpx_'. Lastly, we add some comments to mirror some of the descriptions in the Intel documents (SDM) of the various state components. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233129.384B73EB@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-14 12:22:00 +02:00
Dave Hansen	8a93c9e0dc	x86/fpu: Rework XSTATE_* macros to remove magic '2' The 'xstate.c' code has a bunch of references to '2'. This is because we have a lot more work to do for the "extended" xstates than the "legacy" ones and state component 2 is the first "extended" state. This patch replaces all of the instances of '2' with FIRST_EXTENDED_XFEATURE, which clearly explains what is going on. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233128.A8C0BF51@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-14 12:21:58 +02:00
Dave Hansen	dad8c4fe85	x86/fpu: Rename XFEATURES_NR_MAX This is a logcal followon to the last patch. It makes the XFEATURE_MAX naming consistent with the other enum values. This is what Ingo suggested. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233127.A541448F@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-14 12:21:57 +02:00
Dave Hansen	d91cab7813	x86/fpu: Rename XSAVE macros There are two concepts that have some confusing naming: 1. Extended State Component numbers (currently called XFEATURE_BIT_) 2. Extended State Component masks (currently called XSTATE_) The numbers are (currently) from 0-9. State component 3 is the bounds registers for MPX, for instance. But when we want to enable "state component 3", we go set a bit in XCR0. The bit we set is 1<<3. We can check to see if a state component feature is enabled by looking at its bit. The current 'xfeature_bit's are at best xfeature bit _numbers_. Calling them bits is at best inconsistent with ending the enum list with 'XFEATURES_NR_MAX'. This patch renames the enum to be 'xfeature'. These also happen to be what the Intel documentation calls a "state component". We also want to differentiate these from the "XSTATE_" macros. The "XSTATE_" macros are a mask, and we rename them to match. These macros are reasonably widely used so this patch is a wee bit big, but this really is just a rename. The only non-mechanical part of this is the s/XSTATE_EXTEND_MASK/XFEATURE_MASK_EXTEND/ We need a better name for it, but that's another patch. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233126.38653250@viggo.jf.intel.com [ Ported to v4.3-rc1. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-14 12:21:46 +02:00
Dave Hansen	75933433d6	x86/fpu: Remove partial LWP support definitions LightWeight Profiling was evidently an AMD profiling feature that we never got around to implementing. Remove the references to it. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233125.7E602284@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-14 12:07:56 +02:00
Dave Hansen	4109ca066b	x86/fpu: Remove XSTATE_RESERVE The original purpose of XSTATE_RESERVE was to carve out space to store all of the possible extended state components that get saved with the XSAVE instruction(s). However, we are now almost entirely dynamically allocating the buffers we use for XSAVE by placing them at the end of the task_struct and them sizing them at boot. The one exception for that is the init_task. The maximum extended state component size that we have today is on systems with space for AVX-512 and Memory Protection Keys: 2696 bytes. We have reserved a PAGE_SIZE buffer in the init_task via fpregs_state->__padding. This check ensures that even if the component sizes or layout were changed (which we do not expect), that we will still not overflow the init_task's buffer. In the case that we detect we might overflow the buffer, we completely disable XSAVE support in the kernel and try to boot as if we had 'legacy x87 FPU' support in place. This is a crippled state without any of the XSAVE-enabled features (MPX, AVX, etc...). But, it at least let us boot safely. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233125.D948D475@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-14 12:07:56 +02:00
Dave Hansen	0a26537502	x86/fpu: Move XSAVE-disabling code to a helper When we want to _completely_ disable XSAVE support as far as the kernel is concerned, we have a big set of feature flags to clear. We currently only do this in cases where the user asks for it to be disabled, but we are about to expand the places where we do it to handle errors too. Move the code in to xstate.c, and put it in the xstate.h header. We will use it in the next patch too. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233124.EA9A70E5@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-14 12:07:56 +02:00
Peter Zijlstra	0e2815de55	x86/headers: Clean up too long lines Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: brgerst@gmail.com Cc: dvlasenk@redhat.com Cc: luto@amacapital.net Cc: mikko.rapeli@iki.fi Cc: oleg@redhat.com Link: http://lkml.kernel.org/r/20150909071244.GM3644@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-14 11:09:33 +02:00
George Beshers	7c52198b9c	x86/platform/uv: Insert per_cpu accessor function on uv_hub_nmi UV: NMI: insert this_cpu_read accessor function on uv_hub_nmi. On SGI UV systems a 'power nmi' command from the CMC causes all processors to drop into uv_handle_nmi(). With the 4.0 kernel this results in BUG: unable to handle kernel paging request The bug is caused by the current code trying to use the PER_CPU variable uv_cpu_nmi.hub without an appropriate accessor function. That oversight occurred in commit `e16321709c` ("uv: Replace __get_cpu_var") Author: Christoph Lameter <cl@linux.com> Date: Sun Aug 17 12:30:41 2014 -0500 This patch inserts this_cpu_read() in the uv_hub_nmi macro restoring the intended functionality. Signed-off-by: George Beshers <gbeshers@sgi.com> Acked-by: Mike Travis <travis@sgi.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Hedi Berriche <hedi@sgi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <rja@sgi.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-13 09:27:53 +02:00
Peter Zijlstra	a6b277857f	locking/qspinlock/x86: Only emit the test-and-set fallback when building guest support Only emit the test-and-set fallback for Hypervisors lacking PARAVIRT_SPINLOCKS support when building for guests. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org # 4.2 Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-11 07:50:12 +02:00
Peter Zijlstra	43b3f02899	locking/qspinlock/x86: Fix performance regression under unaccelerated VMs Dave ran into horrible performance on a VM without PARAVIRT_SPINLOCKS set and Linus noted that the test-and-set implementation was retarded. One should spin on the variable with a load, not a RMW. While there, remove 'queued' from the name, as the lock isn't queued at all, but a simple test-and-set. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Dave Chinner <david@fromorbit.com> Tested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <Waiman.Long@hp.com> Cc: stable@vger.kernel.org # v4.2+ Link: http://lkml.kernel.org/r/20150904152523.GR18673@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-11 07:49:42 +02:00
Linus Torvalds	33e247c7e5	Merge branch 'akpm' (patches from Andrew) Merge third patch-bomb from Andrew Morton: - even more of the rest of MM - lib/ updates - checkpatch updates - small changes to a few scruffy filesystems - kmod fixes/cleanups - kexec updates - a dma-mapping cleanup series from hch * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits) dma-mapping: consolidate dma_set_mask dma-mapping: consolidate dma_supported dma-mapping: cosolidate dma_mapping_error dma-mapping: consolidate dma_{alloc,free}_noncoherent dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent} mm: use vma_is_anonymous() in create_huge_pmd() and wp_huge_pmd() mm: make sure all file VMAs have ->vm_ops set mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff() mm: mark most vm_operations_struct const namei: fix warning while make xmldocs caused by namei.c ipc: convert invalid scenarios to use WARN_ON zlib_deflate/deftree: remove bi_reverse() lib/decompress_unlzma: Do a NULL check for pointer lib/decompressors: use real out buf size for gunzip with kernel fs/affs: make root lookup from blkdev logical size sysctl: fix int -> unsigned long assignments in INT_MIN case kexec: export KERNEL_IMAGE_SIZE to vmcoreinfo kexec: align crash_notes allocation to make it be inside one physical page kexec: remove unnecessary test in kimage_alloc_crash_control_pages() kexec: split kexec_load syscall from kexec core code ...	2015-09-10 18:19:42 -07:00
Linus Torvalds	06ab838c20	xen: MFN/GFN/BFN terminology changes for 4.3-rc0 - Use the correct GFN/BFN terms more consistently. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJV8VRMAAoJEFxbo/MsZsTRiGQH/i/jrAJUJfrFC2PINaA2gDwe O0dlrkCiSgAYChGmxxxXZQSPM5Po5+EbT/dLjZ/uvSooeorM9RYY/mFo7ut/qLep 4pyQUuwGtebWGBZTrj9sygUVXVhgJnyoZxskNUbhj9zvP7hb9++IiI78mzne6cpj lCh/7Z2dgpfRcKlNRu+qpzP79Uc7OqIfDK+IZLrQKlXa7IQDJTQYoRjbKpfCtmMV BEG3kN9ESx5tLzYiAfxvaxVXl9WQFEoktqe9V8IgOQlVRLgJ2DQWS6vmraGrokWM 3HDOCHtRCXlPhu1Vnrp0R9OgqWbz8FJnmVAndXT8r3Nsjjmd0aLwhJx7YAReO/4= =JDia -----END PGP SIGNATURE----- Merge tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen terminology fixes from David Vrabel: "Use the correct GFN/BFN terms more consistently" * tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/xenbus: Rename the variable xen_store_mfn to xen_store_gfn xen/privcmd: Further s/MFN/GFN/ clean-up hvc/xen: Further s/MFN/GFN clean-up video/xen-fbfront: Further s/MFN/GFN clean-up xen/tmem: Use xen_page_to_gfn rather than pfn_to_gfn xen: Use correctly the Xen memory terminologies arm/xen: implement correctly pfn_to_mfn xen: Make clear that swiotlb and biomerge are dealing with DMA address	2015-09-10 16:21:11 -07:00
Christoph Hellwig	452e06af1f	dma-mapping: consolidate dma_set_mask Almost everyone implements dma_set_mask the same way, although some time that's hidden in ->set_dma_mask methods. This patch consolidates those into a common implementation that either calls ->set_dma_mask if present or otherwise uses the default implementation. Some architectures used to only call ->set_dma_mask after the initial checks, and those instance have been fixed to do the full work. h8300 implemented dma_set_mask bogusly as a no-ops and has been fixed. Unfortunately some architectures overload unrelated semantics like changing the dma_ops into it so we still need to allow for an architecture override for now. [jcmvbkbc@gmail.com: fix xtensa] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-10 13:29:01 -07:00
Christoph Hellwig	ee196371d5	dma-mapping: consolidate dma_supported Most architectures just call into ->dma_supported, but some also return 1 if the method is not present, or 0 if no dma ops are present (although that should never happeb). Consolidate this more broad version into common code. Also fix h8300 which inorrectly always returned 0, which would have been a problem if it's dma_set_mask implementation wasn't a similarly buggy noop. As a few architectures have much more elaborate implementations, we still allow for arch overrides. [jcmvbkbc@gmail.com: fix xtensa] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-10 13:29:01 -07:00
Christoph Hellwig	efa21e432c	dma-mapping: cosolidate dma_mapping_error Currently there are three valid implementations of dma_mapping_error: (1) call ->mapping_error (2) check for a hardcoded error code (3) always return 0 This patch provides a common implementation that calls ->mapping_error if present, then checks for DMA_ERROR_CODE if defined or otherwise returns 0. [jcmvbkbc@gmail.com: fix xtensa] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-10 13:29:01 -07:00
Christoph Hellwig	1e8937526e	dma-mapping: consolidate dma_{alloc,free}_noncoherent Most architectures do not support non-coherent allocations and either define dma_{alloc,free}_noncoherent to their coherent versions or stub them out. Openrisc uses dma_{alloc,free}_attrs to implement them, and only Mips implements them directly. This patch moves the Openrisc version to common code, and handles the DMA_ATTR_NON_CONSISTENT case in the mips dma_map_ops instance. Note that actual non-coherent allocations require a dma_cache_sync implementation, so if non-coherent allocations didn't work on an architecture before this patch they still won't work after it. [jcmvbkbc@gmail.com: fix xtensa] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-10 13:29:01 -07:00
Christoph Hellwig	6894258eda	dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent} Since 2009 we have a nice asm-generic header implementing lots of DMA API functions for architectures using struct dma_map_ops, but unfortunately it's still missing a lot of APIs that all architectures still have to duplicate. This series consolidates the remaining functions, although we still need arch opt outs for two of them as a few architectures have very non-standard implementations. This patch (of 5): The coherent DMA allocator works the same over all architectures supporting dma_map operations. This patch consolidates them and converges the minor differences: - the debug_dma helpers are now called from all architectures, including those that were previously missing them - dma_alloc_from_coherent and dma_release_from_coherent are now always called from the generic alloc/free routines instead of the ops dma-mapping-common.h always includes dma-coherent.h to get the defintions for them, or the stubs if the architecture doesn't support this feature - checks for ->alloc / ->free presence are removed. There is only one magic instead of dma_map_ops without them (mic_dma_ops) and that one is x86 only anyway. Besides that only x86 needs special treatment to replace a default devices if none is passed and tweak the gfp_flags. An optional arch hook is provided for that. [linux@roeck-us.net: fix build] [jcmvbkbc@gmail.com: fix xtensa] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-10 13:29:01 -07:00
Dave Young	2965faa5e0	kexec: split kexec_load syscall from kexec core code There are two kexec load syscalls, kexec_load another and kexec_file_load. kexec_file_load has been splited as kernel/kexec_file.c. In this patch I split kexec_load syscall code to kernel/kexec.c. And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and use kexec_file_load only, or vice verse. The original requirement is from Ted Ts'o, he want kexec kernel signature being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use kexec_load syscall can bypass the checking. Vivek Goyal proposed to create a common kconfig option so user can compile in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects KEXEC_CORE so that old config files still work. Because there's general code need CONFIG_KEXEC_CORE, so I updated all the architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects KEXEC_CORE in arch Kconfig. Also updated general kernel code with to kexec_load syscall. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Dave Young <dyoung@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-10 13:29:01 -07:00
Linus Torvalds	12f03ee606	libnvdimm for 4.3: 1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic mechanism for adding device-driver-discovered memory regions to the kernel's direct map. This facility is used by the pmem driver to enable pfn_to_page() operations on the page frames returned by DAX ('direct_access' in 'struct block_device_operations'). For now, the 'memmap' allocation for these "device" pages comes from "System RAM". Support for allocating the memmap from device memory will arrive in a later kernel. 2/ Introduce memremap() to replace usages of ioremap_cache() and ioremap_wt(). memremap() drops the __iomem annotation for these mappings to memory that do not have i/o side effects. The replacement of ioremap_cache() with memremap() is limited to the pmem driver to ease merging the api change in v4.3. Completion of the conversion is targeted for v4.4. 3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem driver, update the VFS DAX implementation and PMEM api to provide persistence guarantees for kernel operations on a DAX mapping. 4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as cacheable to improve performance. 5/ Miscellaneous updates and fixes to libnvdimm including support for issuing "address range scrub" commands, clarifying the optimal 'sector size' of pmem devices, a clarification of the usage of the ACPI '_STA' (status) property for DIMM devices, and other minor fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJV6Nx7AAoJEB7SkWpmfYgCWyYQAI5ju6Gvw27RNFtPovHcZUf5 JGnxXejI6/AqeTQ+IulgprxtEUCrXOHjCDA5dkjr1qvsoqK1qxug+vJHOZLgeW0R OwDtmdW4Qrgeqm+CPoxETkorJ8wDOc8mol81kTiMgeV3UqbYeeHIiTAmwe7VzZ0C nNdCRDm5g8dHCjTKcvK3rvozgyoNoWeBiHkPe76EbnxDICxCB5dak7XsVKNMIVFQ NuYlnw6IYN7+rMHgpgpRux38NtIW8VlYPWTmHExejc2mlioWMNBG/bmtwLyJ6M3e zliz4/cnonTMUaizZaVozyinTa65m7wcnpjK+vlyGV2deDZPJpDRvSOtB0lH30bR 1gy+qrKzuGKpaN6thOISxFLLjmEeYwzYd7SvC9n118r32qShz+opN9XX0WmWSFlA sajE1ehm4M7s5pkMoa/dRnAyR8RUPu4RNINdQ/Z9jFfAOx+Q26rLdQXwf9+uqbEb bIeSQwOteK5vYYCstvpAcHSMlJAglzIX5UfZBvtEIJN7rlb0VhmGWfxAnTu+ktG1 o9cqAt+J4146xHaFwj5duTsyKhWb8BL9+xqbKPNpXEp+PbLsrnE/+WkDLFD67jxz dgIoK60mGnVXp+16I2uMqYYDgAyO5zUdmM4OygOMnZNa1mxesjbDJC6Wat1Wsndn slsw6DkrWT60CRE42nbK =o57/ -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This update has successfully completed a 0day-kbuild run and has appeared in a linux-next release. The changes outside of the typical drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and the introduction of ZONE_DEVICE + devm_memremap_pages(). Summary: - Introduce ZONE_DEVICE and devm_memremap_pages() as a generic mechanism for adding device-driver-discovered memory regions to the kernel's direct map. This facility is used by the pmem driver to enable pfn_to_page() operations on the page frames returned by DAX ('direct_access' in 'struct block_device_operations'). For now, the 'memmap' allocation for these "device" pages comes from "System RAM". Support for allocating the memmap from device memory will arrive in a later kernel. - Introduce memremap() to replace usages of ioremap_cache() and ioremap_wt(). memremap() drops the __iomem annotation for these mappings to memory that do not have i/o side effects. The replacement of ioremap_cache() with memremap() is limited to the pmem driver to ease merging the api change in v4.3. Completion of the conversion is targeted for v4.4. - Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem driver, update the VFS DAX implementation and PMEM api to provide persistence guarantees for kernel operations on a DAX mapping. - Convert the ACPI NFIT 'BLK' driver to map the block apertures as cacheable to improve performance. - Miscellaneous updates and fixes to libnvdimm including support for issuing "address range scrub" commands, clarifying the optimal 'sector size' of pmem devices, a clarification of the usage of the ACPI '_STA' (status) property for DIMM devices, and other minor fixes" * tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits) libnvdimm, pmem: direct map legacy pmem by default libnvdimm, pmem: 'struct page' for pmem libnvdimm, pfn: 'struct page' provider infrastructure x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB add devm_memremap_pages mm: ZONE_DEVICE for "device memory" mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h dax: drop size parameter to ->direct_access() nd_blk: change aperture mapping from WC to WB nvdimm: change to use generic kvfree() pmem, dax: have direct_access use __pmem annotation dax: update I/O path to do proper PMEM flushing pmem: add copy_from_iter_pmem() and clear_pmem() pmem, x86: clean up conditional pmem includes pmem: remove layer when calling arch_has_wmb_pmem() pmem, x86: move x86 PMEM API to new pmem.h header libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option pmem: switch to devm_ allocations devres: add devm_memremap libnvdimm, btt: write and validate parent_uuid ...	2015-09-08 14:35:59 -07:00
Linus Torvalds	59a47fff02	Mostly this is just clean ups and micro optimizations. The changes with more meat are: o Allowing the trace event filters to filter on CPU number and process ids o Two new markers for trace output latency were added (10 and 100 msec latencies) o Have tracing_thresh filter function profiling time I also worked on modifying the ring buffer code for some future work, and moved the adding of the timestamp around. One of my changes caused a regression, and since other changes were built on top of it and already tested, I had to operate a revert of that change. Instead of rebasing, this change set has the code that caused a regression as well as the code to revert that change without touching the other changes that were made on top of it. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJV6aZEAAoJEEjnJuOKh9ldrR4H/A1RcQf1prLLoUibPP4w3lat dmQcdpS1NY+cqyiKuKPAOkFDGQL7qWzRqZ8whcPSJIsHq57ufqNSLf+0bbQYPzg9 g3CgGL7OApmGi5ulj0sNxhadvc9TFm/SAN0nVJlNuUWdm8e1UWHLsrJZaMfopu2r RDEtkOhg619mhDL4rktNdS6rk0B92Fhu2o2PwLZPVlUl1NNEt4WJU+ejitXUVO1A Nb70/rTGGJKtyHbW+74on4LnEN5Uu0Viu6rMwGfYyIgRmC2otdBDvE4xfKMiTUKr SzBjzrhIoMIRn4Vl0vElfulkpYaw7pcC2BdpZ4d9VpIOiLSlZs0x/TgCtpFEv5M= =baZ3 -----END PGP SIGNATURE----- Merge tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing update from Steven Rostedt: "Mostly this is just clean ups and micro optimizations. The changes with more meat are: - Allowing the trace event filters to filter on CPU number and process ids - Two new markers for trace output latency were added (10 and 100 msec latencies) - Have tracing_thresh filter function profiling time I also worked on modifying the ring buffer code for some future work, and moved the adding of the timestamp around. One of my changes caused a regression, and since other changes were built on top of it and already tested, I had to operate a revert of that change. Instead of rebasing, this change set has the code that caused a regression as well as the code to revert that change without touching the other changes that were made on top of it" * tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Revert "ring-buffer: Get timestamp after event is allocated" tracing: Don't make assumptions about length of string on task rename tracing: Allow triggers to filter for CPU ids and process names ftrace: Format MCOUNT_ADDR address as type unsigned long tracing: Introduce two additional marks for delay ftrace: Fix function_graph duration spacing with 7-digits ftrace: add tracing_thresh to function profile tracing: Clean up stack tracing and fix fentry updates ring-buffer: Reorganize function locations ring-buffer: Make sure event has enough room for extend and padding ring-buffer: Get timestamp after event is allocated ring-buffer: Move the adding of the extended timestamp out of line ring-buffer: Add event descriptor to simplify passing data ftrace: correct the counter increment for trace_buffer data tracing: Fix for non-continuous cpu ids tracing: Prefer kcalloc over kzalloc with multiply	2015-09-08 14:04:14 -07:00
Linus Torvalds	752240e74d	xen: features and fixes for 4.3-rc0 - Convert xen-blkfront to the multiqueue API - [arm] Support binding event channels to different VCPUs. - [x86] Support > 512 GiB in a PV guests (off by default as such a guest cannot be migrated with the current toolstack). - [x86] PMU support for PV dom0 (limited support for using perf with Xen and other guests). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJV7wIdAAoJEFxbo/MsZsTR0hEH/04HTKLKGnSJpZ5WbMPxqZxE UqGlvhvVWNAmFocZmbPcEi9T1qtcFrX5pM55JQr6UmAp3ovYsT2q1Q1kKaOaawks pSfc/YEH3oQW5VUQ9Lm9Ru5Z8Btox0WrzRREO92OF36UOgUOBOLkGsUfOwDinNIM lSk2djbYwDYAsoeC3PHB32wwMI//Lz6B/9ZVXcyL6ULynt1ULdspETjGnptRPZa7 JTB5L4/soioKOn18HDwwOhKmvaFUPQv9Odnv7dc85XwZreajhM/KMu3qFbMDaF/d WVB1NMeCBdQYgjOrUjrmpyr5uTMySiQEG54cplrEKinfeZgKlEyjKvjcAfJfiac= =Ktjl -----END PGP SIGNATURE----- Merge tag 'for-linus-4.3-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from David Vrabel: "Xen features and fixes for 4.3: - Convert xen-blkfront to the multiqueue API - [arm] Support binding event channels to different VCPUs. - [x86] Support > 512 GiB in a PV guests (off by default as such a guest cannot be migrated with the current toolstack). - [x86] PMU support for PV dom0 (limited support for using perf with Xen and other guests)" * tag 'for-linus-4.3-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (33 commits) xen: switch extra memory accounting to use pfns xen: limit memory to architectural maximum xen: avoid another early crash of memory limited dom0 xen: avoid early crash of memory limited dom0 arm/xen: Remove helpers which are PV specific xen/x86: Don't try to set PCE bit in CR4 xen/PMU: PMU emulation code xen/PMU: Intercept PMU-related MSR and APIC accesses xen/PMU: Describe vendor-specific PMU registers xen/PMU: Initialization code for Xen PMU xen/PMU: Sysfs interface for setting Xen PMU mode xen: xensyms support xen: remove no longer needed p2m.h xen: allow more than 512 GB of RAM for 64 bit pv-domains xen: move p2m list if conflicting with e820 map xen: add explicit memblock_reserve() calls for special pages mm: provide early_memremap_ro to establish read-only mapping xen: check for initrd conflicting with e820 map xen: check pre-allocated page tables for conflict with memory map xen: check for kernel memory conflicting with memory layout ...	2015-09-08 11:46:48 -07:00
Julien Grall	0df4f266b3	xen: Use correctly the Xen memory terminologies Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN is meant, I suspect this is because the first support for Xen was for PV. This resulted in some misimplementation of helpers on ARM and confused developers about the expected behavior. For instance, with pfn_to_mfn, we expect to get an MFN based on the name. Although, if we look at the implementation on x86, it's returning a GFN. For clarity and avoid new confusion, replace any reference to mfn with gfn in any helpers used by PV drivers. The x86 code will still keep some reference of pfn_to_mfn which may be used by all kind of guests No changes as been made in the hypercall field, even though they may be invalid, in order to keep the same as the defintion in xen repo. Note that page_to_mfn has been renamed to xen_page_to_gfn to avoid a name to close to the KVM function gfn_to_page. Take also the opportunity to simplify simple construction such as pfn_to_mfn(page_to_pfn(page)) into xen_page_to_gfn. More complex clean up will come in follow-up patches. [1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e758ed14f390342513405dd766e874934573e6cb Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-09-08 18:03:49 +01:00
Julien Grall	32e09870ee	xen: Make clear that swiotlb and biomerge are dealing with DMA address The swiotlb is required when programming a DMA address on ARM when a device is not protected by an IOMMU. In this case, the DMA address should always be equal to the machine address. For DOM0 memory, Xen ensure it by have an identity mapping between the guest address and host address. However, when mapping a foreign grant reference, the 1:1 model doesn't work. For ARM guest, most of the callers of pfn_to_mfn expects to get a GFN (Guest Frame Number), i.e a PFN (Page Frame Number) from the Linux point of view given that all ARM guest are auto-translated. Even though the name pfn_to_mfn is misleading, we need to ensure that those caller get a GFN and not by mistake a MFN. In pratical, I haven't seen error related to this but we should fix it for the sake of correctness. In order to fix the implementation of pfn_to_mfn on ARM in a follow-up patch, we have to introduce new helpers to return the DMA from a PFN and the invert. On x86, the new helpers will be an alias of pfn_to_mfn and mfn_to_pfn. The helpers will be used in swiotlb and xen_biovec_phys_mergeable. This is necessary in the latter because we have to ensure that the biovec code will not try to merge a biovec using foreign page and another using Linux memory. Lastly, the helper mfn_to_local_pfn has been renamed to bfn_to_local_pfn given that the only usage was in swiotlb. Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-09-08 17:10:52 +01:00
Ingo Molnar	decb4c4115	x86/headers: Remove <asm/sigcontext.h> references on the kernel side Now that all type definitions are in the UAPI header, include it directly, instead of through <asm/sigcontext.h>. [ We still keep asm/sigcontext.h, so that uapi/asm/sigcontext32.h can include <asm/sigcontext.h>. ] Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-16-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-08 10:06:05 +02:00
Ingo Molnar	711531f4f3	x86/headers: Remove direct sigcontext32.h uses Now that all sigcontext types are defined in asm/sigcontext.h, remove the various sigcontext32.h uses in the kernel. We still keep the header itself, which includes sigcontext.h, in case user-space relies on it. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-15-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-08 10:03:59 +02:00
Ingo Molnar	8fcb346b91	x86/headers: Convert sigcontext_ia32 uses to sigcontext_32 Use the new name in kernel code, and move the old name to the user-space-only legacy section of the UAPI header. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-14-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-08 10:03:59 +02:00
Ingo Molnar	db1e031401	x86/headers: Unify 'struct sigcontext_ia32' and 'struct sigcontext_32' The two structures are identical - merge them and keep the legacy name as a define. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-13-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-08 10:03:58 +02:00
Ingo Molnar	530e5c8271	x86/headers: Make sigcontext pointers bit independent Before we can eliminate the duplication between 'struct sigcontext_32' and 'struct sigcontext_ia32', make the 'fpstate' pointer field in 'struct sigcontext_32' bit independent. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-12-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-08 10:03:58 +02:00
Ingo Molnar	f2c609bca0	x86/headers: Move the 'struct sigcontext' definitions into the UAPI header Our goal is to eliminate the duplicate struct sigcontext_ia32 definition, so move the kernel's primary sigcontext type into the UAPI header, defining these two variants: struct sigcontext_32 struct sigcontext_64 ... and map them to 'struct sigcontext'. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-11-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-08 10:03:58 +02:00
Ingo Molnar	2d057c69e7	x86/headers: Clean up the kernel's struct sigcontext types to be ABI-clean Use the __u16/32/64 types we standardized on in ABI definitions and which other sigcontext related types are already using. This will help unify struct sigcontext types between native 32-bit, compat and 64-bit kernels. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-10-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-08 10:03:57 +02:00
Ingo Molnar	86e9fc3a0e	x86/headers: Convert uses of _fpstate_ia32 to _fpstate_32 Remove uses of _fpstate_ia32 from the kernel, and move the legacy _fpstate_ia32 definition to the user-space only portion of the header. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-9-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-08 10:03:57 +02:00
Ingo Molnar	7bb0dc2222	x86/headers: Unify 'struct _fpstate_ia32' and i386 struct _fpstate 'struct _fpstate_ia32' and 'struct _fpstate' on i386 are identical in all fields, except 'padding1' being named 'padding'. We unify the two structures and add a union that is both named 'padding1' and 'padding', in the (unlikely) case there's user-space code that relies on the padding field name. We rename the two main types to be: struct _fpstate_32 struct _fpstate_64 for the 32-bit and 64-bit frame, and map them to the main and compat structure names (_fpstate) depending on whether we are on 32-bit or on 64-bit kernels. We also keep the old _fpstate_ia32 name as a legacy name. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-8-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-08 10:03:57 +02:00
Ingo Molnar	337a167d1a	x86/headers: Unify register type definitions between 32-bit compat and i386 The following sigcontext related types were duplicated across native 32-bit and compat 32-bit headers: struct _fpreg; struct _fpxreg; struct _xmmreg; X86_FXSR_MAGIC Unify them. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-7-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-08 10:03:56 +02:00
Ingo Molnar	3f623a5b27	x86/headers: Use ABI types consistently in sigcontext*.h Use the __u16/32/64 types we standardized on in ABI definitions - and which most of this header was already using. This will allow us to more obviously unify the compat header into the main header. No change in functionality. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-6-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-08 10:03:56 +02:00
Ingo Molnar	128f8257a1	x86/headers: Separate out legacy user-space structure definitions Better separate the user-space struct sigcontext definitions from the kernel definitions, so that we can unify the kernel definitions with sigcontext32.h. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-5-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-08 10:03:56 +02:00
Ingo Molnar	cbf5f4fbf4	x86/headers: Clean up and better document uapi/asm/sigcontext.h Clean up sigcontext.h: - the explanations were full of typos and were hard to read in general - use consistent and readable vertical spacing - fix, harmonize and extend comments No field name has been changed, user-space might be relying on them. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-4-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-08 10:03:55 +02:00
Ingo Molnar	c3f4986fb0	x86/headers: Clean up uapi/asm/sigcontext32.h Clean up sigcontext32.h a bit: - use consistent and readable vertical spacing - fix, harmonize and extend comments No field name has been changed, user-space might be relying on them. Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-3-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-08 10:03:55 +02:00
Ingo Molnar	b76cb6c869	x86/headers: Fix (old) header file dependency bug in uapi/asm/sigcontext32.h Mikko Rapeli reported that the following standalone user-space header does not compile: #include <asm/sigcontext32.h> Due to undefined 'struct __fpx_sw_bytes' which is defined in asm/sigcontext.h. The following header order works: #include <asm/sigcontext.h> #include <asm/sigcontext32.h> and that's probably how everyone's been using these headers for the past decade or so, but it's a legit header file dependency bug, so include asm/sigcontext.h in sigcontext32.h to allow it to be built standlone. Reported-by: Mikko Rapeli <mikko.rapeli@iki.fi> Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441438363-9999-2-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-08 10:03:55 +02:00
Ingo Molnar	95cd2ea7d5	Merge branch 'linus' into x86/urgent, to be able to merge a dependent fix Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-05 09:00:47 +02:00
Mel Gorman	72b252aed5	mm: send one IPI per CPU to TLB flush all entries after unmapping pages An IPI is sent to flush remote TLBs when a page is unmapped that was potentially accesssed by other CPUs. There are many circumstances where this happens but the obvious one is kswapd reclaiming pages belonging to a running process as kswapd and the task are likely running on separate CPUs. On small machines, this is not a significant problem but as machine gets larger with more cores and more memory, the cost of these IPIs can be high. This patch uses a simple structure that tracks CPUs that potentially have TLB entries for pages being unmapped. When the unmapping is complete, the full TLB is flushed on the assumption that a refill cost is lower than flushing individual entries. Architectures wishing to do this must give the following guarantee. If a clean page is unmapped and not immediately flushed, the architecture must guarantee that a write to that linear address from a CPU with a cached TLB entry will trap a page fault. This is essentially what the kernel already depends on but the window is much larger with this patch applied and is worth highlighting. The architecture should consider whether the cost of the full TLB flush is higher than sending an IPI to flush each individual entry. An additional architecture helper called flush_tlb_local is required. It's a trivial wrapper with some accounting in the x86 case. The impact of this patch depends on the workload as measuring any benefit requires both mapped pages co-located on the LRU and memory pressure. The case with the biggest impact is multiple processes reading mapped pages taken from the vm-scalability test suite. The test case uses NR_CPU readers of mapped files that consume 10*RAM. Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 Ops lru-file-mmap-read-elapsed 159.62 ( 0.00%) 120.68 ( 24.40%) Ops lru-file-mmap-read-time_range 30.59 ( 0.00%) 2.80 ( 90.85%) Ops lru-file-mmap-read-time_stddv 6.70 ( 0.00%) 0.64 ( 90.38%) 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 User 581.00 611.43 System 5804.93 4111.76 Elapsed 161.03 122.12 This is showing that the readers completed 24.40% faster with 29% less system CPU time. From vmstats, it is known that the vanilla kernel was interrupted roughly 900K times per second during the steady phase of the test and the patched kernel was interrupts 180K times per second. The impact is lower on a single socket machine. 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 Ops lru-file-mmap-read-elapsed 25.33 ( 0.00%) 20.38 ( 19.54%) Ops lru-file-mmap-read-time_range 0.91 ( 0.00%) 1.44 (-58.24%) Ops lru-file-mmap-read-time_stddv 0.28 ( 0.00%) 0.47 (-65.34%) 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 User 58.09 57.64 System 111.82 76.56 Elapsed 27.29 22.55 It's still a noticeable improvement with vmstat showing interrupts went from roughly 500K per second to 45K per second. The patch will have no impact on workloads with no memory pressure or have relatively few mapped pages. It will have an unpredictable impact on the workload running on the CPU being flushed as it'll depend on how many TLB entries need to be refilled and how long that takes. Worst case, the TLB will be completely cleared of active entries when the target PFNs were not resident at all. [sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush] Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Linus Torvalds	ca520cab25	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking and atomic updates from Ingo Molnar: "Main changes in this cycle are: - Extend atomic primitives with coherent logic op primitives (atomic_{or,and,xor}()) and deprecate the old partial APIs (atomic_{set,clear}_mask()) The old ops were incoherent with incompatible signatures across architectures and with incomplete support. Now every architecture supports the primitives consistently (by Peter Zijlstra) - Generic support for 'relaxed atomics': - _acquire/release/relaxed() flavours of xchg(), cmpxchg() and {add,sub}_return() - atomic_read_acquire() - atomic_set_release() This came out of porting qwrlock code to arm64 (by Will Deacon) - Clean up the fragile static_key APIs that were causing repeat bugs, by introducing a new one: DEFINE_STATIC_KEY_TRUE(name); DEFINE_STATIC_KEY_FALSE(name); which define a key of different types with an initial true/false value. Then allow: static_branch_likely() static_branch_unlikely() to take a key of either type and emit the right instruction for the case. To be able to know the 'type' of the static key we encode it in the jump entry (by Peter Zijlstra) - Static key self-tests (by Jason Baron) - qrwlock optimizations (by Waiman Long) - small futex enhancements (by Davidlohr Bueso) - ... and misc other changes" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits) jump_label/x86: Work around asm build bug on older/backported GCCs locking, ARM, atomics: Define our SMP atomics in terms of _relaxed() operations locking, include/llist: Use linux/atomic.h instead of asm/cmpxchg.h locking/qrwlock: Make use of _{acquire\|release\|relaxed}() atomics locking/qrwlock: Implement queue_write_unlock() using smp_store_release() locking/lockref: Remove homebrew cmpxchg64_relaxed() macro definition locking, asm-generic: Add _{relaxed\|acquire\|release}() variants for 'atomic_long_t' locking, asm-generic: Rework atomic-long.h to avoid bulk code duplication locking/atomics: Add _{acquire\|release\|relaxed}() variants of some atomic operations locking, compiler.h: Cast away attributes in the WRITE_ONCE() magic locking/static_keys: Make verify_keys() static jump label, locking/static_keys: Update docs locking/static_keys: Provide a selftest jump_label: Provide a self-test s390/uaccess, locking/static_keys: employ static_branch_likely() x86, tsc, locking/static_keys: Employ static_branch_likely() locking/static_keys: Add selftest locking/static_keys: Add a new static_key interface locking/static_keys: Rework update logic locking/static_keys: Add static_key_{en,dis}able() helpers ...	2015-09-03 15:46:07 -07:00
Linus Torvalds	ae98207309	Power management and ACPI material for v4.3-rc1 - ACPICA update to upstream revision 20150818 including method tracing extensions to allow more in-depth AML debugging in the kernel and a number of assorted fixes and cleanups (Bob Moore, Lv Zheng, Markus Elfring). - ACPI sysfs code updates and a documentation update related to AML method tracing (Lv Zheng). - ACPI EC driver fix related to serialized evaluations of _Qxx methods and ACPI tools updates allowing the EC userspace tool to be built from the kernel source (Lv Zheng). - ACPI processor driver updates preparing it for future introduction of CPPC support and ACPI PCC mailbox driver updates (Ashwin Chaugule). - ACPI interrupts enumeration fix for a regression related to the handling of IRQ attribute conflicts between MADT and the ACPI namespace (Jiang Liu). - Fixes related to ACPI device PM (Mika Westerberg, Srinidhi Kasagar). - ACPI device registration code reorganization to separate the sysfs-related code and bus type operations from the rest (Rafael J Wysocki). - Assorted cleanups in the ACPI core (Jarkko Nikula, Mathias Krause, Andy Shevchenko, Rafael J Wysocki, Nicolas Iooss). - ACPI cpufreq driver and ia64 cpufreq driver fixes and cleanups (Pan Xinhui, Rafael J Wysocki). - cpufreq core cleanups on top of the previous changes allowing it to preseve its sysfs directories over system suspend/resume (Viresh Kumar, Rafael J Wysocki, Sebastian Andrzej Siewior). - cpufreq fixes and cleanups related to governors (Viresh Kumar). - cpufreq updates (core and the cpufreq-dt driver) related to the turbo/boost mode support (Viresh Kumar, Bartlomiej Zolnierkiewicz). - New DT bindings for Operating Performance Points (OPP), support for them in the OPP framework and in the cpufreq-dt driver plus related OPP framework fixes and cleanups (Viresh Kumar). - cpufreq powernv driver updates (Shilpasri G Bhat). - New cpufreq driver for Mediatek MT8173 (Pi-Cheng Chen). - Assorted cpufreq driver (speedstep-lib, sfi, integrator) cleanups and fixes (Abhilash Jindal, Andrzej Hajda, Cristian Ardelean). - intel_pstate driver updates including Skylake-S support, support for enabling HW P-states per CPU and an additional vendor bypass list entry (Kristen Carlson Accardi, Chen Yu, Ethan Zhao). - cpuidle core fixes related to the handling of coupled idle states (Xunlei Pang). - intel_idle driver updates including Skylake Client support and support for freeze-mode-specific idle states (Len Brown). - Driver core updates related to power management (Andy Shevchenko, Rafael J Wysocki). - Generic power domains framework fixes and cleanups (Jon Hunter, Geert Uytterhoeven, Rajendra Nayak, Ulf Hansson). - Device PM QoS framework update to allow the latency tolerance setting to be exposed to user space via sysfs (Mika Westerberg). - devfreq support for PPMUv2 in Exynos5433 and a fix for an incorrect exynos-ppmu DT binding (Chanwoo Choi, Javier Martinez Canillas). - System sleep support updates (Alan Stern, Len Brown, SungEun Kim). - rockchip-io AVS support updates (Heiko Stuebner). - PM core clocks support fixup (Colin Ian King). - Power capping RAPL driver update including support for Skylake H/S and Broadwell-H (Radivoje Jovanovic, Seiichi Ikarashi). - Generic device properties framework fixes related to the handling of static (driver-provided) property sets (Andy Shevchenko). - turbostat and cpupower updates (Len Brown, Shilpasri G Bhat, Shreyas B Prabhu). / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJV5hhGAAoJEILEb/54YlRxs+EQAK51iFk48+IbpHYaZZ50Yo4m ZZc2zBcbwRcBlU9vKERrhG+jieSl8J/JJNxT8vBjKqyvNw038mCjewQh02ol0HuC R7nlDiVJkmZ50sLO4xwE/1UBZr/XqbddwCUnYzvFMkMTA0ePzFtf8BrJ1FXpT8S/ fkwSXQty6hvJDwxkfrbMSaA730wMju9lahx8D6MlmUAedWYZOJDMQKB4WKa/St5X 9uckBPHUBB2KiKlXxdbFPwKLNxHvLROq5SpDLc6cM/7XZB+QfNFy85CUjCUtYo1O 1W8k0qnztvZ6UEv27qz5dejGyAGOarMWGGNsmL9evoeGeHRpQL+dom7HcTnbAfUZ walyhYSm/zKkdy7Vl3xWUUQkMG48+PviMI6K0YhHXb3Rm5wlR/yBNZTwNIty9SX/ fKCHEa8QynWwLxgm53c3xRkiitJxMsHNK03moLD9zQMjshTyTNvpNbZoahyKQzk6 H+9M1DBRHhkkREDWSwGutukxfEMtWe2vcZcyERrFiY7l5k1j58DwDBMPqjPhRv6q P/1NlCzr0XYf83Y86J18LbDuPGDhTjjIEn6CqbtI2mmWqTg3+rF7zvS2ux+FzMnA gisv8l6GT9JiWhxKFqqL/rrVpwtyHebWLYE/RpNUW6fEzLziRNj1qyYO9dqI/GGi I3rfxlXoc/5xJWCgNB8f =fTgI -----END PGP SIGNATURE----- Merge tag 'pm+acpi-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI updates from Rafael Wysocki: "From the number of commits perspective, the biggest items are ACPICA and cpufreq changes with the latter taking the lead (over 50 commits). On the cpufreq front, there are many cleanups and minor fixes in the core and governors, driver updates etc. We also have a new cpufreq driver for Mediatek MT8173 chips. ACPICA mostly updates its debug infrastructure and adds a number of fixes and cleanups for a good measure. The Operating Performance Points (OPP) framework is updated with new DT bindings and support for them among other things. We have a few updates of the generic power domains framework and a reorganization of the ACPI device enumeration code and bus type operations. And a lot of fixes and cleanups all over. Included is one branch from the MFD tree as it contains some PM-related driver core and ACPI PM changes a few other commits are based on. Specifics: - ACPICA update to upstream revision 20150818 including method tracing extensions to allow more in-depth AML debugging in the kernel and a number of assorted fixes and cleanups (Bob Moore, Lv Zheng, Markus Elfring). - ACPI sysfs code updates and a documentation update related to AML method tracing (Lv Zheng). - ACPI EC driver fix related to serialized evaluations of _Qxx methods and ACPI tools updates allowing the EC userspace tool to be built from the kernel source (Lv Zheng). - ACPI processor driver updates preparing it for future introduction of CPPC support and ACPI PCC mailbox driver updates (Ashwin Chaugule). - ACPI interrupts enumeration fix for a regression related to the handling of IRQ attribute conflicts between MADT and the ACPI namespace (Jiang Liu). - Fixes related to ACPI device PM (Mika Westerberg, Srinidhi Kasagar). - ACPI device registration code reorganization to separate the sysfs-related code and bus type operations from the rest (Rafael J Wysocki). - Assorted cleanups in the ACPI core (Jarkko Nikula, Mathias Krause, Andy Shevchenko, Rafael J Wysocki, Nicolas Iooss). - ACPI cpufreq driver and ia64 cpufreq driver fixes and cleanups (Pan Xinhui, Rafael J Wysocki). - cpufreq core cleanups on top of the previous changes allowing it to preseve its sysfs directories over system suspend/resume (Viresh Kumar, Rafael J Wysocki, Sebastian Andrzej Siewior). - cpufreq fixes and cleanups related to governors (Viresh Kumar). - cpufreq updates (core and the cpufreq-dt driver) related to the turbo/boost mode support (Viresh Kumar, Bartlomiej Zolnierkiewicz). - New DT bindings for Operating Performance Points (OPP), support for them in the OPP framework and in the cpufreq-dt driver plus related OPP framework fixes and cleanups (Viresh Kumar). - cpufreq powernv driver updates (Shilpasri G Bhat). - New cpufreq driver for Mediatek MT8173 (Pi-Cheng Chen). - Assorted cpufreq driver (speedstep-lib, sfi, integrator) cleanups and fixes (Abhilash Jindal, Andrzej Hajda, Cristian Ardelean). - intel_pstate driver updates including Skylake-S support, support for enabling HW P-states per CPU and an additional vendor bypass list entry (Kristen Carlson Accardi, Chen Yu, Ethan Zhao). - cpuidle core fixes related to the handling of coupled idle states (Xunlei Pang). - intel_idle driver updates including Skylake Client support and support for freeze-mode-specific idle states (Len Brown). - Driver core updates related to power management (Andy Shevchenko, Rafael J Wysocki). - Generic power domains framework fixes and cleanups (Jon Hunter, Geert Uytterhoeven, Rajendra Nayak, Ulf Hansson). - Device PM QoS framework update to allow the latency tolerance setting to be exposed to user space via sysfs (Mika Westerberg). - devfreq support for PPMUv2 in Exynos5433 and a fix for an incorrect exynos-ppmu DT binding (Chanwoo Choi, Javier Martinez Canillas). - System sleep support updates (Alan Stern, Len Brown, SungEun Kim). - rockchip-io AVS support updates (Heiko Stuebner). - PM core clocks support fixup (Colin Ian King). - Power capping RAPL driver update including support for Skylake H/S and Broadwell-H (Radivoje Jovanovic, Seiichi Ikarashi). - Generic device properties framework fixes related to the handling of static (driver-provided) property sets (Andy Shevchenko). - turbostat and cpupower updates (Len Brown, Shilpasri G Bhat, Shreyas B Prabhu)" * tag 'pm+acpi-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (180 commits) cpufreq: speedstep-lib: Use monotonic clock cpufreq: powernv: Increase the verbosity of OCC console messages cpufreq: sfi: use kmemdup rather than duplicating its implementation cpufreq: drop !cpufreq_driver check from cpufreq_parse_governor() cpufreq: rename cpufreq_real_policy as cpufreq_user_policy cpufreq: remove redundant 'policy' field from user_policy cpufreq: remove redundant 'governor' field from user_policy cpufreq: update user_policy.* on success cpufreq: use memcpy() to copy policy cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event cpufreq: mediatek: Add MT8173 cpufreq driver dt-bindings: mediatek: Add MT8173 CPU DVFS clock bindings PM / Domains: Fix typo in description of genpd_dev_pm_detach() PM / Domains: Remove unusable governor dummies PM / Domains: Make pm_genpd_init() available to modules PM / domains: Align column headers and data in pm_genpd_summary output powercap / RAPL: disable the 2nd power limit properly tools: cpupower: Fix error when running cpupower monitor PM / OPP: Drop unlikely before IS_ERR(_OR_NULL) PM / OPP: Fix static checker warning (broken 64bit big endian systems) ...	2015-09-01 19:45:46 -07:00
Linus Torvalds	43af9872f5	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Thomas Gleixner: "This udpate contains: - rework the irq vector array to store a pointer to the irq descriptor instead of the irq number to avoid a lookup of the irq descriptor in the irq entry path - lguest interrupt handling cleanups - conversion of the local apic timer to the new clockevent callbacks - preparatory changes for the irq argument removal of interrupt flow handlers" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Do not dereference irq descriptor before checking it tools/lguest: Clean up include dir tools/lguest: Fix redefinition of struct virtio_pci_cfg_cap x86/irq: Store irq descriptor in vector array genirq: Provide irq_desc_has_action x86/irq: Get rid of an indentation level x86/irq: Rename VECTOR_UNDEFINED to VECTOR_UNUSED x86/irq: Replace numeric constant x86/irq: Protect smp_cleanup_move x86/lguest: Do not setup unused irq vectors x86/lguest: Clean up lguest_setup_irq x86/apic: Drop local_irq_save/restore in timer callbacks x86/apic: Migrate apic timer to new set_state interface x86/irq: Use access helper irq_data_get_affinity_mask() x86/irq: Use accessor irq_data_get_irq_handler_data() x86/irq: Use accessor irq_data_get_node()	2015-09-01 15:20:51 -07:00
Linus Torvalds	361f7d1757	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core platform updates from Ingo Molnar: "The main changes are: - Intel Atom platform updates. (Andy Shevchenko) - modularity fixlets. (Paul Gortmaker) - x86 platform clockevents driver updates for lguest, uv and Xen. (Viresh Kumar) - Microsoft Hyper-V TSC fixlet. (Vitaly Kuznetsov)" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform: Make atom/pmc_atom.c explicitly non-modular x86/hyperv: Mark the Hyper-V TSC as unstable x86/xen/time: Migrate to new set-state interface x86/uv/time: Migrate to new set-state interface x86/lguest/timer: Migrate to new set-state interface x86/pci/intel_mid_pci: Use proper constants for irq polarity x86/pci/intel_mid_pci: Make intel_mid_pci_ops static x86/pci/intel_mid_pci: Propagate actual return code x86/pci/intel_mid_pci: Work around for IRQ0 assignment x86/platform/iosf_mbi: Add Intel Tangier PCI id x86/platform/iosf_mbi: Source cleanup x86/platform/iosf_mbi: Remove NULL pointer checks for pci_dev_put() x86/platform/iosf_mbi: Check return value of debugfs_create properly x86/platform/iosf_mbi: Move to dedicated folder x86/platform/intel/pmc_atom: Move the PMC-Atom code to arch/x86/platform/atom x86/platform/intel/pmc_atom: Add Cherrytrail PMC interface x86/platform/intel/pmc_atom: Supply register mappings via PMC object x86/platform/intel/pmc_atom: Print index of device in loop x86/platform/intel/pmc_atom: Export accessors to PMC registers	2015-09-01 10:33:31 -07:00
Linus Torvalds	25525bea46	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "The dominant change in this cycle was the continued work to isolate kernel drivers from MTRR legacies: this tree gets rid of all kernel internal driver interfaces to MTRRs (mostly by rewriting it to proper PAT interfaces), the only access left is the /proc/mtrr ABI. This work was done by Luis R Rodriguez. There's also some related PCI interface additions for which I've Cc:-ed Bjorn" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86/mm/mtrr: Remove kernel internal MTRR interfaces: unexport mtrr_add() and mtrr_del() s390/io: Add pci_iomap_wc() and pci_iomap_wc_range() drivers/dma/iop-adma: Use dma_alloc_writecombine() kernel-style drivers/video/fbdev/vt8623fb: Use arch_phys_wc_add() and pci_iomap_wc() drivers/video/fbdev/s3fb: Use arch_phys_wc_add() and pci_iomap_wc() drivers/video/fbdev/arkfb.c: Use arch_phys_wc_add() and pci_iomap_wc() PCI: Add pci_iomap_wc() variants drivers/video/fbdev/gxt4500: Use pci_ioremap_wc_bar() to map framebuffer drivers/video/fbdev/kyrofb: Use arch_phys_wc_add() and pci_ioremap_wc_bar() drivers/video/fbdev/i740fb: Use arch_phys_wc_add() and pci_ioremap_wc_bar() PCI: Add pci_ioremap_wc_bar() x86/mm: Make kernel/check.c explicitly non-modular x86/mm/pat: Make mm/pageattr[-test].c explicitly non-modular x86/mm/pat: Add comments to cachemode translation tables arch/*/io.h: Add ioremap_uc() to all architectures drivers/video/fbdev/atyfb: Use arch_phys_wc_add() and ioremap_wc() drivers/video/fbdev/atyfb: Replace MTRR UC hole with strong UC drivers/video/fbdev/atyfb: Clarify ioremap() base and length used drivers/video/fbdev/atyfb: Carve out framebuffer length fudging into a helper x86/mm, asm-generic: Add IOMMU ioremap_uc() variant default ...	2015-09-01 10:07:40 -07:00
Linus Torvalds	6b2282aa37	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Ingo Molnar: "Two changes: a suspend/resume quirk and a new CPUID bit definition" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpufeature: Add feature bit for Intel's Silicon Debug CPUID bit x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume	2015-09-01 09:41:03 -07:00
Linus Torvalds	11e612ddb4	Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "The main x86 bootup related changes in this cycle were: - more boot time optimizations. (Len Brown) - implement hex output to allow the debugging of early bootup parameters. (Kees Cook) - remove obsolete MCA leftovers. (Paolo Pisati)" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/smpboot: Remove APIC.wait_for_init_deassert and atomic init_deasserted x86/smpboot: Remove SIPI delays from cpu_up() x86/smpboot: Remove udelay(100) when polling cpu_callin_map x86/smpboot: Remove udelay(100) when polling cpu_initialized_map x86/boot: Obsolete the MCA sys_desc_table x86/boot: Add hex output for debugging	2015-09-01 09:04:31 -07:00
Linus Torvalds	5778077d03	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm changes from Ingo Molnar: "The biggest changes in this cycle were: - Revamp, simplify (and in some cases fix) Time Stamp Counter (TSC) primitives. (Andy Lutomirski) - Add new, comprehensible entry and exit handlers written in C. (Andy Lutomirski) - vm86 mode cleanups and fixes. (Brian Gerst) - 32-bit compat code cleanups. (Brian Gerst) The amount of simplification in low level assembly code is already palpable: arch/x86/entry/entry_32.S \| 130 +---- arch/x86/entry/entry_64.S \| 197 ++----- but more simplifications are planned. There's also the usual laudry mix of low level changes - see the changelog for details" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (83 commits) x86/asm: Drop repeated macro of X86_EFLAGS_AC definition x86/asm/msr: Make wrmsrl() a function x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer x86/asm: Add MONITORX/MWAITX instruction support x86/traps: Weaken context tracking entry assertions x86/asm/tsc: Add rdtscll() merge helper selftests/x86: Add syscall_nt selftest selftests/x86: Disable sigreturn_64 x86/vdso: Emit a GNU hash x86/entry: Remove do_notify_resume(), syscall_trace_leave(), and their TIF masks x86/entry/32: Migrate to C exit path x86/entry/32: Remove 32-bit syscall audit optimizations x86/vm86: Rename vm86->v86flags and v86mask x86/vm86: Rename vm86->vm86_info to user_vm86 x86/vm86: Clean up vm86.h includes x86/vm86: Move the vm86 IRQ definitions to vm86.h x86/vm86: Use the normal pt_regs area for vm86 x86/vm86: Eliminate 'struct kernel_vm86_struct' x86/vm86: Move fields from 'struct kernel_vm86_struct' to 'struct vm86' x86/vm86: Move vm86 fields out of 'thread_struct' ...	2015-09-01 08:40:25 -07:00
Rafael J. Wysocki	e625ccec1f	Merge branches 'pm-tools' and 'powercap' * pm-tools: tools: cpupower: Fix error when running cpupower monitor tools/power turbostat: fix typo on DRAM column in Joules-mode cpupower: Do not change the frequency of offline cpu tools/power turbostat: fix parameter passing for forked command tools/power turbostat: dump CONFIG_TDP tools/power turbostat: cpu0 is no longer hard-coded, so update output tools/power turbostat: update turbostat(8) * powercap: powercap / RAPL: disable the 2nd power limit properly powercap / RAPL: Add support for Broadwell-H powercap / RAPL: Add support for Skylake H/S	2015-09-01 15:54:30 +02:00
Linus Torvalds	a1d8561172	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The biggest change in this cycle is the rewrite of the main SMP load balancing metric: the CPU load/utilization. The main goal was to make the metric more precise and more representative - see the changelog of this commit for the gory details: `9d89c257df` ("sched/fair: Rewrite runnable load and utilization average tracking") It is done in a way that significantly reduces complexity of the code: 5 files changed, 249 insertions(+), 494 deletions(-) and the performance testing results are encouraging. Nevertheless we need to keep an eye on potential regressions, since this potentially affects every SMP workload in existence. This work comes from Yuyang Du. Other changes: - SCHED_DL updates. (Andrea Parri) - Simplify architecture callbacks by removing finish_arch_switch(). (Peter Zijlstra et al) - cputime accounting: guarantee stime + utime == rtime. (Peter Zijlstra) - optimize idle CPU wakeups some more - inspired by Facebook server loads. (Mike Galbraith) - stop_machine fixes and updates. (Oleg Nesterov) - Introduce the 'trace_sched_waking' tracepoint. (Peter Zijlstra) - sched/numa tweaks. (Srikar Dronamraju) - misc fixes and small cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) sched/deadline: Fix comment in enqueue_task_dl() sched/deadline: Fix comment in push_dl_tasks() sched: Change the sched_class::set_cpus_allowed() calling context sched: Make sched_class::set_cpus_allowed() unconditional sched: Fix a race between __kthread_bind() and sched_setaffinity() sched: Ensure a task has a non-normalized vruntime when returning back to CFS sched/numa: Fix NUMA_DIRECT topology identification tile: Reorganize _switch_to() sched, sparc32: Update scheduler comments in copy_thread() sched: Remove finish_arch_switch() sched, tile: Remove finish_arch_switch sched, sh: Fold finish_arch_switch() into switch_to() sched, score: Remove finish_arch_switch() sched, avr32: Remove finish_arch_switch() sched, MIPS: Get rid of finish_arch_switch() sched, arm: Remove finish_arch_switch() sched/fair: Clean up load average references sched/fair: Provide runnable_load_avg back to cfs_rq sched/fair: Remove task and group entity load when they are dead sched/fair: Init cfs_rq's sched_entity load average ...	2015-08-31 20:26:22 -07:00
Linus Torvalds	3959df1dfb	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "MCE handling updates, but also some generic drivers/edac/ changes to better organize the Kconfig space" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ras: Move AMD MCE injector to arch/x86/ras/ x86/mce: Add a wrapper around mce_log() for injection x86/mce: Rename rcu_dereference_check_mce() to mce_log_get_idx_check() RAS: Add a menuconfig option with descriptive text x86/mce: Reenable CMCI banks when swiching back to interrupt mode x86/mce: Clear Local MCE opt-in before kexec x86/mce: Remove unused function declarations x86/mce: Kill drain_mcelog_buffer() x86/mce: Avoid potential deadlock due to printk() in MCE context x86/mce: Remove the MCE ring for Action Optional errors x86/mce: Don't use percpu workqueues x86/mce: Provide a lockless memory pool to save error records x86/mce: Reuse one of the u16 padding fields in 'struct mce'	2015-08-31 20:20:30 -07:00
Linus Torvalds	41d859a83c	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Main perf kernel side changes: - uprobes updates/fixes. (Oleg Nesterov) - Add PERF_RECORD_SWITCH to indicate context switches and use it in tooling. (Adrian Hunter) - Support BPF programs attached to uprobes and first steps for BPF tooling support. (Wang Nan) - x86 generic x86 MSR-to-perf PMU driver. (Andy Lutomirski) - x86 Intel PT, LBR and BTS updates. (Alexander Shishkin) - x86 Intel Skylake support. (Andi Kleen) - x86 Intel Knights Landing (KNL) RAPL support. (Dasaratharaman Chandramouli) - x86 Intel Broadwell-DE uncore support. (Kan Liang) - x86 hw breakpoints robustization (Andy Lutomirski) Main perf tooling side changes: - Support Intel PT in several tools, enabling the use of the processor trace feature introduced in Intel Broadwell processors: (Adrian Hunter) # dmesg \| grep Performance # [0.188477] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell events, full-width counters, Intel PMU driver. # perf record -e intel_pt//u -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.216 MB perf.data ] # perf script # then navigate in the tool output to some area, like this one: 184 1030 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba661440 dl_main (/usr/lib64/ld-2.17.so) 185 1457 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba669f10 _dl_new_object (/usr/lib64/ld-2.17.so) 186 9f37 _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba677b90 strlen (/usr/lib64/ld-2.17.so) 187 7ba3 strlen (/usr/lib64/ld-2.17.so) => 7f21ba677c75 strlen (/usr/lib64/ld-2.17.so) 188 7c78 strlen (/usr/lib64/ld-2.17.so) => 7f21ba669f3c _dl_new_object (/usr/lib64/ld-2.17.so) 189 9f8a _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba65fab0 calloc@plt (/usr/lib64/ld-2.17.so) 190 fab0 calloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e70 calloc (/usr/lib64/ld-2.17.so) 191 5e87 calloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa90 malloc@plt (/usr/lib64/ld-2.17.so) 192 fa90 malloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e60 malloc (/usr/lib64/ld-2.17.so) 193 5e68 malloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so) 194 fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so) => 7f21ba675d50 __libc_memalign (/usr/lib64/ld-2.17.so) 195 5d63 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e20 __libc_memalign (/usr/lib64/ld-2.17.so) 196 5e40 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675d73 __libc_memalign (/usr/lib64/ld-2.17.so) 197 5d97 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e18 __libc_memalign (/usr/lib64/ld-2.17.so) 198 5e1e __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675df9 __libc_memalign (/usr/lib64/ld-2.17.so) 199 5e10 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba669f8f _dl_new_object (/usr/lib64/ld-2.17.so) 200 9fc2 _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba678e70 memcpy (/usr/lib64/ld-2.17.so) 201 8e8c memcpy (/usr/lib64/ld-2.17.so) => 7f21ba678ea0 memcpy (/usr/lib64/ld-2.17.so) - Add support for using several Intel PT features (CYC, MTC packets), the relevant documentation was updated in: tools/perf/Documentation/intel-pt.txt briefly describing those packets, its purposes, how to configure them in the event config terms and relevant external documentation for further reading. (Adrian Hunter) - Introduce support for probing at an absolute address, for user and kernel 'perf probe's, useful when one have the symbol maps on a developer machine but not on an embedded system. (Wang Nan) - Add Intel BTS support, with a call-graph script to show it and PT in use in a GUI using 'perf script' python scripting with postgresql and Qt. (Adrian Hunter) - Allow selecting the type of callchains per event, including disabling callchains in all but one entry in an event list, to save space, and also to ask for the callchains collected in one event to be used in other events. (Kan Liang) - Beautify more syscall arguments in 'perf trace': (Arnaldo Carvalho de Melo) * A bunch more translate file/pathnames from pointers to strings. * Convert numbers to strings for the 'keyctl' syscall 'option' arg. * Add missing 'clockid' entries. - Introduce 'srcfile' sort key: (Andi Kleen) # perf record -F 10000 usleep 1 # perf report --stdio --dsos '[kernel.vmlinux]' -s srcfile <SNIP> # Overhead Source File 26.49% copy_page_64.S 5.49% signal.c 0.51% msr.h # It can be combined with other fields, for instance, experiment with '-s srcfile,symbol'. There are some oddities in some distros and with some specific DSOs, being investigated, so your mileage may vary. - Support per-event 'freq' term: (Namhyung Kim) $ perf record -e 'cpu/instructions,freq=1234/',cycles -c 1000 sleep 1 $ perf evlist -F cpu/instructions,freq=1234/: sample_freq=1234 cycles: sample_period=1000 $ - Deref sys_enter pointer args with contents from probe:vfs_getname, showing pathnames instead of pointers in many syscalls in 'perf trace'. (Arnaldo Carvalho de Melo) - Stop collecting /proc/kallsyms in perf.data files, saving about 4.5MB on a typical x86-64 system, use the the symbol resolution routines used in all the other tools (report, top, etc) now that we can ask libtraceevent to use perf's symbol resolution code. (Arnaldo Carvalho de Melo) - Allow filtering out of perf's PID via 'perf record --exclude-perf'. (Wang Nan) - 'perf trace' now supports syscall groups, like strace, i.e: $ trace -e file touch file Will expand 'file' into multiple, file related, syscalls. More work needed to add extra groups for other syscall groups, and also to complement what was added for the 'file' group, included as a proof of concept. (Arnaldo Carvalho de Melo) - Add lock_pi stresser to 'perf bench futex', to test the kernel code related to FUTEX_(UN)LOCK_PI. (Davidlohr Bueso) - Let user have timestamps with per-thread recording in 'perf record' (Adrian Hunter) - ... and tons of other changes, see the shortlog and the Git log for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (240 commits) perf evlist: Add backpointer for perf_env to evlist perf tools: Rename perf_session_env to perf_env perf tools: Do not change lib/api/fs/debugfs directly perf tools: Add tracing_path and remove unneeded functions perf buildid: Introduce sysfs/filename__sprintf_build_id perf evsel: Add a backpointer to the evlist a evsel is in perf trace: Add header with copyright and background info perf scripts python: Add new compaction-times script perf stat: Get correct cpu id for print_aggr tools lib traceeveent: Allow for negative numbers in print format perf script: Add --[no-]-demangle/--[no-]-demangle-kernel tracing/uprobes: Do not print '0x (null)' when offset is 0 perf probe: Support probing at absolute address perf probe: Fix error reported when offset without function perf probe: Fix list result when address is zero perf probe: Fix list result when symbol can't be found tools build: Allow duplicate objects in the object list perf tools: Remove export.h from MANIFEST perf probe: Prevent segfault when reading probe point with absolute address perf tools: Update Intel PT documentation ...	2015-08-31 19:49:05 -07:00
Linus Torvalds	4658000955	Merge branch 'mm-kasan-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/kasan changes from Ingo Molnar: "These are two KASAN changes that factor out (and generalize) x86 specific KASAN code from x86 to mm" * 'mm-kasan-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kasan, mm: Introduce generic kasan_populate_zero_shadow() x86/kasan: Define KASAN_SHADOW_OFFSET per architecture	2015-08-31 19:16:36 -07:00
Linus Torvalds	5757bd6157	Merge branch 'core-types-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull inlining tuning from Ingo Molnar: "A handful of inlining optimizations inspired by x86 work but applicable in general" * 'core-types-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: jiffies: Force inlining of {m,u}msecs_to_jiffies() x86/hweight: Force inlining of __arch_hweight{32,64}() linux/bitmap: Force inlining of bitmap weight functions	2015-08-31 18:37:33 -07:00
Linus Torvalds	26f8b7edc9	PCI changes for the v4.3 merge window: Enumeration Allocate ATS struct during enumeration (Bjorn Helgaas) Embed ATS info directly into struct pci_dev (Bjorn Helgaas) Reduce size of ATS structure elements (Bjorn Helgaas) Stop caching ATS Invalidate Queue Depth (Bjorn Helgaas) iommu/vt-d: Cache PCI ATS state and Invalidate Queue Depth (Bjorn Helgaas) Move MPS configuration check to pci_configure_device() (Bjorn Helgaas) Set MPS to match upstream bridge (Keith Busch) ARM/PCI: Set MPS before pci_bus_add_devices() (Murali Karicheri) Add pci_scan_root_bus_msi() (Lorenzo Pieralisi) ARM/PCI, designware, xilinx: Use pci_scan_root_bus_msi() (Lorenzo Pieralisi) Resource management Call pci_read_bridge_bases() from core instead of arch code (Lorenzo Pieralisi) PCI device hotplug pciehp: Remove unused interrupt events (Bjorn Helgaas) pciehp: Remove ignored MRL sensor interrupt events (Bjorn Helgaas) pciehp: Handle invalid data when reading from non-existent devices (Jarod Wilson) pciehp: Simplify pcie_poll_cmd() (Yijing Wang) Use "slot" and "pci_slot" for struct hotplug_slot and struct pci_slot (Yijing Wang) Protect pci_bus->slots with pci_slot_mutex, not pci_bus_sem (Yijing Wang) Hold pci_slot_mutex while searching bus->slots list (Yijing Wang) Power management Disable async suspend/resume for JMicron multi-function SATA/AHCI (Zhang Rui) Virtualization Add ACS quirks for Intel I219-LM/V (Alex Williamson) Restore ACS configuration as part of pci_restore_state() (Alexander Duyck) MSI Add pcibios_alloc_irq() and pcibios_free_irq() (Jiang Liu) x86: Implement pcibios_alloc_irq() and pcibios_free_irq() (Jiang Liu) Add helpers to manage pci_dev->irq and pci_dev->irq_managed (Jiang Liu) Free legacy IRQ when enabling MSI/MSI-X (Jiang Liu) ARM/PCI: Remove msi_controller from struct pci_sys_data (Lorenzo Pieralisi) Remove unused pcibios_msi_controller() hook (Lorenzo Pieralisi) Generic host bridge driver Remove dependency on ARM-specific struct hw_pci (Jayachandran C) Build setup-irq.o for arm64 (Jayachandran C) Add arm64 support (Jayachandran C) APM X-Gene host bridge driver Add APM X-Gene PCIe 64-bit prefetchable window (Duc Dang) Add support for a 64-bit prefetchable memory window (Duc Dang) Drop owner assignment from platform_driver (Krzysztof Kozlowski) Broadcom iProc host bridge driver Allow BCMA bus driver to be built as module (Hauke Mehrtens) Delete unnecessary checks before phy calls (Markus Elfring) Add arm64 support (Ray Jui) Synopsys DesignWare host bridge driver Don't complain missing config reg space if va_cfg0 is set (Murali Karicheri) TI DRA7xx host bridge driver Disable pm_runtime on get_sync failure (Kishon Vijay Abraham I) Add PM support (Kishon Vijay Abraham I) Clear MSE bit during suspend so clocks will idle (Kishon Vijay Abraham I) Add support to make GPIO drive PERST# line (Kishon Vijay Abraham I) Xilinx AXI host bridge driver Check for MSI interrupt flag before handling as INTx (Russell Joyce) Miscellaneous Fix Intersil/Techwell TW686[4589] AV capture class code (Krzysztof Hałasa) Use PCI_CLASS_SERIAL_USB instead of bare number (Bjorn Helgaas) Fix generic NCR 53c810 class code quirk (Bjorn Helgaas) Fix TI816X class code quirk (Bjorn Helgaas) Remove unused "pci_probe" flags (Bjorn Helgaas) Host bridge driver code simplifications (Fabio Estevam) Add dev_flags bit to access VPD through function 0 (Mark Rustad) Add VPD function 0 quirk for Intel Ethernet devices (Mark Rustad) Kill off set_irq_flags() usage (Rob Herring) Remove Intel Cherrytrail D3 delays (Srinidhi Kasagar) Clean up pci_find_capability() (Wei Yang) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJV5FE/AAoJEFmIoMA60/r8I2QP/R9b9MrvH2i9tN98/lTDl7g3 czE58ZM1d4kMYtW3Pm/DrYI6y6RprAaB4ZEp5rHxlFLqBPZEQwWodA19NkjECcb6 g5qKWOdIWA4T6Jaab6a/yCmAFa0jni7iAmmTYqca9o3Xj7tFovxDxqPSYkh+rer0 v+1sAr/4HXSiN339KR6teEF3VZqLFp6ewMydQlVS+R7kAOHHYQDqoo9WF6JnIoL5 PO3Kbmr1WN3fZY3s98yLq1x6XmLrLlmGdJI+2r+KewO4r/05CL6wTVP/oTMi+Eti dueseeISlOTcTAUhk87Vap23uJPeB/rJbYoFdCr7+0AkZGe/U/E2dpZm2wyMcCvq OrATuFymgzIuJm5uUPsdH4lzsX97U9BcDccracfC38rYnP5u3bqHCjw8HJzANR7p VYbFBzc5ZCCUYtQAjyrKt2820AvTFo+Bu+z75IsJO8LQQgv/zGtQQ8grIQeAjH+l sAe3xOTwzZnq6Obl4qb/GElHmIGUbQ1X4Dx1mliiijKMKkhYHOA0iFnB/OBILmEZ wHzKU8chWcI9lip0aaX8q9i/qovdVUt2+rdo/N40l7YY66x4jkNgQQXZX+FSKk6H stTvEBQgK28EKCHDxMsgzTGIqllSyk4DnRMA7ij1hRWqdUbGk7wOPTvm9QSwNDWe SokuWzAQD9YeMRGdsYjZ =DX1r -----END PGP SIGNATURE----- Merge tag 'pci-v4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for the v4.3 merge window: Enumeration: - Allocate ATS struct during enumeration (Bjorn Helgaas) - Embed ATS info directly into struct pci_dev (Bjorn Helgaas) - Reduce size of ATS structure elements (Bjorn Helgaas) - Stop caching ATS Invalidate Queue Depth (Bjorn Helgaas) - iommu/vt-d: Cache PCI ATS state and Invalidate Queue Depth (Bjorn Helgaas) - Move MPS configuration check to pci_configure_device() (Bjorn Helgaas) - Set MPS to match upstream bridge (Keith Busch) - ARM/PCI: Set MPS before pci_bus_add_devices() (Murali Karicheri) - Add pci_scan_root_bus_msi() (Lorenzo Pieralisi) - ARM/PCI, designware, xilinx: Use pci_scan_root_bus_msi() (Lorenzo Pieralisi) Resource management: - Call pci_read_bridge_bases() from core instead of arch code (Lorenzo Pieralisi) PCI device hotplug: - pciehp: Remove unused interrupt events (Bjorn Helgaas) - pciehp: Remove ignored MRL sensor interrupt events (Bjorn Helgaas) - pciehp: Handle invalid data when reading from non-existent devices (Jarod Wilson) - pciehp: Simplify pcie_poll_cmd() (Yijing Wang) - Use "slot" and "pci_slot" for struct hotplug_slot and struct pci_slot (Yijing Wang) - Protect pci_bus->slots with pci_slot_mutex, not pci_bus_sem (Yijing Wang) - Hold pci_slot_mutex while searching bus->slots list (Yijing Wang) Power management: - Disable async suspend/resume for JMicron multi-function SATA/AHCI (Zhang Rui) Virtualization: - Add ACS quirks for Intel I219-LM/V (Alex Williamson) - Restore ACS configuration as part of pci_restore_state() (Alexander Duyck) MSI: - Add pcibios_alloc_irq() and pcibios_free_irq() (Jiang Liu) - x86: Implement pcibios_alloc_irq() and pcibios_free_irq() (Jiang Liu) - Add helpers to manage pci_dev->irq and pci_dev->irq_managed (Jiang Liu) - Free legacy IRQ when enabling MSI/MSI-X (Jiang Liu) - ARM/PCI: Remove msi_controller from struct pci_sys_data (Lorenzo Pieralisi) - Remove unused pcibios_msi_controller() hook (Lorenzo Pieralisi) Generic host bridge driver: - Remove dependency on ARM-specific struct hw_pci (Jayachandran C) - Build setup-irq.o for arm64 (Jayachandran C) - Add arm64 support (Jayachandran C) APM X-Gene host bridge driver: - Add APM X-Gene PCIe 64-bit prefetchable window (Duc Dang) - Add support for a 64-bit prefetchable memory window (Duc Dang) - Drop owner assignment from platform_driver (Krzysztof Kozlowski) Broadcom iProc host bridge driver: - Allow BCMA bus driver to be built as module (Hauke Mehrtens) - Delete unnecessary checks before phy calls (Markus Elfring) - Add arm64 support (Ray Jui) Synopsys DesignWare host bridge driver: - Don't complain missing config reg space if va_cfg0 is set (Murali Karicheri) TI DRA7xx host bridge driver: - Disable pm_runtime on get_sync failure (Kishon Vijay Abraham I) - Add PM support (Kishon Vijay Abraham I) - Clear MSE bit during suspend so clocks will idle (Kishon Vijay Abraham I) - Add support to make GPIO drive PERST# line (Kishon Vijay Abraham I) Xilinx AXI host bridge driver: - Check for MSI interrupt flag before handling as INTx (Russell Joyce) Miscellaneous: - Fix Intersil/Techwell TW686[4589] AV capture class code (Krzysztof Hałasa) - Use PCI_CLASS_SERIAL_USB instead of bare number (Bjorn Helgaas) - Fix generic NCR 53c810 class code quirk (Bjorn Helgaas) - Fix TI816X class code quirk (Bjorn Helgaas) - Remove unused "pci_probe" flags (Bjorn Helgaas) - Host bridge driver code simplifications (Fabio Estevam) - Add dev_flags bit to access VPD through function 0 (Mark Rustad) - Add VPD function 0 quirk for Intel Ethernet devices (Mark Rustad) - Kill off set_irq_flags() usage (Rob Herring) - Remove Intel Cherrytrail D3 delays (Srinidhi Kasagar) - Clean up pci_find_capability() (Wei Yang)" * tag 'pci-v4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (72 commits) PCI: Disable async suspend/resume for JMicron multi-function SATA/AHCI PCI: Set MPS to match upstream bridge PCI: Move MPS configuration check to pci_configure_device() PCI: Drop references acquired by of_parse_phandle() PCI/MSI: Remove unused pcibios_msi_controller() hook ARM/PCI: Remove msi_controller from struct pci_sys_data ARM/PCI, designware, xilinx: Use pci_scan_root_bus_msi() PCI: Add pci_scan_root_bus_msi() ARM/PCI: Replace panic with WARN messages on failures PCI: generic: Add arm64 support PCI: Build setup-irq.o for arm64 PCI: generic: Remove dependency on ARM-specific struct hw_pci PCI: imx6: Simplify a trivial if-return sequence PCI: spear: Use BUG_ON() instead of condition followed by BUG() PCI: dra7xx: Remove unneeded use of IS_ERR_VALUE() PCI: Remove pci_ats_enabled() PCI: Stop caching ATS Invalidate Queue Depth PCI: Move ATS declarations to linux/pci.h so they're all together PCI: Clean up ATS error handling PCI: Use pci_physfn() rather than looking up physfn by hand ...	2015-08-31 17:14:39 -07:00
Ingo Molnar	0050ae57cd	Merge branch 'x86/cpufeature' into x86/urgent, because it's ready Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-31 19:47:03 +02:00
Linus Torvalds	1c00038c76	Char/Misc driver patches for 4.3-rc1 Here's the "big" char/misc driver update for 4.3-rc1. Not much really interesting here, just a number of little changes all over the place, and some nice consolidation of the nvmem drivers to a common framework. As usual, the mei drivers stand out as the largest "churn" to handle new devices and features in their hardware. All have been in linux-next for a while with no issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlXV844ACgkQMUfUDdst+ymYfQCgmDKjq3fsVHCxNZPxnukFYzvb xZkAnRb8fuub5gVQFP29A+rhyiuWD13v =Bq9K -----END PGP SIGNATURE----- Merge tag 'char-misc-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver patches from Greg KH: "Here's the "big" char/misc driver update for 4.3-rc1. Not much really interesting here, just a number of little changes all over the place, and some nice consolidation of the nvmem drivers to a common framework. As usual, the mei drivers stand out as the largest "churn" to handle new devices and features in their hardware. All have been in linux-next for a while with no issues" * tag 'char-misc-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits) auxdisplay: ks0108: initialize local parport variable extcon: palmas: Fix build break due to devm_gpiod_get_optional API change extcon: palmas: Support GPIO based USB ID detection extcon: Fix signedness bugs about break error handling extcon: Drop owner assignment from i2c_driver extcon: arizona: Simplify pdata symantics for micd_dbtime extcon: arizona: Declare 3-pole jack if we detect open circuit on mic extcon: Add exception handling to prevent the NULL pointer access extcon: arizona: Ensure variables are set for headphone detection extcon: arizona: Use gpiod inteface to handle micd_pol_gpio gpio extcon: arizona: Add basic microphone detection DT/ACPI bindings extcon: arizona: Update to use the new device properties API extcon: palmas: Remove the mutually_exclusive array extcon: Remove optional print_state() function pointer of struct extcon_dev extcon: Remove duplicate header file in extcon.h extcon: max77843: Clear IRQ bits state before request IRQ toshiba laptop: replace ioremap_cache with ioremap misc: eeprom: max6875: clean up max6875_read() misc: eeprom: clean up eeprom_read() misc: eeprom: 93xx46: clean up eeprom_93xx46_bin_read/write ...	2015-08-31 08:34:13 -07:00
Linus Torvalds	44e98edcd1	A very small release for x86 and s390 KVM. s390: timekeeping changes, cleanups and fixes x86: support for Hyper-V MSRs to report crashes, and a bunch of cleanups. One interesting feature that was planned for 4.3 (emulating the local APIC in kernel while keeping the IOAPIC and 8254 in userspace) had to be delayed because Intel complained about my reading of the manual. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJVznW4AAoJEL/70l94x66Dt+gH/3vydhh6kv+mKhnR+kADaGfM gaunw0CUpJLU6gkOkYOm5M32WGhsT9Hd3WtRTJO6PhSo7cQ88hMx24u4XAffoewo Os5tDwAaHeV2enVSTri6xX8e2F2mgPDghGcYJPUBwnmMjRzZ8tj2VHUcbxqVT6Pb pX3V8ZxOZ81+ACZU2tdNRzLUd2H1v4d74gtVS7ove1Vb0CvPOBdHf1KQuUCUa2Pi 73fvnaEuSaFYtSWZIP1PYxLnsQHpApH3Kco/5kHeqUPpYaGa/g2bnfncHRw20Svr gb3opwbfyiq91xfGbRVR3+E63Cw4G6aTl5MDNv9UFJ+xFKuj8WJ72xXXTSwzUi4= =HgT+ -----END PGP SIGNATURE----- Merge tag 'kvm-4.3-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "A very small release for x86 and s390 KVM. - s390: timekeeping changes, cleanups and fixes - x86: support for Hyper-V MSRs to report crashes, and a bunch of cleanups. One interesting feature that was planned for 4.3 (emulating the local APIC in kernel while keeping the IOAPIC and 8254 in userspace) had to be delayed because Intel complained about my reading of the manual" * tag 'kvm-4.3-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits) x86/kvm: Rename VMX's segment access rights defines KVM: x86/vPMU: Fix unnecessary signed extension for AMD PERFCTRn kvm: x86: Fix error handling in the function kvm_lapic_sync_from_vapic KVM: s390: Fix assumption that kvm_set_irq_routing is always run successfully KVM: VMX: drop ept misconfig check KVM: MMU: fully check zero bits for sptes KVM: MMU: introduce is_shadow_zero_bits_set() KVM: MMU: introduce the framework to check zero bits on sptes KVM: MMU: split reset_rsvds_bits_mask_ept KVM: MMU: split reset_rsvds_bits_mask KVM: MMU: introduce rsvd_bits_validate KVM: MMU: move FNAME(is_rsvd_bits_set) to mmu.c KVM: MMU: fix validation of mmio page fault KVM: MTRR: Use default type for non-MTRR-covered gfn before WARN_ON KVM: s390: host STP toleration for VMs KVM: x86: clean/fix memory barriers in irqchip_in_kernel KVM: document memory barriers for kvm->vcpus/kvm->online_vcpus KVM: x86: remove unnecessary memory barriers for shared MSRs KVM: move code related to KVM_SET_BOOT_CPU_ID to x86 KVM: s390: log capability enablement and vm attribute changes ...	2015-08-31 08:27:44 -07:00
Ingo Molnar	02b643b643	Merge branch 'perf/urgent' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-31 10:25:26 +02:00
Huang Rui	7e01ebffff	x86/asm: Drop repeated macro of X86_EFLAGS_AC definition We just need one macro of X86_EFLAGS_AC_BIT and X86_EFLAGS_AC. Signed-off-by: Huang Rui <ray.huang@amd.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@suse.de> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Li <tony.li@amd.com> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1440669844-21535-1-git-send-email-ray.huang@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-28 07:33:34 +02:00
Dan Williams	96601adb74	x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB Given that a write-back (WB) mapping plus non-temporal stores is expected to be the most efficient way to access PMEM, update the definition of ARCH_HAS_PMEM_API to imply arch support for WB-mapped-PMEM. This is needed as a pre-requisite for adding PMEM to the direct map and mapping it with struct page. The above clarification for X86_64 means that memcpy_to_pmem() is permitted to use the non-temporal arch_memcpy_to_pmem() rather than needlessly fall back to default_memcpy_to_pmem() when the pcommit instruction is not available. When arch_memcpy_to_pmem() is not guaranteed to flush writes out of cache, i.e. on older X86_32 implementations where non-temporal stores may just dirty cache, ARCH_HAS_PMEM_API is simply disabled. The default fall back for persistent memory handling remains. Namely, map it with the WT (write-through) cache-type and hope for the best. arch_has_pmem_api() is updated to only indicate whether the arch provides the proper helpers to meet the minimum "writes are visible outside the cache hierarchy after memcpy_to_pmem() + wmb_pmem()". Code that cares whether wmb_pmem() actually flushes writes to pmem must now call arch_has_wmb_pmem() directly. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> [hch: set ARCH_HAS_PMEM_API=n on x86_32] Reviewed-by: Christoph Hellwig <hch@lst.de> [toshi: x86_32 compile fixes] Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-08-27 19:40:59 -04:00
Dan Williams	4a9bf88a5c	Merge branch 'pmem-api' into libnvdimm-for-next	2015-08-27 19:40:26 -04:00
Ross Zwisler	67a3e8fe90	nd_blk: change aperture mapping from WC to WB This should result in a pretty sizeable performance gain for reads. For rough comparison I did some simple read testing using PMEM to compare reads of write combining (WC) mappings vs write-back (WB). This was done on a random lab machine. PMEM reads from a write combining mapping: # dd of=/dev/null if=/dev/pmem0 bs=4096 count=100000 100000+0 records in 100000+0 records out 409600000 bytes (410 MB) copied, 9.2855 s, 44.1 MB/s PMEM reads from a write-back mapping: # dd of=/dev/null if=/dev/pmem0 bs=4096 count=1000000 1000000+0 records in 1000000+0 records out 4096000000 bytes (4.1 GB) copied, 3.44034 s, 1.2 GB/s To be able to safely support a write-back aperture I needed to add support for the "read flush" _DSM flag, as outlined in the DSM spec: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf This flag tells the ND BLK driver that it needs to flush the cache lines associated with the aperture after the aperture is moved but before any new data is read. This ensures that any stale cache lines from the previous contents of the aperture will be discarded from the processor cache, and the new data will be read properly from the DIMM. We know that the cache lines are clean and will be discarded without any writeback because either a) the previous aperture operation was a read, and we never modified the contents of the aperture, or b) the previous aperture operation was a write and we must have written back the dirtied contents of the aperture to the DIMM before the I/O was completed. In order to add support for the "read flush" flag I needed to add a generic routine to invalidate cache lines, mmio_flush_range(). This is protected by the ARCH_HAS_MMIO_FLUSH Kconfig variable, and is currently only supported on x86. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-08-27 19:38:28 -04:00
Ingo Molnar	8d58b66ed2	Linux 4.2-rc8 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJV2pUkAAoJEHm+PkMAQRiGCIoH/Rb29ZjdCoZJp9OtnjAG+qRc bG3YuomIdib86x7xHRKKaLWBa7din7IYjuwT/X4S4duO5a1R5Lp1sRG3IlGfhT0W nBNbjFl4q4bOyiTPtTRTYyh4g5UQv4IuyCnCmZyCTJyVi/O6HVM9TWKUzm68P2dJ 30LwLUcQJ+mHueGJwFBAXe2BaojEpvYCdSX6tvbrQ/8X3FrVExZXuJl4uMYNFYNK ZwG/v5t7tYOiAe76JGbrEuVFPZWLPEW7amHOWR0T4Ye4nWTlBgx7fENiNRlfgcvI CM16l/xkyrZQ3Q5jZy1qYDfdHYF++dyEDysX4w1ae/X0aaLZn7l+u5VQD6WpkQQ= =IF6I -----END PGP SIGNATURE----- Merge tag 'v4.2-rc8' into x86/mm, before applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-25 09:59:19 +02:00
Rafael J. Wysocki	82bb70c599	Merge branch 'turbostat' of https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux into pm-tools Pull turbostat changes for v4.3 from Len Brown. * 'turbostat' of https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: fix typo on DRAM column in Joules-mode tools/power turbostat: fix parameter passing for forked command tools/power turbostat: dump CONFIG_TDP tools/power turbostat: cpu0 is no longer hard-coded, so update output tools/power turbostat: update turbostat(8)	2015-08-24 23:10:02 +02:00
Andy Lutomirski	47edb65178	x86/asm/msr: Make wrmsrl() a function As of `cf991de2f6` ("x86/asm/msr: Make wrmsrl_safe() a function"), wrmsrl_safe is a function, but wrmsrl is still a macro. The wrmsrl macro performs invalid shifts if the value argument is 32 bits. This makes it unnecessarily awkward to write code that puts an unsigned long into an MSR. To make this work, syscall_init needs tweaking to stop passing a function pointer to wrmsrl. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Willy Tarreau <w@1wt.eu> Link: http://lkml.kernel.org/r/690f0c629a1085d054e2d1ef3da073cfb3f7db92.1437678821.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-23 13:25:38 +02:00
Andrey Ryabinin	920e277e17	x86/kasan: Define KASAN_SHADOW_OFFSET per architecture Current definition of KASAN_SHADOW_OFFSET in include/linux/kasan.h will not work for upcomming arm64, so move it to the arch header. Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alexey Klimov <klimov.linux@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Keitel <dkeitel@codeaurora.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Yury <yury.norov@gmail.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1439444244-26057-2-git-send-email-ryabinin.a.a@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-22 14:54:55 +02:00
Huang Rui	b466bdb614	x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer MWAITX can enable a timer and a corresponding timer value specified in SW P0 clocks. The SW P0 frequency is the same as TSC. The timer provides an upper bound on how long the instruction waits before exiting. This way, a delay function in the kernel can leverage that MWAITX timer of MWAITX. When a CPU core executes MWAITX, it will be quiesced in a waiting phase, diminishing its power consumption. This way, we can save power in comparison to our default TSC-based delays. A simple test shows that: $ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc $ sleep 10000s $ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc Results: * TSC-based default delay: 485115 uWatts average power * MWAITX-based delay: 252738 uWatts average power Thus, that's about 240 milliWatts less power consumption. The test method relies on the support of AMD CPU accumulated power algorithm in fam15h_power for which patches are forthcoming. Suggested-by: Andy Lutomirski <luto@amacapital.net> Suggested-by: Borislav Petkov <bp@suse.de> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Huang Rui <ray.huang@amd.com> [ Fix delay truncation. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Andreas Herrmann <herrmann.der.user@gmail.com> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Li <tony.li@amd.com> Link: http://lkml.kernel.org/r/1438744732-1459-3-git-send-email-ray.huang@amd.com Link: http://lkml.kernel.org/r/1439201994-28067-4-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-22 14:52:16 +02:00
Huang Rui	f96756746c	x86/asm: Add MONITORX/MWAITX instruction support AMD Carrizo processors (Family 15h, Models 60h-6fh) added a new feature called MWAITX (MWAIT with extensions) as an extension to MONITOR/MWAIT. This new instruction controls a configurable timer which causes the core to exit wait state on timer expiration, in addition to "normal" MWAIT condition of reading from a monitored VA. Compared to MONITOR/MWAIT, there are minor differences in opcode and input parameters: MWAITX ECX[1]: enable timer if set MWAITX EBX[31:0]: max wait time expressed in SW P0 clocks == TSC. The software P0 frequency is the same as the TSC frequency. MWAIT MWAITX opcode 0f 01 c9 \| 0f 01 fb ECX[0] value of RFLAGS.IF seen by instruction ECX[1] unused/#GP if set \| enable timer if set ECX[31:2] unused/#GP if set EAX unused (reserve for hint) EBX[31:0] unused \| max wait time (SW P0 == TSC) MONITOR MONITORX opcode 0f 01 c8 \| 0f 01 fa EAX (logical) address to monitor ECX #GP if not zero Max timeout = EBX/(TSC frequency) Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Herrmann <herrmann.der.user@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dirk Brandewie <dirk.j.brandewie@intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <bitbucket@online.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Li <tony.li@amd.com> Link: http://lkml.kernel.org/r/1439201994-28067-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-22 14:52:16 +02:00
Tim Chen	488ca7d72d	x86/cpufeatures: Enable cpuid for Intel SHA extensions Add Intel CPUID for Intel Secure Hash Algorithm Extensions. This feature provides new instructions for accelerated computation of SHA-1 and SHA-256. This allows the feature to be shown in the /proc/cpuinfo for cpus that support it. Refer to SHA extension programming guide in chapter 8.2 of the Intel Architecture Instruction Set Extensions Programming reference for definition of this feature's cpuid: CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29] = 1 https://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf Originally-by: Chandramouli Narayanan <mouli_7982@yahoo.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Link: http://lkml.kernel.org/r/1440194206.3940.6.camel@schen9-mobl2 Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-08-22 11:17:31 +02:00
Ingo Molnar	99770737ca	x86/asm/tsc: Add rdtscll() merge helper Some in-flight code makes use of the old rdtscll() (now removed), provide a wrapper for a kernel cycle to smooth the transition to rdtsc(). ( We use the safest variant, rdtsc_ordered(), which has barriers - this adds another incentive to remove the wrapper in the future. ) Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/dddbf98a2af53312e9aa73a5a2b1622fe5d6f52b.1434501121.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-21 08:35:42 +02:00
Ross Zwisler	5de490daec	pmem: add copy_from_iter_pmem() and clear_pmem() Add support for two new PMEM APIs, copy_from_iter_pmem() and clear_pmem(). copy_from_iter_pmem() is used to copy data from an iterator into a PMEM buffer. clear_pmem() zeros a PMEM memory range. Both of these new APIs must be explicitly ordered using a wmb_pmem() function call and are implemented in such a way that the wmb_pmem() will make the stores to PMEM durable. Because both APIs are unordered they can be called as needed without introducing any unwanted memory barriers. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-08-20 14:07:23 -04:00
Ross Zwisler	4a370df553	pmem, x86: clean up conditional pmem includes Prior to this change x86_64 used the pmem defines in arch/x86/include/asm/pmem.h, and UM used the default ones at the top of include/linux/pmem.h. The inclusion or exclusion in linux/pmem.h was controlled by CONFIG_ARCH_HAS_PMEM_API, but the ones in asm/pmem.h were controlled by ARCH_HAS_NOCACHE_UACCESS. Instead, control them both with CONFIG_ARCH_HAS_PMEM_API so that it's clear that they are related and we don't run into the possibility where they are both included or excluded. Also remove a bunch of stale function prototypes meant for UM in asm/pmem.h - these just conflicted with the inline defaults in linux/pmem.h and gave compile errors. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-08-20 14:07:23 -04:00
Ross Zwisler	18279b467a	pmem: remove layer when calling arch_has_wmb_pmem() Prior to this change arch_has_wmb_pmem() was only called by arch_has_pmem_api(). Both arch_has_wmb_pmem() and arch_has_pmem_api() checked to make sure that CONFIG_ARCH_HAS_PMEM_API was enabled. Instead, remove the old arch_has_wmb_pmem() wrapper to be rid of one extra layer of indirection and the redundant CONFIG_ARCH_HAS_PMEM_API check. Rename __arch_has_wmb_pmem() to arch_has_wmb_pmem() since we no longer have a wrapper, and just have arch_has_pmem_api() call the architecture specific arch_has_wmb_pmem() directly. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-08-20 14:07:23 -04:00
Ross Zwisler	4060352656	pmem, x86: move x86 PMEM API to new pmem.h header Move the x86 PMEM API implementation out of asm/cacheflush.h and into its own header asm/pmem.h. This will allow members of the PMEM API to be more easily identified on this and other architectures. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-08-20 14:07:23 -04:00
Boris Ostrovsky	65d0cf0be7	xen/PMU: Initialization code for Xen PMU Map shared data structure that will hold CPU registers, VPMU context, V/PCPU IDs of the CPU interrupted by PMU interrupt. Hypervisor fills this information in its handler and passes it to the guest for further processing. Set up PMU VIRQ. Now that perf infrastructure will assume that PMU is available on a PV guest we need to be careful and make sure that accesses via RDPMC instruction don't cause fatal traps by the hypervisor. Provide a nop RDPMC handler. For the same reason avoid issuing a warning on a write to APIC's LVTPC. Both of these will be made functional in later patches. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-08-20 12:25:20 +01:00
Boris Ostrovsky	5f14154882	xen/PMU: Sysfs interface for setting Xen PMU mode Set Xen's PMU mode via /sys/hypervisor/pmu/pmu_mode. Add XENPMU hypercall. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-08-20 12:24:26 +01:00
Juergen Gross	cb3eb85013	xen: remove no longer needed p2m.h Cleanup by removing arch/x86/xen/p2m.h as it isn't needed any more. Most definitions in this file are used in p2m.c only. Move those into p2m.c. set_phys_range_identity() is already declared in arch/x86/include/asm/xen/page.h, add __init annotation there. MAX_REMAP_RANGES isn't used at all, just delete it. The only define left is P2M_PER_PAGE which is moved to page.h as well. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-08-20 12:24:25 +01:00
Juergen Gross	c70727a5bc	xen: allow more than 512 GB of RAM for 64 bit pv-domains 64 bit pv-domains under Xen are limited to 512 GB of RAM today. The main reason has been the 3 level p2m tree, which was replaced by the virtual mapped linear p2m list. Parallel to the p2m list which is being used by the kernel itself there is a 3 level mfn tree for usage by the Xen tools and eventually for crash dump analysis. For this tree the linear p2m list can serve as a replacement, too. As the kernel can't know whether the tools are capable of dealing with the p2m list instead of the mfn tree, the limit of 512 GB can't be dropped in all cases. This patch replaces the hard limit by a kernel parameter which tells the kernel to obey the 512 GB limit or not. The default is selected by a configuration parameter which specifies whether the 512 GB limit should be active per default for domUs (domain save/restore/migration and crash dump analysis are affected). Memory above the domain limit is returned to the hypervisor instead of being identity mapped, which was wrong anyway. The kernel configuration parameter to specify the maximum size of a domain can be deleted, as it is not relevant any more. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-08-20 12:24:24 +01:00
Juergen Gross	17fb46b119	xen: sync with xen headers Use the newest headers from the xen tree to get some new structure layouts. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-08-20 12:24:16 +01:00
Julien Grall	4a5b69464e	xen/events: Support event channel rebind on ARM Currently, the event channel rebind code is gated with the presence of the vector callback. The virtual interrupt controller on ARM has the concept of per-CPU interrupt (PPI) which allow us to support per-VCPU event channel. Therefore there is no need of vector callback for ARM. Xen is already using a free PPI to notify the guest VCPU of an event. Furthermore, the xen code initialization in Linux (see arch/arm/xen/enlighten.c) is requesting correctly a per-CPU IRQ. Introduce new helper xen_support_evtchn_rebind to allow architecture decide whether rebind an event is support or not. It will always return true on ARM and keep the same behavior on x86. This is also allow us to drop the usage of xen_have_vector_callback entirely in the ARM code. Signed-off-by: Julien Grall <julien.grall@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-08-20 12:24:15 +01:00
Ingo Molnar	40a2ea1bd9	Merge branch 'perf/urgent' into perf/core, to pick up fixes before adding more changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-20 11:48:56 +02:00
Dan Williams	7a67832c7e	libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option We currently register a platform device for e820 type-12 memory and register a nvdimm bus beneath it. Registering the platform device triggers the device-core machinery to probe for a driver, but that search currently comes up empty. Building the nvdimm-bus registration into the e820_pmem platform device registration in this way forces libnvdimm to be built-in. Instead, convert the built-in portion of CONFIG_X86_PMEM_LEGACY to simply register a platform device and move the rest of the logic to the driver for e820_pmem, for the following reasons: 1/ Letting e820_pmem support be a module allows building and testing libnvdimm.ko changes without rebooting 2/ All the normal policy around modules can be applied to e820_pmem (unbind to disable and/or blacklisting the module from loading by default) 3/ Moving the driver to a generic location and converting it to scan "iomem_resource" rather than "e820.map" means any other architecture can take advantage of this simple nvdimm resource discovery mechanism by registering a resource named "Persistent Memory (legacy)" Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-08-19 00:34:34 -04:00
Ingo Molnar	a5dd192496	Merge branch 'x86/urgent' into x86/asm to fix up conflicts and to pick up fixes Conflicts: arch/x86/entry/entry_64_compat.S arch/x86/math-emu/get_address.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-18 09:39:47 +02:00
Andy Lutomirski	512255a2ad	Revert "sched/x86_64: Don't save flags on context switch" This reverts commit: `2c7577a758` ("sched/x86_64: Don't save flags on context switch") It was a nice speedup. It's also not quite correct: SYSENTER enables interrupts too early. We can re-add this optimization once the SYSENTER code is beaten into shape, which should happen in 4.3 or 4.4. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org # v3.19 Link: http://lkml.kernel.org/r/85f56651f59f76624e80785a8fd3bdfdd089a818.1439838962.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-18 09:39:26 +02:00
Len Brown	656bba3068	x86/smpboot: Remove APIC.wait_for_init_deassert and atomic init_deasserted Both the per-APIC flag ".wait_for_init_deassert", and the global atomic_t "init_deasserted" are dead code -- remove them. For all APIC types, "wait_for_master()" prevents an AP from proceeding until the BSP has set cpu_callout_mask, making "init_deasserted" {unnecessary}: BSP: <de-assert INIT> ... BSP: {set init_deasserted} AP: wait_for_master() set cpu_initialized_mask wait for cpu_callout_mask BSP: test cpu_initialized_mask BSP: set cpu_callout_mask AP: test cpu_callout_mask AP: {wait for init_deasserted} ... AP: <touch APIC> Deleting the {dead code} above is necessary to enable some parallelism in a future patch. Signed-off-by: Len Brown <len.brown@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zhu Guihua <zhugh.fnst@cn.fujitsu.com> Link: http://lkml.kernel.org/r/de4b3a9bab894735e285870b5296da25ee6a8a5a.1439739165.git.len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-17 10:42:28 +02:00
Ingo Molnar	5461bd81bf	Linux 4.2-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJV0R4AAAoJEHm+PkMAQRiG8xIH/AmiRd+JDrs0qqEy46p6X8Gn 0lB5/KsGycvIGIBTiy2nZzcT0Ly6LeFUKUjzPytlOhIZPMrxMVMShDaQKCXXIMUr 1mN6hkvpkLNnUhvL2fR6mm0zkjbz3zZEazFY+Jic8wQrtSkHgfH0DXqSAo8le0f8 kNrd5BPPhIwvpHGaNGFdTpbgpPcalXyQk/fHyvDGidbyXzY/d7l05QfYJ6XCD4Zm IAy48iK5BFts2+z3aOYrOeuuCcm1qFX8YArqzE1rfPp+U/LQpfUfij4cmOqDLn/F qnv9E7bRRVovvrgKe4I3Trta8kT53VLJvqpdw2Usqo8zvhs4VyrYpHC+gEE6YUY= =9Rd4 -----END PGP SIGNATURE----- Merge tag 'v4.2-rc7' into x86/boot, to refresh the branch before merging new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-17 10:41:59 +02:00
Andy Lutomirski	4d283ec908	x86/kvm: Rename VMX's segment access rights defines VMX encodes access rights differently from LAR, and the latter is most likely what x86 people think of when they think of "access rights". Rename them to avoid confusion. Cc: kvm@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-08-15 00:47:13 +02:00
Dan Williams	e836a256e8	pmem: convert to generic memremap Kill arch_memremap_pmem() and just let the architecture specify the flags to be passed to memremap(). Default to writethrough by default. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-08-14 13:23:28 -04:00
Linus Torvalds	ed596cde94	Revert x86 sigcontext cleanups This reverts commits `9a036b93a3` ("x86/signal/64: Remove 'fs' and 'gs' from sigcontext") and `c6f2062935` ("x86/signal/64: Fix SS handling for signals delivered to 64-bit programs"). They were cleanups, but they break dosemu by changing the signal return behavior (and removing 'fs' and 'gs' from the sigcontext struct - while not actually changing any behavior - causes build problems). Reported-and-tested-by: Stas Sergeev <stsp@list.ru> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-08-13 12:42:22 -07:00
Ashok Raj	8838eb6c0b	x86/mce: Clear Local MCE opt-in before kexec kexec could boot a kernel that could be legacy with no knowledge of LMCE. Hence we should make sure we clear LMCE optin before kexec reboot. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1439396985-12812-9-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-13 10:12:52 +02:00
Ashok Raj	4d1d5cdc34	x86/mce: Remove unused function declarations Remove unused function declarations. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1439396985-12812-8-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-13 10:12:52 +02:00
Borislav Petkov	eef4dfa0cb	x86/mce: Kill drain_mcelog_buffer() This used to flush out MCEs logged during early boot and which were in the MCA registers from a previous system run. No need for that now, since we've moved to a genpool. Suggested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1439396985-12812-7-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-13 10:12:52 +02:00
Chen, Gong	fd4cf79fcc	x86/mce: Remove the MCE ring for Action Optional errors Use unified genpool to save Action Optional error events and put Action Optional error handling in the same notification chain as MCE error decoding. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> [ Fold in subsequent patch from Boris for early boot logging. ] Signed-off-by: Tony Luck <tony.luck@intel.com> [ Correct a lot. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1439396985-12812-5-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-13 10:12:51 +02:00
Borislav Petkov	20d51a426f	x86/mce: Reuse one of the u16 padding fields in 'struct mce' ... to save the error severity of the MCE and whether the reported address of the error is usable. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1439396985-12812-2-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-13 10:12:50 +02:00
Peter Zijlstra	d420acd816	jump_label/x86: Work around asm build bug on older/backported GCCs Boris reported that gcc version 4.4.4 20100503 (Red Hat 4.4.4-2) fails to build linux-next kernels that have this fresh commit via the locking tree: `11276d5306` ("locking/static_keys: Add a new static_key interface") The problem appears to be that even though @key and @branch are compile time constants, it doesn't see the following expression as an immediate value: &((char *)key)[branch] More recent GCCs don't appear to have this problem. In particular, Red Hat backported the 'asm goto' feature into 4.4, 'normal' 4.4 compilers will not have this feature and thus not run into this asm. The workaround is to supply both values to the asm as immediates and do the addition in asm. Suggested-by: H. Peter Anvin <hpa@zytor.com> Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-13 08:44:43 +02:00
Will Deacon	2b2a85a4d3	locking/qrwlock: Implement queue_write_unlock() using smp_store_release() Since the following commit: `536fa40222` ("compiler: Allow 1- and 2-byte smp_load_acquire() and smp_store_release()") smp_store_release() supports byte accesses, so use that in writer unlock and remove the conditional macro override. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Waiman Long <Waiman.Long@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1438880084-18856-6-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-12 11:59:05 +02:00
Ingo Molnar	f52609fdab	Merge branch 'locking/arch-atomic' into locking/core, because it's ready for upstream Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-12 11:44:30 +02:00
Greg Kroah-Hartman	5d44f4b348	Merge 4.2-rc6 into char-misc-next We want the fixes in Linus's tree in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-08-09 16:28:09 -07:00
Jonathan (Zhixiong) Zhang	b40227fbfb	acpi, x86: Implement arch_apei_get_mem_attributes() ... to allow an arch specific implementation of getting page protection type associated with a physical address. On x86, we currently have no way to look up the EFI memory map attributes for a region in a consistent way, because the memmap is discarded after efi_free_boot_services(). So if you call efi_mem_attributes() during boot and at runtime, you could theoretically see different attributes. Since we are yet to see any x86 platforms that require anything other than PAGE_KERNEL (some arm64 platforms require the equivalent of PAGE_KERNEL_NOCACHE), return that until we know differently. Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438936621-5215-5-git-send-email-matt@codeblueprint.co.uk [ Small fixes to spelling. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-08 10:37:39 +02:00
Thomas Gleixner	a782a7e46b	x86/irq: Store irq descriptor in vector array We can spare the irq_desc lookup in the interrupt entry code if we store the descriptor pointer in the vector array instead the interrupt number. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Bjorn Helgaas <bhelgaas@google.com> Link: http://lkml.kernel.org/r/20150802203609.717724106@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-08-06 00:14:59 +02:00
Thomas Gleixner	7276c6a2cb	x86/irq: Rename VECTOR_UNDEFINED to VECTOR_UNUSED VECTOR_UNDEFINED is a misnomer. The vector is defined, but unused. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Bjorn Helgaas <bhelgaas@google.com> Link: http://lkml.kernel.org/r/20150802203609.477282494@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-08-06 00:14:58 +02:00
K. Y. Srinivasan	ca9357bd26	Drivers: hv: vmbus: Implement a clocksource based on the TSC page The current Hyper-V clock source is based on the per-partition reference counter and this counter is being accessed via s synthetic MSR - HV_X64_MSR_TIME_REF_COUNT. Hyper-V has a more efficient way of computing the per-partition reference counter value that does not involve reading a synthetic MSR. We implement a time source based on this mechanism. Tested-by: Vivek Yadav <vyadav@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-08-05 11:44:29 -07:00
Xiao Guangrong	c258b62b26	KVM: MMU: introduce the framework to check zero bits on sptes We have abstracted the data struct and functions which are used to check reserved bit on guest page tables, now we extend the logic to check zero bits on shadow page tables The zero bits on sptes include not only reserved bits on hardware but also the bits that SPTEs willnever use. For example, shadow pages will never use GB pages unless the guest uses them too. Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-08-05 12:47:24 +02:00
Xiao Guangrong	a0a64f50aa	KVM: MMU: introduce rsvd_bits_validate These two fields, rsvd_bits_mask and bad_mt_xwr, in "struct kvm_mmu" are used to check if reserved bits set on guest ptes, move them to a data struct so that the approach can be applied to check host shadow page table entries as well Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-08-05 12:47:23 +02:00
Andy Lutomirski	88cd622f92	x86/entry: Remove do_notify_resume(), syscall_trace_leave(), and their TIF masks They are no longer used. Good riddance! Deleting the TIF_ macros is really nice. It was never clear why there were so many variants. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eric Paris <eparis@parisplace.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/22c61682f446628573dde0f1d573ab821677e06da.1438378274.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-05 10:54:35 +02:00
Denys Vlasenko	d14edb1648	x86/hweight: Force inlining of __arch_hweight{32,64}() With this config: http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os gcc-4.7.2 generates many copies of these tiny functions: __arch_hweight32 (35 copies): 55 push %rbp e8 66 9b 4a 00 callq __sw_hweight32 48 89 e5 mov %rsp,%rbp 5d pop %rbp c3 retq __arch_hweight64 (8 copies): 55 push %rbp e8 5e c2 8a 00 callq __sw_hweight64 48 89 e5 mov %rsp,%rbp 5d pop %rbp c3 retq See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 This patch fixes this via s/inline/__always_inline/ To avoid touching 32-bit case where such change was not tested to be a win, reformat __arch_hweight64() to have completely disjoint 64-bit and 32-bit implementations. IOW: made #ifdef / 32 bits and 64 bits instead of having #ifdef / #else / #endif inside a single function body. Only 64-bit __arch_hweight64() is __always_inline'd. text data bss dec filename 86971120 17195912 36659200 140826232 vmlinux.before 86970954 17195912 36659200 140826066 vmlinux Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Graf <tgraf@suug.ch> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1438697716-28121-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-05 09:38:09 +02:00
Denis V. Lunev	cc2dd4027a	mshyperv: fix recognition of Hyper-V guest crash MSR's Hypervisor Top Level Functional Specification v3.1/4.0 notes that cpuid (0x40000003) EDX's 10th bit should be used to check that Hyper-V guest crash MSR's functionality available. This patch should fix this recognition. Currently the code checks EAX register instead of EDX. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-08-04 22:30:44 -07:00
Vitaly Kuznetsov	b4370df2b1	Drivers: hv: vmbus: add special crash handler Full kernel hang is observed when kdump kernel starts after a crash. This hang happens in vmbus_negotiate_version() function on wait_for_completion() as Hyper-V host (Win2012R2 in my testing) never responds to CHANNELMSG_INITIATE_CONTACT as it thinks the connection is already established. We need to perform some mandatory minimalistic cleanup before we start new kernel. Reported-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-08-04 22:28:38 -07:00
Vitaly Kuznetsov	2517281d63	Drivers: hv: vmbus: add special kexec handler When general-purpose kexec (not kdump) is being performed in Hyper-V guest the newly booted kernel fails with an MCE error coming from the host. It is the same error which was fixed in the "Drivers: hv: vmbus: Implement the protocol for tearing down vmbus state" commit - monitor pages remain special and when they're being written to (as the new kernel doesn't know these pages are special) bad things happen. We need to perform some minimalistic cleanup before booting a new kernel on kexec. To do so we need to register a special machine_ops.shutdown handler to be executed before the native_machine_shutdown(). Registering a shutdown notification handler via the register_reboot_notifier() call is not sufficient as it happens to early for our purposes. machine_ops is not being exported to modules (and I don't think we want to export it) so let's do this in mshyperv.c The minimalistic cleanup consists of cleaning up clockevents, synic MSRs, guest os id MSR, and hypercall MSR. Kdump doesn't require all this stuff as it lives in a separate memory space. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-08-04 22:25:29 -07:00
Andi Kleen	b83ff1c861	x86: Add new MSRs and MSR bits used for Intel Skylake PMU support Add new MSRs (LBR_INFO) and some new MSR bits used by the Intel Skylake PMU driver. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1431285767-27027-4-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-04 10:16:56 +02:00
Andi Kleen	a94cab2376	perf/x86: Add a native_perf_sched_clock_from_tsc() PEBSv3 has a raw TSC time stamp in its memory buffer that later needs to to be converted to perf_clock. Add a native_sched_clock_from_tsc() that works the same as native_sched_clock(), but starts with an already given TSC value. Paravirt is ignored, it will just get the native clock. But there isn't a para virtualized PEBS anyway. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1431285767-27027-2-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-04 10:16:55 +02:00
Alexander Shishkin	b1bf72d669	perf/x86/intel/pt: Add new timing packet enables Intel PT chapter in the new Intel Architecture SDM adds several packets corresponding enable bits and registers that control packet generation. Also, additional bits in the Intel PT CPUID leaf were added to enumerate presence and parameters of these new packets and features. The packets and enables are: * CYC: cycle accurate mode, provides the number of cycles elapsed since previous CYC packet; its presence and available threshold values are enumerated via CPUID; * MTC: mini time counter packets, used for tracking TSC time between full TSC packets; its presence and available resolution options are enumerated via CPUID; * PSB packet period is now configurable, available period values are enumerated via CPUID. This patch adds corresponding bit and register definitions, pmu driver capabilities based on CPUID enumeration, new attribute format bits for the new featurens and extends event configuration validation function to take these into account. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/1438262131-12725-1-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-04 10:16:55 +02:00
Konstantin Khlebnikov	fe32d3cd5e	sched/preempt: Fix cond_resched_lock() and cond_resched_softirq() These functions check should_resched() before unlocking spinlock/bh-enable: preempt_count always non-zero => should_resched() always returns false. cond_resched_lock() worked iff spin_needbreak is set. This patch adds argument "preempt_offset" to should_resched(). preempt_count offset constants for that: PREEMPT_DISABLE_OFFSET - offset after preempt_disable() PREEMPT_LOCK_OFFSET - offset after spin_lock() SOFTIRQ_DISABLE_OFFSET - offset after local_bh_distable() SOFTIRQ_LOCK_OFFSET - offset after spin_lock_bh() Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Graf <agraf@suse.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `bdb4380658` ("sched: Extract the basic add/sub preempt_count modifiers") Link: http://lkml.kernel.org/r/20150715095204.12246.98268.stgit@buzz Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-03 12:21:24 +02:00
Peter Zijlstra	11276d5306	locking/static_keys: Add a new static_key interface There are various problems and short-comings with the current static_key interface: - static_key_{true,false}() read like a branch depending on the key value, instead of the actual likely/unlikely branch depending on init value. - static_key_{true,false}() are, as stated above, tied to the static_key init values STATIC_KEY_INIT_{TRUE,FALSE}. - we're limited to the 2 (out of 4) possible options that compile to a default NOP because that's what our arch_static_branch() assembly emits. So provide a new static_key interface: DEFINE_STATIC_KEY_TRUE(name); DEFINE_STATIC_KEY_FALSE(name); Which define a key of different types with an initial true/false value. Then allow: static_branch_likely() static_branch_unlikely() to take a key of either type and emit the right instruction for the case. This means adding a second arch_static_branch_jump() assembly helper which emits a JMP per default. In order to determine the right instruction for the right state, encode the branch type in the LSB of jump_entry::key. This is the final step in removing the naming confusion that has led to a stream of avoidable bugs such as: `a833581e37` ("x86, perf: Fix static_key bug in load_mm_cr4()") ... but it also allows new static key combinations that will give us performance enhancements in the subsequent patches. Tested-by: Rabin Vincent <rabin@rab.in> # arm Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> # ppc Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390 Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-03 11:34:15 +02:00
Ingo Molnar	f320ead76a	Merge branch 'x86/asm' into locking/core Upcoming changes to static keys is interacting/conflicting with the following pending TSC commits in tip:x86/asm: `4ea1636b04` x86/asm/tsc: Rename native_read_tsc() to rdtsc() ... So merge it into the locking tree to have a smoother resolution. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-03 11:04:00 +02:00
Andrey Konovalov	76695af20c	locking, arch: use WRITE_ONCE()/READ_ONCE() in smp_store_release()/smp_load_acquire() Replace ACCESS_ONCE() macro in smp_store_release() and smp_load_acquire() with WRITE_ONCE() and READ_ONCE() on x86, arm, arm64, ia64, metag, mips, powerpc, s390, sparc and asm-generic since ACCESS_ONCE() does not work reliably on non-scalar types. WRITE_ONCE() and READ_ONCE() were introduced in the following commits: `230fa253df` ("kernel: Provide READ_ONCE and ASSIGN_ONCE") `43239cbe79` ("kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)") Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Alexander Duyck <alexander.h.duyck@redhat.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@suse.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/1438528264-714-1-git-send-email-andreyknvl@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-03 10:59:30 +02:00
Ingo Molnar	3a7651e683	Linux 4.2-rc5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVvsVPAAoJEHm+PkMAQRiGkMAH/AqQUjzltvIwDq39vlbrwpc1 DIgpm15zaIHKVThQRA69cBIDOprckk6pChFhA/aZVhRBsVva/Z3k8vIjaAzW7eDs OK3zE1VsQ0QSK9FYo/8DJoy8844DF5beVwZVE4/xc8TFbabA6BgWawAgVxdpgzVQ LQb6jMHQPGGpAQrdPJJcfkeQRi9GBpyXLX6x7nO4jKQAPQGVUqT1QLFN/XYMNp7n xmdWogyNfis+c/Vx2OIQUmS/kFO5oyGaSWB1pK2MKeTG5XJ7AITzeHOGfRPmVinn x9ozeMLPjTMNFlzPYYrTL+xnqdCPHzKW7KP2LBvNb9PRl7j1vtvNKNNzcD8cAbI= =Zmjn -----END PGP SIGNATURE----- Merge branch 'locking/urgent', tag 'v4.2-rc5' into locking/core, to pick up fixes before applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-03 10:52:25 +02:00
Brian Gerst	decd275e62	x86/vm86: Rename vm86->v86flags and v86mask Rename v86flags to veflags, and v86mask to veflags_mask. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438148483-11932-9-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-31 13:31:11 +02:00
Brian Gerst	1342635638	x86/vm86: Rename vm86->vm86_info to user_vm86 Make it clearer that this is the pointer to the userspace vm86 state area. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438148483-11932-8-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-31 13:31:11 +02:00
Brian Gerst	ba3e127ec1	x86/vm86: Clean up vm86.h includes vm86.h was being implicitly included in alot of places via processor.h, which in turn got it from math_emu.h. Break that chain and explicitly include vm86.h in all files that need it. Also remove unused vm86 field from math_emu_info. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438148483-11932-7-git-send-email-brgerst@gmail.com [ Fixed build failure. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-31 13:31:10 +02:00
Ingo Molnar	af3e565a85	x86/vm86: Move the vm86 IRQ definitions to vm86.h Move vm86 specific definitions from irq_vectors.h to vm86.h. Based on patch from Brian Gerst. Originally-from: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438148483-11932-6-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-31 13:31:10 +02:00
Brian Gerst	5ed92a8ab7	x86/vm86: Use the normal pt_regs area for vm86 Change to use the normal pt_regs area to enter and exit vm86 mode. This is done by increasing the padding at the top of the stack to make room for the extra vm86 segment slots in the IRET frame. It then saves the 32-bit regs in the off-stack vm86 data, and copies in the vm86 regs. Exiting back to 32-bit mode does the reverse. This allows removing the hacks to jump directly into the exit asm code due to having to change the stack pointer. Returning normally from the vm86 syscall and the exception handlers allows things like ptrace and auditing to work properly. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438148483-11932-5-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-31 13:31:09 +02:00
Brian Gerst	90c6085a24	x86/vm86: Eliminate 'struct kernel_vm86_struct' Now there is no vm86-specific data left on the kernel stack while in userspace, except for the 32-bit regs. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438148483-11932-4-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-31 13:31:08 +02:00
Brian Gerst	d4ce0f26c7	x86/vm86: Move fields from 'struct kernel_vm86_struct' to 'struct vm86' Move the non-regs fields to the off-stack data. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438148483-11932-3-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-31 13:31:08 +02:00
Brian Gerst	9fda6a0681	x86/vm86: Move vm86 fields out of 'thread_struct' Allocate a separate structure for the vm86 fields. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438148483-11932-2-git-send-email-brgerst@gmail.com [ Build fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-31 13:31:07 +02:00
Andy Lutomirski	a5b9e5a2f1	x86/ldt: Make modify_ldt() optional The modify_ldt syscall exposes a large attack surface and is unnecessary for modern userspace. Make it optional. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: security@kernel.org <security@kernel.org> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/a605166a771c343fd64802dece77a903507333bd.1438291540.git.luto@kernel.org [ Made MATH_EMULATION dependent on MODIFY_LDT_SYSCALL. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-31 13:30:45 +02:00
Mathias Krause	b1c599b8ff	x86/cpufeature: Add feature bit for Intel's Silicon Debug CPUID bit Add a CPUID feature bit for the SDBG (Silicon Debug) CPU feature found on recent Intel systems starting with Haswell. Using the IA32_DEBUG_INTERFACE MSR (index C80H) one can at least detect if SDBG has been enabled by the firmware and if it has been used or not. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dirk Brandewie <dirk.j.brandewie@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1437330403-12102-1-git-send-email-minipli@googlemail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-31 10:34:07 +02:00
Ingo Molnar	5b929bd11d	Merge branch 'x86/urgent' into x86/asm, before applying dependent patches Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-31 10:23:35 +02:00
Andy Lutomirski	37868fe113	x86/ldt: Make modify_ldt synchronous modify_ldt() has questionable locking and does not synchronize threads. Improve it: redesign the locking and synchronize all threads' LDTs using an IPI on all modifications. This will dramatically slow down modify_ldt in multithreaded programs, but there shouldn't be any multithreaded programs that care about modify_ldt's performance in the first place. This fixes some fallout from the CVE-2015-5157 fixes. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: security@kernel.org <security@kernel.org> Cc: <stable@vger.kernel.org> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-31 10:23:23 +02:00
Jiang Liu	991de2e590	PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq() To support IOAPIC hotplug, we need to allocate PCI IRQ resources on demand and free them when not used anymore. Implement pcibios_alloc_irq() and pcibios_free_irq() to dynamically allocate and free PCI IRQs. Remove mp_should_keep_irq(), which is no longer used. [bhelgaas: changelog] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Thomas Gleixner <tglx@linutronix.de>	2015-07-30 14:05:57 -05:00
Paolo Bonzini	d71ba78834	KVM: move code related to KVM_SET_BOOT_CPU_ID to x86 This is another remnant of ia64 support. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-29 14:27:21 +02:00
Peter Zijlstra	de9e432cb5	atomic: Collapse all atomic_{set,clear}_mask definitions Move the now generic definitions of atomic_{set,clear}_mask() into linux/atomic.h to avoid endless and pointless repetition. Also, provide an atomic_andnot() wrapper for those few archs that can implement that. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-07-27 14:06:24 +02:00
Peter Zijlstra	e6942b7de2	atomic: Provide atomic_{or,xor,and} Implement atomic logic ops -- atomic_{or,xor,and}. These will replace the atomic_{set,clear}_mask functions that are available on some archs. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-07-27 14:06:24 +02:00
Peter Zijlstra	7fc1845dd4	x86: Provide atomic_{or,xor,and} Implement atomic logic ops -- atomic_{or,xor,and}. These will replace the atomic_{set,clear}_mask functions that are available on some archs. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-07-27 14:06:23 +02:00
Mihai Donțu	5f3d45e7f2	kvm/x86: add support for MONITOR_TRAP_FLAG Allow a nested hypervisor to single step its guests. Signed-off-by: Mihai Donțu <mihai.dontu@gmail.com> [Fix overlong line. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-23 08:27:07 +02:00
Andrey Smetanin	e7d9513b60	kvm/x86: added hyper-v crash msrs into kvm hyperv context Added kvm Hyper-V context hv crash variables as storage of Hyper-V crash msrs. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Peter Hornyack <peterhornyack@google.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-23 08:27:06 +02:00
Andrey Smetanin	e83d58874b	kvm/x86: move Hyper-V MSR's/hypercall code into hyperv.c file This patch introduce Hyper-V related source code file - hyperv.c and per vm and per vcpu hyperv context structures. All Hyper-V MSR's and hypercall code moved into hyperv.c. All Hyper-V kvm/vcpu fields moved into appropriate hyperv context structures. Copyrights and authors information copied from x86.c to hyperv.c. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Peter Hornyack <peterhornyack@google.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-23 08:27:06 +02:00
Paolo Bonzini	0da029ed7e	KVM: x86: rename quirk constants to KVM_X86_QUIRK_* Make them clearly architecture-dependent; the capability is valid for all architectures, but the argument is not. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-23 08:24:42 +02:00
Paolo Pisati	949163015c	x86/boot: Obsolete the MCA sys_desc_table The kernel does not support the MCA bus anymroe, so mark sys_desc_table as obsolete: remove any reference from the code together with the remaining of MCA logic. bloat-o-meter output: add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-55 (-55) function old new delta i386_start_kernel 128 119 -9 setup_arch 1421 1375 -46 Signed-off-by: Paolo Pisati <p.pisati@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1437409430-8491-1-git-send-email-p.pisati@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-21 10:55:11 +02:00
Luis R. Rodriguez	8c7ea50c01	x86/mm, asm-generic: Add IOMMU ioremap_uc() variant default We currently have no safe way of currently defining architecture agnostic IOMMU ioremap_() variants. The trend is for folks to assume* that ioremap_nocache() should be the default everywhere and then add this mapping on each architectures -- this is not correct today for a variety of reasons. We have two options: 1) Sit and wait for every architecture in Linux to get a an ioremap_() variant defined before including it upstream. 2) Gather consensus on a safe architecture agnostic ioremap_() default. Approach 1) introduces development latencies, and since 2) will take time and work on clarifying semantics the only remaining sensible thing to do to avoid issues is returning NULL on ioremap_() variants. In order for this to work we must have all architectures declare their own ioremap_() variants as defined. This will take some work, do this for ioremp_uc() to set the example as its only currently implemented on x86. Document all this. We only provide implementation support for ioremap_uc() as the other ioremap_*() variants are well defined all over the kernel for other architectures already. Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: benh@kernel.crashing.org Cc: bp@suse.de Cc: dan.j.williams@intel.com Cc: geert@linux-m68k.org Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: linux-mm@kvack.org Cc: luto@amacapital.net Cc: mpe@ellerman.id.au Cc: mst@redhat.com Cc: ralf@linux-mips.org Cc: ross.zwisler@linux.intel.com Cc: stefan.bader@canonical.com Cc: tj@kernel.org Cc: tomi.valkeinen@ti.com Cc: toshi.kani@hp.com Link: http://lkml.kernel.org/r/1436488096-3165-1-git-send-email-mcgrof@do-not-panic.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-21 10:47:03 +02:00
Brian Gerst	ed0b2edb61	x86/entry/vm86: Move userspace accesses to do_sys_vm86() Move the userspace accesses down into the common function in preparation for the next set of patches. Also change to copying the fields explicitly instead of assuming a fixed order in pt_regs and the kernel data structures. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1437354550-25858-4-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-21 09:12:24 +02:00
Brian Gerst	0233606ce5	x86/entry/vm86: Clean up saved_fs/gs There is no need to save FS and non-lazy GS outside the 32-bit regs. Lazy GS still needs to be saved because it wasn't saved on syscall entry. Save it in the gs slot of regs32, which is present but unused. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1437354550-25858-2-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-21 09:12:23 +02:00
Minfei Huang	c93bf928fe	ftrace: Format MCOUNT_ADDR address as type unsigned long Always we use type unsigned long to format the ip address, since the value of ip address is never the negative. This patch uses type unsigned long, instead of long, to format the ip address. The code is more clearly to be viewed by using type unsigned long, although it is correct by using either unsigned long or long. Link: http://lkml.kernel.org/r/1436694744-16747-1-git-send-email-mhuang@redhat.com Cc: Minfei Huang <mnfhuang@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Minfei Huang <mnfhuang@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-07-20 22:30:53 -04:00
Linus Torvalds	0e1dbccd8f	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two families of fixes: - Fix an FPU context related boot crash on newer x86 hardware with larger context sizes than what most people test. To fix this without ugly kludges or extensive reverts we had to touch core task allocator, to allow x86 to determine the task size dynamically, at boot time. I've tested it on a number of x86 platforms, and I cross-built it to a handful of architectures: (warns) (warns) testing x86-64: -git: pass ( 0), -tip: pass ( 0) testing x86-32: -git: pass ( 0), -tip: pass ( 0) testing arm: -git: pass ( 1359), -tip: pass ( 1359) testing cris: -git: pass ( 1031), -tip: pass ( 1031) testing m32r: -git: pass ( 1135), -tip: pass ( 1135) testing m68k: -git: pass ( 1471), -tip: pass ( 1471) testing mips: -git: pass ( 1162), -tip: pass ( 1162) testing mn10300: -git: pass ( 1058), -tip: pass ( 1058) testing parisc: -git: pass ( 1846), -tip: pass ( 1846) testing sparc: -git: pass ( 1185), -tip: pass ( 1185) ... so I hope the cross-arch impact 'none', as intended. (by Dave Hansen) - Fix various NMI handling related bugs unearthed by the big asm code rewrite and generally make the NMI code more robust and more maintainable while at it. These changes are a bit late in the cycle, I hope they are still acceptable. (by Andy Lutomirski)" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86 x86/fpu, sched: Dynamically allocate 'struct fpu' x86/entry/64, x86/nmi/64: Add CONFIG_DEBUG_ENTRY NMI testing code x86/nmi/64: Make the "NMI executing" variable more consistent x86/nmi/64: Minor asm simplification x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection x86/nmi/64: Reorder nested NMI checks x86/nmi/64: Improve nested NMI comments x86/nmi/64: Switch stacks on userspace NMI entry x86/nmi/64: Remove asm code that saves CR2 x86/nmi: Enable nested do_nmi() handling for 64-bit kernels	2015-07-18 10:49:57 -07:00
Linus Torvalds	f79a17bf26	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, plus a static key fix fixing /sys/devices/cpu/rdpmc" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Really allow to specify custom CC, AR or LD perf auxtrace: Fix misplaced check for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT perf hists browser: Take the --comm, --dsos, etc filters into account perf symbols: Store if there is a filter in place x86, perf: Fix static_key bug in load_mm_cr4() tools: Copy lib/hweight.c from the kernel sources perf tools: Fix the detached tarball wrt rbtree copy perf thread_map: Fix the sizeof() calculation for map entries tools lib: Improve clean target perf stat: Fix shadow declaration of close perf tools: Fix lockup using 32-bit compat vdso	2015-07-18 10:44:21 -07:00
Dave Hansen	0c8c0f03e3	x86/fpu, sched: Dynamically allocate 'struct fpu' The FPU rewrite removed the dynamic allocations of 'struct fpu'. But, this potentially wastes massive amounts of memory (2k per task on systems that do not have AVX-512 for instance). Instead of having a separate slab, this patch just appends the space that we need to the 'task_struct' which we dynamically allocate already. This saves from doing an extra slab allocation at fork(). The only real downside here is that we have to stick everything and the end of the task_struct. But, I think the BUILD_BUG_ON()s I stuck in there should keep that from being too fragile. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1437128892-9831-2-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-18 03:42:35 +02:00
Laurent Dufour	f2abeef9fd	mm: clean up per architecture MM hook header files Commit `2ae416b142` ("mm: new mm hook framework") introduced an empty header file (mm-arch-hooks.h) for every architecture, even those which doesn't need to define mm hooks. As suggested by Geert Uytterhoeven, this could be cleaned through the use of a generic header file included via each per architecture asm/include/Kbuild file. The PowerPC architecture is not impacted here since this architecture has to defined the arch_remap MM hook. Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Vineet Gupta <vgupta@synopsys.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-07-17 16:39:53 -07:00
Linus Torvalds	3e87ee06d0	platform-drivers-x86 for 4.2-3 Fix SMBIOS call handling and hwswitch state coherency in the dell-laptop driver. Cleanups for intel__ipc drivers. dell-laptop: - Do not cache hwswitch state - Check return value of each SMBIOS call - Clear buffer before each SMBIOS call intel_scu_ipc: - Move local memory initialization out of a mutex intel_pmc_ipc: - Update kerneldoc formatting - Fix compiler casting warnings -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVqCrgAAoJEKbMaAwKp364aBcIALo/ZB6JFFd3oFDBbZR9bzvp senrgC2QSWboFlyJ2aHB09n98m6tR5x8HTE6BijT64bUyPSLTPgDZoeC9ezIu1H0 rXKJZM7GduxYVOvVgOPVKqt/yUopI55jDhpgvFmxpXgp9zaz4our2y+93VCCBkIm 9nJMHXIvK+Rg4Rg0MuEkaghLRFivJAYFuyFu6vgWQOGap1QXruPIylK6agZs2E9x KhJAlLNjoAAfqFFkWdk7PxMO8QIgV9pLU8RlOQmUdRSe8F+CI3AAJjdn+FdPoXFN EBirxMm8NAd9+/JlfU95fUBwPnofY+D3Q8jUyKBBxnZbDQMIA6gWtzGaA/BY/zI= =hpkC -----END PGP SIGNATURE----- Merge tag 'platform-drivers-x86-v4.2-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86 Pull x86 platform driver fixes from Darren Hart: "Fix SMBIOS call handling and hwswitch state coherency in the dell-laptop driver. Cleanups for intel__ipc drivers. Details: dell-laptop: - Do not cache hwswitch state - Check return value of each SMBIOS call - Clear buffer before each SMBIOS call intel_scu_ipc: - Move local memory initialization out of a mutex intel_pmc_ipc: - Update kerneldoc formatting - Fix compiler casting warnings" * tag 'platform-drivers-x86-v4.2-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: intel_scu_ipc: move local memory initialization out of a mutex intel_pmc_ipc: Update kerneldoc formatting dell-laptop: Do not cache hwswitch state dell-laptop: Check return value of each SMBIOS call dell-laptop: Clear buffer before each SMBIOS call intel_pmc_ipc: Fix compiler casting warnings	2015-07-16 20:57:25 -07:00
Andy Shevchenko	7e1ff15b69	x86/platform/iosf_mbi: Source cleanup - Move the static variables to one place - Fix indentations in the header - Correct comments No functional change. [ tglx: Massaged changelog ] Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: David E . Box <david.e.box@linux.intel.com> Link: http://lkml.kernel.org/r/1436366709-17683-5-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-07-16 17:48:48 +02:00
Linus Torvalds	df14a68d63	* Fix FPU refactoring ("kvm: x86: fix load xsave feature warning") * Fix eager FPU mode (Cc stable). * AMD bits of MTRR virtualization. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJVpOA0AAoJEL/70l94x66D59QH/R68oIbcC9eOrs3LgxFwFS/g uPY2owxr1MFAwI39S3zpSXXuLdB63E9G6EzP9lQO6UnSXgcNitnpEINEXpCpH7hS 5DSw/0gPPjXlwyio7wM0EiXC+YO4kgzhLaBmeK5qPQmfvCz5bvLgelrY39T4TH/u B7gZN/uuaMfU8xNchCiU9Bx+KzaQhgpUTSOH/j8n2Obe9J/AyUqsSFeXzsN9hlVp 5YcBOyzthDjphbeyfxas1nTNinO+tfvO5fDNKbqvRgOnm+/wubhXAikDs4u60eXp j6TUVKYJRJe9yh5jFB/HOA/0h6DDyoK+t7LSYRPRjONVFoxrnogvA+OQCIX/57Y= =vDV4 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: - Fix FPU refactoring ("kvm: x86: fix load xsave feature warning") - Fix eager FPU mode (Cc stable) - AMD bits of MTRR virtualization * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: x86: fix load xsave feature warning KVM: x86: apply guest MTRR virtualization on host reserved pages KVM: SVM: Sync g_pat with guest-written PAT value KVM: SVM: use NPT page attributes KVM: count number of assigned devices KVM: VMX: fix vmwrite to invalid VMCS KVM: x86: reintroduce kvm_is_mmio_pfn x86: hyperv: add CPUID bit for crash handlers	2015-07-15 13:30:12 -07:00
Paolo Bonzini	5544eb9b81	KVM: count number of assigned devices If there are no assigned devices, the guest PAT are not providing any useful information and can be overridden to writeback; VMX always does this because it has the "IPAT" bit in its extended page table entries, but SVM does not have anything similar. Hook into VFIO and legacy device assignment so that they provide this information to KVM. Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-10 13:25:26 +02:00
Paolo Bonzini	5d75a74759	x86: hyperv: add CPUID bit for crash handlers Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-10 13:25:19 +02:00
Peter Zijlstra	a833581e37	x86, perf: Fix static_key bug in load_mm_cr4() Mikulas reported his K6-3 not booting. This is because the static_key API confusion struck and bit Andy, this wants to be static_key_false(). Reported-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Vince Weaver <vince@deater.net> Cc: hillf.zj <hillf.zj@alibaba-inc.com> Fixes: `a66734297f` ("perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks") Link: http://lkml.kernel.org/r/20150709172338.GC19282@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-10 10:24:38 +02:00
qipeng.zha	02941007f5	intel_pmc_ipc: Update kerneldoc formatting Update kerneldoc formatting per Documentation/kernel-dec-nano-HOWTO.txt. Signed-off-by: qipeng.zha <qipeng.zha@intel.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>	2015-07-09 11:23:15 -07:00
Andy Lutomirski	06a7b36c7b	x86/entry: Remove SCHEDULE_USER and asm/context-tracking.h SCHEDULE_USER is no longer used, and asm/context-tracking.h contained nothing else. Remove the header entirely. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/854e9b45f69af20e26c47099eb236321563ebcee.1435952415.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-07 10:59:09 +02:00
Andy Lutomirski	8c84014f3b	x86/entry: Remove exception_enter() from most trap handlers On 64-bit kernels, we don't need it any more: we handle context tracking directly on entry from user mode and exit to user mode. On 32-bit kernels, we don't support context tracking at all, so these callbacks had no effect. Note: this doesn't change do_page_fault(). Before we do that, we need to make sure that there is no code that can page fault from kernel mode with CONTEXT_USER. The 32-bit fast system call stack argument code is the only offender I'm aware of right now. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/ae22f4dfebd799c916574089964592be218151f9.1435952415.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-07 10:59:09 +02:00
Andy Lutomirski	1f484aa690	x86/entry: Move C entry and exit code to arch/x86/entry/common.c The entry and exit C helpers were confusingly scattered between ptrace.c and signal.c, even though they aren't specific to ptrace or signal handling. Move them together in a new file. This change just moves code around. It doesn't change anything. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/324d686821266544d8572423cc281f961da445f4.1435952415.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-07 10:59:05 +02:00
Andy Shevchenko	2b8f8eddaf	x86/platform/intel/pmc_atom: Add Cherrytrail PMC interface The patch adds CHT PMC interface. This exposes all the South IP device power states and S0ix states for CHT. The bit map of FUNC_DIS and D3_STS_0 registers for SoCs are consistent. The D3_STS_1 and FUNC_DIS_2 registers, however, are not aligned. This is fixed by splitting a common mapping on per register basis. (Originally based on code from Kumar P Mahesh.) Originally-from: Kumar P Mahesh <mahesh.kumar.p@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Aubrey Li <aubrey.li@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1436192944-56496-5-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 18:39:38 +02:00
Andy Shevchenko	68872eb9b1	x86/platform/intel/pmc_atom: Export accessors to PMC registers Export the pmc_atom_read() and pmc_atom_write() accessors to the PMC registers. On early initcall stages the functions will return -ENODEV, and caller has to wait when it will be available. Additionally make absence of debugfs a non-fatal error. The patch will be useful for the upcoming fixes regarding to the LPSS block found on Intel BayTrail-T and Braswell. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Aubrey Li <aubrey.li@linux.intel.com> Cc: Kumar P Mahesh <mahesh.kumar.p@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1436192944-56496-2-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 17:42:45 +02:00
Brian Gerst	ab8b82ee6d	x86/compat: Don't build the 32-bit VDSO if not needed Build the 32-bit vdso only for native 32-bit or 32-bit compat is enabled. x32 should not force it to build. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1434974121-32575-7-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:28:56 +02:00
Brian Gerst	7da770785f	x86/compat: Rename 'start_thread_ia32' to 'compat_start_thread' This function is shared between the 32-bit compat and x32 ABIs. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1434974121-32575-5-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:28:56 +02:00
Brian Gerst	b829d1be20	x86/compat: Move ucontext_x32 to sigframe.h ia32.h should only contain the code for 32-bit compatability. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1434974121-32575-4-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:28:55 +02:00
Brian Gerst	b2e02b820d	x86/compat: Make mmap_is_ia32() common compat TIF_ADDR32 is set for both ia32 and x32 tasks, so change from CONFIG_IA32_EMULATION to CONFIG_COMPAT. Use config_enabled() to make the function more readable. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1434974121-32575-3-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:28:55 +02:00
George Spelvin	5a33fcb8d9	x86/asm/tsc: Save an instruction in DECLARE_ARGS users Before, the code to do RDTSC looked like: rdtsc shl $0x20, %rdx mov %eax, %eax or %rdx, %rax The "mov %eax, %eax" is required to clear the high 32 bits of RAX. By declaring low and high as 64-bit variables, the code is simplified to: rdtsc shl $0x20,%rdx or %rdx,%rax Yes, it's a 2-byte instruction that's not on a critical path, but there are principles to be upheld. Every user of EAX_EDX_RET has been checked. I tried to check users of EAX_EDX_ARGS, but there weren't any, so I deleted it to be safe. ( There's no benefit to making "high" 64 bits, but it was the simplest way to proceed. ) Signed-off-by: George Spelvin <linux@horizon.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jacob.jun.pan@linux.intel.com Link: http://lkml.kernel.org/r/20150618075906.4615.qmail@ns.horizon.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:23:30 +02:00
Andy Lutomirski	bb8dd96032	x86/asm/tsc: Remove rdtsc_barrier() All callers have been converted to rdtsc_ordered(). Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/9baa4ae9a1e7c7c282f9cb2f15bb6bf5c2004032.1434501121.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:23:30 +02:00
Andy Lutomirski	502dfeff23	x86/asm/tsc, x86/kvm: Drop open-coded barrier and use rdtsc_ordered() in kvmclock __pvclock_read_cycles() used to have two barriers, one of which was unnecessary, which got removed after an initial version of this patch was sent. But the barrier is still open-coded unnecessarily - get rid of that barrier and clean up the code by just using rdtsc_ordered(). Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/678981cc4761fb38a793c217c9cac42503cf3719.1434501121.git.luto@kernel.org [ Ported it to v4.2-rc1. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:23:30 +02:00
Andy Lutomirski	03b9730b76	x86/asm/tsc: Add rdtsc_ordered() and use it in trivial call sites rdtsc_barrier(); rdtsc() is an unnecessary mouthful and requires more thought than should be necessary. Add an rdtsc_ordered() helper and replace the trivial call sites with it. This should not change generated code. The duplication of the fence asm is temporary. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/dddbf98a2af53312e9aa73a5a2b1622fe5d6f52b.1434501121.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:23:29 +02:00
Andy Lutomirski	4ea1636b04	x86/asm/tsc: Rename native_read_tsc() to rdtsc() Now that there is no paravirt TSC, the "native" is inappropriate. The function does RDTSC, so give it the obvious name: rdtsc(). Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/fd43e16281991f096c1e4d21574d9e1402c62d39.1434501121.git.luto@kernel.org [ Ported it to v4.2-rc1. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:23:28 +02:00
Andy Lutomirski	fe47ae6e1a	x86/asm/tsc: Remove rdtscl() It has no more callers, and it was never a very sensible interface to begin with. Users of the TSC should either read all 64 bits or explicitly throw out the high bits. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/250105f7cee519be9d7fc4464b5784caafc8f4fe.1434501121.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:23:28 +02:00
Andy Lutomirski	ec69de52c6	x86/asm/tsc: Remove the rdtscp() and rdtscpll() macros They have no users. Leave native_read_tscp() which seems potentially useful despite also having no callers. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/6abfa3ef80534b5d73898a48c4d25e069303cbe5.1434501121.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:23:26 +02:00
Andy Lutomirski	87be28aaf1	x86/asm/tsc: Replace rdtscll() with native_read_tsc() Now that the ->read_tsc() paravirt hook is gone, rdtscll() is just a wrapper around native_read_tsc(). Unwrap it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/d2449ae62c1b1fb90195bcfb19ef4a35883a04dc.1434501121.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:23:26 +02:00
Andy Lutomirski	9261e050b6	x86/asm/tsc, x86/paravirt: Remove read_tsc() and read_tscp() paravirt hooks We've had ->read_tsc() and ->read_tscp() paravirt hooks since the very beginning of paravirt, i.e., `d3561b7fa0` ("[PATCH] paravirt: header and stubs for paravirtualisation"). AFAICT, the only paravirt guest implementation that ever replaced these calls was vmware, and it's gone. Arguably even vmware shouldn't have hooked RDTSC -- we fully support systems that don't have a TSC at all, so there's no point for a paravirt implementation to pretend that we have a TSC but to replace it. I also doubt that these hooks actually worked. Calls to rdtscl() and rdtscll(), which respected the hooks, were used seemingly interchangeably with native_read_tsc(), which did not. Just remove them. If anyone ever needs them again, they can try to make a case for why they need them. Before, on a paravirt config: text data bss dec hex filename 12618257 1816384 `1093632` 15528273 ecf151 vmlinux After: text data bss dec hex filename 12617207 1816384 `1093632` 15527223 eced37 vmlinux Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/d08a2600fb298af163681e5efd8e599d889a5b97.1434501121.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:23:26 +02:00
Andy Lutomirski	881d7bf843	x86/asm/tsc, kvm: Remove vget_cycles() The only caller was KVM's read_tsc(). The only difference between vget_cycles() and native_read_tsc() was that vget_cycles() returned zero instead of crashing on TSC-less systems. KVM already checks vclock_mode() before calling that function, so the extra check is unnecessary. Also, KVM (host-side) requires the TSC to exist. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/20615df14ae2eb713ea7a5f5123c1dc4c7ca993d.1434501121.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:23:25 +02:00
Andy Lutomirski	c6e5ca35c4	x86/asm/tsc: Inline native_read_tsc() and remove __native_read_tsc() In the following commit: `cdc7957d19` ("x86: move native_read_tsc() offline") ... native_read_tsc() was moved out of line, presumably for some now-obsolete vDSO-related reason. Undo it. The entire rdtsc, shl, or sequence is only 11 bytes, and calls via rdtscl() and similar helpers were already inlined. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/d05ffe2aaf8468ca475ebc00efad7b2fa174af19.1434501121.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:23:25 +02:00
Zhu Guihua	1db875631f	x86/espfix: Add 'cpu' parameter to init_espfix_ap() Add a CPU index parameter to init_espfix_ap(), so that the parameter could be propagated to the function for espfix page allocation. Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com> Cc: <bp@alien8.de> Cc: <luto@amacapital.net> Cc: <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/cde3fcf1b3211f3f03feb1a995bce3fee850f0fc.1435824469.git.zhugh.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:00:33 +02:00
Alexander Popov	5d5aa3cfca	x86/kasan: Fix KASAN shadow region page tables Currently KASAN shadow region page tables created without respect of physical offset (phys_base). This causes kernel halt when phys_base is not zero. So let's initialize KASAN shadow region page tables in kasan_early_init() using __pa_nodebug() which considers phys_base. This patch also separates x86_64_start_kernel() from KASAN low level details by moving kasan_map_early_shadow(init_level4_pgt) into kasan_early_init(). Remove the comment before clear_bss() which stopped bringing much profit to the code readability. Otherwise describing all the new order dependencies would be too verbose. Signed-off-by: Alexander Popov <alpopov@ptsecurity.com> Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: <stable@vger.kernel.org> # 4.0+ Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435828178-10975-3-git-send-email-a.ryabinin@samsung.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 14:53:13 +02:00
Waiman Long	f7d71f2052	locking/qrwlock: Rename functions to queued_*() To sync up with the naming convention used in qspinlock, all the qrwlock functions were renamed to started with "queued" instead of "queue". Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1434729002-57724-2-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 14:11:27 +02:00
Linus Torvalds	a585d2b738	platform-drivers-x86 for 4.2-2 A new intel_pmc_ipc driver, a symmetrical allocation and free fix in dell-laptop, a couple minor fixes, and some updated documentation in the dell-laptop comments. intel_pmc_ipc: - Add Intel Apollo Lake PMC IPC driver tc1100-wmi: - Delete an unnecessary check before the function call "kfree" dell-laptop: - Fix allocating & freeing SMI buffer page - Show info about WiGig and UWB in debugfs - Update information about wireless control -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVmM8aAAoJEKbMaAwKp364iUkH/jihOduWkDTzzzxRP2Dv2nEh qyvE94Nc9A9dl87C2+II/Pi1s8h4CJOQpl70syYYPc4FdF70hpvP8TbHkgCWrY/d F8CoS9L9keviMtGOWlbEL9hBjfSDNwTMESTrDxrwhA04TSAwjDmXhhiUOF5FjFJm CX5+ZQ3iXEH6KsENR+Er54J9+6WKE6IuRcnnKCapnPQ8cEYeVn+WEPyzHCOy8Pg3 xzzUar3/knS2VMIb5eIVpaKFvD9P9qBsC/gQ0pk1Y+686gwQZMVURDv8lw8hfXpx TJDOXk21P8WbSH1r+jwax5wLjLge7vJtYG2Deye6MUgvSgg+O2tSVCv9SMQR088= =WUgr -----END PGP SIGNATURE----- Merge tag 'platform-drivers-x86-v4.2-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86 Pull late x86 platform driver updates from Darren Hart: "The following came in a bit later and I wanted them to bake in next a few more days before submitting, thus the second pull. A new intel_pmc_ipc driver, a symmetrical allocation and free fix in dell-laptop, a couple minor fixes, and some updated documentation in the dell-laptop comments. intel_pmc_ipc: - Add Intel Apollo Lake PMC IPC driver tc1100-wmi: - Delete an unnecessary check before the function call "kfree" dell-laptop: - Fix allocating & freeing SMI buffer page - Show info about WiGig and UWB in debugfs - Update information about wireless control" * tag 'platform-drivers-x86-v4.2-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: intel_pmc_ipc: Add Intel Apollo Lake PMC IPC driver tc1100-wmi: Delete an unnecessary check before the function call "kfree" dell-laptop: Fix allocating & freeing SMI buffer page dell-laptop: Show info about WiGig and UWB in debugfs dell-laptop: Update information about wireless control	2015-07-05 10:54:09 -07:00
Andrey Smetanin	a88464a8b0	kvm: add hyper-v crash msrs values Added Hyper-V crash msrs values - HV_X64_MSR_CRASH*. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Peter Hornyack <peterhornyack@google.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-03 18:55:20 +02:00
Radim Krčmář	42720138b0	KVM: x86: make vapics_in_nmi_mode atomic Writes were a bit racy, but hard to turn into a bug at the same time. (Particularly because modern Linux doesn't use this feature anymore.) Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> [Actually the next patch makes it much, much easier to trigger the race so I'm including this one for stable@ as well. - Paolo] Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-03 18:55:17 +02:00
qipeng.zha	0a8b83530b	intel_pmc_ipc: Add Intel Apollo Lake PMC IPC driver This driver provides support for PMC control on Apollo Lake platforms. The PMC is an ARC processor which defines some IPC commands for communication with other entities in the CPU. Signed-off-by: qipeng.zha <qipeng.zha@intel.com> [fengguang.wu@intel.com: Fix Sparse and Cocinelle warnings] Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>	2015-06-29 15:28:14 -07:00
Linus Torvalds	88793e5c77	The libnvdimm sub-system introduces, in addition to the libnvdimm-core, 4 drivers / enabling modules: NFIT: Instantiates an "nvdimm bus" with the core and registers memory devices (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware Interface table). After registering NVDIMMs the NFIT driver then registers "region" devices. A libnvdimm-region defines an access mode and the boundaries of persistent memory media. A region may span multiple NVDIMMs that are interleaved by the hardware memory controller. In turn, a libnvdimm-region can be carved into a "namespace" device and bound to the PMEM or BLK driver which will attach a Linux block device (disk) interface to the memory. PMEM: Initially merged in v4.1 this driver for contiguous spans of persistent memory address ranges is re-worked to drive PMEM-namespaces emitted by the libnvdimm-core. In this update the PMEM driver, on x86, gains the ability to assert that writes to persistent memory have been flushed all the way through the caches and buffers in the platform to persistent media. See memcpy_to_pmem() and wmb_pmem(). BLK: This new driver enables access to persistent memory media through "Block Data Windows" as defined by the NFIT. The primary difference of this driver to PMEM is that only a small window of persistent memory is mapped into system address space at any given point in time. Per-NVDIMM windows are reprogrammed at run time, per-I/O, to access different portions of the media. BLK-mode, by definition, does not support DAX. BTT: This is a library, optionally consumed by either PMEM or BLK, that converts a byte-accessible namespace into a disk with atomic sector update semantics (prevents sector tearing on crash or power loss). The sinister aspect of sector tearing is that most applications do not know they have a atomic sector dependency. At least today's disk's rarely ever tear sectors and if they do one almost certainly gets a CRC error on access. NVDIMMs will always tear and always silently. Until an application is audited to be robust in the presence of sector-tearing the usage of BTT is recommended. Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig, Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox, Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael Wysocki, and Bob Moore. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVjZGBAAoJEB7SkWpmfYgC4fkP/j+k6HmSRNU/yRYPyo7CAWvj 3P5P1i6R6nMZZbjQrQArAXaIyLlFk4sEQDYsciR6dmslhhFZAkR2eFwVO5rBOyx3 QN0yxEpyjJbroRFUrV/BLaFK4cq2oyJAFFHs0u7/pLHBJ4MDMqfRKAMtlnBxEkTE LFcqXapSlvWitSbjMdIBWKFEvncaiJ2mdsFqT4aZqclBBTj00eWQvEG9WxleJLdv +tj7qR/vGcwOb12X5UrbQXgwtMYos7A6IzhHbqwQL8IrOcJ6YB8NopJUpLDd7ZVq KAzX6ZYMzNueN4uvv6aDfqDRLyVL7qoxM9XIjGF5R8SV9sF2LMspm1FBpfowo1GT h2QMr0ky1nHVT32yspBCpE9zW/mubRIDtXxEmZZ53DIc4N6Dy9jFaNVmhoWtTAqG b9pndFnjUzzieCjX5pCvo2M5U6N0AQwsnq76/CasiWyhSa9DNKOg8MVDRg0rbxb0 UvK0v8JwOCIRcfO3qiKcx+02nKPtjCtHSPqGkFKPySRvAdb+3g6YR26CxTb3VmnF etowLiKU7HHalLvqGFOlDoQG6viWes9Zl+ZeANBOCVa6rL2O7ZnXJtYgXf1wDQee fzgKB78BcDjXH4jHobbp/WBANQGN/GF34lse8yHa7Ym+28uEihDvSD1wyNLnefmo 7PJBbN5M5qP5tD0aO7SZ =VtWG -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm Pull libnvdimm subsystem from Dan Williams: "The libnvdimm sub-system introduces, in addition to the libnvdimm-core, 4 drivers / enabling modules: NFIT: Instantiates an "nvdimm bus" with the core and registers memory devices (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware Interface table). After registering NVDIMMs the NFIT driver then registers "region" devices. A libnvdimm-region defines an access mode and the boundaries of persistent memory media. A region may span multiple NVDIMMs that are interleaved by the hardware memory controller. In turn, a libnvdimm-region can be carved into a "namespace" device and bound to the PMEM or BLK driver which will attach a Linux block device (disk) interface to the memory. PMEM: Initially merged in v4.1 this driver for contiguous spans of persistent memory address ranges is re-worked to drive PMEM-namespaces emitted by the libnvdimm-core. In this update the PMEM driver, on x86, gains the ability to assert that writes to persistent memory have been flushed all the way through the caches and buffers in the platform to persistent media. See memcpy_to_pmem() and wmb_pmem(). BLK: This new driver enables access to persistent memory media through "Block Data Windows" as defined by the NFIT. The primary difference of this driver to PMEM is that only a small window of persistent memory is mapped into system address space at any given point in time. Per-NVDIMM windows are reprogrammed at run time, per-I/O, to access different portions of the media. BLK-mode, by definition, does not support DAX. BTT: This is a library, optionally consumed by either PMEM or BLK, that converts a byte-accessible namespace into a disk with atomic sector update semantics (prevents sector tearing on crash or power loss). The sinister aspect of sector tearing is that most applications do not know they have a atomic sector dependency. At least today's disk's rarely ever tear sectors and if they do one almost certainly gets a CRC error on access. NVDIMMs will always tear and always silently. Until an application is audited to be robust in the presence of sector-tearing the usage of BTT is recommended. Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig, Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox, Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael Wysocki, and Bob Moore" * tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm: (33 commits) arch, x86: pmem api for ensuring durability of persistent memory updates libnvdimm: Add sysfs numa_node to NVDIMM devices libnvdimm: Set numa_node to NVDIMM devices acpi: Add acpi_map_pxm_to_online_node() libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only pmem: flag pmem block devices as non-rotational libnvdimm: enable iostat pmem: make_request cleanups libnvdimm, pmem: fix up max_hw_sectors libnvdimm, blk: add support for blk integrity libnvdimm, btt: add support for blk integrity fs/block_dev.c: skip rw_page if bdev has integrity libnvdimm: Non-Volatile Devices tools/testing/nvdimm: libnvdimm unit test infrastructure libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory nd_btt: atomic sector updates libnvdimm: infrastructure for btt devices libnvdimm: write blk label set libnvdimm: write pmem label set libnvdimm: blk labels and namespace instantiation ...	2015-06-29 10:34:42 -07:00
Linus Torvalds	8c7febe839	TTY/Serial driver patches for 4.2-rc1 Here's the tty and serial driver patches for 4.2-rc1. A number of individual driver updates, some code cleanups, and other minor things, full details in the shortlog. All have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlWNoSAACgkQMUfUDdst+ymxNQCguSEmkAYNDdLyYhdcOqSxJt9u U1gAoMThUDoomkx6CTDMU1wn53hxgMk9 =eCUS -----END PGP SIGNATURE----- Merge tag 'tty-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver updates from Greg KH: "Here's the tty and serial driver patches for 4.2-rc1. A number of individual driver updates, some code cleanups, and other minor things, full details in the shortlog. All have been in linux-next for a while with no reported issues" * tag 'tty-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (152 commits) Doc: serial-rs485.txt: update RS485 driver interface Doc: tty.txt: remove mention of the BKL MAINTAINERS: tty: add serial docs directory serial: sprd: check for NULL after calling devm_clk_get serial: 8250_pci: Correct uartclk for xr17v35x expansion chips serial: 8250_pci: Add support for 12 port Exar boards serial: 8250_uniphier: add bindings document for UniPhier UART serial: core: cleanup in uart_get_baud_rate() serial: stm32-usart: Add STM32 USART Driver tty/serial: kill off set_irq_flags usage tty: move linux/gsmmux.h to uapi doc: dt: add documentation for nxp,lpc1850-uart serial: 8250: add LPC18xx/43xx UART driver serial: 8250_uniphier: add UniPhier serial driver serial: 8250_dw: support ACPI platforms with integrated DMA engine serial: of_serial: check the return value of clk_prepare_enable() serial: of_serial: use devm_clk_get() instead of clk_get() serial: earlycon: Add support for big-endian MMIO accesses serial: sirf: use hrtimer for data rx serial: sirf: correct the fifo empty_bit ...	2015-06-26 15:53:22 -07:00
Linus Torvalds	47a469421d	Merge branch 'akpm' (patches from Andrew) Merge second patchbomb from Andrew Morton: - most of the rest of MM - lots of misc things - procfs updates - printk feature work - updates to get_maintainer, MAINTAINERS, checkpatch - lib/ updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (96 commits) exit,stats: /* obey this comment / coredump: add __printf attribute to cn_printf functions coredump: use from_kuid/kgid when formatting corename fs/reiserfs: remove unneeded cast NILFS2: support NFSv2 export fs/befs/btree.c: remove unneeded initializations fs/minix: remove unneeded cast init/do_mounts.c: add create_dev() failure log kasan: remove duplicate definition of the macro KASAN_FREE_PAGE fs/efs: femove unneeded cast checkpatch: emit "NOTE: <types>" message only once after multiple files checkpatch: emit an error when there's a diff in a changelog checkpatch: validate MODULE_LICENSE content checkpatch: add multi-line handling for PREFER_ETHER_ADDR_COPY checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr() checkpatch: fix processing of MEMSET issues checkpatch: suggest using ether_addr_equal*() checkpatch: avoid NOT_UNIFIED_DIFF errors on cover-letter.patch files checkpatch: remove local from codespell path checkpatch: add --showfile to allow input via pipe to show filenames ...	2015-06-26 09:52:05 -07:00
Ross Zwisler	61031952f4	arch, x86: pmem api for ensuring durability of persistent memory updates Based on an original patch by Ross Zwisler [1]. Writes to persistent memory have the potential to be posted to cpu cache, cpu write buffers, and platform write buffers (memory controller) before being committed to persistent media. Provide apis, memcpy_to_pmem(), wmb_pmem(), and memremap_pmem(), to write data to pmem and assert that it is durable in PMEM (a persistent linear address range). A '__pmem' attribute is added so sparse can track proper usage of pointers to pmem. This continues the status quo of pmem being x86 only for 4.2, but reworks to ioremap, and wider implementation of memremap() will enable other archs in 4.3. [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-May/000932.html Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> [djbw: various reworks] Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-06-26 11:23:38 -04:00
Dominik Dingel	08bd4fc156	mm/hugetlb: remove arch_prepare/release_hugepage from arch headers Nobody used these hooks so they were removed from common code, and can now be removed from the architectures. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-06-25 17:00:35 -07:00
Linus Torvalds	ad90fb9751	Merge branch 'for-4.2/sg' of git://git.kernel.dk/linux-block Pull asm/scatterlist.h removal from Jens Axboe: "We don't have any specific arch scatterlist anymore, since parisc finally switched over. Kill the include" * 'for-4.2/sg' of git://git.kernel.dk/linux-block: remove scatterlist.h generation from arch Kbuild files remove <asm/scatterlist.h>	2015-06-25 15:22:36 -07:00
Linus Torvalds	aefbef10e3	Merge branch 'akpm' (patches from Andrew) Merge first patchbomb from Andrew Morton: - a few misc things - ocfs2 udpates - kernel/watchdog.c feature work (took ages to get right) - most of MM. A few tricky bits are held up and probably won't make 4.2. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (91 commits) mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc() mm, thp: respect MPOL_PREFERRED policy with non-local node tmpfs: truncate prealloc blocks past i_size mm/memory hotplug: print the last vmemmap region at the end of hot add memory mm/mmap.c: optimization of do_mmap_pgoff function mm: kmemleak: optimise kmemleak_lock acquiring during kmemleak_scan mm: kmemleak: avoid deadlock on the kmemleak object insertion error path mm: kmemleak: do not acquire scan_mutex in kmemleak_do_cleanup() mm: kmemleak: fix delete_object_*() race when called on the same memory block mm: kmemleak: allow safe memory scanning during kmemleak disabling memcg: convert mem_cgroup->under_oom from atomic_t to int memcg: remove unused mem_cgroup->oom_wakeups frontswap: allow multiple backends x86, mirror: x86 enabling - find mirrored memory ranges mm/memblock: allocate boot time data structures from mirrored memory mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute mm: do not ignore mapping_gfp_mask in page cache allocation paths mm/cma.c: fix typos in comments mm/oom_kill.c: print points as unsigned int mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages ...	2015-06-24 20:47:21 -07:00
Linus Torvalds	45471cd98d	EDAC changes, v2: * New APM X-Gene SoC EDAC driver (Loc Ho) * AMD error injection module improvements (Aravind Gopalakrishnan) * Altera Arria 10 support (Thor Thayer) * misc fixes and cleanups all over the place -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJViuInAAoJEBLB8Bhh3lVKHT8QAKkHIMreO8obo09haxNJlfdF BaG7SNEDhvcgQ1B76RsjnjkUpsivvUt+mCYMP+BxcAqFrTA33UZCCOK5tEhGb1wr matRdR6+aezqAl2e/0/Ti25bWOkDxcOeazh2TyezuyIXtaJjOq1oZC7OaYGmxPun NlZY+/uY1eiHlewKsK04y8G8J5i4wGoKnuxBvOyELT90+a+fLfAOshAp0D4r0piB Znv0ydsHlu+Wx57slg1DktlsyswmcGS9WfWwwTlELOLulKgN8wEAVYzUB5pJzNbz ehq0J4wYz95juXADC4M4tEjErHVJNl6PbyMqwt0+XUUJ1NSgOj7Q6iqwxDoZX8km oxiLVydQBtoIzF1LojFKAVZDFnrMKHKwK3RaDaUJjTI90+tVzEU8xsBlUf6+EgD2 Ss2RH8Gfuf52RdtwHh9++T1ur5rM9YNCAm31msq06mcOf0bEtmDbhZ+fVC5mjhqB fIb3hxnk0r2BVg+ZCN/boxGS6RzUtYVcCXaBPDMeHcg9BEEds70KCFEcsX7TvJIg 5/SHI+033MylqkX2zrgDQLj7CQk3R0jaotHVbdhLupyOldcM7r5uF+VO84drNWGN GfM2lpyE/swZWnzKuotgYIGR1XvFjtJAVAyNGIvwP+ajjTsqXzEnLSLClY5LWfYd nSSSMpCCqsEmhoWftOix =Id4f -----END PGP SIGNATURE----- Merge tag 'edac_for_4.2_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC updates from Borislav Petkov: - New APM X-Gene SoC EDAC driver (Loc Ho) - AMD error injection module improvements (Aravind Gopalakrishnan) - Altera Arria 10 support (Thor Thayer) - misc fixes and cleanups all over the place * tag 'edac_for_4.2_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (28 commits) EDAC: Update Documentation/edac.txt EDAC: Fix typos in Documentation/edac.txt EDAC, mce_amd_inj: Set MISCV on injection EDAC, mce_amd_inj: Move bit preparations before the injection EDAC, mce_amd_inj: Cleanup and simplify README EDAC, altera: Do not allow suspend when EDAC is enabled EDAC, mce_amd_inj: Make inj_type static arm: socfpga: dts: Add Arria10 SDRAM EDAC DTS support EDAC, altera: Add Arria10 EDAC support EDAC, altera: Refactor for Altera CycloneV SoC EDAC, altera: Generalize driver to use DT Memory size EDAC, mce_amd_inj: Add README file EDAC, mce_amd_inj: Add individual permissions field to dfs_node EDAC, mce_amd_inj: Modify flags attribute to use string arguments EDAC, mce_amd_inj: Read out number of MCE banks from the hardware EDAC, mce_amd_inj: Use MCE_INJECT_GET macro for bank node too EDAC, xgene: Fix cpuid abuse EDAC, mpc85xx: Extend error address to 64 bit EDAC, mpc8xxx: Adapt for FSL SoC EDAC, edac_stub: Drop arch-specific include ...	2015-06-24 19:52:06 -07:00
Aneesh Kumar K.V	8809aa2d28	mm: clarify that the function operates on hugepage pte We have confusing functions to clear pmd, pmd_clear_* and pmd_clear. Add _huge_ to pmdp_clear functions so that we are clear that they operate on hugepage pte. We don't bother about other functions like pmdp_set_wrprotect, pmdp_clear_flush_young, because they operate on PTE bits and hence indicate they are operating on hugepage ptes Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-06-24 17:49:44 -07:00
Zhang Zhen	a67a31fa30	mm/hugetlb: reduce arch dependent code about hugetlb_prefault_arch_hook Currently we have many duplicates in definitions of hugetlb_prefault_arch_hook. In all architectures this function is empty. Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-06-24 17:49:41 -07:00
Laurent Dufour	2ae416b142	mm: new mm hook framework CRIU is recreating the process memory layout by remapping the checkpointee memory area on top of the current process (criu). This includes remapping the vDSO to the place it has at checkpoint time. However some architectures like powerpc are keeping a reference to the vDSO base address to build the signal return stack frame by calling the vDSO sigreturn service. So once the vDSO has been moved, this reference is no more valid and the signal frame built later are not usable. This patch serie is introducing a new mm hook framework, and a new arch_remap hook which is called when mremap is done and the mm lock still hold. The next patch is adding the vDSO remap and unmap tracking to the powerpc architecture. This patch (of 3): This patch introduces a new set of header file to manage mm hooks: - per architecture empty header file (arch/x/include/asm/mm-arch-hooks.h) - a generic header (include/linux/mm-arch-hooks.h) The architecture which need to overwrite a hook as to redefine it in its header file, while architecture which doesn't need have nothing to do. The default hooks are defined in the generic header and are used in the case the architecture is not defining it. In a next step, mm hooks defined in include/asm-generic/mm_hooks.h should be moved here. Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-06-24 17:49:41 -07:00
Linus Torvalds	4e241557fc	The bulk of the changes here is for x86. And for once it's not for silicon that no one owns: these are really new features for everyone. * ARM: several features are in progress but missed the 4.2 deadline. So here is just a smattering of bug fixes, plus enabling the VFIO integration. * s390: Some fixes/refactorings/optimizations, plus support for 2GB pages. * x86: 1) host and guest support for marking kvmclock as a stable scheduler clock. 2) support for write combining. 3) support for system management mode, needed for secure boot in guests. 4) a bunch of cleanups required for 2+3. 5) support for virtualized performance counters on AMD; 6) legacy PCI device assignment is deprecated and defaults to "n" in Kconfig; VFIO replaces it. On top of this there are also bug fixes and eager FPU context loading for FPU-heavy guests. * Common code: Support for multiple address spaces; for now it is used only for x86 SMM but the s390 folks also have plans. There are some x86 conflicts, one with the rc8 pull request and the rest with Ingo's FPU rework. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJViYzhAAoJEL/70l94x66Dda0H/1IepMbfEy+o849d5G71fNTs F8Y8qUP2GZuL7T53FyFUGSBw+AX7kimu9ia4gR/PmDK+QYsdosYeEjwlsolZfTBf sHuzNtPoJhi5o1o/ur4NGameo0WjGK8f1xyzr+U8z74QDQyQv/QYCdK/4isp4BJL ugHNHkuROX6Zng4i7jc9rfaSRg29I3GBxQUYpMkEnD3eMYMUBWGm6Rs8pHgGAMvL vqzntgW00WNxehTqcAkmD/Wv+txxhkvIadZnjgaxH49e9JeXeBKTIR5vtb7Hns3s SuapZUyw+c95DIipXq4EznxxaOrjbebOeFgLCJo8+XMXZum8RZf/ob24KroYad0= =YsAR -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull first batch of KVM updates from Paolo Bonzini: "The bulk of the changes here is for x86. And for once it's not for silicon that no one owns: these are really new features for everyone. Details: - ARM: several features are in progress but missed the 4.2 deadline. So here is just a smattering of bug fixes, plus enabling the VFIO integration. - s390: Some fixes/refactorings/optimizations, plus support for 2GB pages. - x86: * host and guest support for marking kvmclock as a stable scheduler clock. * support for write combining. * support for system management mode, needed for secure boot in guests. * a bunch of cleanups required for the above * support for virtualized performance counters on AMD * legacy PCI device assignment is deprecated and defaults to "n" in Kconfig; VFIO replaces it On top of this there are also bug fixes and eager FPU context loading for FPU-heavy guests. - Common code: Support for multiple address spaces; for now it is used only for x86 SMM but the s390 folks also have plans" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits) KVM: s390: clear floating interrupt bitmap and parameters KVM: x86/vPMU: Enable PMU handling for AMD PERFCTRn and EVNTSELn MSRs KVM: x86/vPMU: Implement AMD vPMU code for KVM KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch KVM: x86/vPMU: introduce kvm_pmu_msr_idx_to_pmc KVM: x86/vPMU: reorder PMU functions KVM: x86/vPMU: whitespace and stylistic adjustments in PMU code KVM: x86/vPMU: use the new macros to go between PMC, PMU and VCPU KVM: x86/vPMU: introduce pmu.h header KVM: x86/vPMU: rename a few PMU functions KVM: MTRR: do not map huge page for non-consistent range KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type KVM: MTRR: introduce mtrr_for_each_mem_type KVM: MTRR: introduce fixed_mtrr_addr_* functions KVM: MTRR: sort variable MTRRs KVM: MTRR: introduce var_mtrr_range KVM: MTRR: introduce fixed_mtrr_segment table KVM: MTRR: improve kvm_mtrr_get_guest_memory_type KVM: MTRR: do not split 64 bits MSR content KVM: MTRR: clean up mtrr default type ...	2015-06-24 09:36:49 -07:00
Linus Torvalds	0faef837e4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching fixes from Jiri Kosina: - symbol lookup locking fix, from Miroslav Benes - error handling improvements in case of failure of the module coming notifier, from Minfei Huang - we were too pessimistic when kASLR has been enabled on x86 and were dropping address hints on the floor unnecessarily in such case. Fix from Jiri Kosina - a few other small fixes and cleanups * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: add module locking around kallsyms calls livepatch: annotate klp_init() with __init livepatch: introduce patch/func-walking helpers livepatch: make kobject in klp_object statically allocated livepatch: Prevent patch inconsistencies if the coming module notifier fails livepatch: match return value to function signature x86: kaslr: fix build due to missing ALIGN definition livepatch: x86: make kASLR logic more accurate x86: introduce kaslr_offset()	2015-06-23 14:07:26 -07:00
Linus Torvalds	d8133356e9	PCI changes for the v4.2 merge window: Enumeration - Move pci_ari_enabled() to global header (Alex Williamson) - Account for ARI in _PRT lookups (Alex Williamson) - Remove unused pci_scan_bus_parented() (Yijing Wang) Resource management - Use host bridge _CRS info on systems with >32 bit addressing (Bjorn Helgaas) - Use host bridge _CRS info on Foxconn K8M890-8237A (Bjorn Helgaas) - Fix pci_address_to_pio() conversion of CPU address to I/O port (Zhichang Yuan) - Add pci_bus_addr_t (Yinghai Lu) PCI device hotplug - Wait for pciehp command completion where necessary (Alex Williamson) - Drop pointless ACPI-based "slot detection" check (Rafael J. Wysocki) - Check ignore_hotplug for all downstream devices (Rafael J. Wysocki) - Propagate the "ignore hotplug" setting to parent (Rafael J. Wysocki) - Inline pciehp "handle event" functions into the ISR (Bjorn Helgaas) - Clean up pciehp debug logging (Bjorn Helgaas) Power management - Remove redundant PCIe port type checking (Yijing Wang) - Add dev->has_secondary_link to track downstream PCIe links (Yijing Wang) - Use dev->has_secondary_link to find downstream links for ASPM (Yijing Wang) - Drop __pci_disable_link_state() useless "force" parameter (Bjorn Helgaas) - Simplify Clock Power Management setting (Bjorn Helgaas) Virtualization - Add ACS quirks for Intel 9-series PCH root ports (Alex Williamson) - Add function 1 DMA alias quirk for Marvell 9120 (Sakari Ailus) MSI - Disable MSI at enumeration even if kernel doesn't support MSI (Michael S. Tsirkin) - Remove unused pci_msi_off() (Bjorn Helgaas) - Rename msi_set_enable(), msix_clear_and_set_ctrl() (Michael S. Tsirkin) - Export pci_msi_set_enable(), pci_msix_clear_and_set_ctrl() (Michael S. Tsirkin) - Drop pci_msi_off() calls during probe (Michael S. Tsirkin) APM X-Gene host bridge driver - Add APM X-Gene v1 PCIe MSI/MSIX termination driver (Duc Dang) - Add APM X-Gene PCIe MSI DTS nodes (Duc Dang) - Disable Configuration Request Retry Status for v1 silicon (Duc Dang) - Allow config access to Root Port even when link is down (Duc Dang) Broadcom iProc host bridge driver - Allow override of device tree IRQ mapping function (Hauke Mehrtens) - Add BCMA PCIe driver (Hauke Mehrtens) - Directly add PCI resources (Hauke Mehrtens) - Free resource list after registration (Hauke Mehrtens) Freescale i.MX6 host bridge driver - Add speed change timeout message (Troy Kisky) - Rename imx6_pcie_start_link() to imx6_pcie_establish_link() (Bjorn Helgaas) Freescale Layerscape host bridge driver - Use dw_pcie_link_up() consistently (Bjorn Helgaas) - Factor out ls_pcie_establish_link() (Bjorn Helgaas) Marvell MVEBU host bridge driver - Remove mvebu_pcie_scan_bus() (Yijing Wang) NVIDIA Tegra host bridge driver - Remove tegra_pcie_scan_bus() (Yijing Wang) Synopsys DesignWare host bridge driver - Consolidate outbound iATU programming functions (Jisheng Zhang) - Use iATU0 for cfg and IO, iATU1 for MEM (Jisheng Zhang) - Add support for x8 links (Zhou Wang) - Wait for link to come up with consistent style (Bjorn Helgaas) - Use pci_scan_root_bus() for simplicity (Yijing Wang) TI DRA7xx host bridge driver - Use dw_pcie_link_up() consistently (Bjorn Helgaas) Miscellaneous - Include <linux/pci.h>, not <asm/pci.h> (Bjorn Helgaas) - Remove unnecessary #includes of <asm/pci.h> (Bjorn Helgaas) - Remove unused pcibios_select_root() (again) (Bjorn Helgaas) - Remove unused pci_dma_burst_advice() (Bjorn Helgaas) - xen/pcifront: Don't use deprecated function pci_scan_bus_parented() (Arnd Bergmann) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJViCSWAAoJEFmIoMA60/r8zX8P/1DPNnk+8xSQe3dYjnG8VW3P GPxeCqLMkjiF3ffxcLDzsgrHMjZEb8Co67WePs0k5V0lbZevoIwUo48+oO9B5jhc H5DuPZHyTHeOvaZv4GUY5vq/1DBh4JXmJc2V/BkaJ6qhXckF+SCam9C+s0p4950o QX/ifOjg/VHzmhaiL7wymJOzuniZmIttl+y+nzkl3AUJ+T6ZtQbUhz+8GZ3lj7Ma F+7JHhvm9K8Ljajxb6BLWTw4xgHA6ZN5PtYEx+Sl9QBYSsGfL7LnqyYD3KhJ7KV5 4AHNJGEVhzNwSuyh+VQx1tNm7OHOqkAaTsYdCVUZRow+6CPd8P75QOMtpl+SmPJB RV1BAO75OTGqKg0B9IDg855y4Nh+4/dKoZlBPzpp7+cKw3ylaRAsNnaZ9ik5D62v RR06CFgWGHwDXSObgbRm4v0HwfAIHWWJzrPqAZmElh2dzb1Lv1I3AbB1SClCN6sl fnAu6CAwA47A5GT8xW3L0oQXdcSmdNUdNzZrsfDnOBIQWMsF+zBFKr6sTABVgyxp /WEJaNlvx8Zlq0bZlhGDdsGSbFNFzhX4avWZtXhvdcqFzH0KaVghYSayYvJE9Haq oakWqS+GZ3x40j+rdrgLg98AWRVraE1MvV1A7N9TIGjuuKqqbZfSP8kvX3QRQQhO Z2+X5hMM0s/tdYtADYu/ =Qw+j -----END PGP SIGNATURE----- Merge tag 'pci-v4.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for the v4.2 merge window: Enumeration - Move pci_ari_enabled() to global header (Alex Williamson) - Account for ARI in _PRT lookups (Alex Williamson) - Remove unused pci_scan_bus_parented() (Yijing Wang) Resource management - Use host bridge _CRS info on systems with >32 bit addressing (Bjorn Helgaas) - Use host bridge _CRS info on Foxconn K8M890-8237A (Bjorn Helgaas) - Fix pci_address_to_pio() conversion of CPU address to I/O port (Zhichang Yuan) - Add pci_bus_addr_t (Yinghai Lu) PCI device hotplug - Wait for pciehp command completion where necessary (Alex Williamson) - Drop pointless ACPI-based "slot detection" check (Rafael J. Wysocki) - Check ignore_hotplug for all downstream devices (Rafael J. Wysocki) - Propagate the "ignore hotplug" setting to parent (Rafael J. Wysocki) - Inline pciehp "handle event" functions into the ISR (Bjorn Helgaas) - Clean up pciehp debug logging (Bjorn Helgaas) Power management - Remove redundant PCIe port type checking (Yijing Wang) - Add dev->has_secondary_link to track downstream PCIe links (Yijing Wang) - Use dev->has_secondary_link to find downstream links for ASPM (Yijing Wang) - Drop __pci_disable_link_state() useless "force" parameter (Bjorn Helgaas) - Simplify Clock Power Management setting (Bjorn Helgaas) Virtualization - Add ACS quirks for Intel 9-series PCH root ports (Alex Williamson) - Add function 1 DMA alias quirk for Marvell 9120 (Sakari Ailus) MSI - Disable MSI at enumeration even if kernel doesn't support MSI (Michael S. Tsirkin) - Remove unused pci_msi_off() (Bjorn Helgaas) - Rename msi_set_enable(), msix_clear_and_set_ctrl() (Michael S. Tsirkin) - Export pci_msi_set_enable(), pci_msix_clear_and_set_ctrl() (Michael S. Tsirkin) - Drop pci_msi_off() calls during probe (Michael S. Tsirkin) APM X-Gene host bridge driver - Add APM X-Gene v1 PCIe MSI/MSIX termination driver (Duc Dang) - Add APM X-Gene PCIe MSI DTS nodes (Duc Dang) - Disable Configuration Request Retry Status for v1 silicon (Duc Dang) - Allow config access to Root Port even when link is down (Duc Dang) Broadcom iProc host bridge driver - Allow override of device tree IRQ mapping function (Hauke Mehrtens) - Add BCMA PCIe driver (Hauke Mehrtens) - Directly add PCI resources (Hauke Mehrtens) - Free resource list after registration (Hauke Mehrtens) Freescale i.MX6 host bridge driver - Add speed change timeout message (Troy Kisky) - Rename imx6_pcie_start_link() to imx6_pcie_establish_link() (Bjorn Helgaas) Freescale Layerscape host bridge driver - Use dw_pcie_link_up() consistently (Bjorn Helgaas) - Factor out ls_pcie_establish_link() (Bjorn Helgaas) Marvell MVEBU host bridge driver - Remove mvebu_pcie_scan_bus() (Yijing Wang) NVIDIA Tegra host bridge driver - Remove tegra_pcie_scan_bus() (Yijing Wang) Synopsys DesignWare host bridge driver - Consolidate outbound iATU programming functions (Jisheng Zhang) - Use iATU0 for cfg and IO, iATU1 for MEM (Jisheng Zhang) - Add support for x8 links (Zhou Wang) - Wait for link to come up with consistent style (Bjorn Helgaas) - Use pci_scan_root_bus() for simplicity (Yijing Wang) TI DRA7xx host bridge driver - Use dw_pcie_link_up() consistently (Bjorn Helgaas) Miscellaneous - Include <linux/pci.h>, not <asm/pci.h> (Bjorn Helgaas) - Remove unnecessary #includes of <asm/pci.h> (Bjorn Helgaas) - Remove unused pcibios_select_root() (again) (Bjorn Helgaas) - Remove unused pci_dma_burst_advice() (Bjorn Helgaas) - xen/pcifront: Don't use deprecated function pci_scan_bus_parented() (Arnd Bergmann)" * tag 'pci-v4.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (58 commits) PCI: pciehp: Inline the "handle event" functions into the ISR PCI: pciehp: Rename queue_interrupt_event() to pciehp_queue_interrupt_event() PCI: pciehp: Make queue_interrupt_event() void PCI: xgene: Allow config access to Root Port even when link is down PCI: xgene: Disable Configuration Request Retry Status for v1 silicon PCI: pciehp: Clean up debug logging x86/PCI: Use host bridge _CRS info on systems with >32 bit addressing PCI: imx6: Add #define PCIE_RC_LCSR PCI: imx6: Use "u32", not "uint32_t" PCI: Remove unused pci_scan_bus_parented() xen/pcifront: Don't use deprecated function pci_scan_bus_parented() PCI: imx6: Add speed change timeout message PCI/ASPM: Simplify Clock Power Management setting PCI: designware: Wait for link to come up with consistent style PCI: layerscape: Factor out ls_pcie_establish_link() PCI: layerscape: Use dw_pcie_link_up() consistently PCI: dra7xx: Use dw_pcie_link_up() consistently x86/PCI: Use host bridge _CRS info on Foxconn K8M890-8237A PCI: pciehp: Wait for hotplug command completion where necessary PCI: Remove unused pci_dma_burst_advice() ...	2015-06-23 13:41:24 -07:00
Wei Huang	25462f7f52	KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch This patch defines a new function pointer struct (kvm_pmu_ops) to support vPMU for both Intel and AMD. The functions pointers defined in this new struct will be linked with Intel and AMD functions later. In the meanwhile the struct that maps from event_sel bits to PERF_TYPE_HARDWARE events is renamed and moved from Intel specific code to kvm_host.h as a common struct. Reviewed-by: Joerg Roedel <jroedel@suse.de> Tested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-23 14:12:14 +02:00
Linus Torvalds	d70b3ef54c	Merge branch 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Ingo Molnar: "There were so many changes in the x86/asm, x86/apic and x86/mm topics in this cycle that the topical separation of -tip broke down somewhat - so the result is a more traditional architecture pull request, collected into the 'x86/core' topic. The topics were still maintained separately as far as possible, so bisectability and conceptual separation should still be pretty good - but there were a handful of merge points to avoid excessive dependencies (and conflicts) that would have been poorly tested in the end. The next cycle will hopefully be much more quiet (or at least will have fewer dependencies). The main changes in this cycle were: * x86/apic changes, with related IRQ core changes: (Jiang Liu, Thomas Gleixner) - This is the second and most intrusive part of changes to the x86 interrupt handling - full conversion to hierarchical interrupt domains: [IOAPIC domain] ----- \| [MSI domain] --------[Remapping domain] ----- [ Vector domain ] \| (optional) \| [HPET MSI domain] ----- \| \| [DMAR domain] ----------------------------- \| [Legacy domain] ----------------------------- This now reflects the actual hardware and allowed us to distangle the domain specific code from the underlying parent domain, which can be optional in the case of interrupt remapping. It's a clear separation of functionality and removes quite some duct tape constructs which plugged the remap code between ioapic/msi/hpet and the vector management. - Intel IOMMU IRQ remapping enhancements, to allow direct interrupt injection into guests (Feng Wu) * x86/asm changes: - Tons of cleanups and small speedups, micro-optimizations. This is in preparation to move a good chunk of the low level entry code from assembly to C code (Denys Vlasenko, Andy Lutomirski, Brian Gerst) - Moved all system entry related code to a new home under arch/x86/entry/ (Ingo Molnar) - Removal of the fragile and ugly CFI dwarf debuginfo annotations. Conversion to C will reintroduce many of them - but meanwhile they are only getting in the way, and the upstream kernel does not rely on them (Ingo Molnar) - NOP handling refinements. (Borislav Petkov) * x86/mm changes: - Big PAT and MTRR rework: making the code more robust and preparing to phase out exposing direct MTRR interfaces to drivers - in favor of using PAT driven interfaces (Toshi Kani, Luis R Rodriguez, Borislav Petkov) - New ioremap_wt()/set_memory_wt() interfaces to support Write-Through cached memory mappings. This is especially important for good performance on NVDIMM hardware (Toshi Kani) * x86/ras changes: - Add support for deferred errors on AMD (Aravind Gopalakrishnan) This is an important RAS feature which adds hardware support for poisoned data. That means roughly that the hardware marks data which it has detected as corrupted but wasn't able to correct, as poisoned data and raises an APIC interrupt to signal that in the form of a deferred error. It is the OS's responsibility then to take proper recovery action and thus prolonge system lifetime as far as possible. - Add support for Intel "Local MCE"s: upcoming CPUs will support CPU-local MCE interrupts, as opposed to the traditional system- wide broadcasted MCE interrupts (Ashok Raj) - Misc cleanups (Borislav Petkov) * x86/platform changes: - Intel Atom SoC updates ... and lots of other cleanups, fixlets and other changes - see the shortlog and the Git log for details" * 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (222 commits) x86/hpet: Use proper hpet device number for MSI allocation x86/hpet: Check for irq==0 when allocating hpet MSI interrupts x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled x86/mm/pat, drivers/media/ivtv: Use arch_phys_wc_add() and require PAT disabled x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail genirq: Prevent crash in irq_move_irq() genirq: Enhance irq_data_to_desc() to support hierarchy irqdomain iommu, x86: Properly handle posted interrupts for IOMMU hotplug iommu, x86: Provide irq_remapping_cap() interface iommu, x86: Setup Posted-Interrupts capability for Intel iommu iommu, x86: Add cap_pi_support() to detect VT-d PI capability iommu, x86: Avoid migrating VT-d posted interrupts iommu, x86: Save the mode (posted or remapped) of an IRTE iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip iommu: dmar: Provide helper to copy shared irte fields iommu: dmar: Extend struct irte for VT-d Posted-Interrupts iommu: Add new member capability to struct irq_remap_ops x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code x86/asm/entry/32: Shorten __audit_syscall_entry() args preparation x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry() ...	2015-06-22 17:59:09 -07:00
Linus Torvalds	35ffccdb7e	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pul x86 microcode updates from Ingo Molnar: "x86 microcode loader updates from Borislav Petkov: - early parsing of the built-in microcode - cleanups - misc smaller fixes" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Correct CPU family related variable types x86/microcode: Disable builtin microcode loading on 32-bit for now x86/microcode/intel: Rename get_matching_sig() x86/microcode/intel: Simplify get_matching_sig() x86/microcode/intel: Simplify update_match_cpu() x86/microcode/intel: Rename get_matching_microcode x86/cpu/microcode: Zap changelog x86/microcode: Parse built-in microcode early x86/microcode/intel: Remove unused @rev arg of get_matching_sig() x86/microcode/intel: Get rid of revision_is_newer()	2015-06-22 17:46:14 -07:00
Linus Torvalds	e75c73ad64	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FPU updates from Ingo Molnar: "This tree contains two main changes: - The big FPU code rewrite: wide reaching cleanups and reorganization that pulls all the FPU code together into a clean base in arch/x86/fpu/. The resulting code is leaner and faster, and much easier to understand. This enables future work to further simplify the FPU code (such as removing lazy FPU restores). By its nature these changes have a substantial regression risk: FPU code related bugs are long lived, because races are often subtle and bugs mask as user-space failures that are difficult to track back to kernel side backs. I'm aware of no unfixed (or even suspected) FPU related regression so far. - MPX support rework/fixes. As this is still not a released CPU feature, there were some buglets in the code - should be much more robust now (Dave Hansen)" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (250 commits) x86/fpu: Fix double-increment in setup_xstate_features() x86/mpx: Allow 32-bit binaries on 64-bit kernels again x86/mpx: Do not count MPX VMAs as neighbors when unmapping x86/mpx: Rewrite the unmap code x86/mpx: Support 32-bit binaries on 64-bit kernels x86/mpx: Use 32-bit-only cmpxchg() for 32-bit apps x86/mpx: Introduce new 'directory entry' to 'addr' helper function x86/mpx: Add temporary variable to reduce masking x86: Make is_64bit_mm() widely available x86/mpx: Trace allocation of new bounds tables x86/mpx: Trace the attempts to find bounds tables x86/mpx: Trace entry to bounds exception paths x86/mpx: Trace #BR exceptions x86/mpx: Introduce a boot-time disable flag x86/mpx: Restrict the mmap() size check to bounds tables x86/mpx: Remove redundant MPX_BNDCFG_ADDR_MASK x86/mpx: Clean up the code by not passing a task pointer around when unnecessary x86/mpx: Use the new get_xsave_field_ptr()API x86/fpu/xstate: Wrap get_xsave_addr() to make it safer x86/fpu/xstate: Fix up bad get_xsave_addr() assumptions ...	2015-06-22 17:16:11 -07:00
Linus Torvalds	b3ba283d83	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CPU features from Ingo Molnar: "Various CPU feature support related changes: in particular the /proc/cpuinfo model name sanitization change should be monitored, it has a chance to break stuff. (but really shouldn't and there are no regression reports)" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/amd: Give access to the number of nodes in a physical package x86/cpu: Trim model ID whitespace x86/cpu: Strip any /proc/cpuinfo model name field whitespace x86/cpu/amd: Set X86_FEATURE_EXTD_APICID for future processors x86/gart: Check for GART support before accessing GART registers	2015-06-22 16:43:01 -07:00
Linus Torvalds	d43e4f44ba	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Clean up types in xlate_dev_mem_ptr() some more x86: Deinline dma_free_attrs() x86: Deinline dma_alloc_attrs() x86: Remove unused TI_cpu x86: Merge common 32-bit values in asm-offsets.c	2015-06-22 16:23:00 -07:00
Linus Torvalds	23b7776290	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes are: - lockless wakeup support for futexes and IPC message queues (Davidlohr Bueso, Peter Zijlstra) - Replace spinlocks with atomics in thread_group_cputimer(), to improve scalability (Jason Low) - NUMA balancing improvements (Rik van Riel) - SCHED_DEADLINE improvements (Wanpeng Li) - clean up and reorganize preemption helpers (Frederic Weisbecker) - decouple page fault disabling machinery from the preemption counter, to improve debuggability and robustness (David Hildenbrand) - SCHED_DEADLINE documentation updates (Luca Abeni) - topology CPU masks cleanups (Bartosz Golaszewski) - /proc/sched_debug improvements (Srikar Dronamraju)" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (79 commits) sched/deadline: Remove needless parameter in dl_runtime_exceeded() sched: Remove superfluous resetting of the p->dl_throttled flag sched/deadline: Drop duplicate init_sched_dl_class() declaration sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target sched/deadline: Make init_sched_dl_class() __init sched/deadline: Optimize pull_dl_task() sched/preempt: Add static_key() to preempt_notifiers sched/preempt: Fix preempt notifiers documentation about hlist_del() within unsafe iteration sched/stop_machine: Fix deadlock between multiple stop_two_cpus() sched/debug: Add sum_sleep_runtime to /proc/<pid>/sched sched/debug: Replace vruntime with wait_sum in /proc/sched_debug sched/debug: Properly format runnable tasks in /proc/sched_debug sched/numa: Only consider less busy nodes as numa balancing destinations Revert `095bebf61a` ("sched/numa: Do not move past the balance point if unbalanced") sched/fair: Prevent throttling in early pick_next_task_fair() preempt: Reorganize the notrace definitions a bit preempt: Use preempt_schedule_context() as the official tracing preemption point sched: Make preempt_schedule_context() function-tracing safe x86: Remove cpu_sibling_mask() and cpu_core_mask() x86: Replace cpu__mask() with topology__cpumask() ...	2015-06-22 15:52:04 -07:00
Linus Torvalds	1bf7067c6e	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes are: - 'qspinlock' support, enabled on x86: queued spinlocks - these are now the spinlock variant used by x86 as they outperform ticket spinlocks in every category. (Waiman Long) - 'pvqspinlock' support on x86: paravirtualized variant of queued spinlocks. (Waiman Long, Peter Zijlstra) - 'qrwlock' support, enabled on x86: queued rwlocks. Similar to queued spinlocks, they are now the variant used by x86: CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y CONFIG_QUEUED_SPINLOCKS=y CONFIG_ARCH_USE_QUEUED_RWLOCKS=y CONFIG_QUEUED_RWLOCKS=y - various lockdep fixlets - various locking primitives cleanups, further WRITE_ONCE() propagation" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) locking/lockdep: Remove hard coded array size dependency locking/qrwlock: Don't contend with readers when setting _QW_WAITING lockdep: Do not break user-visible string locking/arch: Rename set_mb() to smp_store_mb() locking/arch: Add WRITE_ONCE() to set_mb() rtmutex: Warn if trylock is called from hard/softirq context arch: Remove __ARCH_HAVE_CMPXCHG locking/rtmutex: Drop usage of __HAVE_ARCH_CMPXCHG locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKS locking/pvqspinlock: Replace xchg() by the more descriptive set_mb() locking/pvqspinlock, x86: Enable PV qspinlock for Xen locking/pvqspinlock, x86: Enable PV qspinlock for KVM locking/pvqspinlock, x86: Implement the paravirt qspinlock call patching locking/pvqspinlock: Implement simple paravirt support for the qspinlock locking/qspinlock: Revert to test-and-set on hypervisors locking/qspinlock: Use a simple write to grab the lock locking/qspinlock: Optimize for smaller NR_CPUS locking/qspinlock: Extract out code snippets for the next patch locking/qspinlock: Add pending bit ...	2015-06-22 14:54:22 -07:00
Ingo Molnar	7ef3d7d58d	Merge branches 'x86/apic', 'x86/asm', 'x86/mm' and 'x86/platform' into x86/core, to merge last updates Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-22 09:15:03 +02:00
Wei Huang	474a5bb944	KVM: x86/vPMU: introduce pmu.h header This will be used for private function used by AMD- and Intel-specific PMU implementations. Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-19 17:16:29 +02:00
Wei Huang	c6702c9dcf	KVM: x86/vPMU: rename a few PMU functions Before introducing a pmu.h header for them, make the naming more consistent. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-19 17:16:29 +02:00
Xiao Guangrong	19efffa244	KVM: MTRR: sort variable MTRRs Sort all valid variable MTRRs based on its base address, it will help us to check a range to see if it's fully contained in variable MTRRs Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> [Fix list insertion sort, simplify var_mtrr_range_is_valid to just test the V bit. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-19 17:16:28 +02:00
Xiao Guangrong	86fd52701c	KVM: MTRR: do not split 64 bits MSR content Variable MTRR MSRs are 64 bits which are directly accessed with full length, no reason to split them to two 32 bits Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-19 17:16:27 +02:00
Xiao Guangrong	10fac2dc2b	KVM: MTRR: clean up mtrr default type Drop kvm_mtrr->enable, omit the decode/code workload and get rid of all the hard code Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-19 17:16:27 +02:00
Xiao Guangrong	910a6aae4e	KVM: MTRR: exactly define the size of variable MTRRs Only KVM_NR_VAR_MTRR variable MTRRs are available in KVM guest Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-19 17:16:27 +02:00
Xiao Guangrong	70109e7d9d	KVM: MTRR: remove mtrr_state.have_fixed vMTRR does not depend on any host MTRR feature and fixed MTRRs have always been implemented, so drop this field Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-19 17:16:27 +02:00
Xiao Guangrong	ff53604b40	KVM: x86: move MTRR related code to a separate file MTRR code locates in x86.c and mmu.c so that move them to a separate file to make the organization more clearer and it will be the place where we fully implement vMTRR Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-19 17:16:26 +02:00
Aravind Gopalakrishnan	cc2749e409	x86/cpu/amd: Give access to the number of nodes in a physical package Stash the number of nodes in a physical processor package locally and add an accessor to be called by interested parties. The first user is the MCE injection module which uses it to find the node base core in a package for injecting a certain type of errors. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Rewrote the commit message, merged it with the accessor patch and unified naming. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mchehab@osg.samsung.com Link: http://lkml.kernel.org/r/1433868317-18417-2-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-18 11:16:06 +02:00
Len Brown	6fb3143b56	tools/power turbostat: dump CONFIG_TDP Config TDP is a feature that allows parts to be configured for different thermal limits after they have left the factory. This can have an effect on the operation of the part, particularly in determiniing... Max Non-turbo Ratio Turbo Activation Ratio Signed-off-by: Len Brown <len.brown@intel.com>	2015-06-17 16:23:45 -04:00
Feng Wu	959c870f73	iommu, x86: Provide irq_remapping_cap() interface Add a new interface irq_remapping_cap() to detect whether irq remapping supports new features, such as VT-d Posted-Interrupts. Export the function, so that KVM code can check this and use this mechanism properly. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Joerg Roedel <joro@8bytes.org> Cc: iommu@lists.linux-foundation.org Cc: dwmw2@infradead.org Link: http://lkml.kernel.org/r/1433827237-3382-10-git-send-email-feng.wu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-06-12 11:33:52 +02:00
Feng Wu	8541186faf	iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip Interrupt chip callback to set the VCPU affinity for posted interrupts. [ tglx: Use the helper function to copy from the remap irte instead of open coding it. Massage the comment as well ] Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Cc: iommu@lists.linux-foundation.org Cc: joro@8bytes.org Cc: dwmw2@infradead.org Link: http://lkml.kernel.org/r/1433827237-3382-5-git-send-email-feng.wu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-06-12 11:33:52 +02:00
Feng Wu	6f28192394	iommu: Add new member capability to struct irq_remap_ops Add a new member 'capability' to struct irq_remap_ops for storing information about available capabilities such as VT-d Posted-Interrupts. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Joerg Roedel <joro@8bytes.org> Cc: iommu@lists.linux-foundation.org Cc: dwmw2@infradead.org Link: http://lkml.kernel.org/r/1433827237-3382-2-git-send-email-feng.wu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-06-12 11:33:51 +02:00
Dave Hansen	613fcb7d3c	x86/mpx: Support 32-bit binaries on 64-bit kernels Right now, the kernel can only switch between 64-bit and 32-bit binaries at compile time. This patch adds support for 32-bit binaries on 64-bit kernels when we support ia32 emulation. We essentially choose which set of table sizes to use when doing arithmetic for the bounds table calculations. This also uses a different approach for calculating the table indexes than before. I think the new one makes it much more clear what is going on, and allows us to share more code between the 32-bit and 64-bit cases. Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183705.E01F21E2@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:34 +02:00
Dave Hansen	5458765390	x86/mpx: Introduce new 'directory entry' to 'addr' helper function Currently, to get from a bounds directory entry to the virtual address of a bounds table, we simply mask off a few low bits. However, the set of bits we mask off is different for 32-bit and 64-bit binaries. This breaks the operation out in to a helper function and also adds a temporary variable to store the result until we are sure we are returning one. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183704.007686CE@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:33 +02:00
Dave Hansen	b0e9b09b3b	x86: Make is_64bit_mm() widely available The uprobes code has a nice helper, is_64bit_mm(), that consults both the runtime and compile-time flags for 32-bit support. Instead of reinventing the wheel, pull it in to an x86 header so we can use it for MPX. I prefer passing the 'mm' around to test_thread_flag(TIF_IA32) because it makes it explicit where the context is coming from. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183704.F0209999@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:32 +02:00
Dave Hansen	cd4996dce1	x86/mpx: Trace allocation of new bounds tables Bounds tables are a significant consumer of memory. It is important to know when they are being allocated. Add a trace point to trace whenever an allocation occurs and also its virtual address. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183704.EC23A93E@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:32 +02:00
Dave Hansen	2a1dcb1f79	x86/mpx: Trace the attempts to find bounds tables There are two different events being traced here. They are doing similar things so share a trace "EVENT_CLASS" and are presented together. 1. Trace when MPX is zapping pages "mpx_unmap_zap": When MPX can not free an entire bounds table, it will instead try to zap unused parts of a bounds table to free the backing memory. This decreases RSS (resident set size) without decreasing the virtual space allocated for bounds tables. 2. Trace attempts to find bounds tables "mpx_unmap_search": This event traces any time we go looking to unmap a bounds table for a given virtual address range. This is useful to ensure that the kernel actually "tried" to free a bounds table versus times it succeeded in finding one. It might try and fail if it realized that a table was shared with an adjacent VMA which is not being unmapped. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183703.B9D2468B@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:32 +02:00
Dave Hansen	97efebf1bc	x86/mpx: Trace entry to bounds exception paths There are two basic things that can happen as the result of a bounds exception (#BR): 1. We allocate a new bounds table 2. We pass up a bounds exception to userspace. This patch adds a trace point for the case where we are passing the exception up to userspace with a signal. We are also explicit that we're printing out the inverse of the 'upper' that we encounter. If you want to filter, for instance, you need to ~ the value first. The reason we do this is because of how 'upper' is stored in the bounds table. If a pointer's range is: 0x1000 -> 0x2000 it is stored in the bounds table as (32-bits here for brevity): lower: 0x00001000 upper: 0xffffdfff That is so that an all 0's entry: lower: 0x00000000 upper: 0x00000000 corresponds to the "init" bounds which store a range of: 0x00000000 -> 0xffffffff That is, by far, the common case, and that lets us use the zero page, or deduplicate the memory, etc... The 'upper' stored in the table is gibberish to print by itself, so we print ~upper to get the actual, logical, human-readable value printed out. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183703.027BB9B0@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:32 +02:00
Dave Hansen	e7126cf5f1	x86/mpx: Trace #BR exceptions This is the first in a series of MPX tracing patches. I've found these extremely useful in the process of debugging applications and the kernel code itself. This exception hooks in to the bounds (#BR) exception very early and allows capturing the key registers which would influence how the exception is handled. Note that bndcfgu/bndstatus are technically still 64-bit registers even in 32-bit mode. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183703.5FE2619A@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:31 +02:00
Qiaowei Ren	3c1d323009	x86/mpx: Remove redundant MPX_BNDCFG_ADDR_MASK MPX_BNDCFG_ADDR_MASK is defined two times, so this patch removes redundant one. Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183702.5F129376@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:30 +02:00
Dave Hansen	46a6e0cf1c	x86/mpx: Clean up the code by not passing a task pointer around when unnecessary The MPX code can only work on the current task. You can not, for instance, enable MPX management in another process or thread. You can also not handle a fault for another process or thread. Despite this, we pass a task_struct around prolifically. This patch removes all of the task struct passing for code paths where the code can not deal with another task (which turns out to be all of them). This has no functional changes. It's just a cleanup. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bp@alien8.de Link: http://lkml.kernel.org/r/20150607183702.6A81DA2C@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:30 +02:00
Dave Hansen	a84eeaa96b	x86/mpx: Use the new get_xsave_field_ptr()API The MPX registers (bndcsr/bndcfgu/bndstatus) are not directly accessible via normal instructions. They essentially act as if they were floating point registers and are saved/restored along with those registers. There are two main paths in the MPX code where we care about the contents of these registers: 1. #BR (bounds) faults 2. the prctl() code where we are setting MPX up Both of those paths _might_ be called without the FPU having been used. That means that 'tsk->thread.fpu.state' might never be allocated. Also, fpu_save_init() is not preempt-safe. It was a bug to call it without disabling preemption. The new get_xsave_addr() calls unlazy_fpu() instead and properly disables preemption. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave@sr71.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: bp@alien8.de Link: http://lkml.kernel.org/r/20150607183701.BC0D37CF@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:30 +02:00
Dave Hansen	04cd027bcb	x86/fpu/xstate: Wrap get_xsave_addr() to make it safer The MPX code appears is calling a low-level FPU function (copy_fpregs_to_fpstate()). This function is not able to be called in all contexts, although it is safe to call directly in some cases. Although probably correct, the current code is ugly and potentially error-prone. So, add a wrapper that calls the (slightly) higher-level fpu__save() (which is preempt- safe) and also ensures that we even have an FPU context (in the case that this was called when in lazy FPU mode). Ingo had this to say about the details about when we need preemption disabled: > it's indeed generally unsafe to access/copy FPU registers with preemption enabled, > for two reasons: > > - on older systems that use FSAVE the instruction destroys FPU register > contents, which has to be handled carefully > > - even on newer systems if we copy to FPU registers (which this code doesn't) > then we don't want a context switch to occur in the middle of it, because a > context switch will write to the fpstate, potentially overwriting our new data > with old FPU state. > > But it's safe to access FPU registers with preemption enabled in a couple of > special cases: > > - potentially destructively saving FPU registers: the signal handling code does > this in copy_fpstate_to_sigframe(), because it can rely on the signal restore > side to restore the original FPU state. > > - reading FPU registers on modern systems: we don't do this anywhere at the > moment, mostly to keep symmetry with older systems where FSAVE is > destructive. > > - initializing FPU registers on modern systems: fpu__clear() does this. Here > it's safe because we don't copy from the fpstate. > > - directly writing FPU registers from user-space memory (!). We do this in > fpu__restore_sig(), and it's safe because neither context switches nor > irq-handler FPU use can corrupt the source context of the copy (which is > user-space memory). > > Note that the MPX code's current use of copy_fpregs_to_fpstate() was safe I think, > because: > > - MPX is predicated on eagerfpu, so the destructive F[N]SAVE instruction won't be > used. > > - the code was only reading FPU registers, and was doing it only in places that > guaranteed that an FPU state was already active (i.e. didn't do it in > kthreads) Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave@sr71.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: bp@alien8.de Link: http://lkml.kernel.org/r/20150607183700.AA881696@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:29 +02:00
Ingo Molnar	9dda1658a9	Merge branch 'x86/asm' into x86/core, to prepare for new patch Collect all changes to arch/x86/entry/entry_64.S, before applying patch that changes most of the file. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-08 20:48:20 +02:00
Greg Kroah-Hartman	00fda1682e	Merge 4.1-rc7 into tty-next This fixes up a merge issue with the amba-pl011.c driver, and we want the fixes in this branch as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-06-08 10:49:28 -07:00
Bjorn Helgaas	01d72a9518	PCI: Remove unused pci_dma_burst_advice() pci_dma_burst_advice() was added by `e24c2d963a` ("[PATCH] PCI: DMA bursting advice") but apparently never used. Remove it. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Michal Simek <monstr@monstr.eu> # microblaze CC: David S. Miller <davem@davemloft.net>	2015-06-08 07:56:43 -05:00
Ingo Molnar	b2502b418e	x86/asm/entry: Untangle 'system_call' into two entry points: entry_SYSCALL_64 and entry_INT80_32 The 'system_call' entry points differ starkly between native 32-bit and 64-bit kernels: on 32-bit kernels it defines the INT 0x80 entry point, while on 64-bit it's the SYSCALL entry point. This is pretty confusing when looking at generic code, and it also obscures the nature of the entry point at the assembly level. So unangle this by splitting the name into its two uses: system_call (32) -> entry_INT80_32 system_call (64) -> entry_SYSCALL_64 As per the generic naming scheme for x86 system call entry points: entry_MNEMONIC_qualifier where 'qualifier' is one of _32, _64 or _compat. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-08 09:14:21 +02:00
Ingo Molnar	4c8cd0c50d	x86/asm/entry: Untangle 'ia32_sysenter_target' into two entry points: entry_SYSENTER_32 and entry_SYSENTER_compat So the SYSENTER instruction is pretty quirky and it has different behavior depending on bitness and CPU maker. Yet we create a false sense of coherency by naming it 'ia32_sysenter_target' in both of the cases. Split the name into its two uses: ia32_sysenter_target (32) -> entry_SYSENTER_32 ia32_sysenter_target (64) -> entry_SYSENTER_compat As per the generic naming scheme for x86 system call entry points: entry_MNEMONIC_qualifier where 'qualifier' is one of _32, _64 or _compat. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-08 08:47:46 +02:00
Ingo Molnar	2cd23553b4	x86/asm/entry: Rename compat syscall entry points Rename the following system call entry points: ia32_cstar_target -> entry_SYSCALL_compat ia32_syscall -> entry_INT80_compat The generic naming scheme for x86 system call entry points is: entry_MNEMONIC_qualifier where 'qualifier' is one of _32, _64 or _compat. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-08 08:47:36 +02:00
Frederic Weisbecker	4eaca0a887	preempt: Use preempt_schedule_context() as the official tracing preemption point preempt_schedule_context() is a tracing safe preemption point but it's only used when CONFIG_CONTEXT_TRACKING=y. Other configs have tracing recursion issues since commit: `b30f0e3ffe` ("sched/preempt: Optimize preemption operations on __schedule() callers") introduced function based preemp_count_*() ops. Lets make it available on all configs and give it a more appropriate name for its new position. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433432349-1021-3-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:57:42 +02:00
Andy Shevchenko	7b179b8feb	x86/microcode: Correct CPU family related variable types Change the type of variables and function prototypes to be in alignment with what the x86_() / __x86_() family/model functions return. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433436928-31903-21-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:38:15 +02:00
Borislav Petkov	b72e7464e4	x86/uapi: Do not export <asm/msr-index.h> as part of the user API headers This header containing all MSRs and respective bit definitions got exported to userspace in conjunction with the big UAPI shuffle. But, it doesn't belong in the UAPI headers because userspace can do its own MSR defines and exporting them from the kernel blocks us from doing cleanups/renames in that header. Which is ridiculous - it is not kernel's job to export such a header and keep MSRs list and their names stable. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433436928-31903-19-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:36:04 +02:00
Ingo Molnar	c2f9b0af8b	Merge branch 'x86/ras' into x86/core, to fix conflicts Conflicts: arch/x86/include/asm/irq_vectors.h Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:35:27 +02:00
Borislav Petkov	c8e56d20f2	x86: Kill CONFIG_X86_HT In talking to Aravind recently about making certain AMD topology attributes available to the MCE injection module, it seemed like that CONFIG_X86_HT thing is more or less superfluous. It is def_bool y, depends on SMP and gets enabled in the majority of .configs - distro and otherwise - out there. So let's kill it and make code behind it depend directly on SMP. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Walter <dwalter@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433436928-31903-18-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:33:44 +02:00
Ashok Raj	88d538672e	x86/mce: Add infrastructure to support Local MCE Initialize and prepare for handling LMCEs. Add a boot-time option to disable LMCEs. Signed-off-by: Ashok Raj <ashok.raj@intel.com> [ Simplify stuff, align statements for better readability, reflow comments; kill unused lmce_clear(); save us an MSR write if LMCE is already enabled. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1433436928-31903-16-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:33:14 +02:00
Ashok Raj	bc12edb873	x86/mce: Add Local MCE definitions Add required definitions to support Local Machine Check Exceptions. Historically, machine check exceptions on Intel x86 processors have been broadcast to all logical processors in the system. Upcoming CPUs will support an opt-in mechanism to request some machine check exceptions be delivered to a single logical processor experiencing the fault. See http://www.intel.com/sdm Volume 3, System Programming Guide, chapter 15 for more information on MSRs and documentation on Local MCE. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1433436928-31903-15-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:33:13 +02:00
Toshi Kani	623dffb2a2	x86/mm/pat: Add set_memory_wt() for Write-Through type Now that reserve_ram_pages_type() accepts the WT type, add set_memory_wt(), set_memory_array_wt() and set_pages_array_wt() in order to be able to set memory to Write-Through page cache mode. Also, extend ioremap_change_attr() to accept the WT type. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-13-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:29:00 +02:00
Toshi Kani	d1b4bfbfac	x86/mm/pat: Add pgprot_writethrough() Add pgprot_writethrough() for setting page protection flags to Write-Through mode. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-11-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:28:58 +02:00
Toshi Kani	d838270e25	x86/mm, asm-generic: Add ioremap_wt() for creating Write-Through mappings Add ioremap_wt() for creating Write-Through mappings on x86. It follows the same model as ioremap_wc() for multi-arch support. Define ARCH_HAS_IOREMAP_WT in the x86 version of io.h to indicate that ioremap_wt() is implemented on x86. Also update the PAT documentation file to cover ioremap_wt(). Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-8-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:28:56 +02:00
Toshi Kani	ecb2febaaa	x86/mm: Teach is_new_memtype_allowed() about Write-Through type __ioremap_caller() calls reserve_memtype() and the passed down @new_pcm contains the actual page cache type it reserved in the success case. is_new_memtype_allowed() verifies if converting to the new page cache type is allowed when @pcm (the requested type) is different from @new_pcm. When WT is requested, the caller expects that writes are ordered and uncached. Therefore, enhance is_new_memtype_allowed() to disallow the following cases: - If the request is WT, mapping type cannot be WB - If the request is WT, mapping type cannot be WC Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-7-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:28:55 +02:00
Borislav Petkov	9cd25aac1f	x86/mm/pat: Emulate PAT when it is disabled In the case when PAT is disabled on the command line with "nopat" or when virtualization doesn't support PAT (correctly) - see `9d34cfdf47` ("x86: Don't rely on VMWare emulating PAT MSR correctly"). we emulate it using the PWT and PCD cache attribute bits. Get rid of boot_pat_state while at it. Based on a conglomerate patch from Toshi Kani. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:28:52 +02:00
Linus Torvalds	51d0f0cb3a	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - early_idt_handlers[] fix that fixes the build with bleeding edge tooling - build warning fix on GCC 5.1 - vm86 fix plus self-test to make it harder to break it again" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers x86/asm/entry/32, selftests: Add a selftest for kernel entries from VM86 mode x86/boot: Add CONFIG_PARAVIRT_SPINLOCKS quirk to arch/x86/boot/compressed/misc.h x86/asm/entry/32: Really make user_mode() work correctly for VM86 mode	2015-06-05 10:03:48 -07:00
Paolo Bonzini	6d396b5520	KVM: x86: advertise KVM_CAP_X86_SMM ... and we're done. :) Because SMBASE is usually relocated above 1M on modern chipsets, and SMM handlers might indeed rely on 4G segment limits, we only expose it if KVM is able to run the guest in big real mode. This includes any of VMX+emulate_invalid_guest_state, VMX+unrestricted_guest, or SVM. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-05 17:26:38 +02:00
Paolo Bonzini	699023e239	KVM: x86: add SMM to the MMU role, support SMRAM address space This is now very simple to do. The only interesting part is a simple trick to find the right memslot in gfn_to_rmap, retrieving the address space from the spte role word. The same trick is used in the auditing code. The comment on top of union kvm_mmu_page_role has been stale forever, so remove it. Speaking of stale code, remove pad_for_nice_hex_output too: it was splitting the "access" bitfield across two bytes and thus had effectively turned into pad_for_ugly_hex_output. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-05 17:26:37 +02:00
Paolo Bonzini	9da0e4d5ac	KVM: x86: work on all available address spaces This patch has no semantic change, but it prepares for the introduction of a second address space for system management mode. A new function x86_set_memory_region (and the "slots_lock taken" counterpart __x86_set_memory_region) is introduced in order to operate on all address spaces when adding or deleting private memory slots. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-05 17:26:37 +02:00
Paolo Bonzini	54bf36aac5	KVM: x86: use vcpu-specific functions to read/write/translate GFNs We need to hide SMRAM from guests not running in SMM. Therefore, all uses of kvm_read_guest* and kvm_write_guest* must be changed to check whether the VCPU is in system management mode and use a different set of memslots. Switch from kvm_* to the newly-introduced kvm_vcpu_*, which call into kvm_arch_vcpu_memslots_id. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-05 17:26:36 +02:00
Andy Lutomirski	cf991de2f6	x86/asm/msr: Make wrmsrl_safe() a function The wrmsrl_safe macro performs invalid shifts if the value argument is 32 bits. This makes it unnecessarily awkward to write code that puts an unsigned long into an MSR. Convert it to a real inline function. For inspiration, see: `7c74d5b7b7` ("x86/asm/entry/64: Fix MSR_IA32_SYSENTER_CS MSR value"). Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: <linux-kernel@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> [ Applied small improvements. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-05 09:41:22 +02:00
Paolo Bonzini	64d6067057	KVM: x86: stubs for SMM support This patch adds the interface between x86.c and the emulator: the SMBASE register, a new emulator flag, the RSM instruction. It also adds a new request bit that will be used by the KVM_SMI ioctl. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 16:01:45 +02:00
Paolo Bonzini	f077825a87	KVM: x86: API changes for SMM support This patch includes changes to the external API for SMM support. Userspace can predicate the availability of the new fields and ioctls on a new capability, KVM_CAP_X86_SMM, which is added at the end of the patch series. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 16:01:11 +02:00
Paolo Bonzini	a584539b24	KVM: x86: pass the whole hflags field to emulator and back The hflags field will contain information about system management mode and will be useful for the emulator. Pass the entire field rather than just the guest-mode information. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 16:01:05 +02:00
Paolo Bonzini	609e36d372	KVM: x86: pass host_initiated to functions that read MSRs SMBASE is only readable from SMM for the VCPU, but it must be always accessible if userspace is accessing it. Thus, all functions that read MSRs are changed to accept a struct msr_data; the host_initiated and index fields are pre-initialized, while the data field is filled on return. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 16:01:00 +02:00
Ingo Molnar	d36f947904	x86/asm/entry: Move arch/x86/include/asm/calling.h to arch/x86/entry/ asm/calling.h is private to the entry code, make this more apparent by moving it to the new arch/x86/entry/ directory. Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-04 07:37:36 +02:00
Stephen Rothwell	d6472302f2	x86/mm: Decouple <linux/vmalloc.h> from <asm/io.h> Nothing in <asm/io.h> uses anything from <linux/vmalloc.h>, so remove it from there and fix up the resulting build problems triggered on x86 {64\|32}-bit {def\|allmod\|allno}configs. The breakages were triggering in places where x86 builds relied on vmalloc() facilities but did not include <linux/vmalloc.h> explicitly and relied on the implicit inclusion via <asm/io.h>. Also add: - <linux/init.h> to <linux/io.h> - <asm/pgtable_types> to <asm/io.h> ... which were two other implicit header file dependencies. Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> [ Tidied up the changelog. ] Acked-by: David Miller <davem@davemloft.net> Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Vinod Koul <vinod.koul@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Colin Cross <ccross@android.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: James E.J. Bottomley <JBottomley@odin.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Suma Ramars <sramars@cisco.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-03 12:02:00 +02:00
Ingo Molnar	71966f3a0b	Merge branch 'locking/core' into x86/core, to prepare for dependent patch Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-03 10:07:35 +02:00
Ingo Molnar	34e7724c07	Merge branches 'x86/mm', 'x86/build', 'x86/apic' and 'x86/platform' into x86/core, to apply dependent patch Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-03 10:05:18 +02:00
Jan Beulich	2bf557ea3f	x86/asm/entry/64: Use negative immediates for stack adjustments Doing so allows adjustments by 128 bytes (occurring for REMOVE_PT_GPREGS_FROM_STACK 8 uses) to be expressed with a single byte immediate. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/556C660F020000780007FB60@mail.emea.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-02 10:10:09 +02:00
Andy Lutomirski	425be5679f	x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers The early_idt_handlers asm code generates an array of entry points spaced nine bytes apart. It's not really clear from that code or from the places that reference it what's going on, and the code only works in the first place because GAS never generates two-byte JMP instructions when jumping to global labels. Clean up the code to generate the correct array stride (member size) explicitly. This should be considerably more robust against screw-ups, as GAS will warn if a .fill directive has a negative count. Using '. =' to advance would have been even more robust (it would generate an actual error if it tried to move backwards), but it would pad with nulls, confusing anyone who tries to disassemble the code. The new scheme should be much clearer to future readers. While we're at it, improve the comments and rename the array and common code. Binutils may start relaxing jumps to non-weak labels. If so, this change will fix our build, and we may need to backport this change. Before, on x86_64: 0000000000000000 <early_idt_handlers>: 0: 6a 00 pushq $0x0 2: 6a 00 pushq $0x0 4: e9 00 00 00 00 jmpq 9 <early_idt_handlers+0x9> 5: R_X86_64_PC32 early_idt_handler-0x4 ... 48: 66 90 xchg %ax,%ax 4a: 6a 08 pushq $0x8 4c: e9 00 00 00 00 jmpq 51 <early_idt_handlers+0x51> 4d: R_X86_64_PC32 early_idt_handler-0x4 ... 117: 6a 00 pushq $0x0 119: 6a 1f pushq $0x1f 11b: e9 00 00 00 00 jmpq 120 <early_idt_handler> 11c: R_X86_64_PC32 early_idt_handler-0x4 After: 0000000000000000 <early_idt_handler_array>: 0: 6a 00 pushq $0x0 2: 6a 00 pushq $0x0 4: e9 14 01 00 00 jmpq 11d <early_idt_handler_common> ... 48: 6a 08 pushq $0x8 4a: e9 d1 00 00 00 jmpq 120 <early_idt_handler_common> 4f: cc int3 50: cc int3 ... 117: 6a 00 pushq $0x0 119: 6a 1f pushq $0x1f 11b: eb 03 jmp 120 <early_idt_handler_common> 11d: cc int3 11e: cc int3 11f: cc int3 Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Binutils <binutils@sourceware.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-02 09:39:40 +02:00
Ingo Molnar	f407a82586	Merge branch 'linus' into sched/core, to resolve conflict Conflicts: arch/sparc/include/asm/topology_64.h Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-02 08:05:42 +02:00
Ingo Molnar	131484c8da	x86/debug: Remove perpetually broken, unmaintainable dwarf annotations So the dwarf2 annotations in low level assembly code have become an increasing hindrance: unreadable, messy macros mixed into some of the most security sensitive code paths of the Linux kernel. These debug info annotations don't even buy the upstream kernel anything: dwarf driven stack unwinding has caused problems in the past so it's out of tree, and the upstream kernel only uses the much more robust framepointers based stack unwinding method. In addition to that there's a steady, slow bitrot going on with these annotations, requiring frequent fixups. There's no tooling and no functionality upstream that keeps it correct. So burn down the sick forest, allowing new, healthier growth: 27 files changed, 350 insertions(+), 1101 deletions(-) Someone who has the willingness and time to do this properly can attempt to reintroduce dwarf debuginfo in x86 assembly code plus dwarf unwinding from first principles, with the following conditions: - it should be maximally readable, and maximally low-key to 'ordinary' code reading and maintenance. - find a build time method to insert dwarf annotations automatically in the most common cases, for pop/push instructions that manipulate the stack pointer. This could be done for example via a preprocessing step that just looks for common patterns - plus special annotations for the few cases where we want to depart from the default. We have hundreds of CFI annotations, so automating most of that makes sense. - it should come with build tooling checks that ensure that CFI annotations are sensible. We've seen such efforts from the framepointer side, and there's no reason it couldn't be done on the dwarf side. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-02 07:57:48 +02:00
Marcelo Tosatti	611917258f	x86: kvmclock: add flag to indicate pvclock counts from zero Setting sched clock stable for kvmclock causes the printk timestamps to not start from zero, which is different from baremetal and can possibly break userspace. Add a flag to indicate that hypervisor sets clock base at zero when kvmclock is initialized. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-29 14:01:39 +02:00
Jan Beulich	7ba554b5ac	x86/asm/entry/32: Really make user_mode() work correctly for VM86 mode While commit `efa7045103` ("x86/asm/entry: Make user_mode() work correctly if regs came from VM86 mode") claims that "user_mode() is now identical to user_mode_vm()", this wasn't actually the case - no prior commit made it so. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5566EB0D020000780007E655@mail.emea.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-29 09:46:40 +02:00
Borislav Petkov	b01aec9b2c	EDAC: Cleanup atomic_scrub mess So first of all, this atomic_scrub() function's naming is bad. It looks like an atomic_t helper. Change it to edac_atomic_scrub(). The bigger problem is that this function is arch-specific and every new arch which doesn't necessarily need that functionality still needs to define it, otherwise EDAC doesn't compile. So instead of doing that and including arch-specific headers, have each arch define an EDAC_ATOMIC_SCRUB symbol which can be used in edac_mc.c for ifdeffery. Much cleaner. And we already are doing this with another symbol - EDAC_SUPPORT. This is also much cleaner than having CONFIG_EDAC enumerate all the arches which need/have EDAC support and drivers. This way I can kill the useless edac.h header in tile too. Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Chris Metcalf <cmetcalf@ezchip.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Doug Thompson <dougthompson@xmission.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linuxppc-dev@lists.ozlabs.org Cc: "Maciej W. Rozycki" <macro@codesourcery.com> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Paul Mackerras <paulus@samba.org> Cc: "Steven J. Hill" <Steven.Hill@imgtec.com> Cc: x86@kernel.org Signed-off-by: Borislav Petkov <bp@suse.de>	2015-05-28 15:31:53 +02:00
Paolo Bonzini	f36f3f2846	KVM: add "new" argument to kvm_arch_commit_memory_region This lets the function access the new memory slot without going through kvm_memslots and id_to_memslot. It will simplify the code when more than one address space will be supported. Unfortunately, the "const"ness of the new argument must be casted away in two places. Fixing KVM to accept const struct kvm_memory_slot pointers would require modifications in pretty much all architectures, and is left for later. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-28 10:42:58 +02:00
Dan Williams	ad5fb870c4	e820, efi: add ACPI 6.0 persistent memory types ACPI 6.0 formalizes e820-type-7 and efi-type-14 as persistent memory. Mark it "reserved" and allow it to be claimed by a persistent memory device driver. This definition is in addition to the Linux kernel's existing type-12 definition that was recently added in support of shipping platforms with NVDIMM support that predate ACPI 6.0 (which now classifies type-12 as OEM reserved). Note, /proc/iomem can be consulted for differentiating legacy "Persistent Memory (legacy)" E820_PRAM vs standard "Persistent Memory" E820_PMEM. Cc: Boaz Harrosh <boaz@plexistor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jens Axboe <axboe@fb.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-05-27 21:46:05 -04:00
Dasaratharaman Chandramouli	fb5d432722	tools/power turbostat: enable turbostat to support Knights Landing (KNL) Changes mainly to account for minor differences in Knights Landing(KNL): 1. KNL supports C1 and C6 core states. 2. KNL supports PC2, PC3 and PC6 package states. 3. KNL has a different encoding of the TURBO_RATIO_LIMIT MSR Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2015-05-27 18:03:57 -04:00
Bartosz Golaszewski	960d447b94	x86: Remove cpu_sibling_mask() and cpu_core_mask() These functions are arch-specific and duplicate the functionality of macros defined in linux/include/topology.h. Remove them as all the callers in x86 have now switched to using the topology_**_cpumask() family. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Benoit Cousson <bcousson@baylibre.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jean Delvare <jdelvare@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: http://lkml.kernel.org/r/1432645896-12588-10-git-send-email-bgolaszewski@baylibre.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 15:22:17 +02:00
Bartosz Golaszewski	06931e6224	sched/topology: Rename topology_thread_cpumask() to topology_sibling_cpumask() Rename topology_thread_cpumask() to topology_sibling_cpumask() for more consistency with scheduler code. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Benoit Cousson <bcousson@baylibre.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jean Delvare <jdelvare@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: http://lkml.kernel.org/r/1432645896-12588-2-git-send-email-bgolaszewski@baylibre.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 15:22:15 +02:00
Luis R. Rodriguez	cb32edf65b	x86/mm/pat: Wrap pat_enabled into a function API We use pat_enabled in x86-specific code to see if PAT is enabled or not but we're granting full access to it even though readers do not need to set it. If, for instance, we granted access to it to modules later they then could override the variable setting... no bueno. This renames pat_enabled to a new static variable __pat_enabled. Folks are redirected to use pat_enabled() now. Code that sets this can only be internal to pat.c. Apart from the early kernel parameter "nopat" to disable PAT, we also have a few cases that disable it later and make use of a helper pat_disable(). It is wrapped under an ifdef but since that code cannot run unless PAT was enabled its not required to wrap it with ifdefs, unwrap that. Likewise, since "nopat" doesn't really change non-PAT systems just remove that ifdef as well. Although we could add and use an early_param_off(), these helpers don't use __read_mostly but we want to keep __read_mostly for __pat_enabled as this is a hot path -- upon boot, for instance, a simple guest may see ~4k accesses to pat_enabled(). Since __read_mostly early boot params are not that common we don't add a helper for them just yet. Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Walls <awalls@md.metrocast.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Doug Ledford <dledford@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kyle McMartin <kyle@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1430425520-22275-3-git-send-email-mcgrof@do-not-panic.com Link: http://lkml.kernel.org/r/1432628901-18044-13-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 14:41:01 +02:00
Luis R. Rodriguez	7d010fdf29	x86/mm/mtrr: Avoid #ifdeffery with phys_wc_to_mtrr_index() There is only one user but since we're going to bury MTRR next out of access to drivers, expose this last piece of API to drivers in a general fashion only needing io.h for access to helpers. Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Abhilash Kesavan <a.kesavan@samsung.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Antonino Daplas <adaplas@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Cristian Stoica <cristian.stoica@freescale.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Thierry Reding <treding@nvidia.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Ville Syrjälä <syrjala@sci.fi> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will.deacon@arm.com> Cc: dri-devel@lists.freedesktop.org Link: http://lkml.kernel.org/r/1429722736-4473-1-git-send-email-mcgrof@do-not-panic.com Link: http://lkml.kernel.org/r/1432628901-18044-11-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 14:41:00 +02:00
Toshi Kani	b73522e0c1	x86/mm/mtrr: Enhance MTRR checks in kernel mapping helpers This patch adds the argument 'uniform' to mtrr_type_lookup(), which gets set to 1 when a given range is covered uniformly by MTRRs, i.e. the range is fully covered by a single MTRR entry or the default type. Change pud_set_huge() and pmd_set_huge() to honor the 'uniform' flag to see if it is safe to create a huge page mapping in the range. This allows them to create a huge page mapping in a range covered by a single MTRR entry of any memory type. It also detects a non-optimal request properly. They continue to check with the WB type since it does not effectively change the uniform mapping even if a request spans multiple MTRR entries. pmd_set_huge() logs a warning message to a non-optimal request so that driver writers will be aware of such a case. Drivers should make a mapping request aligned to a single MTRR entry when the range is covered by MTRRs. Signed-off-by: Toshi Kani <toshi.kani@hp.com> [ Realign, flesh out comments, improve warning message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Cc: linux-mm <linux-mm@kvack.org> Cc: pebolle@tiscali.nl Link: http://lkml.kernel.org/r/1431714237-880-7-git-send-email-toshi.kani@hp.com Link: http://lkml.kernel.org/r/1432628901-18044-8-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 14:40:58 +02:00
Toshi Kani	3d3ca416d9	x86/mm/mtrr: Use symbolic define as a retval for disabled MTRRs mtrr_type_lookup() returns verbatim 0xFF when MTRRs are disabled. This patch defines MTRR_TYPE_INVALID to clarify the meaning of this value, and documents its usage. Document the return values of the kernel virtual address mapping helpers pud_set_huge(), pmd_set_huge, pud_clear_huge() and pmd_clear_huge(). There is no functional change in this patch. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Cc: linux-mm <linux-mm@kvack.org> Cc: pebolle@tiscali.nl Link: http://lkml.kernel.org/r/1431714237-880-5-git-send-email-toshi.kani@hp.com Link: http://lkml.kernel.org/r/1432628901-18044-5-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 14:40:57 +02:00
Toshi Kani	9b3aca6208	x86/mm/mtrr: Fix MTRR state checks in mtrr_type_lookup() 'mtrr_state.enabled' contains the FE (fixed MTRRs enabled) and E (MTRRs enabled) flags in MSR_MTRRdefType. Intel SDM, section 11.11.2.1, defines these flags as follows: - All MTRRs are disabled when the E flag is clear. The FE flag has no affect when the E flag is clear. - The default type is enabled when the E flag is set. - MTRR variable ranges are enabled when the E flag is set. - MTRR fixed ranges are enabled when both E and FE flags are set. MTRR state checks in __mtrr_type_lookup() do not match with SDM. Hence, this patch makes the following changes: - The current code detects MTRRs disabled when both E and FE flags are clear in mtrr_state.enabled. Fix to detect MTRRs disabled when the E flag is clear. - The current code does not check if the FE bit is set in mtrr_state.enabled when looking at the fixed entries. Fix to check the FE flag. - The current code returns the default type when the E flag is clear in mtrr_state.enabled. However, the default type is UC when the E flag is clear. Remove the code as this case is handled as MTRR disabled with the 1st change. In addition, this patch defines the E and FE flags in mtrr_state.enabled as follows. - FE flag: MTRR_STATE_MTRR_FIXED_ENABLED - E flag: MTRR_STATE_MTRR_ENABLED print_mtrr_state() and x86_get_mtrr_mem_range() are also updated accordingly. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Cc: linux-mm <linux-mm@kvack.org> Cc: pebolle@tiscali.nl Link: http://lkml.kernel.org/r/1431714237-880-4-git-send-email-toshi.kani@hp.com Link: http://lkml.kernel.org/r/1432628901-18044-4-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 14:40:56 +02:00
Ingo Molnar	d563a6bb3d	Linux 4.1-rc5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVYnloAAoJEHm+PkMAQRiGCgkH/j3r2djOOm4h83FXrShaHORY p8TBI3FNj4fzLk2PfzqbmiDw2T2CwygB+pxb2Ac9CE99epw8qPk2SRvPXBpdKR7t lolhhwfzApLJMZbhzNLVywUCDUhFoiEWRhmPqIfA3WXFcIW3t5VNXAoIFjV5HFr6 sYUlaxSI1XiQ5tldVv8D6YSFHms41pisziBIZmzhIUg10P6Vv3D0FbE74fjAJwx0 +08zj3EO7yQMv7Aeeq8F8AJ3628142rcZf0NWF5ohlKLRK3gt0cl9jO5U4Co2dDt 29v03LP5EI6jDKkIbuWlqRMq9YxJz7N3wnkzV0EJiqXucoqPLFDqzbxB4gnS1pI= =7vbA -----END PGP SIGNATURE----- Merge tag 'v4.1-rc5' into x86/mm, to refresh the tree before applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 14:40:10 +02:00
Ingo Molnar	83242c5158	x86/fpu: Make WARN_ON_FPU() more robust in the !CONFIG_X86_DEBUG_FPU case Make sure the WARN_ON_FPU() macro consumes the macro argument, to avoid 'unused variable' build warnings if the only use of a variable is in debugging code. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 14:28:30 +02:00
Ingo Molnar	d65fcd608f	x86/fpu: Simplify copy_kernel_to_xregs_booting() copy_kernel_to_xregs_booting() has a second parameter that is the mask of xfeatures that should be copied - but this parameter is always -1. Simplify the call site of this function, this also makes it more similar to the function call signature of other copy_kernel_to*regs() functions. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 14:11:33 +02:00
Ingo Molnar	003e2e8b57	x86/fpu: Standardize the parameter type of copy_kernel_to_fpregs() Bring the __copy_fpstate_to_fpregs() and copy_fpstate_to_fpregs() functions in line with the parameter passing convention of other kernel-to-FPU-registers copying functions: pass around an in-memory FPU register state pointer, instead of struct fpu *. NOTE: This patch also changes the assembly constraint of the FXSAVE-leak workaround from 'fpu->fpregs_active' to 'fpstate' - but that is fine, as we only need a valid memory address there for the FILDL instruction. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 14:11:32 +02:00
Ingo Molnar	9ccc27a5d2	x86/fpu: Remove error return values from copy_kernel_to_regs() functions None of the copy_kernel_to_regs() FPU register copying functions are supposed to fail, and all of them have debugging checks that enforce this. Remove their return values and simplify their call sites, which have redundant error checks and error handling code paths. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 14:11:30 +02:00
Ingo Molnar	3e1bf47e5c	x86/fpu: Rename copy_fpstate_to_fpregs() to copy_kernel_to_fpregs() Bring the __copy_fpstate_to_fpregs() and copy_fpstate_to_fpregs() functions in line with the naming of other kernel-to-FPU-registers copying functions. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 14:11:29 +02:00
Ingo Molnar	43b287b3f4	x86/fpu: Add debugging checks to all copy_kernel_to_*() functions Copying from in-kernel FPU context buffers to FPU registers are never supposed to fault. Add debugging checks to copy_kernel_to_fxregs() and copy_kernel_to_fregs() to double check this assumption. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 14:11:28 +02:00
Ingo Molnar	6a81d7eb33	x86/fpu: Rename fpu__activate_fpstate() to fpu__activate_fpstate_write() Remaining users of fpu__activate_fpstate() are all places that want to modify FPU registers, rename the function to fpu__activate_fpstate_write() according to this usage. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 14:11:26 +02:00
Ingo Molnar	0560281266	x86/fpu: Split out the fpu__activate_fpstate_read() method Currently fpu__activate_fpstate() is used for two distinct purposes: - read access by ptrace and core dumping, where in the core dumping case the current task's FPU state may be examined as well. - write access by ptrace, which modifies FPU registers and expects the modified registers to be reloaded on the next context switch. Split out the reading side into fpu__activate_fpstate_read(). ( Note that this is just a pure duplication of fpu__activate_fpstate() for the time being, we'll optimize the new function in the next patch. ) Cc: Andy Lutomirski <luto@amacapital.net> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 14:11:24 +02:00
Ingo Molnar	47f01e8cc2	x86/fpu: Fix FPU register read access to the current task Bobby Powers reported the following FPU warning during ELF coredumping: WARNING: CPU: 0 PID: 27452 at arch/x86/kernel/fpu/core.c:324 fpu__activate_stopped+0x8a/0xa0() This warning unearthed an invalid assumption about fpu__activate_stopped() that I added in: `67e97fc2ec` ("x86/fpu: Rename init_fpu() to fpu__unlazy_stopped() and add debugging check") the old init_fpu() function had an (intentional but obscure) side effect: when FPU registers are accessed for the current task, for reading, then it synchronized live in-register FPU state with the fpstate by saving it. So fix this bug by saving the FPU if we are the current task. We'll still warn in fpu__save() if this is called for not yet stopped child tasks, so the debugging check is still preserved. Also rename the function to fpu__activate_fpstate(), because it's not exclusively used for stopped tasks, but for the current task as well. ( Note that this bug calls for a cleaner separation of access-for-read and access-for-modification FPU methods, but we'll do that in separate patches. ) Reported-by: Bobby Powers <bobbypowers@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 12:40:18 +02:00
Ingo Molnar	8c05f05edb	x86/fpu: Micro-optimize the copy_xregs_to_kernel() and copy_kernel_to_xregs() functions The copy_xregs_to_kernel() and copy_kernel_to_xregs() functions are used to copy FPU registers to kernel memory and vice versa. They are never expected to fail, yet they have a return code, mostly because that way they can share the assembly macros with the copyuser() functions. This error code is then silently ignored by the context switching and other code - which made the bug in: `b8c1b8ea7b` ("x86/fpu: Fix FPU state save area alignment bug") harder to fix than necessary. So remove the return values and check for no faults when FPU debugging is enabled in the .config. This improves the eagerfpu context switching fast path by a couple of instructions, when FPU debugging is disabled: ffffffff810407fa: 89 c2 mov %eax,%edx ffffffff810407fc: 48 0f ae 2f xrstor64 (%rdi) ffffffff81040800: 31 c0 xor %eax,%eax -ffffffff81040802: eb 0a jmp ffffffff8104080e <__switch_to+0x321> +ffffffff81040802: eb 16 jmp ffffffff8104081a <__switch_to+0x32d> ffffffff81040804: 31 c0 xor %eax,%eax ffffffff81040806: 48 0f ae 8b c0 05 00 fxrstor64 0x5c0(%rbx) ffffffff8104080d: 00 Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-25 12:49:40 +02:00
Ingo Molnar	685c961624	x86/fpu: Improve the initialization logic of 'err' around xstate_fault() constraints There's a confusing aspect of how xstate_fault() constraints are handled by the FPU register/memory copying functions in fpu/internal.h: they use "0" (0) to signal that the asm code will not always set 'err' to a valid value. But 'err' is already initialized to 0 in C code, which is duplicated by the asm() constraint. Should the initialization value ever be changed, it might become subtly inconsistent with the not too clear asm() constraint. Use 'err' as the value of the input variable instead, to clarify this all. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-25 12:49:38 +02:00
Ingo Molnar	87b6559d0a	x86/fpu: Improve xstate_fault() handling There are two problems with xstate_fault handling: - The xstate_fault() macro takes an argument, but that's propagated into the assembly named label as well. This is technically correct currently but might result in failures if anytime a more complex argument is used. So use a separate '_err' name instead for the label. - All the xstate_fault() using functions have an error variable named 'err', which is an output variable to the asm() they are using. The problem is, it's not always set by the asm(), in which case the compiler might optimize out its initialization, so that the C variable 'err' might become corrupted after the asm() - confusing anyone who tries to take advantage of this variable after the asm(). Mark it an input variable as well. This is a latent bug currently, but an upcoming debug patch will make use of 'err'. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-25 12:49:37 +02:00
Ingo Molnar	87dafd41a4	x86/fpu: Rename xstate related 'fx' references to 'xstate' So the xstate code was probably first copied from the fxregs code, hence it carried over the 'fx' naming for the state pointer variable. But this is slightly confusing, as we usually on call the (legacy) MMX/SSE state 'fx', both in data structures and in the functions build around FXSAVE/FXRSTOR. So rename it to 'xstate' to make it more apparent what it is related to. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-25 12:49:35 +02:00
Ingo Molnar	fd169b0541	x86/fpu: Move the xstate copying functions into fpu/internal.h All the other register<-> memory copying functions are defined in fpu/internal.h, so move the xstate variants there too. Beyond being more consistent, this also allows FPU debugging checks to be added to them. (Because they can now use the macros defined in fpu/internal.h.) Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-25 12:49:33 +02:00
Ingo Molnar	3152657f10	Linux 4.1-rc5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVYnloAAoJEHm+PkMAQRiGCgkH/j3r2djOOm4h83FXrShaHORY p8TBI3FNj4fzLk2PfzqbmiDw2T2CwygB+pxb2Ac9CE99epw8qPk2SRvPXBpdKR7t lolhhwfzApLJMZbhzNLVywUCDUhFoiEWRhmPqIfA3WXFcIW3t5VNXAoIFjV5HFr6 sYUlaxSI1XiQ5tldVv8D6YSFHms41pisziBIZmzhIUg10P6Vv3D0FbE74fjAJwx0 +08zj3EO7yQMv7Aeeq8F8AJ3628142rcZf0NWF5ohlKLRK3gt0cl9jO5U4Co2dDt 29v03LP5EI6jDKkIbuWlqRMq9YxJz7N3wnkzV0EJiqXucoqPLFDqzbxB4gnS1pI= =7vbA -----END PGP SIGNATURE----- Merge branch 'linus' into x86/fpu Resolve semantic conflict in arch/x86/kvm/cpuid.c with: `c447e76b4c` ("kvm/fpu: Enable eager restore kvm FPU for MPX") By removing the FPU internal include files. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-25 09:39:19 +02:00
Ingo Molnar	b8c1b8ea7b	x86/fpu: Fix FPU state save area alignment bug On most configs task-struct is cache line aligned, which makes the XSAVE area's 64-byte required alignment work out fine. But on some .config's task_struct is aligned only to 16 bytes (enforced by ARCH_MIN_TASKALIGN), which makes things like fpu__copy() (that XSAVEOPT uses) not work so well. I broke this in: `7366ed771f` ("x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again)") which embedded the fpstate in the task_struct. The alignment requirements of the FPU code were originally present in ARCH_MIN_TASKALIGN, which still has a value of 16, which was the alignment requirement of the FPU state area prior XSAVE. But this link was not documented (and not required) and the link got lost when the FPU state area was made dynamic years ago. With XSAVEOPT the minimum alignment requirment went up to 64 bytes, and the embedding of the FPU state area in task_struct exposed it again - and '16' was not increased to '64'. So fix this bug, but also try to address the underlying lost link of information that made it easier to happen: - document ARCH_MIN_TASKALIGN a bit better - use alignof() to recover the current alignment requirements. This would work in the future as well, should the alignment requirements go up to 128 bytes with things like AVX512. ( We should probably also use the vSMP alignment rules for all of x86, but that's for another patch. ) Reported-by: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-25 09:38:04 +02:00
Andy Lutomirski	cdeb604894	x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers The early_idt_handlers asm code generates an array of entry points spaced nine bytes apart. It's not really clear from that code or from the places that reference it what's going on, and the code only works in the first place because GAS never generates two-byte JMP instructions when jumping to global labels. Clean up the code to generate the correct array stride (member size) explicitly. This should be considerably more robust against screw-ups, as GAS will warn if a .fill directive has a negative count. Using '. =' to advance would have been even more robust (it would generate an actual error if it tried to move backwards), but it would pad with nulls, confusing anyone who tries to disassemble the code. The new scheme should be much clearer to future readers. While we're at it, improve the comments and rename the array and common code. Binutils may start relaxing jumps to non-weak labels. If so, this change will fix our build, and we may need to backport this change. Before, on x86_64: 0000000000000000 <early_idt_handlers>: 0: 6a 00 pushq $0x0 2: 6a 00 pushq $0x0 4: e9 00 00 00 00 jmpq 9 <early_idt_handlers+0x9> 5: R_X86_64_PC32 early_idt_handler-0x4 ... 48: 66 90 xchg %ax,%ax 4a: 6a 08 pushq $0x8 4c: e9 00 00 00 00 jmpq 51 <early_idt_handlers+0x51> 4d: R_X86_64_PC32 early_idt_handler-0x4 ... 117: 6a 00 pushq $0x0 119: 6a 1f pushq $0x1f 11b: e9 00 00 00 00 jmpq 120 <early_idt_handler> 11c: R_X86_64_PC32 early_idt_handler-0x4 After: 0000000000000000 <early_idt_handler_array>: 0: 6a 00 pushq $0x0 2: 6a 00 pushq $0x0 4: e9 14 01 00 00 jmpq 11d <early_idt_handler_common> ... 48: 6a 08 pushq $0x8 4a: e9 d1 00 00 00 jmpq 120 <early_idt_handler_common> 4f: cc int3 50: cc int3 ... 117: 6a 00 pushq $0x0 119: 6a 1f pushq $0x1f 11b: eb 03 jmp 120 <early_idt_handler_common> 11d: cc int3 11e: cc int3 11f: cc int3 Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Binutils <binutils@sourceware.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-24 08:35:03 +02:00
Linus Torvalds	f0d8690ad4	This pull request includes a fix for two oopses, one on PPC and on x86. The rest is fixes for bugs with newer Intel processors. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJVXGLvAAoJEL/70l94x66DxOAH/270hZu3Rt0Tt04LYs0uy1B3 6a91Hs4YsYALe0j6IVZUQ2ngO+N4DPsw/Lusutd82jWX13UG221w1rbUtUpNF46r bPf7Eh4AdGhNehGtkllRKrBmZEDkZVngZWsftFvzA+rmbV/HVzFU5SfuPdhzYAL5 WpQTzou0w63c3Gh6hymLsq/x/zUScMRoFdyjIEJTRN+AOnnro9I/nj4O83OEF8uv Hp4VZ7TDG55xTloiC5WSimTCWPIZFDMiuim1iFo/OOOIGjfjdM8IBKwer4zIXa/S VD71lYu267yxIabYpbEOjd+dcZ5myJhy4ePWmWHZczsOeklbvMouWMD7/1U2Gpg= =x0LU -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "This includes a fix for two oopses, one on PPC and on x86. The rest is fixes for bugs with newer Intel processors" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm/fpu: Enable eager restore kvm FPU for MPX Revert "KVM: x86: drop fpu_activate hook" kvm: fix crash in kvm_vcpu_reload_apic_access_page KVM: MMU: fix SMAP virtualization KVM: MMU: fix CR4.SMEP=1, CR0.WP=0 with shadow pages KVM: MMU: fix smap permission check KVM: PPC: Book3S HV: Fix list traversal in error case	2015-05-21 20:15:16 -07:00
Paolo Bonzini	a9b4fb7e79	Merge branch 'kvm-master' into kvm-next Grab MPX bugfix, and fix conflicts against Rik's adaptive FPU deactivation patch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-20 12:31:37 +02:00
Liang Li	c447e76b4c	kvm/fpu: Enable eager restore kvm FPU for MPX The MPX feature requires eager KVM FPU restore support. We have verified that MPX cannot work correctly with the current lazy KVM FPU restore mechanism. Eager KVM FPU restore should be enabled if the MPX feature is exposed to VM. Signed-off-by: Yang Zhang <yang.z.zhang@intel.com> Signed-off-by: Liang Li <liang.z.li@intel.com> [Also activate the FPU on AMD processors. - Paolo] Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-20 12:30:26 +02:00
Paolo Bonzini	0fdd74f778	Revert "KVM: x86: drop fpu_activate hook" This reverts commit `4473b570a7`. We'll use the hook again. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-20 12:30:15 +02:00
Ingo Molnar	b11ca7fbc7	x86/fpu/xstate: Use explicit parameter in xstate_fault() While looking at xstate.h it took me some time to realize that 'xstate_fault' uses 'err' as a silent parameter. This is not obvious at the call site, at all. Make it an explicit macro argument, so that the syntactic connection is easier to see. Also explain xstate_fault() a bit. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-20 10:02:22 +02:00
Xiao Guangrong	edc90b7dc4	KVM: MMU: fix SMAP virtualization KVM may turn a user page to a kernel page when kernel writes a readonly user page if CR0.WP = 1. This shadow page entry will be reused after SMAP is enabled so that kernel is allowed to access this user page Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu once CR4.SMAP is updated Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-19 20:52:36 +02:00
Christoph Hellwig	c546d5db75	remove scatterlist.h generation from arch Kbuild files Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-05-19 09:14:34 -06:00
Feng Wu	f6b3c72c23	x86/irq: Define a global vector for VT-d Posted-Interrupts Currently, we use a global vector as the Posted-Interrupts Notification Event for all the vCPUs in the system. We need to introduce another global vector for VT-d Posted-Interrtups, which will be used to wakeup the sleep vCPU when an external interrupt from a direct-assigned device happens for that vCPU. [ tglx: Removed a gazillion of extra newlines ] Signed-off-by: Feng Wu <feng.wu@intel.com> Cc: jiang.liu@linux.intel.com Link: http://lkml.kernel.org/r/1432026437-16560-4-git-send-email-feng.wu@intel.com Suggested-by: Yang Zhang <yang.z.zhang@intel.com> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-05-19 15:51:17 +02:00
Ingo Molnar	b1b64dc355	x86/fpu: Reorganize fpu/internal.h fpu/internal.h has grown organically, with not much high level structure, which hurts its readability. Organize the various definitions into 5 sections: - high level FPU state functions - FPU/CPU feature flag helpers - fpstate handling functions - FPU context switching helpers - misc helper functions Other related changes: - Move MXCSR_DEFAULT to fpu/types.h. - drop the unused X87_FSW_ES define No change in functionality. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:12 +02:00
Ingo Molnar	e97131a839	x86/fpu: Add CONFIG_X86_DEBUG_FPU=y FPU debugging code There are various internal FPU state debugging checks that never trigger in practice, but which are useful for FPU code development. Separate these out into CONFIG_X86_DEBUG_FPU=y, and also add a couple of new ones. The size difference is about 0.5K of code on defconfig: text data bss filename 15028906 2578816 1638400 vmlinux 15029430 2578816 1638400 vmlinux ( Keep this enabled by default until the new FPU code is debugged. ) Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:12 +02:00
Ingo Molnar	e1884d69f6	x86/fpu: Pass 'struct fpu' to fpu__restore() This cleans up the call sites and the function a bit, and also makes it more symmetric with the other high level FPU state handling functions. It's still only valid for the current task, as we copy to the FPU registers of the current CPU. No change in functionality. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:11 +02:00
Ingo Molnar	5fd402dfa7	x86/fpu/xstate: Clean up setup_xstate_comp() call So call setup_xstate_comp() from the xstate init code, not from the generic fpu__init_system() code. This allows us to remove the protytype from xstate.h as well. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:11 +02:00
Ingo Molnar	489e9c0188	x86/fpu: Clean up xstate feature reservation Put MPX support into its separate high level structure, and also replace the fixed YMM, LWP and MPX structures in xregs_state with just reservations - their exact offsets in the structure will depend on the CPU and no code actually relies on those fields. No change in functionality. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:10 +02:00
Ingo Molnar	bdf80d1040	x86/fpu: Document the various fpregs state formats Document all the structures that make up 'struct fpu'. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:09 +02:00
Ingo Molnar	aeb997b9f2	x86/fpu: Change fpu->fpregs_active from 'int' to 'char', add lazy switching comments Improve the memory layout of 'struct fpu': - change ->fpregs_active from 'int' to 'char' - it's just a single flag and modern x86 CPUs can do efficient byte accesses. - pack related fields closer to each other: often 'fpu->state' will not be touched, while the other fields will - so pack them into a group. Also add comments to each field, describing their purpose, and add some background information about lazy restores. Also fix an obsolete, lazy switching related comment in fpu_copy()'s description. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:09 +02:00
Ingo Molnar	c47ada305d	x86/fpu: Harmonize FPU register state types Use these consistent names: struct fregs_state # was: i387_fsave_struct struct fxregs_state # was: i387_fxsave_struct struct swregs_state # was: i387_soft_struct struct xregs_state # was: xsave_struct union fpregs_state # was: thread_xstate Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:09 +02:00
Ingo Molnar	0c306bcfba	x86/fpu: Factor out the FPU regset code into fpu/regset.c So much of fpu/core.c is the regset code, but it just obscures the generic FPU state machine logic. Factor out the regset code into fpu/regset.c, where it can be read in isolation. This affects one API: fpu__activate_stopped() has to be made available from the core to fpu/regset.c. No change in functionality. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:09 +02:00
Ingo Molnar	b992c660d3	x86/fpu: Factor out fpu/signal.c fpu/xstate.c has a lot of generic FPU signal frame handling routines, move them into a separate file: fpu/signal.c. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:08 +02:00
Ingo Molnar	c681314421	x86/fpu: Rename all the fpregs, xregs, fxregs and fregs handling functions Standardize the naming of the various functions that copy register content in specific FPU context formats: copy_fxregs_to_kernel() # was: fpu_fxsave() copy_xregs_to_kernel() # was: xsave_state() copy_kernel_to_fregs() # was: frstor_checking() copy_kernel_to_fxregs() # was: fxrstor_checking() copy_kernel_to_xregs() # was: fpu_xrstor_checking() copy_kernel_to_xregs_booting() # was: xrstor_state_booting() copy_fregs_to_user() # was: fsave_user() copy_fxregs_to_user() # was: fxsave_user() copy_xregs_to_user() # was: xsave_user() copy_user_to_fregs() # was: frstor_user() copy_user_to_fxregs() # was: fxrstor_user() copy_user_to_xregs() # was: xrestore_user() copy_user_to_fpregs_zeroing() # was: restore_user_xstate() Eliminate fpu_xrstor_checking(), because it was just a wrapper. No change in functionality. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:08 +02:00
Ingo Molnar	815418890e	x86/fpu: Move restore_init_xstate() out of fpu/internal.h Move restore_init_xstate() next to its sole caller. Also rename it to copy_init_fpstate_to_fpregs() and add some comments about what it does. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:08 +02:00
Ingo Molnar	6f57502310	x86/fpu: Generalize 'init_xstate_ctx' So the handling of init_xstate_ctx has a layering violation: both 'struct xsave_struct' and 'union thread_xstate' have a 'struct i387_fxsave_struct' member: xsave_struct::i387 thread_xstate::fxsave The handling of init_xstate_ctx is generic, it is used on all CPUs, with or without XSAVE instruction. So it's confusing how the generic code passes around and handles an XSAVE specific format. What we really want is for init_xstate_ctx to be a proper fpstate and we use its ::fxsave and ::xsave members, as appropriate. Since the xsave_struct::i387 and thread_xstate::fxsave aliases each other this is not a functional problem. So implement this, and move init_xstate_ctx to the generic FPU code in the process. Also, since init_xstate_ctx is not XSAVE specific anymore, rename it to init_fpstate, and mark it __read_mostly, because it's only modified once during bootup, and used as a reference fpstate later on. There's no change in functionality. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:07 +02:00
Ingo Molnar	bf935b0b52	x86/fpu: Create 'union thread_xstate' helper for fpstate_init() fpstate_init() only uses fpu->state, so pass that in to it. This enables the cleanup we will do in the next patch. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:07 +02:00
Ingo Molnar	0aba697894	x86/fpu: Harmonize the names of the fpstate_init() helper functions Harmonize the inconsistent naming of these related functions: fpstate_init() finit_soft_fpu() => fpstate_init_fsoft() fx_finit() => fpstate_init_fxstate() fx_finit() => fpstate_init_fstate() # split out Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:07 +02:00
Ingo Molnar	e1cebad49c	x86/fpu: Factor out the exception error code handling code Factor out the FPU error code handling code from traps.c and fpu/internal.h and move them close to each other. Also convert the helper functions to 'struct fpu *', which further simplifies them. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:06 +02:00
Ingo Molnar	59a36d16be	x86/fpu: Factor out fpu/regset.h from fpu/internal.h Only a few places use the regset definitions, so factor them out. Also fix related header dependency assumptions. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:06 +02:00
Ingo Molnar	fcbc99c403	x86/fpu: Split out fpu/signal.h from fpu/internal.h for signal frame handling functions Most of the FPU does not use them, so split it out and include them in signal.c and ia32_signal.c Also fix header file dependency assumption in fpu/core.c. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:05 +02:00
Ingo Molnar	05012c13f6	x86/fpu: Move is_ia32*frame() helpers out of fpu/internal.h Move them to their only user. This makes the code easier to read, the header is less cluttered, and it also speeds up the build a bit. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:05 +02:00
Ingo Molnar	fbce778246	x86/fpu: Merge fpu__reset() and fpu__clear() With recent cleanups and fixes the fpu__reset() and fpu__clear() functions have become almost identical in functionality: the only difference is that fpu__reset() assumed that the fpstate was already active in the eagerfpu case, while fpu__clear() activated it if it was inactive. This distinction almost never matters, the only case where such fpstate activation happens if if the init thread (PID 1) gets exec()-ed for the first time. So keep fpu__clear() and change all fpu__reset() uses to fpu__clear() to simpify the logic. ( In a later patch we'll further simplify fpu__clear() by making sure that all contexts it is called on are already active. ) Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:05 +02:00
Ingo Molnar	82c0e45eb5	x86/fpu: Move the signal frame handling code closer to each other Consolidate more signal frame related functions: text data bss dec filename 14108070 2575280 1634304 18317654 vmlinux.before 14107944 2575344 1634304 18317592 vmlinux.after Also, while moving it, rename alloc_mathframe() to fpu__alloc_mathframe(). Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:04 +02:00
Ingo Molnar	9dfe99b755	x86/fpu: Rename restore_xstate_sig() to fpu__restore_sig() restore_xstate_sig() is a misnomer: it's not limited to 'xstate' at all, it is the high level 'restore FPU state from a signal frame' function that works with all legacy FPU formats as well. Rename it (and its helper) accordingly, and also move it to the fpu__*() namespace. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:04 +02:00
Ingo Molnar	04c8e01d50	x86/fpu: Move fpu__clear() to 'struct fpu *' parameter passing Do it like all other high level FPU state handling functions: they only know about struct fpu, not about the task. (Also remove a dead prototype while at it.) Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:04 +02:00
Ingo Molnar	6ffc152e46	x86/fpu: Move all the fpu__() high level methods closer to each other The fpu__() methods are closely related, but they are defined in scattered places within the FPU code. Concentrate them, and also uninline fpu__save(), fpu__drop() and fpu__reset() to save about 5K of kernel text on 64-bit kernels: text data bss dec filename 14113063 2575280 1634304 18322647 vmlinux.before 14108070 2575280 1634304 18317654 vmlinux.after Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:04 +02:00
Ingo Molnar	0e75c54f17	x86/fpu: Rename restore_fpu_checking() to copy_fpstate_to_fpregs() fpu_restore_checking() is a helper function of restore_fpu_checking(), but this is not apparent from the naming. Both copy fpstate contents to fpregs, while the fuller variant does a full copy without leaking information. So rename them to: copy_fpstate_to_fpregs() __copy_fpstate_to_fpregs() Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:03 +02:00
Ingo Molnar	5033861575	x86/fpu: Synchronize the naming of drop_fpu() and fpu_reset_state() drop_fpu() and fpu_reset_state() are similar in functionality and in scope, yet this is not apparent from their names. drop_fpu() deactivates FPU contents (both the fpregs and the fpstate), but leaves register contents intact in the eager-FPU case, mostly as an optimization. It disables fpregs in the lazy FPU case. The drop_fpu() method can be used to destroy FPU state in an optimized way, when we know that a new state will be loaded before user-space might see any remains of the old FPU state: - such as in sys_exit()'s exit_thread() where we know this task won't execute any user-space instructions anymore and the next context switch cleans up the FPU. The old FPU state might still be around in the eagerfpu case but won't be saved. - in __restore_xstate_sig(), where we use drop_fpu() before copying a new state into the fpstate and activating that one. No user-pace instructions can execute between those steps. - in sys_execve()'s fpu__clear(): there we use drop_fpu() in the !eagerfpu case, where it's equivalent to a full reinit. fpu_reset_state() is a stronger version of drop_fpu(): both in the eagerfpu and the lazy-FPU case it guarantees that fpregs are reinitialized to init state. This method is used in cases where we need a full reset: - handle_signal() uses fpu_reset_state() to reset the FPU state to init before executing a user-space signal handler. While we have already saved the original FPU state at this point, and always restore the original state, the signal handling code still has to do this reinit, because signals may interrupt any user-space instruction, and the FPU might be in various intermediate states (such as an unbalanced x87 stack) that is not immediately usable for general C signal handler code. - __restore_xstate_sig() uses fpu_reset_state() when the signal frame has no FP context. Since the signal handler may have modified the FPU state, it gets reset back to init state. - in another branch __restore_xstate_sig() uses fpu_reset_state() to handle a restoration error: when restore_user_xstate() fails to restore FPU state and we might have inconsistent FPU data, fpu_reset_state() is used to reset it back to a known good state. - __kernel_fpu_end() uses fpu_reset_state() in an error branch. This is in a 'must not trigger' error branch, so on bug-free kernels this never triggers. - fpu__restore() uses fpu_reset_state() in an error path as well: if the fpstate was set up with invalid FPU state (via ptrace or via a signal handler), then it's reset back to init state. - likewise, the scheduler's switch_fpu_finish() uses it in a restoration error path too. Move both drop_fpu() and fpu_reset_state() to the fpu__*() namespace and harmonize their naming with their function: fpu__drop() fpu__reset() This clearly shows that both methods operate on the full state of the FPU, just like fpu__restore(). Also add comments to explain what each function does. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:03 +02:00
Ingo Molnar	5e907bb045	x86/alternatives, x86/fpu: Add 'alternatives_patched' debug flag and use it in xsave_state() We'd like to use xsave_state() earlier, but its SYSTEM_BOOTING check is too imprecise. The real condition that xsave_state() would like to check is whether alternative XSAVE instructions were patched into the kernel image already. Add such a (read-mostly) debug flag and use it in xsave_state(). Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:03 +02:00
Ingo Molnar	3c6dffa93b	x86/fpu: Rename user_has_fpu() to fpregs_active() Rename this function in line with the new FPU nomenclature. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:02 +02:00
Ingo Molnar	c8e1404120	x86/fpu: Rename save_xstate_sig() to copy_fpstate_to_sigframe() Standardize the naming of save_xstate_sig() by renaming it to copy_fpstate_to_sigframe(): this tells us at a glance that the function copies an FPU fpstate to a signal frame. This naming also follows the naming of copy_fpregs_to_fpstate(). Don't put 'xstate' into the name: since this is a generic name, it's expected that the function is able to handle xstate frames as well, beyond legacy frames. xstate used to be the odd case in the x86 FPU code - now it's the common case. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:01 +02:00
Ingo Molnar	36e49e7f2e	x86/fpu: Pass 'struct fpu' to fpstate_sanitize_xstate() Currently fpstate_sanitize_xstate() has a task_struct input parameter, but it only uses the fpu structure from it - so pass in a 'struct fpu' pointer only and update all call sites. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:00 +02:00
Ingo Molnar	1ac91a767f	x86/fpu: Simplify fpstate_sanitize_xstate() calls Remove the extra layer of __fpstate_sanitize_xstate(): if (!use_xsaveopt()) return; __fpstate_sanitize_xstate(tsk); and move the check for use_xsaveopt() into fpstate_sanitize_xstate(). In general we optimize for the presence of CPU features, not for the absence of them. Furthermore there's little point in this inlining, as the call sites are not super hot code paths. Doing this uninlining shrinks the code a bit: text data bss dec hex filename 14108751 2573624 1634304 18316679 1177d87 vmlinux.before 14108627 2573624 1634304 18316555 1177d0b vmlinux.after Also remove a pointless '!fx' check from fpstate_sanitize_xstate(). Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:00 +02:00
Ingo Molnar	d090319312	x86/fpu: Rename sanitize_i387_state() to fpstate_sanitize_xstate() So the sanitize_i387_state() function has the following purpose: on CPUs that support optimized xstate saving instructions, an FPU fpstate might end up having partially uninitialized data. This function initializes that data. Note that the function name is a misnomer and confusing on two levels, not only is it not i387 specific at all, but it is the exact opposite: it only matters on xstate CPUs. So rename sanitize_i387_state() and __sanitize_i387_state() to fpstate_sanitize_xstate() and __fpstate_sanitize_xstate(), to clearly express the purpose and usage of the function. We'll further clean up this function in the next patch. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:00 +02:00
Ingo Molnar	befc61ad3c	x86/fpu: Move asm/xcr.h to asm/fpu/internal.h Now that all FPU internals using drivers are converted to public APIs, move xcr.h's definitions into fpu/internal.h and remove xcr.h. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:00 +02:00
Ingo Molnar	91969d690f	x86/fpu: Move xfeature type enumeration to fpu/types.h So xsave.h is an internal header that FPU using drivers commonly include, to get access to the xstate feature names, amongst other things. Move these type definitions to fpu/fpu.h to allow simplification of FPU using driver code. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:56 +02:00
Ingo Molnar	677b98bdd5	x86/fpu: Enumerate xfeature bits Transform the xstate masks into an enumerated list of xfeature bits. This removes the hard coding of XFEATURES_NR_MAX. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:56 +02:00
Ingo Molnar	5b07343034	x86/fpu: Introduce cpu_has_xfeatures(xfeatures_mask, feature_name) A lot of FPU using driver code is querying complex CPU features to be able to figure out whether a given set of xstate features is supported by the CPU or not. Introduce a simplified API function that can be used on any CPU type to get this information. Also add an error string return pointer, so that the driver can print a meaningful error message with a standardized feature name. Also mark xfeatures_mask as __read_only. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:55 +02:00
Ingo Molnar	669ebabb79	x86/fpu: Rename fpu/xsave.h to fpu/xstate.h 'xsave' is an x86 instruction name to most people - but xsave.h is about a lot more than just the XSAVE instruction: it includes definitions and support, both internal and external, related to xstate and xfeatures support. As a first step in cleaning up the various xstate uses rename this header to 'fpu/xstate.h' to better reflect what this header file is about. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:54 +02:00
Ingo Molnar	72ee6f87ad	x86/fpu: Simplify __save_fpu() __save_fpu() has this pattern: if (unlikely(system_state == SYSTEM_BOOTING)) xsave_state_booting(&fpu->state.xsave); else xsave_state(&fpu->state.xsave); ... but it does not actually get called during system bootup. So remove the complication and always call xsave_state(). To make sure this assumption is correct, add a WARN_ONCE() debug check to xsave_state(). Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:53 +02:00
Ingo Molnar	32b49b3c83	x86/fpu: Factor out FPU hw activation/deactivation We have repeat patterns of: if (!use_eager_fpu()) clts(); ... to activate FPU registers, and: if (!use_eager_fpu()) stts(); ... to deactivate them. Encapsulate these in: __fpregs_activate_hw(); __fpregs_activate_hw(); and use them accordingly. Doing this synchronizes the idiom with the fpu->fpregs_active software-flag's handling functions, creating clear patterns of: __fpregs_activate_hw(); __fpregs_activate(fpu); etc., which improves readability. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:52 +02:00
Ingo Molnar	c4d72e2db3	x86/fpu: Simplify fpstate_init_curr() usage Now that fpstate_init_curr() is not doing implicit allocations anymore, almost all uses of it involve a very simple pattern: if (!fpu->fpstate_active) fpstate_init_curr(fpu); which is basically activating the FPU fpstate if it was not active before. So propagate the check into the function itself, and rename the function according to its new purpose: fpu__activate_curr(fpu); Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:51 +02:00
Ingo Molnar	0ee6a51725	x86/fpu, kvm: Simplify fx_init() Now that fpstate_init() cannot fail the error return of fx_init() has lost its purpose. Eliminate the error return and propagate this change to all callers. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:51 +02:00
Ingo Molnar	e62bb3d894	x86/fpu: Rename fpstate_alloc_init() to fpstate_init_curr() Now that there are no FPU context allocations, rename fpstate_alloc_init() to fpstate_init_curr(), to signal that it initializes the fpstate and marks it active, for the current task. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:50 +02:00
Ingo Molnar	91d93d0e20	x86/fpu: Remove failure return from fpstate_alloc_init() Remove the failure code and propagate this down to callers. Note that this function still has an 'init' aspect, which must be called. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:50 +02:00
Ingo Molnar	c4d6ee6e2e	x86/fpu: Remove failure paths from fpstate-alloc low level functions Now that we always allocate the FPU context as part of task_struct there's no need for separate allocations - remove them and their primary failure handling code. ( Note that there's still secondary error codes that have become superfluous, those will be removed in separate patches. ) Move the somewhat misplaced setup_xstate_comp() call to the core. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:50 +02:00
Ingo Molnar	7366ed771f	x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again) So 6 years ago we made the FPU fpstate dynamically allocated: `aa283f4927` ("x86, fpu: lazy allocation of FPU area - v5") `61c4628b53` ("x86, fpu: split FPU state from task struct - v5") In hindsight this was a mistake: - it complicated context allocation failure handling, such as: /* kthread execs. TODO: cleanup this horror. / if (WARN_ON(fpstate_alloc_init(fpu))) force_sig(SIGKILL, tsk); - it caused us to enable irqs in fpu__restore(): local_irq_enable(); / * does a slab alloc which can sleep / if (fpstate_alloc_init(fpu)) { / * ran out of memory! */ do_group_exit(SIGKILL); return; } local_irq_disable(); - it (slightly) slowed down task creation/destruction by adding slab allocation/free pattens. - it made access to context contents (slightly) slower by adding one more pointer dereference. The motivation for the dynamic allocation was two-fold: - reduce memory consumption by non-FPU tasks - allocate and handle only the necessary amount of context for various XSAVE processors that have varying hardware frame sizes. These days, with glibc using SSE memcpy by default and GCC optimizing for SSE/AVX by default, the scope of FPU using apps on an x86 system is much larger than it was 6 years ago. For example on a freshly installed Fedora 21 desktop system, with a recent kernel, all non-kthread tasks have used the FPU shortly after bootup. Also, even modern embedded x86 CPUs try to support the latest vector instruction set - so they'll too often use the larger xstate frame sizes. So remove the dynamic allocation complication by embedding the FPU fpstate in task_struct again. This should make the FPU a lot more accessible to all sorts of atomic contexts. We could still optimize for the xstate frame size in the future, by moving the state structure to the last element of task_struct, and allocating only a part of that. This change is kept minimal by still keeping the ctx_alloc()/free() routines (that now do nothing substantial) - we'll remove them in the following patches. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:49 +02:00
Ingo Molnar	1bc6b056d8	x86/fpu: Optimize copy_fpregs_to_fpstate() by removing the FNCLEX synchronization with FP exceptions So we have the following ancient code in copy_fpregs_to_fpstate(): if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES)) { asm volatile("fnclex"); goto drop_fpregs; } which clears pending FPU exceptions and then drops registers, which causes the next FP instruction of the saved context to re-load the saved FPU state, with all pending exceptions marked properly, and will re-start the exception handling mechanism in the hardware. Since FPU exceptions are always issued on instruction boundaries, in particular on the next FP instruction following the exception generating instruction, there's no fear of getting an FP exception asynchronously. They were truly asynchronous back in the IRQ13 days, when the FPU was a weird and expensive co-processor that did its own processing, and we had to synchronize with them, but that code is not working anymore: we don't have IRQ13 mapped in the IDT anymore. With the introduction of optimized XSAVE support there's a new complication: if the xstate features bit indicates that a particular state component is unused (in 'init state'), then the hardware does not guarantee that the XSAVE (et al) instruction keeps the underlying FPU state image in memory valid and current. In practice this means that the hardware won't write it, and the exceptions flag in the state might be an older version, with it still being set. This meant that we had to check the xfeatures flag as well, adding another memory load and branch to a critical hot path of the scheduler. So optimize all this by removing both the old quirk and the new check, and straight-line optimizing the most common cases with likely() hints. Quite a bit of code gets removed this way: arch/x86/kernel/process_64.o: text data bss dec filename 5484 8 0 5492 process_64.o.before 5416 8 0 5424 process_64.o.after Now there's also a chance that some weird behavior or erratum was masked by our IRQ13 handling quirk (or that I misunderstood the nature of the quirk), and that this change triggers some badness. There's no real good way to protect against that possibility other than keeping this change well isolated, well commented and well bisectable. If you bisect a weird (or not so weird) breakage to this commit then please let us know! Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:49 +02:00
Ingo Molnar	4f83634710	x86/fpu: Rename fpu_save_init() to copy_fpregs_to_fpstate() So fpu_save_init() is a historic name that got its name when the only way the FPU state was FNSAVE, which cleared (well, destroyed) the FPU state after saving it. Nowadays the name is misleading, because ever since the introduction of FXSAVE (and more modern FPU saving instructions) the 'we need to reload the FPU state' part is only true if there's a pending FPU exception [], which is almost never the case. So rename it to copy_fpregs_to_fpstate() to make it clear what's happening. Also add a few comments about why we cannot keep registers in certain cases. Also clean up the control flow a bit, to make it more apparent when we are dropping/keeping FP registers, and to optimize the common case (of keeping fpregs) some more. [] Probably not true anymore, modern instructions always leave the FPU state intact, even if exceptions are pending: because pending FP exceptions are posted on the next FP instruction, not asynchronously. They were truly asynchronous back in the IRQ13 case, and we had to synchronize with them, but that code is not working anymore: we don't have IRQ13 mapped in the IDT anymore. But a cleanup patch is obviously not the place to change subtle behavior. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:49 +02:00
Ingo Molnar	910665882f	x86/fpu: Uninline the irq_ts_save()/restore() functions Especially the irq_ts_save() function is pretty bloaty, generating over a dozen instructions, so uninline them. Even though the API is used rarely, the space savings are measurable: text data bss dec hex filename 13331995 2572920 1634304 17539219 10ba093 vmlinux.before 13331739 2572920 1634304 17538963 10b9f93 vmlinux.after ( This also allows the removal of an include file inclusion from fpu/api.h, speeding up the kernel build slightly. ) Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:48 +02:00
Ingo Molnar	952f07ecbd	x86/fpu: Move various internal function prototypes to fpu/internal.h There are a number of FPU internal function prototypes and an inline function in fpu/api.h, mostly placed so historically as the code grew over the years. Move them over into fpu/internal.h where they belong. (Add sched.h include to stackprotector.h which incorrectly relied on getting it from fpu/api.h.) fpu/api.h is now a pure file that only contains FPU APIs intended for driver use. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:48 +02:00
Ingo Molnar	d63e79b114	x86/fpu: Uninline kernel_fpu_begin()/end() Both inline functions call an inline function unconditionally, so we already pay the function call based clobbering cost. Uninline them. This saves quite a bit of code in various performance sensitive code paths: text data bss dec hex filename 13321334 `2569888` 1634304 17525526 10b6b16 vmlinux.before 13320246 `2569888` 1634304 17524438 10b66d6 vmlinux.after Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:48 +02:00
Ingo Molnar	e229537543	x86/fpu: Move fpu__save() to fpu/internals.h It's an internal method, not a driver API, so move it from fpu/api.h to fpu/internal.h. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:48 +02:00
Ingo Molnar	c66e3f2823	x86/fpu: Remove the extra fpu__detect() layer Now that fpu__detect() has become an empty layer around fpu__init_system(), eliminate it and make fpu__init_system() the main system initialization routine. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:46 +02:00
Ingo Molnar	dd863880ac	x86/fpu: Move fpu__init_system_early_generic() out of fpu__detect() Move the fpu__init_system_early_generic() call into fpu__init_system(), which hosts all the system init calls. Expose fpu__init_system() to other modules - this will be our main and only system init function. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:46 +02:00
Ingo Molnar	21c4cd108a	x86/fpu: Simplify fpu__cpu_init() After the latest round of cleanups, fpu__cpu_init() has become a simple call to fpu__init_cpu(). Rename fpu__init_cpu() to fpu__cpu_init() and remove the extra layer. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:44 +02:00
Ingo Molnar	c42103b226	x86/fpu: Remove xsave_init() Expand fpu__init_system_xstate() and fpu__init_cpu_xstate() calls into xsave_init() calls. (This will allow us to call the proper versions in higher level FPU init code later on.) No change in functionality. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:40 +02:00
Ingo Molnar	55cc4678b7	x86/fpu: Make the system/cpu init distinction clear in the xstate code as well Rename existing xstate init functions along the system/cpu init principles: fpu__init_system_xstate(): called once per system bootup fpu__init_cpu_xstate(): called per CPU onlining Also make the fpu__init_cpu_xstate() early code invariant: if xfeatures_mask is not set yet then don't crash but return. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:39 +02:00
Ingo Molnar	3e5e126774	x86/fpu: Remove 'init_xstate_buf' bootmem allocation Make init_xstate_buf allocated statically at build time. This structure's maximum size is around 1KB - and it's allocated even on most modern embedded x86 CPUs which strive for FPU instruction set parity with desktop and server CPUs, so it's not like we can save much on smaller systems. This removes the last bootmem allocation from the FPU init path, allowing it to be called earlier in the boot sequence. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:39 +02:00
Ingo Molnar	966ece619e	x86/fpu: Remove xsave_init() bootmem allocations There's only 8 xstate bits at the moment, and it's not like we can support unknown bits - so put xstate_offsets[] and xstate_sizes[] into static allocation. This is in preparation to be able to call the FPU init code earlier, when there's no bootmem available yet. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:38 +02:00
Ingo Molnar	66af8e2764	x86/fpu: Rename __thread_fpu_end() to fpregs_deactivate() Propagate the 'fpu->fpregs_active' naming to the high level function that clears it. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:37 +02:00
Ingo Molnar	232f62cdd7	x86/fpu: Rename __thread_fpu_begin() to fpregs_activate() Propagate the 'fpu->fpregs_active' naming to the high level function that sets it. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:37 +02:00
Ingo Molnar	723c58e428	x86/fpu: Rename __thread_clear_has_fpu() to __fpregs_deactivate() Propagate the 'fpu->fpregs_active' naming to the functions that clears it. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:37 +02:00
Ingo Molnar	dfaea4e6a2	x86/fpu: Rename __thread_set_has_fpu() to __fpregs_activate() Propagate the 'fpu->fpregs_active' naming to the functions that sets it. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:36 +02:00
Ingo Molnar	d5cea9b0af	x86/fpu: Rename fpu->has_fpu to fpu->fpregs_active So the current code uses fpu->has_cpu to determine whether a given user FPU context is actively loaded into the FPU's registers [] and that those registers represent the task's current FPU state. But this term is not unambiguous: especially the distinction between fpu->has_fpu, PF_USED_MATH and fpu_fpregs_owner_ctx is not clear. Increase clarity by unambigously signalling that it's about hardware registers being active right now, by renaming it to fpu->fpregs_active. ( In later patches we'll use more of the 'fpregs' naming, which will make it easier to grep for as well. ) [] There's the kernel_fpu_begin()/end() primitive that also activates FPU hw registers as well and uses them, without touching the fpu->fpregs_active flag. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:36 +02:00
Ingo Molnar	e783e8167d	x86/fpu: Explain the AVX register layout in the xsave area The previous explanation was rather cryptic. Also transform "u32 [64]" to the more readable "u8[256]" form. No change in implementation. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:35 +02:00
Ingo Molnar	678eaf6034	x86/fpu: Rename regset FPU register accessors Rename regset accessors to prefix them with 'regset_', because we want to start using the 'fpregs_active' name elsewhere. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:35 +02:00
Ingo Molnar	400e4b2091	x86/fpu: Rename xsave.header::xstate_bv to 'xfeatures' 'xsave.header::xstate_bv' is a misnomer - what does 'bv' stand for? It probably comes from the 'XGETBV' instruction name, but I could not find in the Intel documentation where that abbreviation comes from. It could mean 'bit vector' - or something else? But how about - instead of guessing about a weird name - we named the field in an obvious and descriptive way that tells us exactly what it does? So rename it to 'xfeatures', which is a bitmask of the xfeatures that are fpstate_active in that context structure. Eyesore like: fpu->state->xsave.xsave_hdr.xstate_bv \|= XSTATE_FP; is now much more readable: fpu->state->xsave.header.xfeatures \|= XSTATE_FP; Which form is not just infinitely more readable, but is also shorter as well. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:35 +02:00
Ingo Molnar	3a54450b5e	x86/fpu: Rename 'xsave_hdr' to 'header' Code like: fpu->state->xsave.xsave_hdr.xstate_bv \|= XSTATE_FP; is an eyesore, because not only is the words 'xsave' and 'state' are repeated twice times (!), but also because of the 'hdr' and 'bv' abbreviations that are pretty meaningless at a first glance. Start cleaning this up by renaming 'xsave_hdr' to 'header'. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:34 +02:00
Ingo Molnar	9254aaa0fe	x86/fpu: Move XCR0 manipulation to the FPU code proper The suspend code accesses FPU state internals, add a helper for it and isolate it. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:33 +02:00
Ingo Molnar	614df7fb8a	x86/fpu: Rename 'pcntxt_mask' to 'xfeatures_mask' So the 'pcntxt_mask' is a misnomer, it's essentially meaningless to anyone who doesn't know what it does exactly. Name it more descriptively as 'xfeatures_mask'. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:33 +02:00
Ingo Molnar	7b302e6731	x86/fpu: Remove assembly guard from asm/fpu/api.h asm/fpu/api.h does not contain any defines useful to assembly code, and no assembly code includes asm/fpu/api.h. Remove the historic #ifndef __ASSEMBLY__ leftover guard. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:32 +02:00
Ingo Molnar	df6397525c	x86/fpu: Move MXCSR_DEFAULT to fpu/internal.h fpu/types.h gets included everywhere, move the MXCSR_DEFAULT to fpu/internal.h, the place where it's used. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:31 +02:00
Ingo Molnar	78f7f1e54b	x86/fpu: Rename fpu-internal.h to fpu/internal.h This unifies all the FPU related header files under a unified, hiearchical naming scheme: - asm/fpu/types.h: FPU related data types, needed for 'struct task_struct', widely included in almost all kernel code, and hence kept as small as possible. - asm/fpu/api.h: FPU related 'public' methods exported to other subsystems. - asm/fpu/internal.h: FPU subsystem internal methods - asm/fpu/xsave.h: XSAVE support internal methods (Also standardize the header guard in asm/fpu/internal.h.) Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:31 +02:00
Ingo Molnar	a137fb6bbf	x86/fpu: Move xsave.h to fpu/xsave.h Move the xsave.h header file to the FPU directory as well. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:30 +02:00
Ingo Molnar	df6b35f409	x86/fpu: Rename i387.h to fpu/api.h We already have fpu/types.h, move i387.h to fpu/api.h. The file name has become a misnomer anyway: it offers generic FPU APIs, but is not limited to i387 functionality. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:30 +02:00
Ingo Molnar	2e8a310266	x86/fpu: Rename fpu__flush_thread() to fpu__clear() The primary purpose of this function is to clear the current task's FPU before an exec(), to not leak information from the previous task, and to allow the new task to start with freshly initialized FPU registers. Rename the function to reflect this primary purpose. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:29 +02:00
Ingo Molnar	db2b1d3ad1	x86/fpu: Use 'struct fpu' in fpstate_alloc_init() Migrate this function to pure 'struct fpu' usage. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:29 +02:00
Ingo Molnar	c69e098b1f	x86/fpu: Use 'struct fpu' in fpu__copy() Migrate this function to pure 'struct fpu' usage. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:29 +02:00
Ingo Molnar	0c070595ce	x86/fpu: Use 'struct fpu' in fpu__save() Migrate this function to pure 'struct fpu' usage. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:28 +02:00
Ingo Molnar	2d75bcf314	x86/fpu: Move __save_fpu() into fpu/core.c This helper function is only used in fpu/core.c, move it there. This slightly speeds up compilation. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:27 +02:00
Ingo Molnar	384a23f939	x86/fpu: Use 'struct fpu' in switch_fpu_finish() Migrate this function to pure 'struct fpu' usage. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:27 +02:00
Ingo Molnar	cb8818b6ac	x86/fpu: Use 'struct fpu' in switch_fpu_prepare() Migrate this function to pure 'struct fpu' usage. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:27 +02:00
Ingo Molnar	af2d94fddc	x86/fpu: Use 'struct fpu' in fpu_reset_state() Migrate this function to pure 'struct fpu' usage. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:26 +02:00
Ingo Molnar	11f2d50b10	x86/fpu: Use 'struct fpu' in restore_fpu_checking() Migrate this function to pure 'struct fpu' usage. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:26 +02:00
Ingo Molnar	66ddc2cb0f	x86/fpu: Use 'struct fpu' in fpu_lazy_restore() Also rename it to fpu_want_lazy_restore(), to better indicate that this function just tests whether we can do a lazy restore. (The old name suggested that it was doing the lazy restore, which is not the case.) Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:26 +02:00
Ingo Molnar	eb6a3251bf	x86/fpu: Remove task_disable_lazy_fpu_restore() Replace task_disable_lazy_fpu_restore() with easier to read open-coded uses: we already update the fpu->last_cpu field explicitly in other cases. (This also removes yet another task_struct using FPU method.) Better explain the fpu::last_cpu field in the structure definition. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:26 +02:00
Ingo Molnar	ca6787ba0f	x86/fpu: Remove 'struct task_struct' usage from drop_fpu() Migrate this function to pure 'struct fpu' usage. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:25 +02:00
Ingo Molnar	c5bedc6847	x86/fpu: Get rid of PF_USED_MATH usage, convert it to fpu->fpstate_active Introduce a simple fpu->fpstate_active flag in the fpu context data structure and use that instead of PF_USED_MATH in task->flags. Testing for this flag byte should be slightly more efficient than testing a bit in a bitmask, but the main advantage is that most FPU functions can now be performed on a 'struct fpu' alone, they don't need access to 'struct task_struct' anymore. There's a slight linecount increase, mostly due to the 'fpu' local variables and due to extra comments. The local variables will go away once we move most of the FPU methods to pure 'struct fpu' parameters. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:25 +02:00
Ingo Molnar	4c1384100e	x86/fpu: Open code PF_USED_MATH usages PF_USED_MATH is used directly, but also in a handful of helper inlines. To ease the elimination of PF_USED_MATH, convert all inline helpers to open-coded PF_USED_MATH usage. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:24 +02:00
Ingo Molnar	4540d3faa7	x86/fpu: Remove 'struct task_struct' usage from __thread_fpu_begin() Migrate this function to pure 'struct fpu' usage. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:24 +02:00
Ingo Molnar	35191e3f07	x86/fpu: Remove 'struct task_struct' usage from __thread_fpu_end() Migrate this function to pure 'struct fpu' usage. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:24 +02:00
Ingo Molnar	c0311f63e3	x86/fpu: Remove 'struct task_struct' usage from __thread_set_has_fpu() Migrate this function to pure 'struct fpu' usage. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:23 +02:00
Ingo Molnar	36b544dcd3	x86/fpu: Change fpu_owner_task to fpu_fpregs_owner_ctx Track the FPU owner context instead of the owner task: this change, together with other changes, will allow in subsequent patches the elimination of 'struct task_struct' usage in various FPU code: we'll be able to use 'struct fpu' only. There's no change in code size: text data bss dec hex filename 13066467 2545248 1626112 17237827 1070743 vmlinux.before 13066467 2545248 1626112 17237827 1070743 vmlinux.after Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:23 +02:00
Ingo Molnar	36fe6175be	x86/fpu: Change __thread_clear_has_fpu() to 'struct fpu' parameter We do this to make the code more readable, and also to be able to eliminate task_struct usage from most of the FPU code. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:22 +02:00
Ingo Molnar	276983f808	x86/fpu: Eliminate the __thread_has_fpu() wrapper Start migrating FPU methods towards using 'struct fpu *fpu' directly. __thread_has_fpu() is just a trivial wrapper around fpu->has_fpu, eliminate it. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:22 +02:00
Ingo Molnar	e102f30f4e	x86/fpu: Move fpu_copy() to fpu/core.c Move fpu_copy() where its only user is. Beyond readability this also speeds up compilation, as fpu-internal.h is included in over a dozen .c files. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:21 +02:00
Ingo Molnar	6522d78377	x86/fpu: Remove __save_init_fpu() __save_init_fpu() is just a trivial wrapper around fpu_save_init(). Remove the extra layer of obfuscation. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:21 +02:00
Ingo Molnar	416d49ac67	x86/fpu: Make kernel_fpu_disable/enable() static This allows the compiler to inline them and to eliminate them: arch/x86/kernel/fpu/core.o: text data bss dec hex filename 6741 4 8 6753 1a61 core.o.before 6716 4 8 6728 1a48 core.o.after Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:20 +02:00
Ingo Molnar	f55f88e25e	x86/fpu: Make task_xstate_cachep static It's now local to fpu/core.c, make it static. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:20 +02:00
Ingo Molnar	5a12bf6332	x86/fpu: Uninline fpstate_free() and move it next to the allocation function Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:20 +02:00
Ingo Molnar	a752b53d9d	x86/fpu: Factor out fpu__copy() Introduce fpu__copy() and use it in arch_dup_task_struct(), thus moving another chunk of FPU logic to fpu/core.c. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:19 +02:00
Ingo Molnar	8ffb53ab98	x86/fpu: Move task_xstate_cachep handling to core.c This code was historically in process.c, now we have FPU core internals in fpu/core.c instead - move it there. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:19 +02:00
Ingo Molnar	0afc4a941c	x86/fpu: Remove fpu_xsave() It's a pointless wrapper now - use xsave_state(). Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:19 +02:00
Ingo Molnar	3e261c14e4	x86/fpu: Simplify the xsave_state*() methods These functions (xsave_state() and xsave_state_booting()) have a 'mask' argument that is always -1. Propagate this into the functions instead and eliminate the extra argument. Does not change the generated code, because these were inlined functions. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:18 +02:00
Ingo Molnar	4d1640927b	x86/fpu: Factor out the FPU bug detection code into fpu__init_check_bugs() Move the boot-time FPU bug detection code to the other FPU boot time init code in fpu/init.c. No change in code size: text data bss dec hex filename 13044568 1884440 1130496 16059504 f50c70 vmlinux.before 13044568 1884440 1130496 16059504 f50c70 vmlinux.after Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:18 +02:00
Ingo Molnar	3a0aee4801	x86/fpu: Rename math_state_restore() to fpu__restore() Move to the new fpu__*() namespace. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:18 +02:00
Ingo Molnar	81683cc827	x86/fpu: Factor out fpu__flush_thread() from flush_thread() flush_thread() open codes a lot of FPU internals - create a separate function for it in fpu/core.c. Turns out that this does not hurt performance: text data bss dec hex filename 11843039 1884440 1130496 14857975 e2b6f7 vmlinux.before 11843039 1884440 1130496 14857975 e2b6f7 vmlinux.after and since this is a slowpath clarity comes first anyway. We can reconsider inlining decisions after the FPU code has been cleaned up. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:17 +02:00
Ingo Molnar	11ad19277e	x86/fpu: Remove the free_thread_xstate() complication Use fpstate_free() directly to manage FPU state. Only process.c was using this method, so this is a speedup as well, as it removes the extra function call and related clobbers. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:17 +02:00
Ingo Molnar	f89e32e0a3	x86/fpu: Fix header file dependencies of fpu-internal.h Fix a minor header file dependency bug in asm/fpu-internal.h: it relies on i387.h but does not include it. All users of fpu-internal.h included it explicitly. Also remove unnecessary includes, to reduce compilation time. This also makes it easier to use it as a standalone header file for FPU internals, such as an upcoming C module in arch/x86/kernel/fpu/. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:16 +02:00
Ingo Molnar	47bc510634	x86/fpu: Clean up asm/fpu/types.h - add header guards - standardize vertical alignment - add comments about MPX No code changed. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:15 +02:00
Ingo Molnar	14b9675ae9	x86/fpu: Move FPU data structures to asm/fpu_types.h Move the FPU details to asm/fpu_types.h, to further factor out the FPU code. ( As an added bonus, the 'struct orig_ist' definition now moves next to its other data types - the FPU definitions were slapped in the middle of them for some mysterious reason. ) No code changed. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:15 +02:00
Ingo Molnar	126009993f	x86/fpu: Improve the comment for the fpu::counter field This was pretty hard to read, improve it. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:14 +02:00
Ingo Molnar	c0c2803dee	x86/fpu: Move thread_info::fpu_counter into thread_info::fpu.counter This field is kept separate from the main FPU state structure for no good reason. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:14 +02:00
Ingo Molnar	3a9c4b0d7e	x86/fpu: Rename fpu_init() to fpu__cpu_init() fpu_init() is a bit of a misnomer in that it (falsely) creates the impression that it's related to the (old) fpu_finit() function, which initializes FPU ctx state. Rename it to fpu__cpu_init() to make its boot time initialization clear, and to move it to the fpu__*() namespace. Also fix and extend its comment block to point out that it's called not only on the boot CPU, but on secondary CPUs as well. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:14 +02:00
Ingo Molnar	c0ee2cf61b	x86/fpu: Rename fpu_finit() to fpstate_init() Make it clear that we are initializing the in-memory FPU context area, no the FPU registers. Also move it to the fpu__*() namespace. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:13 +02:00
Ingo Molnar	a7c2a83364	x86/fpu: Rename fpu_free() to fpstate_free() Use the fpu__*() namespace. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:13 +02:00
Ingo Molnar	ed97b08546	x86/fpu: Rename fpu_alloc() to fpstate_alloc() Use the fpu__*() namespace for fpstate_alloc() as well. Also add a comment about FPU state alignment. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:13 +02:00
Ingo Molnar	6fbe671248	x86/fpu: Move fpu_alloc() out of line This is not a small function, and it's used in several places, one of them a popular module (KVM). Move the function out of line. This saves a bit of text, even with the symbol export overhead: text data bss dec hex filename 12566052 1619504 1089536 15275092 e91454 vmlinux.before 12566046 1619504 1089536 15275086 e9144e vmlinux.after Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:12 +02:00
Ingo Molnar	373244221a	x86/fpu: Remove fpu_allocated() It's an unnecessary obfuscation of a very simple allocation pattern. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:12 +02:00
Ingo Molnar	bda283796b	x86/fpu: Make init_fpu() static Now that the allocation users have been split off into a separate function, init_fpu() has become local to i387.c: make it static. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:11 +02:00
Ingo Molnar	97185c95f7	x86/fpu: Split an fpstate_alloc_init() function out of init_fpu() Most init_fpu() users don't want the register-saving aspect of the function, they are calling it for 'current' and when FPU registers are not allocated and initialized yet. Split out a simplified API that does just that (and add debug-checks for these conditions): fpstate_alloc_init(). Use it where appropriate. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:10 +02:00
Ingo Molnar	68ad8b9fea	x86/fpu: Remove stale init_fpu() prototype We are going to split init_fpu() so keep only a single prototype, in i387.h. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:10 +02:00
Ingo Molnar	1a7dc0db71	x86/fpu: Rename fpu_detect() to fpu__detect() Use the fpu__*() namespace to organize FPU ops better. Also document fpu__detect() a bit. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:10 +02:00
Ingo Molnar	0a78155154	x86/fpu: Rename unlazy_fpu() to fpu__save() This function is a misnomer on two levels: 1) it doesn't really manipulate TS on modern CPUs anymore, its primary purpose is to save FPU state, used: - when executing fork()/clone(): to copy current FPU state to the child's FPU state. - when handling math exceptions: to generate the math error si_code in the signal frame. 2) even on legacy CPUs it doesn't actually 'unlazy', if then it lazies the FPU state: as a side effect of the old FNSAVE instruction which clears (destroys) FPU state it's necessary to set CR0::TS. So rename it to fpu__save() to better reflect its purpose. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:09 +02:00
Paul Gortmaker	ea6cd25058	x86: Rename eisa_set_level_irq to elcr_set_level_irq This routine has been around for over a decade, but with EISA being dead and abandoned for about twice that long, the name can be kind of confusing. The function is going at the PIC Edge/Level Configuration Registers (ELCR), so rename it as such and mentally decouple it from the long since dead EISA bus. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Reviewed-by: Maciej W. Rozycki <macro@linux-mips.org> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Len Brown <len.brown@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/1431217657-934-1-git-send-email-paul.gortmaker@windriver.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-05-19 11:23:38 +02:00
David Hildenbrand	b3c395ef55	mm/uaccess, mm/fault: Clarify that uaccess may only sleep if pagefaults are enabled In general, non-atomic variants of user access functions must not sleep if pagefaults are disabled. Let's update all relevant comments in uaccess code. This also reflects the might_sleep() checks in might_fault(). Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: benh@kernel.crashing.org Cc: bigeasy@linutronix.de Cc: borntraeger@de.ibm.com Cc: daniel.vetter@intel.com Cc: heiko.carstens@de.ibm.com Cc: herbert@gondor.apana.org.au Cc: hocko@suse.cz Cc: hughd@google.com Cc: mst@redhat.com Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: schwidefsky@de.ibm.com Cc: yang.shi@windriver.com Link: http://lkml.kernel.org/r/1431359540-32227-4-git-send-email-dahi@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 08:39:14 +02:00
Peter Zijlstra	b92b8b35a2	locking/arch: Rename set_mb() to smp_store_mb() Since set_mb() is really about an smp_mb() -- not a IO/DMA barrier like mb() rename it to match the recent smp_load_acquire() and smp_store_release(). Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 08:32:00 +02:00
Peter Zijlstra	ab3f02fc23	locking/arch: Add WRITE_ONCE() to set_mb() Since we assume set_mb() to result in a single store followed by a full memory barrier, employ WRITE_ONCE(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 08:31:59 +02:00
Greg Kroah-Hartman	02730d3c05	Merge 4.1-rc4 into tty-next This resolves some tty driver merge issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-05-18 14:08:58 -07:00
Borislav Petkov	e774eaa9f6	x86/microcode/intel: Rename get_matching_sig() ... to find_matching_signature() which is exactly what it does. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431860101-14847-5-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-18 09:32:37 +02:00
Borislav Petkov	6b2d469f5b	x86/microcode/intel: Simplify update_match_cpu() Drop unreadable macro, deconstruct compound conditional statement into single ones and return early if they match. Add comments. There should be no functionality change resulting from this patch. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431860101-14847-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-18 09:32:36 +02:00
Borislav Petkov	8de3eafc16	x86/microcode/intel: Rename get_matching_microcode ... to has_newer_microcode() as it does exactly that: checks whether binary data @mc has newer microcode patch than the applied one. Move @mc to be the first function arg too. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431860101-14847-2-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-18 09:32:36 +02:00
Ingo Molnar	cffc32975d	Merge branch 'x86/asm' into x86/apic, to resolve conflicts Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-17 07:58:08 +02:00
Thomas Gleixner	6dc1787605	x86: Consolidate irq entering inlines smp.c and irq_work.c implement the same inline helper. Move it to apic.h and use it everywhere. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>	2015-05-15 16:04:49 +02:00
Borislav Petkov	9e6b13f761	x86/asm/uaccess: Unify the ALIGN_DESTINATION macro Pull it up into the header and kill duplicate versions. Separately, both macros are identical: 35948b2bd3431aee7149e85cfe4becbc /tmp/a 35948b2bd3431aee7149e85cfe4becbc /tmp/b Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431538944-27724-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-14 07:25:34 +02:00
Thomas Gleixner	a22e5f579b	arch: Remove __ARCH_HAVE_CMPXCHG We removed the only user of this define in the rtmutex code. Get rid of it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2015-05-13 10:55:42 +02:00
Xiao Guangrong	0be0226f07	KVM: MMU: fix SMAP virtualization KVM may turn a user page to a kernel page when kernel writes a readonly user page if CR0.WP = 1. This shadow page entry will be reused after SMAP is enabled so that kernel is allowed to access this user page Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu once CR4.SMAP is updated Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-11 17:17:50 +02:00
Ingo Molnar	191a66353b	Merge branch 'x86/asm' into x86/apic, to resolve a conflict Conflicts: arch/x86/kernel/apic/io_apic.c arch/x86/kernel/apic/vector.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-11 16:05:09 +02:00
Luis R. Rodriguez	e4b6be33c2	x86/mm: Add ioremap_uc() helper to map memory uncacheable (not UC-) ioremap_nocache() currently uses UC- by default. Our goal is to eventually make UC the default. Linux maps UC- to PCD=1, PWT=0 page attributes on non-PAT systems. Linux maps UC to PCD=1, PWT=1 page attributes on non-PAT systems. On non-PAT and PAT systems a WC MTRR has different effects on pages with either of these attributes. In order to help with a smooth transition its best to enable use of UC (PCD,1, PWT=1) on a region as that ensures a WC MTRR will have no effect on a region, this however requires us to have an way to declare a region as UC and we currently do not have a way to do this. WC MTRR on non-PAT system with PCD=1, PWT=0 (UC-) yields WC. WC MTRR on non-PAT system with PCD=1, PWT=1 (UC) yields UC. WC MTRR on PAT system with PCD=1, PWT=0 (UC-) yields WC. WC MTRR on PAT system with PCD=1, PWT=1 (UC) yields UC. A flip of the default ioremap_nocache() behaviour from UC- to UC can therefore regress a memory region from effective memory type WC to UC if MTRRs are used. Use of MTRRs should be phased out and in the best case only arch_phys_wc_add() use will remain, even if this happens arch_phys_wc_add() will have an effect on non-PAT systems and changes to default ioremap_nocache() behaviour could regress drivers. Now, ideally we'd use ioremap_nocache() on the regions in which we'd need uncachable memory types and avoid any MTRRs on those regions. There are however some restrictions on MTRRs use, such as the requirement of having the base and size of variable sized MTRRs to be powers of two, which could mean having to use a WC MTRR over a large area which includes a region in which write-combining effects are undesirable. Add ioremap_uc() to help with the both phasing out of MTRR use and also provide a way to blacklist small WC undesirable regions in devices with mixed regions which are size-implicated to use large WC MTRRs. Use of ioremap_uc() helps phase out MTRR use by avoiding regressions with an eventual flip of default behaviour or ioremap_nocache() from UC- to UC. Drivers working with WC MTRRs can use the below table to review and consider the use of ioremap() and similar helpers to ensure appropriate behaviour long term even if default ioremap_nocache() behaviour changes from UC- to UC. Although ioremap_uc() is being added we leave set_memory_uc() to use UC- as only initial memory type setup is required to be able to accommodate existing device drivers and phase out MTRR use. It should also be clarified that set_memory_uc() cannot be used with IO memory, even though its use will not return any errors, it really has no effect. ---------------------------------------------------------------------- MTRR Non-PAT PAT Linux ioremap value Effective memory type ---------------------------------------------------------------------- Non-PAT \| PAT PAT \|PCD \|\|PWT \|\|\| WC 000 WB _PAGE_CACHE_MODE_WB WC \| WC WC 001 WC _PAGE_CACHE_MODE_WC WC \| WC WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* \| WC WC 011 UC _PAGE_CACHE_MODE_UC UC \| UC ---------------------------------------------------------------------- Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Antonino Daplas <adaplas@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Travis <travis@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Thierry Reding <treding@nvidia.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Ville Syrjälä <syrjala@sci.fi> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-fbdev@vger.kernel.org Link: http://lkml.kernel.org/r/1430343851-967-2-git-send-email-mcgrof@do-not-panic.com Link: http://lkml.kernel.org/r/1431332153-18566-9-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-11 10:38:45 +02:00
Ross Zwisler	ca7d9b795e	x86/mm: Add kerneldoc comments for pcommit_sfence() Add kerneldoc comments for pcommit_sfence() describing the purpose of the PCOMMIT instruction and demonstrating its usage with an example. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H Peter Anvin <h.peter.anvin@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Link: http://lkml.kernel.org/r/1430261196-2401-1-git-send-email-ross.zwisler@linux.intel.com Link: http://lkml.kernel.org/r/1431332153-18566-7-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-11 10:38:44 +02:00
Ingo Molnar	4ddf2a1785	RAS: Add support for deferred errors on AMD (Aravind Gopalakrishnan) This is an important RAS feature which adds hardware support for poisoned data. That means roughly that the hardware marks data which it has detected as corrupted but wasn't able to correct, as poisoned data and raises an APIC interrupt to signal that in the form of a deferred error. It is the OS's responsibility then to take proper recovery action and thus prolonge system lifetime as far as possible. Misc cleanups ontop. (Borislav Petkov) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVUF2sAAoJEBLB8Bhh3lVKXZ4QAJ3UdVM1/TuqAsZ7+jLkb7BZ BRyWgv31CcX5fM1D0vV+6K+4GPPsLAtNVYy2G+LauFX1bfE1f9ExWKlMzp45h1sS xaNLDhIIP+aE4kD1J7mlNc0WlF0ghlfX+iaGc7lI+j3o2Ydlxm15Pt6Te9hDI7en C1NOWrkJ0+BJv48bPeJ835CLu+DZ6xktWdJ1In88PNUA9YiTj12/nhMKkaGbh3zv Ep3FCFD/tHcecRK/rVmSTE3cG50SLKtndh/Kl7s1wYhgw6ERyg3x/t8QefZkuU0Q 6fbetgYS9VvpewViAuNemoCHY5qxBNHPLsn6vwhluzlelW1CcgINU8LHcGZiaLmd DYVM9bHfSrKrHhH0M55XPn9RQSZpA+cTep3IyQzCK+jmLBiqrH3bMIRHjNQRUOLy DsGLm51tQqaMmnhDma8mMjF7LN+iBqNxXeqvkxQxQBE5NVLXHoaajOgUuj/N59WE FEFa65rmTrmsmgjAn9BPBk0zeoyQaYFKCLhENB19Vlt/4YoY/vHvzFYJNEcQT5ZU kuM8/hSEqeYZH4ZjJ8i9zKVado7z6pRQqV/lwRJ27tuXy9+9y6pV+ewmk8gCjQe4 gvySlHbIlfO5geF59GYenp4ll5CdZFvIJuwhybDBZhk3C7M2M7X/xgHnJnprza6j YVzOp7Jj2aeHGImqGL49 =3e5/ -----END PGP SIGNATURE----- Merge tag 'ras_for_4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull RAS updates from Borislav Petkov: - RAS: Add support for deferred errors on AMD (Aravind Gopalakrishnan) This is an important RAS feature which adds hardware support for poisoned data. That means roughly that the hardware marks data which it has detected as corrupted but wasn't able to correct, as poisoned data and raises an APIC interrupt to signal that in the form of a deferred error. It is the OS's responsibility then to take proper recovery action and thus prolonge system lifetime as far as possible. - Misc cleanups ontop. (Borislav Petkov)" Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-11 10:05:19 +02:00
Ingo Molnar	62c7a1e9ae	locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKS Valentin Rothberg reported that we use CONFIG_QUEUED_SPINLOCKS in arch/x86/kernel/paravirt_patch_32.c, while the symbol is called CONFIG_QUEUED_SPINLOCK. (Note the extra 'S') But the typo was natural: the proper English term for such a generic object would be 'queued spinlocks' - so rename this and related symbols accordingly to the plural form. Reported-by: Valentin Rothberg <valentinrothberg@gmail.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <Waiman.Long@hp.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-11 09:52:09 +02:00
Brian Gerst	8b455e6577	x86/asm/entry/irq: Clean up IRQn_VECTOR macros Since the ISA irqs are in a single block, use ISA_IRQ_VECTOR(irq) instead of individual macros. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431185813-15413-5-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-10 12:34:28 +02:00
Brian Gerst	51bb92843e	x86/asm/entry: Remove SYSCALL_VECTOR Use IA32_SYSCALL_VECTOR for both compat and native. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431185813-15413-4-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-10 12:34:28 +02:00
Brian Gerst	c6e692f95d	x86/asm/entry/irq: Remove unused invalidate_interrupt prototypes The invalidate_interrupt* functions no longer exist. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431185813-15413-3-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-10 12:34:28 +02:00
Denys Vlasenko	3a23208e69	x86/entry: Define 'cpu_current_top_of_stack' for 64-bit code 32-bit code has PER_CPU_VAR(cpu_current_top_of_stack). 64-bit code uses somewhat more obscure: PER_CPU_VAR(cpu_tss + TSS_sp0). Define the 'cpu_current_top_of_stack' macro on CONFIG_X86_64 as well so that the PER_CPU_VAR(cpu_current_top_of_stack) expression can be used in both 32-bit and 64-bit code. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1429889495-27850-3-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-08 13:50:02 +02:00
Denys Vlasenko	fed7c3f0f7	x86/entry: Remove unused 'kernel_stack' per-cpu variable Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1429889495-27850-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-08 13:49:43 +02:00
Denys Vlasenko	63332a8455	x86/entry: Stop using PER_CPU_VAR(kernel_stack) PER_CPU_VAR(kernel_stack) is redundant: - On the 64-bit build, we can use PER_CPU_VAR(cpu_tss + TSS_sp0). - On the 32-bit build, we can use PER_CPU_VAR(cpu_current_top_of_stack). PER_CPU_VAR(kernel_stack) will be deleted by a separate change. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1429889495-27850-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-08 13:43:52 +02:00
Ingo Molnar	7ae383be81	Merge branch 'linus' into x86/asm, before applying dependent patch Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-08 13:33:33 +02:00
Denys Vlasenko	2a4e90b18c	x86: Force inlining of atomic ops With both gcc 4.7.2 and 4.9.2, sometimes gcc mysteriously doesn't inline very small functions we expect to be inlined: $ nm --size-sort vmlinux \| grep -iF ' t ' \| uniq -c \| grep -v '^ *1 ' \| sort -rn 473 000000000000000b t spin_unlock_irqrestore 449 000000000000005f t rcu_read_unlock 355 0000000000000009 t atomic_inc <== THIS 353 000000000000006e t rcu_read_lock 350 0000000000000075 t rcu_read_lock_sched_held 291 000000000000000b t spin_unlock 266 0000000000000019 t arch_local_irq_restore 215 000000000000000b t spin_lock 180 0000000000000011 t kzalloc 165 0000000000000012 t list_add_tail 161 0000000000000019 t arch_local_save_flags 153 0000000000000016 t test_and_set_bit 134 000000000000000b t spin_unlock_irq 134 0000000000000009 t atomic_dec <== THIS 130 000000000000000b t spin_unlock_bh 122 0000000000000010 t brelse 120 0000000000000016 t test_and_clear_bit 120 000000000000000b t spin_lock_irq 119 000000000000001e t get_dma_ops 117 0000000000000053 t cpumask_next 116 0000000000000036 t kref_get 114 000000000000001a t schedule_work 106 000000000000000b t spin_lock_bh 103 0000000000000019 t arch_local_irq_disable ... Note sizes of marked functions. They are merely 9 bytes long! Selecting function with 'atomic' in their names: 355 0000000000000009 t atomic_inc 134 0000000000000009 t atomic_dec 98 0000000000000014 t atomic_dec_and_test 31 000000000000000e t atomic_add_return 27 000000000000000a t atomic64_inc 26 000000000000002f t kmap_atomic 24 0000000000000009 t atomic_add 12 0000000000000009 t atomic_sub 10 0000000000000021 t __atomic_add_unless 10 000000000000000a t atomic64_add 5 000000000000001f t __atomic_add_unless.constprop.7 5 000000000000000a t atomic64_dec 4 000000000000001f t __atomic_add_unless.constprop.18 4 000000000000001f t __atomic_add_unless.constprop.12 4 000000000000001f t __atomic_add_unless.constprop.10 3 000000000000001f t __atomic_add_unless.constprop.13 3 0000000000000011 t atomic64_add_return 2 000000000000001f t __atomic_add_unless.constprop.9 2 000000000000001f t __atomic_add_unless.constprop.8 2 000000000000001f t __atomic_add_unless.constprop.6 2 000000000000001f t __atomic_add_unless.constprop.5 2 000000000000001f t __atomic_add_unless.constprop.3 2 000000000000001f t __atomic_add_unless.constprop.22 2 000000000000001f t __atomic_add_unless.constprop.14 2 000000000000001f t __atomic_add_unless.constprop.11 2 000000000000001e t atomic_dec_if_positive 2 0000000000000014 t atomic_inc_and_test 2 0000000000000011 t atomic_add_return.constprop.4 2 0000000000000011 t atomic_add_return.constprop.17 2 0000000000000011 t atomic_add_return.constprop.16 2 000000000000000d t atomic_inc.constprop.4 2 000000000000000c t atomic_cmpxchg This patch fixes this for x86 atomic ops via s/inline/__always_inline/. This decreases allyesconfig kernel by about 25k: text data bss dec hex filename 82399481 22255416 20627456 125282353 777a831 vmlinux.before 82375570 22255544 20627456 125258570 7774b4a vmlinux Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1431080762-17797-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-08 12:55:50 +02:00
Ingo Molnar	99e711101c	Merge branch 'linus' into x86/cleanups, before applying dependent patch	2015-05-08 12:41:09 +02:00
Peter Zijlstra (Intel)	f233f7f158	locking/pvqspinlock, x86: Implement the paravirt qspinlock call patching We use the regular paravirt call patching to switch between: native_queued_spin_lock_slowpath() __pv_queued_spin_lock_slowpath() native_queued_spin_unlock() __pv_queued_spin_unlock() We use a callee saved call for the unlock function which reduces the i-cache footprint and allows 'inlining' of SPIN_UNLOCK functions again. We further optimize the unlock path by patching the direct call with a "movb $0,%arg1" if we are indeed using the native unlock code. This makes the unlock code almost as fast as the !PARAVIRT case. This significantly lowers the overhead of having CONFIG_PARAVIRT_SPINLOCKS enabled, even for native code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-10-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-08 12:37:09 +02:00
Peter Zijlstra (Intel)	2aa79af642	locking/qspinlock: Revert to test-and-set on hypervisors When we detect a hypervisor (!paravirt, see qspinlock paravirt support patches), revert to a simple test-and-set lock to avoid the horrors of queue preemption. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-8-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-08 12:36:58 +02:00
Waiman Long	d73a33973f	locking/qspinlock, x86: Enable x86-64 to use queued spinlocks This patch makes the necessary changes at the x86 architecture specific layer to enable the use of queued spinlocks for x86-64. As x86-32 machines are typically not multi-socket. The benefit of queue spinlock may not be apparent. So queued spinlocks are not enabled. Currently, there is some incompatibilities between the para-virtualized spinlock code (which hard-codes the use of ticket spinlock) and the queued spinlocks. Therefore, the use of queued spinlocks is disabled when the para-virtualized spinlock is enabled. The arch/x86/include/asm/qspinlock.h header file includes some x86 specific optimization which will make the queueds spinlock code perform better than the generic implementation. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-3-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-08 12:36:26 +02:00
Marcelo Tosatti	a3eb97bd80	x86: kvmclock: drop rdtsc_barrier() Drop unnecessary rdtsc_barrier(), as has been determined empirically, see `057e6a8c66` for details. Noticed by Andy Lutomirski. Improves clock_gettime() by approximately 15% on Intel i7-3520M @ 2.90GHz. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:48 +02:00
James Sullivan	93bbf0b8bc	kvm: x86: Extended struct kvm_lapic_irq with msi_redir_hint for MSI delivery Extended struct kvm_lapic_irq with bool msi_redir_hint, which will be used to determine if the delivery of the MSI should target only the lowest priority CPU in the logical group specified for delivery. (In physical dest mode, the RH bit is not relevant). Initialized the value of msi_redir_hint to true when RH=1 in kvm_set_msi_irq(), and initialized to false in all other cases. Added value of msi_redir_hint to a debug message dump of an IRQ in apic_send_ipi(). Signed-off-by: James Sullivan <sullivan.james.f@gmail.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:44 +02:00
Paolo Bonzini	b7cb223173	KVM: x86: tweak types of fields in kvm_lapic_irq Change to u16 if they only contain data in the low 16 bits. Change the level field to bool, since we assign 1 sometimes, but just mask icr_low with APIC_INT_ASSERT in apic_send_ipi. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:43 +02:00
Nadav Amit	d28bc9dd25	KVM: x86: INIT and reset sequences are different x86 architecture defines differences between the reset and INIT sequences. INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU, MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP. References (from Intel SDM): "If the MP protocol has completed and a BSP is chosen, subsequent INITs (either to a specific processor or system wide) do not cause the MP protocol to be repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions] [Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT] "If the processor is reset by asserting the INIT# pin, the x87 FPU state is not changed." [9.2: X87 FPU INITIALIZATION] "The state of the local APIC following an INIT reset is the same as it is after a power-up or hardware reset, except that the APIC ID and arbitration ID registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset ("Wait-for-SIPI" State)] Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:43 +02:00
Nadav Amit	90de4a1875	KVM: x86: Support for disabling quirks Introducing KVM_CAP_DISABLE_QUIRKS for disabling x86 quirks that were previous created in order to overcome QEMU issues. Those issue were mostly result of invalid VM BIOS. Currently there are two quirks that can be disabled: 1. KVM_QUIRK_LINT0_REENABLED - LINT0 was enabled after boot 2. KVM_QUIRK_CD_NW_CLEARED - CD and NW are cleared after boot These two issues are already resolved in recent releases of QEMU, and would therefore be disabled by QEMU. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1428879221-29996-1-git-send-email-namit@cs.technion.ac.il> [Report capability from KVM_CHECK_EXTENSION too. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:42 +02:00
Aravind Gopalakrishnan	5c0d728e1a	x86/irq: Cleanup ordering of vector numbers Sort vector number assignments in proper descending order. No functional change. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1430913538-1415-6-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-05-07 10:28:43 +02:00
Aravind Gopalakrishnan	24fd78a81f	x86/mce/amd: Introduce deferred error interrupt handler Deferred errors indicate error conditions that were not corrected, but require no action from S/W (or action is optional).These errors provide info about a latent UC MCE that can occur when a poisoned data is consumed by the processor. Processors that report these errors can be configured to generate APIC interrupts to notify OS about the error. Provide an interrupt handler in this patch so that OS can catch these errors as and when they happen. Currently, we simply log the errors and exit the handler as S/W action is not mandated. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1430913538-1415-5-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-05-07 10:23:32 +02:00
Linus Torvalds	0e1dc42748	xen: bug fixes for 4.1-rc2 - Fix blkback regression if using persistent grants. - Fix various event channel related suspend/resume bugs. - Fix AMD x86 regression with X86_BUG_SYSRET_SS_ATTRS. - SWIOTLB on ARM now uses frames <4 GiB (if available) so device only capable of 32-bit DMA work. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJVSiC1AAoJEFxbo/MsZsTRojgH/1zWPD0r5WMAEPb6DFdb7Ga1 SqBbyHFu43axNwZ7EvUzSqI8BKDPbTnScQ3+zC6Zy1SIEfS+40+vn7kY/uASmWtK LYaYu8nd49OZP8ykH0HEvsJ2LXKnAwqAwvVbEigG7KJA7h8wXo7aDwdwxtZmHlFP 18xRTfHcrnINtAJpjVRmIGZsCMXhXQz4bm0HwsXTTX0qUcRWtxydKDlMPTVFyWR8 wQ2m5+76fQ8KlFsoJEB0M9ygFdheZBF4FxBGHRrWXBUOhHrQITnH+cf1aMVxTkvy NDwiEebwXUDHacv21onszoOkNjReLsx+DWp9eHknlT/fgPo6tweMM2yazFGm+JQ= =W683 -----END PGP SIGNATURE----- Merge tag 'for-linus-4.1b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen bug fixes from David Vrabel: - fix blkback regression if using persistent grants - fix various event channel related suspend/resume bugs - fix AMD x86 regression with X86_BUG_SYSRET_SS_ATTRS - SWIOTLB on ARM now uses frames <4 GiB (if available) so device only capable of 32-bit DMA work. * tag 'for-linus-4.1b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM hypervisor/x86/xen: Unset X86_BUG_SYSRET_SS_ATTRS on Xen PV guests xen/events: Set irq_info->evtchn before binding the channel to CPU in __startup_pirq() xen/console: Update console event channel on resume xen/xenbus: Update xenbus event channel on resume xen/events: Clear cpu_evtchn_mask before resuming xen-pciback: Add name prefix to global 'permissive' variable xen: Suspend ticks on all CPUs during suspend xen/grant: introduce func gnttab_unmap_refs_sync() xen/blkback: safely unmap purge persistent grants	2015-05-06 15:58:06 -07:00
Valentin Rothberg	507224aa88	serial: 8250: remove Kconfig indirection Remove CONFIG_SERIAL_DETECT_IRQ and CONFIG_SERIAL_MANY_PORTS, and substitute all references to the proper 8250 Kconfig options. Now, the actual Kconfig dependencies are not hidden when reading the code and static analyzers are less confused. Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-05-06 22:27:00 +02:00
Aravind Gopalakrishnan	7559e13fb4	x86/mce: Add support for deferred errors on AMD Deferred errors indicate error conditions that were not corrected, but those errors have not been consumed yet. They require no action from S/W (or action is optional). These errors provide info about a latent uncorrectable MCE that can occur when a poisoned data is consumed by the processor. Newer AMD processors can generate deferred errors and can be configured to generate APIC interrupts on such events. SUCCOR stands for S/W UnCorrectable error COntainment and Recovery. It indicates support for data poisoning in HW and deferred error interrupts. Add new bitfield to mce_vendor_flags for this. We use this to verify presence of deferred error interrupts before we enable them in mce_amd.c While at it, clarify comments in mce_vendor_flags to provide an indication of usages of the bitfields. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1430913538-1415-4-git-send-email-Aravind.Gopalakrishnan@amd.com [ beef up commit message, do CPUID(8000_0007) only once. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2015-05-06 20:34:31 +02:00
Linus Torvalds	3d54ac9e35	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "EFI fixes, and FPU fix, a ticket spinlock boundary condition fix and two build fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Always restore_xinit_state() when use_eager_cpu() x86: Make cpu_tss available to external modules efi: Fix error handling in add_sysfs_runtime_map_entry() x86/spinlocks: Fix regression in spinlock contention detection x86/mm: Clean up types in xlate_dev_mem_ptr() x86/efi: Store upper bits of command line buffer address in ext_cmd_line_ptr efivarfs: Ensure VariableName is NUL-terminated	2015-05-06 10:57:37 -07:00
Stefano Stabellini	8746515d7f	xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM Make sure that xen_swiotlb_init allocates buffers that are DMA capable when at least one memblock is available below 4G. Otherwise we assume that all devices on the SoC can cope with >4G addresses. We do this on ARM and ARM64, where dom0 is mapped 1:1, so pfn == mfn in this case. No functional changes on x86. From: Chen Baozi <baozich@gmail.com> Signed-off-by: Chen Baozi <baozich@gmail.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Tested-by: Chen Baozi <baozich@gmail.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-05-06 15:02:58 +01:00
Borislav Petkov	5b673a48c5	x86/alternatives: Document macros Add some text to the macro magic for future reference and against failing human memory. Requested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-06 11:25:31 +02:00
Borislav Petkov	760d765b2b	x86/microcode: Parse built-in microcode early Apparently, people do build microcode into the kernel image, i.e. CONFIG_FIRMWARE_IN_KERNEL=y. Make that work in the early loader which is where microcode should be preferably loaded anyway. Note that you need to specify the microcode filename with the path relative to the toplevel firmware directory (the same like the late loading method) in CONFIG_EXTRA_FIRMWARE=y so that early loader can find it. I.e., something like this (Intel variant): CONFIG_FIRMWARE_IN_KERNEL=y CONFIG_EXTRA_FIRMWARE="intel-ucode/06-3a-09" CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware/" While at it, add me to the loader copyright boilerplate. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-06 11:24:53 +02:00
Borislav Petkov	da9b50765e	x86/microcode/intel: Remove unused @rev arg of get_matching_sig() @rev wasn't used in get_matching_sig(), drop it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-06 11:24:52 +02:00
Borislav Petkov	a1a32d29f9	x86/microcode/intel: Get rid of revision_is_newer() It is a one-liner for checking microcode header revisions. On top of that, it can be used wrong as it was the case in _save_mc(). Get rid of it. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-06 11:24:44 +02:00
Aravind Gopalakrishnan	1b4574292e	x86/gart: Check for GART support before accessing GART registers GART registers are not present in newer AMD processors (Fam15h, Model 10h and later). So, avoid accessing those in PCI config space by returning early in early_gart_iommu_check() and gart_iommu_hole_init() if GART is not available. Current code doesn't break on existing processors but there are some side effects: We get bogus AGP aperture messages which are simply noise on GART-less processors: AGP: Node 0: aperture [bus addr 0x00000000-0x01ffffff] (32MB) AGP: Your BIOS doesn't leave aperture memory hole AGP: Please enable the IOMMU option in the BIOS setup AGP: This costs you 64MB of RAM AGP: Mapping aperture over RAM [mem 0xd4000000-0xd7ffffff] We can avoid calling allocate_aperture() and would not have to wastefully reserve 64MB of RAM with memblock_reserve(). Also, we can avoid having to loop through all PCI buses and devices twice, searching for a non-existent AGP bridge if we bail out early. Refactor the family check used in amd_nb.c into an inline function so we can use it here as well as in amd_nb.c Fix some typos while at it. Tested the patch on Fam10h and Fam15h Model 00h-fh and this code runs fine. On Fam15h Model 60h-6fh and on Fam16h, we bail early as they don't have GART. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Joerg Rodel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1428443197-3834-1-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-06 11:15:53 +02:00
Christoph Hellwig	84be456f88	remove <asm/scatterlist.h> We don't have any arch specific scatterlist now that parisc switched over to the generic one. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-05-05 13:35:39 -06:00
Denys Vlasenko	f1dc154f82	x86: Deinline dma_free_attrs() Reduces kernel size by 76720 bytes on allyesconfig build: text data bss dec hex filename 82594029 22255352 20627456 125476837 77a9fe5 vmlinux1 82517277 22255384 20627456 125400117 7797435 vmlinux2 Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Don Dutile <ddutile@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/1428926075-28796-3-git-send-email-dvlasenk@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-05-05 20:48:02 +02:00
Denys Vlasenko	0c7965ff22	x86: Deinline dma_alloc_attrs() Reduces kernel size by 68739 bytes on allyesconfig build: text data bss dec hex filename 82662736 22255384 20627456 125545576 77bac68 vmlinux0 82594029 22255352 20627456 125476837 77a9fe5 vmlinux1 Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Don Dutile <ddutile@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/1428926075-28796-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-05-05 20:48:02 +02:00
Boris Ostrovsky	a71dbdaa8c	hypervisor/x86/xen: Unset X86_BUG_SYSRET_SS_ATTRS on Xen PV guests Commit `61f01dd941` ("x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue") makes AMD processors set SS to __KERNEL_DS in __switch_to() to deal with cases when SS is NULL. This breaks Xen PV guests who do not want to load SS with__KERNEL_DS. Since the problem that the commit is trying to address would have to be fixed in the hypervisor (if it in fact exists under Xen) there is no reason to set X86_BUG_SYSRET_SS_ATTRS flag for PV VPCUs here. This can be easily achieved by adding x86_hyper_xen_hvm.set_cpu_features op which will clear this flag. (And since this structure is no longer HVM-specific we should do some renaming). Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-05-05 18:27:43 +01:00
Tahsin Erdogan	e8a4a2696f	x86/spinlocks: Fix regression in spinlock contention detection A spinlock is regarded as contended when there is at least one waiter. Currently, the code that checks whether there are any waiters rely on tail value being greater than head. However, this is not true if tail reaches the max value and wraps back to zero, so arch_spin_is_contended() incorrectly returns 0 (not contended) when tail is smaller than head. The original code (before regression) handled this case by casting the (tail - head) to an unsigned value. This change simply restores that behavior. Fixes: `d6abfdb202` ("x86/spinlocks/paravirt: Fix memory corruption on unlock") Signed-off-by: Tahsin Erdogan <tahsin@google.com> Cc: peterz@infradead.org Cc: Waiman.Long@hp.com Cc: borntraeger@de.ibm.com Cc: oleg@redhat.com Cc: raghavendra.kt@linux.vnet.ibm.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1430799331-20445-1-git-send-email-tahsin@google.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-05-05 11:01:38 +02:00
Jiri Kosina	535b3ddc28	x86: kaslr: fix build due to missing ALIGN definition Fengguang's bot reported that `4545c898` ("x86: introduce kaslr_offset()") broke randconfig build In file included from arch/x86/xen/vga.c:5:0: arch/x86/include/asm/setup.h: In function 'kaslr_offset': >> arch/x86/include/asm/setup.h:77:2: error: implicit declaration of function 'ALIGN' [-Werror=implicit-function-declaration] return (unsigned long)&_text - __START_KERNEL; ^ Fix that by making setup.h self-sufficient by explicitly including linux/kernel.h, which is needed for ALIGN() (which is what __START_KERNEL contains in its expansion). Reported-by: fengguang.wu@intel.com Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2015-04-29 21:54:54 +02:00
Jiri Kosina	5d4351ba65	livepatch: x86: make kASLR logic more accurate We give up old_addr hint from the coming patch module in cases when kernel load base has been randomized (as in such case, the coming module has no idea about the exact randomization offset). We are currently too pessimistic, and give up immediately as soon as CONFIG_RANDOMIZE_BASE is set; this doesn't however directly imply that the load base has actually been randomized. There are config options that disable kASLR (such as hibernation), user could have disabled kaslr on kernel command-line, etc. The loader propagates the information whether kernel has been randomized through bootparams. This allows us to have the condition more accurate. On top of that, it seems unnecessary to give up old_addr hints even if randomization is active. The relocation offset can be computed using kaslr_ofsset(), and therefore old_addr can be adjusted accordingly. Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2015-04-29 16:51:33 +02:00
Jiri Kosina	4545c89880	x86: introduce kaslr_offset() Offset that has been chosen for kaslr during kernel decompression can be easily computed as a difference between _text and __START_KERNEL. We are already making use of this in dump_kernel_offset() notifier and in arch_crash_save_vmcoreinfo(). Introduce kaslr_offset() that makes this computation instead of hard-coding it, so that other kernel code (such as live patching) can make use of it. Also convert existing users to make use of it. This patch is equivalent transofrmation without any effects on the resulting code: $ diff -u vmlinux.old.asm vmlinux.new.asm --- vmlinux.old.asm 2015-04-28 17:55:19.520983368 +0200 +++ vmlinux.new.asm 2015-04-28 17:55:24.141206072 +0200 @@ -1,5 +1,5 @@ -vmlinux.old: file format elf64-x86-64 +vmlinux.new: file format elf64-x86-64 Disassembly of section .text: $ Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2015-04-29 16:51:33 +02:00
Paolo Bonzini	73459e2a1a	x86: pvclock: Really remove the sched notifier for cross-cpu migrations This reverts commits `0a4e6be9ca` and `80f7fdb1c7`. The task migration notifier was originally introduced in order to support the pvclock vsyscall with non-synchronized TSC, but KVM only supports it with synchronized TSC. Hence, on KVM the race condition is only needed due to a bad implementation on the host side, and even then it's so rare that it's mostly theoretical. As far as KVM is concerned it's possible to fix the host, avoiding the additional complexity in the vDSO and the (re)introduction of the task migration notifier. Xen, on the other hand, hasn't yet implemented vsyscall support at all, so we do not care about its plans for non-synchronized TSC. Reported-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-27 15:49:30 +02:00
Andy Lutomirski	61f01dd941	x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with SS == 0 results in an invalid usermode state in which SS is apparently equal to __USER_DS but causes #SS if used. Work around the issue by setting SS to __KERNEL_DS __switch_to, thus ensuring that SYSRET never happens with SS set to NULL. This was exposed by a recent vDSO cleanup. Fixes: `e7d6eefaaa` x86/vdso32/syscall.S: Do not load __USER32_DS to %ss Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Brian Gerst <brgerst@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-04-26 17:57:38 -07:00
Jiang Liu	d746d1ebd3	x86/irq: Move irqdomain specific code into asm/irqdomain.h Now we have dedicated asm/irqdomain.h, so move irqdomain specific code into it. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/1428978610-28986-33-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:55 +02:00
Thomas Gleixner	f7a0c78669	x86: Cleanup irq_domain ops We have 3 identical copies of the ioapic domain ops for acpi, mpparse, and sfi. Have a global one in the io_apic code and be done with it. To avoid include hell in io_apic.h, create a private irqdomain header and include the generic irqdomain header from there. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: sfi-devel@simplefirmware.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rob Herring <robh@kernel.org> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/1428978610-28986-32-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:55 +02:00
Thomas Gleixner	335efdf57d	x86, ioapic: Use proper defines for the entry fields While looking at the printout issue, I stumbled more than once over the various 0/1 assignments which are either commented in strange ways or force to lookup the meaning. Use proper constants and fix the misleading comments. While at it remove pointless 0 assignments in native_disable_io_apic() which have no value for understanding the code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/1428978610-28986-30-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:55 +02:00
Jiang Liu	4399b14fa7	x86/irq: Refine the way to calculate NR_IRQS Now we have made MSI independent of IOAPIC, so we need to refine the way to calculate NR_IRQS to support configuration with MSI enabled but IOAPIC disabled. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Jan Beulich <JBeulich@suse.com> Link: http://lkml.kernel.org/r/1428978610-28986-28-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:55 +02:00
Jiang Liu	7f3262edcd	x86/irq: Move private data in struct irq_cfg into dedicated data structure Several fields in struct irq_cfg are private to vector.c, so move it into dedicated data structure. This helps to hide implementation details. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Link: http://lkml.kernel.org/r/1428978610-28986-27-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1416901802-24211-35-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Joerg Roedel <jroedel@suse.de>	2015-04-24 15:36:55 +02:00
Jiang Liu	68f9f4404d	x86/irq: Remove function apic_set_affinity() Now there's no user of apic_set_affinity(), so remove it. Also rename vector_set_affinity() to apic_set_affinity() for consistency. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Link: http://lkml.kernel.org/r/1428978610-28986-25-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:54 +02:00
Jiang Liu	f970510cc5	x86/irq: Make functions only used in vector.c static Function {assign\|clear}_irq_vector() and apic_retrigger_irq() are only used in vector.c, so make them static. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Link: http://lkml.kernel.org/r/1428978610-28986-24-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:54 +02:00
Jiang Liu	a2cbbb47fd	x86/irq: Remove unused alloc_irq_and_cfg_at() There's no caller of alloc_irq_and_cfg_at() anymore, so remove it. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Link: http://lkml.kernel.org/r/1428978610-28986-23-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:54 +02:00
Thomas Gleixner	1f93464129	x86/irq: Remove sis apic bug workaround The SiS apic bug workaround is now obsolete as we cache the register values for performance reasons. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1428978610-28986-22-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:54 +02:00
Jiang Liu	154d9e50e4	x86/irq: Clean up io_apic.h Clean up io_apic.h by: 1) moving definition of struct mp_ioapic_gsi into io_apic.c 2) changing mp_pin_to_gsi() and mp_ioapic_gsi_routing() as static 3) removing unused MP_MAX_IOAPIC_PIN 4) removing useless forward declaration 5) removing useless comments Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1428978610-28986-20-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:54 +02:00
Thomas Gleixner	ca1b88622e	x86: Remove more unmodified io_apic_ops io_apic_ops.init() is either NULL, if IO-APIC support is disabled at compile time or native_io_apic_init_mappings(). No point to have that as we can achieve the same thing with an empty inline. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:54 +02:00
Jiang Liu	9a93d4736e	x86/irq: Remove x86_io_apic_ops.write and x86_io_apic_ops.modify x86_io_apic_ops.write is always set to native_io_apic_write(), and nobody overrides it. So get rid of the indirection by changing native_io_apic_write() as io_apic_write() and removing x86_io_apic_ops.write. Do the same for x86_io_apic_ops.modify and native_io_apic_modify(). Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Yijing Wang <wangyijing@huawei.com> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1428978610-28986-19-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:53 +02:00
Jiang Liu	50a6ad84b2	x86/irq: Remove struct io_apic_irq_attr Now there's no user of struct io_apic_irq_attr anymore, so remove it. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1428978610-28986-18-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:53 +02:00
Jiang Liu	4467715a44	x86/irq: Move irq_cfg.irq_2_pin into io_apic.c Now only io_apic.c accesses struct irq_cfg.irq_2_pin, so move irq_2_pin into struct mp_chip_data in io_apic.c to clean up struct irq_cfg further. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1428978610-28986-17-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:53 +02:00
Jiang Liu	9c72496698	irq_remapping/amd: Move struct irq_2_irte into amd_iommu.c Now only amd_iommu.c access irq_2_irte, so move it from hw_irq.h into amd_iommu.c. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: iommu@lists.linux-foundation.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Joerg Roedel <joro@8bytes.org> Link: http://lkml.kernel.org/r/1428978610-28986-16-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:53 +02:00
Jiang Liu	099c5c0348	irq_remapping/vt-d: Move struct irq_2_iommu into intel_irq_remapping.c Now only intel_irq_remapping.c access irq_2_iommu, so move it from hw_irq.h into intel_irq_remapping.c. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: iommu@lists.linux-foundation.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Joerg Roedel <joro@8bytes.org> Link: http://lkml.kernel.org/r/1428978610-28986-15-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:53 +02:00
Jiang Liu	bac4f90784	x86/irq: Remove irq_cfg.irq_remapped Now there is no user of irq_cfg.irq_remapped, so remove it. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: iommu@lists.linux-foundation.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Joerg Roedel <joro@8bytes.org> Link: http://lkml.kernel.org/r/1428978610-28986-14-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:53 +02:00
Jiang Liu	9880534989	irq_remapping: Clean up unsued code to support IOAPIC Now we have converted to hierarchical irqdomains, so clean up unused code. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: iommu@lists.linux-foundation.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Joerg Roedel <joro@8bytes.org> Link: http://lkml.kernel.org/r/1428978610-28986-10-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:52 +02:00
Jiang Liu	3dd786ea3a	x86/irq: Clean up unused forward declarations in x86_init.h Clean up unused forward declarations in x86_init.h. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Yijing Wang <wangyijing@huawei.com> Link: http://lkml.kernel.org/r/1428978610-28986-9-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:52 +02:00
Jiang Liu	ad66e1efc9	x86/irq: Remove x86_io_apic_ops.eoi_ioapic_pin and related interfaces Now there is no user of x86_io_apic_ops.eoi_ioapic_pin anymore, so remove it. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: iommu@lists.linux-foundation.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Yijing Wang <wangyijing@huawei.com> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1428978610-28986-7-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:52 +02:00
Jiang Liu	aa5cb97f14	x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces Now there is no user of x86_io_apic_ops.set_affinity anymore, so remove it. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: iommu@lists.linux-foundation.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Yijing Wang <wangyijing@huawei.com> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1428978610-28986-6-git-send-email-jiang.liu@linux.intel.com	2015-04-24 15:36:52 +02:00
Jiang Liu	35d50d8fd5	x86/irq: Remove x86_io_apic_ops.setup_entry and related interfaces Now there is no user of x86_io_apic_ops.setup_entry anymore, so remove it. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: iommu@lists.linux-foundation.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Yijing Wang <wangyijing@huawei.com> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1428978610-28986-5-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:52 +02:00
Jiang Liu	84bea5cc77	x86/irq: Remove x86_io_apic_ops.print_entries and related interfaces Now there is no user of x86_io_apic_ops.print_entries anymore, so remove it. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: iommu@lists.linux-foundation.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Yijing Wang <wangyijing@huawei.com> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1428978610-28986-4-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:52 +02:00
Jiang Liu	5ad274d41c	x86/irq: Remove unused old IOAPIC irqdomain interfaces Now we have converted to hierarchical irqdomain, so remove unused old IOAPIC interfaces and code. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1428978610-28986-2-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:51 +02:00
Jiang Liu	49c7e60022	x86/irq: Implement callbacks to enable hierarchical irqdomains on IOAPICs Implement required callbacks to prepare for enabling hierarchical irqdomains on IOAPICs. After the conversion we can remove quite some code from the old implementation. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1428905519-23704-34-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:51 +02:00
Jiang Liu	c4d05a2c35	x86/irq: Prepare IOAPIC interfaces to support hierarchical irqdomains Introduce helper functions to manipulate struct irq_alloc_info for IOAPIC. Also add an extra parameter to IOAPIC interfaces to prepare for hierarchical irqdomain. Function mp_set_gsi_attr() will be removed once we have switched to hierarchical irqdomains. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Jan Beulich <JBeulich@suse.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: David Cohen <david.a.cohen@linux.intel.com> Link: http://lkml.kernel.org/r/1428905519-23704-33-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:51 +02:00
Jiang Liu	4e69d7eab4	x86/irq: Remove unused pre_init_apic_IRQ0() Now there's no user of pre_init_apic_IRQ0(), so remove it. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1428905519-23704-32-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:51 +02:00
Jiang Liu	0cddfc7946	irq_remapping: Remove unused function irq_remapping_print_chip() Now there's no user of irq_remapping_print_chip() anymore, so remove it. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Joerg Roedel <jroedel@suse.de> Cc: iommu@lists.linux-foundation.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Joerg Roedel <joro@8bytes.org> Link: http://lkml.kernel.org/r/1428905519-23704-29-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:50 +02:00
Jiang Liu	43fe1abc18	x86/uv: Use hierarchical irqdomain to manage UV interrupts Enhance UV code to support hierarchical irqdomain, it helps to make the architecture more clear. We construct hwirq based on mmr_blade and mmr_offset, but mmr_offset has type unsigned long, it may exceed the range of irq_hw_number_t. So help about the way to construct hwirq based on mmr_blade and mmr_offset is welcomed! Folded a patch from Dimitri Sivanich <sivanich@sgi.com> to fix a bug on UV platforms, please refer to: http://lkml.org/lkml/2014/12/16/351 Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Russ Anderson <rja@sgi.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Link: http://lkml.kernel.org/r/1428905519-23704-23-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:50 +02:00
Jiang Liu	49e07d8f28	x86/htirq: Use hierarchical irqdomain to manage Hypertransport interrupts We have slightly changed the architecture interfaces to support htirq PCI driver. It's safe because currently Hypertransport interrupt is only enabled on x86 platforms. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Link: http://lkml.kernel.org/r/1428905519-23704-22-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:50 +02:00
Jiang Liu	0921f1da64	x86/irq: Use hierarchical irqdomain to manage DMAR interrupts Enhance DMAR code to support hierarchical irqdomain, it helps to make the architecture more clear. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Link: http://lkml.kernel.org/r/1428905519-23704-21-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:49 +02:00
Jiang Liu	34742db8ea	iommu/vt-d: Refine the interfaces to create IRQ for DMAR unit Refine the interfaces to create IRQ for DMAR unit. It's a preparation for converting DMAR IRQ to hierarchical irqdomain on x86. It also moves dmar_alloc_hwirq()/dmar_free_hwirq() from irq_remapping.h to dmar.h. They are not irq_remapping specific. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: iommu@lists.linux-foundation.org Cc: Vinod Koul <vinod.koul@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Link: http://lkml.kernel.org/r/1428905519-23704-20-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:49 +02:00
Jiang Liu	b1855c752e	x86/MSI: Clean up unused MSI related code and interfaces Now MSI interrupt has been converted to new hierarchical irqdomain interfaces, so remove legacy MSI related code and interfaces. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Yijing Wang <wangyijing@huawei.com> Link: http://lkml.kernel.org/r/1428905519-23704-19-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:49 +02:00
Jiang Liu	7a53a12162	irq_remapping: Clean up unused MSI related code Now MSI interrupt has been converted to new hierarchical irqdomain interfaces, so remove legacy MSI related code and interfaces. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: iommu@lists.linux-foundation.org Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Yijing Wang <wangyijing@huawei.com> Link: http://lkml.kernel.org/r/1428905519-23704-18-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:49 +02:00
Jiang Liu	52f518a3a7	x86/MSI: Use hierarchical irqdomains to manage MSI interrupts Enhance MSI code to support hierarchical irqdomains, it helps to make the architecture more clear. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: iommu@lists.linux-foundation.org Cc: Joerg Roedel <jroedel@suse.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Joerg Roedel <joro@8bytes.org> Link: http://lkml.kernel.org/r/1428905519-23704-14-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:49 +02:00
Jiang Liu	3cb96f0c97	x86/hpet: Enhance HPET IRQ to support hierarchical irqdomains Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/1428905519-23704-13-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:49 +02:00
Jiang Liu	947045a2aa	irq_remapping: Introduce new interfaces to support hierarchical irqdomains Introduce new interfaces for interrupt remapping drivers to support hierarchical irqdomains: 1) irq_remapping_get_ir_irq_domain(): get irqdomain associated with an interrupt remapping unit. IOAPIC/HPET drivers use this interface to get parent interrupt remapping irqdomain. 2) irq_remapping_get_irq_domain(): get irqdomain for an IRQ allocation. This is mainly used to support MSI irqdomain. We must build one MSI irqdomain for each interrupt remapping unit. MSI driver calls this interface to get MSI irqdomain associated with an IR irqdomain which manages the PCI devices. In a further step we will store the irqdomain pointer in the device struct to avoid this call in the irq allocation path. Architecture specific hooks: 1) arch_get_ir_parent_domain(): get parent irqdomain for IR irqdomain, which is x86_vector_domain on x86 platforms. 2) arch_create_msi_irq_domain(): create an MSI irqdomain associated with the interrupt remapping unit. We also add following callbacks into struct irq_remap_ops: struct irq_domain (get_ir_irq_domain)(struct irq_alloc_info ); struct irq_domain (get_irq_domain)(struct irq_alloc_info ); Once all clients of IR have been converted to the new hierarchical irqdomain interfaces, we will: 1) Remove set_ioapic_entry, set_affinity, free_irq, compose_msi_msg, msi_alloc_irq, msi_setup_irq, setup_hpet_msi from struct remap_osp 2) Remove setup_ioapic_remapped_entry, free_remapped_irq, compose_remapped_msi_msg, setup_hpet_msi_remapped, setup_remapped_irq. 3) Simplify x86_io_apic_ops and x86_msi. We can achieve a way clearer architecture with all these changes applied. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Acked-by: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: iommu@lists.linux-foundation.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Joerg Roedel <joro@8bytes.org> Link: http://lkml.kernel.org/r/1428905519-23704-9-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:48 +02:00
Jiang Liu	a62b32cdd0	x86/dmar: Use new irqdomain interfaces to allocate/free IRQ Use new irqdomain interfaces to allocate/free IRQ for DMAR and interrupt remapping, so we can remove GENERIC_IRQ_LEGACY_ALLOC_HWIRQ later. The private definitions of irq_alloc_hwirqs()/irq_free_hwirqs() are a temporary solution, they will be removed once we have converted the interrupt remapping driver to use irqdomain framework. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: iommu@lists.linux-foundation.org Cc: Joerg Roedel <jroedel@suse.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Joerg Roedel <joro@8bytes.org> Link: http://lkml.kernel.org/r/1428905519-23704-8-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:48 +02:00
Jiang Liu	b5dc8e6c21	x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors Abstract CPU local APIC as an interrupt controller and create an irqdomain for it to manage CPU interrupt vectors. It's the base to enable hierarchical irqdomains on x86 systems. The final irqdomain hierarchy will look like this: IOAPIC domain ----\| MSI/MSI-x domain ----> [Interrupt Remapping domain] -> CPU vector domain HPET_IRQ domain ----\| ^ \| DMAR domain ----------------------------------------------\| HT_IRQ domain ----------------------------------------------\| Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1428905519-23704-3-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:47 +02:00
Jiang Liu	5f0052f952	x86/irq: Save destination CPU ID in irq_cfg Cache destination CPU APIC ID into struct irq_cfg when assigning vector for interrupt. Upper layer just needs to read the cached APIC ID instead of calling apic->cpu_mask_to_apicid_and(), it helps to hide APIC driver details from IOAPIC/HPET/MSI drivers.. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Link: http://lkml.kernel.org/r/1428905519-23704-2-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-04-24 15:36:47 +02:00
Linus Torvalds	b9bb6fb73b	Some virtio internal cleanups, a new virtio device "virtio input", and a change to allow the legacy virtio balloon. Most excitingly, some lguest work! No seriously, I got some cleanup patches. Cheers, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVN1SjAAoJENkgDmzRrbjxeDoP+wZnZdG4cHNc6ifiNPkSed9m cKWV7L6uTxczdFKcTNpDShn2MW0XqbcHc+VdBH9Exl3+cyick6fuhpi6SjLby0g6 a40RldysRMAc/K/dK40dG4qtSUT1uwDrOYNonMDjx1RAikO3DoTGUm4YgYZKSlM/ pKuCbAebM3dZ6EUVnaJICHWkJvY7Bk9JwGL6Z8RhF7lunVAGqIMHH9GklqSCyNiY LK+05hNXHv/OOIAkEO+ZmDrWSagogggGXEdRFom9s87xmu9GVse7Fzfq9pZ5nQre gickgBeC+gN8das1wvhlTp22F8XJslC0IRJhvbwLMQUd16hrH1YUIdvsqry/Qxds 04GgzLTVA/Z5VVEVm9MXcKWGwcsnUBu9EChsdEKZwNgBz9UF2gs39My8Co6AZ7U/ Ajcpksl22RXaR7OB65vRPIk23mh/NchGSzVGFbppzCwj2SkO9ONSFrDj3mAzfbhR 9NHi32Xm0+LdN444WCo1NzahKLAX5bYCv2ZSDs5JEBDQzmW2FWKO2ZaVJ84jpG6O O4XppI/X8cP+dxTs8xH91qh9GGmq9Aa41iuekZh/jG/8fLFT45rhlzLJfwh2B9rI djcaFFLFt+in5R6kgugM9dbCNALneXgGDnzlmqy5RwOrrCTwhyGn6DMwDqRz7EHn gsbiiv6eSsrgX4mLHP2n =Wj06 -----END PGP SIGNATURE----- Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio updates from Rusty Russell: "Some virtio internal cleanups, a new virtio device "virtio input", and a change to allow the legacy virtio balloon. Most excitingly, some lguest work! No seriously, I got some cleanup patches" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: virtio: drop virtio_device_is_legacy_only virtio_pci: support non-legacy balloon devices virtio_mmio: support non-legacy balloon devices virtio_ccw: support non-legacy balloon devices virtio: balloon might not be a legacy device virtio_balloon: transitional interface virtio_ring: Update weak barriers to use dma_wmb/rmb virtio_pci_modern: switch to type-safe io accessors virtio_pci_modern: type-safe io accessors lguest: handle traps on the "interrupt suppressed" iret instruction. virtio: drop a useless config read virtio_config: reorder functions Add virtio-input driver. lguest: suppress interrupts for single insn, not range. lguest: simplify lguest_iret lguest: rename i386_head.S in the comments lguest: explicitly set miscdevice's private_data NULL lguest: fix pending interrupt test.	2015-04-22 10:55:06 -07:00
Hagen Paul Pfeifer	3462bd2ade	x86/asm: Always inline atomics During some code analysis I realized that atomic_add(), atomic_sub() and friends are not necessarily inlined AND that each function is defined multiple times: atomic_inc: 544 duplicates atomic_dec: 215 duplicates atomic_dec_and_test: 107 duplicates atomic64_inc: 38 duplicates [...] Each definition is exact equally, e.g.: ffffffff813171b8 <atomic_add>: 55 push %rbp 48 89 e5 mov %rsp,%rbp f0 01 3e lock add %edi,(%rsi) 5d pop %rbp c3 retq In turn each definition has one or more callsites (sure): ffffffff81317c78: e8 3b f5 ff ff callq ffffffff813171b8 <atomic_add> [...] ffffffff8131a062: e8 51 d1 ff ff callq ffffffff813171b8 <atomic_add> [...] ffffffff8131a190: e8 23 d0 ff ff callq ffffffff813171b8 <atomic_add> [...] The other way around would be to remove the static linkage - but I prefer an enforced inlining here. Before: text data bss dec hex filename 81467393 19874720 20168704 121510817 73e1ba1 vmlinux.orig After: text data bss dec hex filename 81461323 19874720 20168704 121504747 73e03eb vmlinux.inlined Yes, the inlining here makes the kernel even smaller! ;) Linus further observed: "I have this memory of having seen that before - the size heuristics for gcc getting confused by inlining. [...] It might be a good idea to mark things that are basically just wrappers around a single (or a couple of) asm instruction to be always_inline." Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1429565231-4609-1-git-send-email-hagen@jauu.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-22 08:14:41 +02:00
Andy Lutomirski	aac82d3191	x86, paravirt, xen: Remove the 64-bit ->irq_enable_sysexit() pvop We don't use irq_enable_sysexit on 64-bit kernels any more. Remove all the paravirt and Xen machinery to support it on 64-bit kernels. Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8a03355698fe5b94194e9e7360f19f91c1b2cf1f.1428100853.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-22 08:07:45 +02:00
Linus Torvalds	1fc149933f	Char/Misc driver patches for 4.1-rc1 Here's the big char/misc driver patchset for 4.1-rc1. Lots of different driver subsystem updates here, nothing major, full details are in the shortlog below. All of this has been in linux-next for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlU2IMEACgkQMUfUDdst+yloDQCfbyIRL23WVAn9ckQse/y8gbjB OT4AoKTJbwndDP9Kb/lrj2tjd9QjNVrC =xhen -----END PGP SIGNATURE----- Merge tag 'char-misc-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here's the big char/misc driver patchset for 4.1-rc1. Lots of different driver subsystem updates here, nothing major, full details are in the shortlog. All of this has been in linux-next for a while" * tag 'char-misc-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (133 commits) mei: trace: remove unused TRACE_SYSTEM_STRING DTS: ARM: OMAP3-N900: Add lis3lv02d support Documentation: DT: lis302: update wakeup binding lis3lv02d: DT: add wakeup unit 2 and wakeup threshold lis3lv02d: DT: use s32 to support negative values Drivers: hv: hv_balloon: correctly handle num_pages>INT_MAX case Drivers: hv: hv_balloon: correctly handle val.freeram<num_pages case mei: replace check for connection instead of transitioning mei: use mei_cl_is_connected consistently mei: fix mei_poll operation hv_vmbus: Add gradually increased delay for retries in vmbus_post_msg() Drivers: hv: hv_balloon: survive ballooning request with num_pages=0 Drivers: hv: hv_balloon: eliminate jumps in piecewiese linear floor function Drivers: hv: hv_balloon: do not online pages in offline blocks hv: remove the per-channel workqueue hv: don't schedule new works in vmbus_onoffer()/vmbus_onoffer_rescind() hv: run non-blocking message handlers in the dispatch tasklet coresight: moving to new "hwtracing" directory coresight-tmc: Adding a status interface to sysfs coresight: remove the unnecessary configuration coresight-default-sink ...	2015-04-21 09:42:58 -07:00
Linus Torvalds	41d5e08ea8	TTY/Serial patches for 4.1-rc1 Here's the big tty/serial driver update for 4.1-rc1. It was delayed for a bit due to some questions surrounding some of the console command line parsing changes that are in here. There's still one tiny regression for people who were previously putting multiple console command lines and expecting them all to be ignored for some odd reason, but Peter is working on fixing that. If not, I'll send a revert for the offending patch, but I have faith that Peter can address it. Other than the console work here, there's the usual serial driver updates and changes, and a buch of 8250 reworks to try to make that driver easier to maintain over time, and have it support more devices in the future. All of these have been in linux-next for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlU2IcUACgkQMUfUDdst+ylFqACcC8LPhFEZg9aHn0hNUoqGK3rE 5dUAnR4b8r/NYqjVoE9FJZgZfB/TqVi1 =lyN/ -----END PGP SIGNATURE----- Merge tag 'tty-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial updates from Greg KH: "Here's the big tty/serial driver update for 4.1-rc1. It was delayed for a bit due to some questions surrounding some of the console command line parsing changes that are in here. There's still one tiny regression for people who were previously putting multiple console command lines and expecting them all to be ignored for some odd reason, but Peter is working on fixing that. If not, I'll send a revert for the offending patch, but I have faith that Peter can address it. Other than the console work here, there's the usual serial driver updates and changes, and a buch of 8250 reworks to try to make that driver easier to maintain over time, and have it support more devices in the future. All of these have been in linux-next for a while" * tag 'tty-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (119 commits) n_gsm: Drop unneeded cast on netdev_priv sc16is7xx: expose RTS inversion in RS-485 mode serial: 8250_pci: port failed after wakeup from S3 earlycon: 8250: Document kernel command line options earlycon: 8250: Fix command line regression earlycon: Fix __earlycon_table stride tty: clean up the tty time logic a bit serial: 8250_dw: only get the clock rate in one place serial: 8250_dw: remove useless ACPI ID check dmaengine: hsu: move memory allocation to GFP_NOWAIT dmaengine: hsu: remove redundant pieces of code serial: 8250_pci: add Intel Tangier support dmaengine: hsu: add Intel Tangier PCI ID serial: 8250_pci: replace switch-case by formula for Intel MID serial: 8250_pci: replace switch-case by formula tty: cpm_uart: replace CONFIG_8xx by CONFIG_CPM1 serial: jsm: some off by one bugs serial: xuartps: Fix check in console_setup(). serial: xuartps: Get rid of register access macros. serial: xuartps: Fix iobase use. ...	2015-04-21 09:33:10 -07:00
Linus Torvalds	09d51602cf	Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat update from Len Brown: "Updates to the turbostat utility. Just one kernel dependency in this batch -- added a #define to msr-index.h" * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: correct dumped pkg-cstate-limit value tools/power turbostat: calculate TSC frequency from CPUID(0x15) on SKL tools/power turbostat: correct DRAM RAPL units on recent Xeon processors tools/power turbostat: Initial Skylake support tools/power turbostat: Use $(CURDIR) instead of $(PWD) and add support for O= option in Makefile tools/power turbostat: modprobe msr, if needed tools/power turbostat: dump MSR_TURBO_RATIO_LIMIT2 tools/power turbostat: use new MSR_TURBO_RATIO_LIMIT names x86 msr-index: define MSR_TURBO_RATIO_LIMIT,1,2 tools/power turbostat: label base frequency tools/power turbostat: update PERF_LIMIT_REASONS decoding tools/power turbostat: simplify default output	2015-04-19 14:31:41 -07:00
Len Brown	0b2bb6925e	tools/power turbostat: Initial Skylake support Skylake adds some additional residency counters. Skylake supports a different mix of RAPL registers from any previous product. In most other ways, Skylake is like Broadwell. Signed-off-by: Len Brown <len.brown@intel.com>	2015-04-18 14:20:51 -04:00
Linus Torvalds	34a984f7b0	Merge branch 'x86-pmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull PMEM driver from Ingo Molnar: "This is the initial support for the pmem block device driver: persistent non-volatile memory space mapped into the system's physical memory space as large physical memory regions. The driver is based on Intel code, written by Ross Zwisler, with fixes by Boaz Harrosh, integrated with x86 e820 memory resource management and tidied up by Christoph Hellwig. Note that there were two other separate pmem driver submissions to lkml: but apparently all parties (Ross Zwisler, Boaz Harrosh) are reasonably happy with this initial version. This version enables minimal support that enables persistent memory devices out in the wild to work as block devices, identified through a magic (non-standard) e820 flag and auto-discovered if CONFIG_X86_PMEM_LEGACY=y, or added explicitly through manipulating the memory maps via the "memmap=..." boot option with the new, special '!' modifier character. Limitations: this is a regular block device, and since the pmem areas are not struct page backed, they are invisible to the rest of the system (other than the block IO device), so direct IO to/from pmem areas, direct mmap() or XIP is not possible yet. The page cache will also shadow and double buffer pmem contents, etc. Initial support is for x86" * 'x86-pmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: drivers/block/pmem: Fix 32-bit build warning in pmem_alloc() drivers/block/pmem: Add a driver for persistent memory x86/mm: Add support for the non-standard protected e820 type	2015-04-18 11:42:49 -04:00
Kees Cook	8eb68bf75e	x86: switch to using asm-generic for seccomp.h Switch to using the newly created asm-generic/seccomp.h for the seccomp strict mode syscall definitions. The obsolete sigreturn syscall override is retained in 32-bit mode, and the ia32 syscall overrides are used in the compat case. Remaining definitions were identical. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-04-17 09:04:10 -04:00
Chris Wilson	6a907738ab	x86/asm: Enable fast 32-bit put_user_64() for copy_to_user() For fixed sized copies, copy_to_user() will utilize __put_user_size() fastpaths. However, it is missing the translation for 64-bit copies on x86/32. Testing on a Pinetrail Atom, the 64 bit put_user() fastpath is substantially faster than the generic copy_to_user() fallback. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Link: http://lkml.kernel.org/r/1429091486-11443-1-git-send-email-chris@chris-wilson.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-16 12:08:21 +02:00
Linus Torvalds	fa2e5c073a	Merge branch 'exec_domain_rip_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc Pull exec domain removal from Richard Weinberger: "This series removes execution domain support from Linux. The idea behind exec domains was to support different ABIs. The feature was never complete nor stable. Let's rip it out and make the kernel signal handling code less complicated" * 'exec_domain_rip_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc: (27 commits) arm64: Removed unused variable sparc: Fix execution domain removal Remove rest of exec domains. arch: Remove exec_domain from remaining archs arc: Remove signal translation and exec_domain xtensa: Remove signal translation and exec_domain xtensa: Autogenerate offsets in struct thread_info x86: Remove signal translation and exec_domain unicore32: Remove signal translation and exec_domain um: Remove signal translation and exec_domain tile: Remove signal translation and exec_domain sparc: Remove signal translation and exec_domain sh: Remove signal translation and exec_domain s390: Remove signal translation and exec_domain mn10300: Remove signal translation and exec_domain microblaze: Remove signal translation and exec_domain m68k: Remove signal translation and exec_domain m32r: Remove signal translation and exec_domain m32r: Autogenerate offsets in struct thread_info frv: Remove signal translation and exec_domain ...	2015-04-15 13:53:55 -07:00
Linus Torvalds	2481bc7528	Power management and ACPI updates for v4.1-rc1 - Generic PM domains support update including new PM domain callbacks to handle device initialization better (Russell King, Rafael J Wysocki, Kevin Hilman). - Unified device properties API update including a new mechanism for accessing data provided by platform initialization code (Rafael J Wysocki, Adrian Hunter). - ARM cpuidle update including ARM32/ARM64 handling consolidation (Daniel Lezcano). - intel_idle update including support for the Silvermont Core in the Baytrail SOC and for the Airmont Core in the Cherrytrail and Braswell SOCs (Len Brown, Mathias Krause). - New cpufreq driver for Hisilicon ACPU (Leo Yan). - intel_pstate update including support for the Knights Landing chip (Dasaratharaman Chandramouli, Kristen Carlson Accardi). - QorIQ cpufreq driver update (Tang Yuantian, Arnd Bergmann). - powernv cpufreq driver update (Shilpasri G Bhat). - devfreq update including Tegra support changes (Tomeu Vizoso, MyungJoo Ham, Chanwoo Choi). - powercap RAPL (Running-Average Power Limit) driver update including support for Intel Broadwell server chips (Jacob Pan, Mathias Krause). - ACPI device enumeration update related to the handling of the special PRP0001 device ID allowing DT-style 'compatible' property to be used for ACPI device identification (Rafael J Wysocki). - ACPI EC driver update including limited _DEP support (Lan Tianyu, Lv Zheng). - ACPI backlight driver update including a new mechanism to allow native backlight handling to be forced on non-Windows 8 systems and a new quirk for Lenovo Ideapad Z570 (Aaron Lu, Hans de Goede). - New Windows Vista compatibility quirk for Sony VGN-SR19XN (Chen Yu). - Assorted ACPI fixes and cleanups (Aaron Lu, Martin Kepplinger, Masanari Iida, Mika Westerberg, Nan Li, Rafael J Wysocki). - Fixes related to suspend-to-idle for the iTCO watchdog driver and the ACPI core system suspend/resume code (Rafael J Wysocki, Chen Yu). - PM tracing support for the suspend phase of system suspend/resume transitions (Zhonghui Fu). - Configurable delay for the system suspend/resume testing facility (Brian Norris). - PNP subsystem cleanups (Peter Huewe, Rafael J Wysocki). / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJVLbO+AAoJEILEb/54YlRx5N4QAJXsmEW1FL2l6mMAyTQkEsVj nbqjF9I6aJgYM9+i8GKaZJxpN17SAZ7Ii7aCAXjPwX8AvjT70+gcZr+KDWtPir61 B75VNVEcUYOR4vOF5Z6rQcQMlhGPkfMOJYXFMahpOG6DdPbVh1x2/tuawfc6IC0V a6S/fln6WqHrXQ+8swDSv1KuZsav6+8AQaTlNUQkkuXdY9b3k/3xiy5C2K26APP8 x1B39iAF810qX6ipnK0gEOC3Vs29dl7hvNmgOVmmkBGVS7+pqTuy5n1/9M12cDRz 78IQ7DXB0NcSwr5tdrmGVUyH0Q6H9lnD3vO7MJkYwKDh5a/2MiBr2GZc4KHDKDWn E1sS27f1Pdn9qnpWLzTcY+yYNV3EEyre56L2fc+sh+Xq9sNOjUah+Y/eAej/IxYD XYRf+GAj768yCJgNP+Y3PJES/PRh+0IZ/dn5k0Qq2iYvc8mcObyG6zdQIvCucv/i 70uV1Z2GWEb31cI9TUV8o5GrMW3D0KI9EsCEEpiFFUnhjNog3AWcerGgFQMHxu7X ZnNSzudvek+XJ3NtpbPgTiJAmnMz8bDvBQm3G1LUO2TQdjYTU6YMUHsfzXs8DL6c aIMWO4stkVuDtWrlT/hfzIXepliccyXmSP6sbH+zNNCepulXe5C4M2SftaDi4l/B uIctXWznvHoGys+EFL+v =erd3 -----END PGP SIGNATURE----- Merge tag 'pm+acpi-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI updates from Rafael Wysocki: "These are mostly fixes and cleanups all over, although there are a few items that sort of fall into the new feature category. First off, we have new callbacks for PM domains that should help us to handle some issues related to device initialization in a better way. There also is some consolidation in the unified device properties API area allowing us to use that inferface for accessing data coming from platform initialization code in addition to firmware-provided data. We have some new device/CPU IDs in a few drivers, support for new chips and a new cpufreq driver too. Specifics: - Generic PM domains support update including new PM domain callbacks to handle device initialization better (Russell King, Rafael J Wysocki, Kevin Hilman) - Unified device properties API update including a new mechanism for accessing data provided by platform initialization code (Rafael J Wysocki, Adrian Hunter) - ARM cpuidle update including ARM32/ARM64 handling consolidation (Daniel Lezcano) - intel_idle update including support for the Silvermont Core in the Baytrail SOC and for the Airmont Core in the Cherrytrail and Braswell SOCs (Len Brown, Mathias Krause) - New cpufreq driver for Hisilicon ACPU (Leo Yan) - intel_pstate update including support for the Knights Landing chip (Dasaratharaman Chandramouli, Kristen Carlson Accardi) - QorIQ cpufreq driver update (Tang Yuantian, Arnd Bergmann) - powernv cpufreq driver update (Shilpasri G Bhat) - devfreq update including Tegra support changes (Tomeu Vizoso, MyungJoo Ham, Chanwoo Choi) - powercap RAPL (Running-Average Power Limit) driver update including support for Intel Broadwell server chips (Jacob Pan, Mathias Krause) - ACPI device enumeration update related to the handling of the special PRP0001 device ID allowing DT-style 'compatible' property to be used for ACPI device identification (Rafael J Wysocki) - ACPI EC driver update including limited _DEP support (Lan Tianyu, Lv Zheng) - ACPI backlight driver update including a new mechanism to allow native backlight handling to be forced on non-Windows 8 systems and a new quirk for Lenovo Ideapad Z570 (Aaron Lu, Hans de Goede) - New Windows Vista compatibility quirk for Sony VGN-SR19XN (Chen Yu) - Assorted ACPI fixes and cleanups (Aaron Lu, Martin Kepplinger, Masanari Iida, Mika Westerberg, Nan Li, Rafael J Wysocki) - Fixes related to suspend-to-idle for the iTCO watchdog driver and the ACPI core system suspend/resume code (Rafael J Wysocki, Chen Yu) - PM tracing support for the suspend phase of system suspend/resume transitions (Zhonghui Fu) - Configurable delay for the system suspend/resume testing facility (Brian Norris) - PNP subsystem cleanups (Peter Huewe, Rafael J Wysocki)" * tag 'pm+acpi-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits) ACPI / scan: Fix NULL pointer dereference in acpi_companion_match() ACPI / scan: Rework modalias creation when "compatible" is present intel_idle: mark cpu id array as __initconst powercap / RAPL: mark rapl_ids array as __initconst powercap / RAPL: add ID for Broadwell server intel_pstate: Knights Landing support intel_pstate: remove MSR test cpufreq: fix qoriq uniprocessor build ACPI / scan: Take the PRP0001 position in the list of IDs into account ACPI / scan: Simplify acpi_match_device() ACPI / scan: Generalize of_compatible matching device property: Introduce firmware node type for platform data device property: Make it possible to use secondary firmware nodes PM / watchdog: iTCO: stop watchdog during system suspend cpufreq: hisilicon: add acpu driver ACPI / EC: Call acpi_walk_dep_device_list() after installing EC opregion handler cpufreq: powernv: Report cpu frequency throttling intel_idle: Add support for the Airmont Core in the Cherrytrail and Braswell SOCs intel_idle: Update support for Silvermont Core in Baytrail SOC PM / devfreq: tegra: Register governor on module init ...	2015-04-14 20:21:54 -07:00
Linus Torvalds	1dcf58d6e6	Merge branch 'akpm' (patches from Andrew) Merge first patchbomb from Andrew Morton: - arch/sh updates - ocfs2 updates - kernel/watchdog feature - about half of mm/ * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (122 commits) Documentation: update arch list in the 'memtest' entry Kconfig: memtest: update number of test patterns up to 17 arm: add support for memtest arm64: add support for memtest memtest: use phys_addr_t for physical addresses mm: move memtest under mm mm, hugetlb: abort __get_user_pages if current has been oom killed mm, mempool: do not allow atomic resizing memcg: print cgroup information when system panics due to panic_on_oom mm: numa: remove migrate_ratelimited mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE mm: split ET_DYN ASLR from mmap ASLR s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE mm: expose arch_mmap_rnd when available s390: standardize mmap_rnd() usage powerpc: standardize mmap_rnd() usage mips: extract logic for mmap_rnd() arm64: standardize mmap_rnd() usage x86: standardize mmap_rnd() usage arm: factor out mmap ASLR into mmap_rnd ...	2015-04-14 16:49:17 -07:00
Vladimir Murzin	4a20799d11	mm: move memtest under mm Memtest is a simple feature which fills the memory with a given set of patterns and validates memory contents, if bad memory regions is detected it reserves them via memblock API. Since memblock API is widely used by other architectures this feature can be enabled outside of x86 world. This patch set promotes memtest to live under generic mm umbrella and enables memtest feature for arm/arm64. It was reported that this patch set was useful for tracking down an issue with some errant DMA on an arm64 platform. This patch (of 6): There is nothing platform dependent in the core memtest code, so other platforms might benefit from this feature too. [linux@roeck-us.net: MEMTEST depends on MEMBLOCK] Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-04-14 16:49:06 -07:00
Kees Cook	204db6ed17	mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE The arch_randomize_brk() function is used on several architectures, even those that don't support ET_DYN ASLR. To avoid bulky extern/#define tricks, consolidate the support under CONFIG_ARCH_HAS_ELF_RANDOMIZE for the architectures that support it, while still handling CONFIG_COMPAT_BRK. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Russell King <linux@arm.linux.org.uk> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: "David A. Long" <dave.long@linaro.org> Cc: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Arun Chandran <achandran@mvista.com> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: Min-Hua Chen <orca.chen@gmail.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Alex Smith <alex@alex-smith.me.uk> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Vineeth Vijayan <vvijayan@mvista.com> Cc: Jeff Bailey <jeffbailey@google.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Behan Webster <behanw@converseincode.com> Cc: Ismael Ripoll <iripoll@upv.es> Cc: Jan-Simon Mller <dl9pf@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-04-14 16:49:05 -07:00
Toshi Kani	5d72b4fba4	x86, mm: support huge I/O mapping capability I/F Implement huge I/O mapping capability interfaces for ioremap() on x86. IOREMAP_MAX_ORDER is defined to PUD_SHIFT on x86/64 and PMD_SHIFT on x86/32, which overrides the default value defined in <linux/vmalloc.h>. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Robert Elliott <Elliott@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-04-14 16:49:04 -07:00
Kirill A. Shutemov	9823336833	x86: expose number of page table levels on Kconfig level We would want to use number of page table level to define mm_struct. Let's expose it as CONFIG_PGTABLE_LEVELS. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-04-14 16:49:02 -07:00
Linus Torvalds	6c8a53c9e6	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Core kernel changes: - One of the more interesting features in this cycle is the ability to attach eBPF programs (user-defined, sandboxed bytecode executed by the kernel) to kprobes. This allows user-defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively. (Right now it's limited to root-only, but in the future we might allow unprivileged use as well.) (Alexei Starovoitov) - Another non-trivial feature is per event clockid support: this allows, amongst other things, the selection of different clock sources for event timestamps traced via perf. This feature is sought by people who'd like to merge perf generated events with external events that were measured with different clocks: - cluster wide profiling - for system wide tracing with user-space events, - JIT profiling events etc. Matching perf tooling support is added as well, available via the -k, --clockid <clockid> parameter to perf record et al. (Peter Zijlstra) Hardware enablement kernel changes: - x86 Intel Processor Trace (PT) support: which is a hardware tracer on steroids, available on Broadwell CPUs. The hardware trace stream is directly output into the user-space ring-buffer, using the 'AUX' data format extension that was added to the perf core to support hardware constraints such as the necessity to have the tracing buffer physically contiguous. This patch-set was developed for two years and this is the result. A simple way to make use of this is to use BTS tracing, the PT driver emulates BTS output - available via the 'intel_bts' PMU. More explicit PT specific tooling support is in the works as well - will probably be ready by 4.2. (Alexander Shishkin, Peter Zijlstra) - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware feature of Intel Xeon CPUs that allows the measurement and allocation/partitioning of caches to individual workloads. These kernel changes expose the measurement side as a new PMU driver, which exposes various QoS related PMU events. (The partitioning change is work in progress and is planned to be merged as a cgroup extension.) (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P Waskiewicz Jr) - x86 Intel Haswell LBR call stack support: this is a new Haswell feature that allows the hardware recording of call chains, plus tooling support. To activate this feature you have to enable it via the new 'lbr' call-graph recording option: perf record --call-graph lbr perf report or: perf top --call-graph lbr This hardware feature is a lot faster than stack walk or dwarf based unwinding, but has some limitations: - It reuses the current LBR facility, so LBR call stack and branch record can not be enabled at the same time. - It is only available for user-space callchains. (Yan, Zheng) - x86 Intel Broadwell CPU support and various event constraints and event table fixes for earlier models. (Andi Kleen) - x86 Intel HT CPUs event scheduling workarounds. This is a complex CPU bug affecting the SNB,IVB,HSW families that results in counter value corruption. The mitigation code is automatically enabled and is transparent. (Maria Dimakopoulou, Stephane Eranian) The perf tooling side had a ton of changes in this cycle as well, so I'm only able to list the user visible changes here, in addition to the tooling changes outlined above: User visible changes affecting all tools: - Improve support of compressed kernel modules (Jiri Olsa) - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo) - Bash completion for subcommands (Yunlong Song) - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa) - Support missing -f to override perf.data file ownership. (Yunlong Song) - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo) User visible changes in individual tools: 'perf data': New tool for converting perf.data to other formats, initially for the CTF (Common Trace Format) from LTTng (Jiri Olsa, Sebastian Siewior) 'perf diff': Add --kallsyms option (David Ahern) 'perf list': Allow listing events with 'tracepoint' prefix (Yunlong Song) Sort the output of the command (Yunlong Song) 'perf kmem': Respect -i option (Jiri Olsa) Print big numbers using thousands' group (Namhyung Kim) Allow -v option (Namhyung Kim) Fix alignment of slab result table (Namhyung Kim) 'perf probe': Support multiple probes on different binaries on the same command line (Masami Hiramatsu) Support unnamed union/structure members data collection. (Masami Hiramatsu) Check kprobes blacklist when adding new events. (Masami Hiramatsu) 'perf record': Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra) Support recording running/enabled time (Andi Kleen) 'perf sched': Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song) 'perf report' and 'perf top': Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo) Indicate which callchain entries are annotated in the TUI hists browser (Arnaldo Carvalho de Melo) Add pid/tid filtering to 'report' and 'script' commands (David Ahern) Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT events (Arnaldo Carvalho de Melo) 'perf stat': Report unsupported events properly (Suzuki K. Poulose) Output running time and run/enabled ratio in CSV mode (Andi Kleen) 'perf trace': Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo) Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo) Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo) Dump stack on segfaults (Arnaldo Carvalho de Melo) No need to explicitely enable evsels for workload started from perf, let it be enabled via perf_event_attr.enable_on_exec, removing some events that take place in the 'perf trace' before a workload is really started by it. (Arnaldo Carvalho de Melo) Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo) There's also been a ton of infrastructure work done, such as the split-out of perf's build system into tools/build/ and other changes - see the shortlog and changelog for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits) perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init() perf evlist: Fix type for references to data_head/tail perf probe: Check the orphaned -x option perf probe: Support multiple probes on different binaries perf buildid-list: Fix segfault when show DSOs with hits perf tools: Fix cross-endian analysis perf tools: Fix error path to do closedir() when synthesizing threads perf tools: Fix synthesizing fork_event.ppid for non-main thread perf tools: Add 'I' event modifier for exclude_idle bit perf report: Don't call map__kmap if map is NULL. perf tests: Fix attr tests perf probe: Fix ARM 32 building error perf tools: Merge all perf_event_attr print functions perf record: Add clockid parameter perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 perf sched replay: Support using -f to override perf.data file ownership perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task perf sched replay: Fix the segmentation fault problem caused by pr_err in threads perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations ...	2015-04-14 14:37:47 -07:00
Linus Torvalds	078838d565	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU changes from Ingo Molnar: "The main changes in this cycle were: - changes permitting use of call_rcu() and friends very early in boot, for example, before rcu_init() is invoked. - add in-kernel API to enable and disable expediting of normal RCU grace periods. - improve RCU's handling of (hotplug-) outgoing CPUs. - NO_HZ_FULL_SYSIDLE fixes. - tiny-RCU updates to make it more tiny. - documentation updates. - miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) cpu: Provide smpboot_thread_init() on !CONFIG_SMP kernels as well cpu: Defer smpboot kthread unparking until CPU known to scheduler rcu: Associate quiescent-state reports with grace period rcu: Yet another fix for preemption and CPU hotplug rcu: Add diagnostics to grace-period cleanup rcutorture: Default to grace-period-initialization delays rcu: Handle outgoing CPUs on exit from idle loop cpu: Make CPU-offline idle-loop transition point more precise rcu: Eliminate ->onoff_mutex from rcu_node structure rcu: Process offlining and onlining only at grace-period start rcu: Move rcu_report_unblock_qs_rnp() to common code rcu: Rework preemptible expedited bitmask handling rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs rcutorture: Enable slow grace-period initializations rcu: Provide diagnostic option to slow down grace-period initialization rcu: Detect stalls caused by failure to propagate up rcu_node tree rcu: Eliminate empty HOTPLUG_CPU ifdef rcu: Simplify sync_rcu_preempt_exp_init() rcu: Put all orphan-callback-related code under same comment rcu: Consolidate offline-CPU callback initialization ...	2015-04-14 13:36:04 -07:00
Linus Torvalds	9497d7380b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching updates from Jiri Kosina: "These are mostly smaller things that got accumulated during the development cycle. The unified solution is still being worked on and is not mature enough for 4.1 yet. - s390 livepatching support, from Jiri Slaby (has Ack from s390 maintainers) - error handling simplification, from Josh Poimboeuf - two minor code cleanups from Josh Poimboeuf and Miroslav Benes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: add support on s390 livepatch: remove unnecessary call to klp_find_object_module() livepatch: simplify disable error path livepatch: remove extern specifier from header files	2015-04-14 10:15:34 -07:00
Len Brown	c4d30668da	x86 msr-index: define MSR_TURBO_RATIO_LIMIT,1,2 MSR_TURBO_RATIO_LIMIT has grown into a set of three registers. Add the documented names for them, in preparation for deleting the previous ad-hoc names: +#define MSR_TURBO_RATIO_LIMIT 0x000001ad +#define MSR_TURBO_RATIO_LIMIT1 0x000001ae +#define MSR_TURBO_RATIO_LIMIT2 0x000001af Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org	2015-04-13 16:41:49 -04:00
Linus Torvalds	07f2d8c63f	Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: "The main changes in this cycle were: - Simplify the CMCI storm logic on Intel CPUs after yet another report about a race in the code (Borislav Petkov) - Enable the MCE threshold irq on AMD CPUs by default (Aravind Gopalakrishnan) - Add AMD-specific MCE-severity grading function. Further error recovery actions will be based on its output (Aravind Gopalakrishnan) - Documentation updates (Borislav Petkov) - ... assorted fixes and cleanups" * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce/severity: Fix warning about indented braces x86/mce: Define mce_severity function pointer x86/mce: Add an AMD severities-grading function x86/mce: Reindent __mcheck_cpu_apply_quirks() properly x86/mce: Use safe MSR accesses for AMD quirk x86/MCE/AMD: Enable thresholding interrupts by default if supported x86/MCE: Make mce_panic() fatal machine check msg in the same pattern x86/MCE/intel: Cleanup CMCI storm logic Documentation/acpi/einj: Correct and streamline text x86/MCE/AMD: Drop bogus const modifier from AMD's bank4_names()	2015-04-13 13:33:20 -07:00
Linus Torvalds	6cf78d4b37	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "The main changes in this cycle were: - reduce the x86/32 PAE per task PGD allocation overhead from 4K to 0.032k (Fenghua Yu) - early_ioremap/memunmap() usage cleanups (Juergen Gross) - gbpages support cleanups (Luis R Rodriguez) - improve AMD Bulldozer (family 0x15) ASLR I$ aliasing workaround to increase randomization by 3 bits (per bootup) (Hector Marco-Gisbert) - misc fixlets" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Improve AMD Bulldozer ASLR workaround x86/mm/pat: Initialize __cachemode2pte_tbl[] and __pte2cachemode_tbl[] in a bit more readable fashion init.h: Clean up the __setup()/early_param() macros x86/mm: Simplify probe_page_size_mask() x86/mm: Further simplify 1 GB kernel linear mappings handling x86/mm: Use early_param_on_off() for direct_gbpages init.h: Add early_param_on_off() x86/mm: Simplify enabling direct_gbpages x86/mm: Use IS_ENABLED() for direct_gbpages x86/mm: Unexport set_memory_ro() and set_memory_rw() x86/mm, efi: Use early_ioremap() in arch/x86/platform/efi/efi-bgrt.c x86/mm: Use early_memunmap() instead of early_iounmap() x86/mm/pat: Ensure different messages in STRICT_DEVMEM and PAT cases x86/mm: Reduce PAE-mode per task pgd allocation overhead from 4K to 32 bytes	2015-04-13 13:31:32 -07:00
Linus Torvalds	0ad5c6b3c2	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode changes from Ingo Molnar: "Microcode driver updates: mostly cleanups but also some fixes (Borislav Petkov)" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/amd: Drop the pci_ids.h dependency x86/microcode/intel: Fix printing of microcode blobs in show_saved_mc() x86/microcode/intel: Check scan_microcode()'s retval x86/microcode/intel: Sanitize microcode_pointer() x86/microcode/intel: Move mc arg last in get_matching_{microcode\|sig} x86/microcode/intel: Simplify generic_load_microcode_early() x86/microcode: Consolidate family,model, ... code x86/microcode/intel: Rename update_match_revision() x86/microcode/intel: Sanitize _save_mc() x86/microcode/intel: Make _save_mc() return the updated saved count x86/microcode/intel: Simplify load_ucode_intel_bsp() x86/microcode/intel: Get rid of last arg to load_ucode_intel_bsp() x86/microcode/intel: Do the mc_saved_src NULL check first x86/microcode/intel: Check if microcode was found before applying x86/microcode/intel: Fix out of bounds memory access to the extended header	2015-04-13 13:25:33 -07:00
Linus Torvalds	421ec9017f	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu changes from Ingo Molnar: "Various x86 FPU handling cleanups, refactorings and fixes (Borislav Petkov, Oleg Nesterov, Rik van Riel)" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86/fpu: Kill eager_fpu_init_bp() x86/fpu: Don't allocate fpu->state for swapper/0 x86/fpu: Rename drop_init_fpu() to fpu_reset_state() x86/fpu: Fold __drop_fpu() into its sole user x86/fpu: Don't abuse drop_init_fpu() in flush_thread() x86/fpu: Use restore_init_xstate() instead of math_state_restore() on kthread exec x86/fpu: Introduce restore_init_xstate() x86/fpu: Document user_fpu_begin() x86/fpu: Factor out memset(xstate, 0) in fpu_finit() paths x86/fpu: Change xstateregs_get()/set() to use ->xsave.i387 rather than ->fxsave x86/fpu: Don't abuse FPU in kernel threads if use_eager_fpu() x86/fpu: Always allow FPU in interrupt if use_eager_fpu() x86/fpu: __kernel_fpu_begin() should clear fpu_owner_task even if use_eager_fpu() x86/fpu: Also check fpu_lazy_restore() when use_eager_fpu() x86/fpu: Use task_disable_lazy_fpu_restore() helper x86/fpu: Use an explicit if/else in switch_fpu_prepare() x86/fpu: Introduce task_disable_lazy_fpu_restore() helper x86/fpu: Move lazy restore functions up a few lines x86/fpu: Change math_error() to use unlazy_fpu(), kill (now) unused save_init_fpu() x86/fpu: Don't do __thread_fpu_end() if use_eager_fpu() ...	2015-04-13 13:24:23 -07:00
Linus Torvalds	9f3252f1ad	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Various cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/iommu: Fix header comments regarding standard and _FINISH macros x86/earlyprintk: Put CONFIG_PCI-only functions under the #ifdef x86: Fix up obsolete __cpu_set() function usage	2015-04-13 13:20:54 -07:00
Linus Torvalds	60f898eeaa	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm changes from Ingo Molnar: "There were lots of changes in this development cycle: - over 100 separate cleanups, restructuring changes, speedups and fixes in the x86 system call, irq, trap and other entry code, part of a heroic effort to deobfuscate a decade old spaghetti asm code and its C code dependencies (Denys Vlasenko, Andy Lutomirski) - alternatives code fixes and enhancements (Borislav Petkov) - simplifications and cleanups to the compat code (Brian Gerst) - signal handling fixes and new x86 testcases (Andy Lutomirski) - various other fixes and cleanups By their nature many of these changes are risky - we tried to test them well on many different x86 systems (there are no known regressions), and they are split up finely to help bisection - but there's still a fair bit of residual risk left so caveat emptor" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (148 commits) perf/x86/64: Report regs_user->ax too in get_regs_user() perf/x86/64: Simplify regs_user->abi setting code in get_regs_user() perf/x86/64: Do report user_regs->cx while we are in syscall, in get_regs_user() perf/x86/64: Do not guess user_regs->cs, ss, sp in get_regs_user() x86/asm/entry/32: Tidy up JNZ instructions after TESTs x86/asm/entry/64: Reduce padding in execve stubs x86/asm/entry/64: Remove GET_THREAD_INFO() in ret_from_fork x86/asm/entry/64: Simplify jumps in ret_from_fork x86/asm/entry/64: Remove a redundant jump x86/asm/entry/64: Optimize [v]fork/clone stubs x86/asm/entry: Zero EXTRA_REGS for stub32_execve() too x86/asm/entry/64: Move stub_x32_execvecloser() to stub_execveat() x86/asm/entry/64: Use common code for rt_sigreturn() epilogue x86/asm/entry/64: Add forgotten CFI annotation x86/asm/entry/irq: Simplify interrupt dispatch table (IDT) layout x86/asm/entry/64: Move opportunistic sysret code to syscall code path x86, selftests: Add sigreturn selftest x86/alternatives: Guard NOPs optimization x86/asm/entry: Clear EXTRA_REGS for all executable formats x86/signal: Remove pax argument from restore_sigcontext ...	2015-04-13 13:16:36 -07:00
Linus Torvalds	977e1ba508	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic changes from Ingo Molnar: "Changes: - SGI UV APIC driver updates - dead code removal" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic/uv: Update the UV APIC HUB check x86/apic/uv: Update the UV APIC driver check x86/apic/uv: Update the APIC UV OEM check x86/apic: Remove verify_local_APIC()	2015-04-13 13:15:09 -07:00
Linus Torvalds	49d2953c72	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "Major changes: - Reworked CPU capacity code, for better SMP load balancing on systems with assymetric CPUs. (Vincent Guittot, Morten Rasmussen) - Reworked RT task SMP balancing to be push based instead of pull based, to reduce latencies on large CPU count systems. (Steven Rostedt) - SCHED_DEADLINE support updates and fixes. (Juri Lelli) - SCHED_DEADLINE task migration support during CPU hotplug. (Wanpeng Li) - x86 mwait-idle optimizations and fixes. (Mike Galbraith, Len Brown) - sched/numa improvements. (Rik van Riel) - various cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits) sched/core: Drop debugging leftover trace_printk call sched/deadline: Support DL task migration during CPU hotplug sched/core: Check for available DL bandwidth in cpuset_cpu_inactive() sched/deadline: Always enqueue on previous rq when dl_task_timer() fires sched/core: Remove unused argument from init_[rt\|dl]_rq() sched/deadline: Fix rt runtime corruption when dl fails its global constraints sched/deadline: Avoid a superfluous check sched: Improve load balancing in the presence of idle CPUs sched: Optimize freq invariant accounting sched: Move CFS tasks to CPUs with higher capacity sched: Add SD_PREFER_SIBLING for SMT level sched: Remove unused struct sched_group_capacity::capacity_orig sched: Replace capacity_factor by usage sched: Calculate CPU's usage statistic and put it into struct sg_lb_stats::group_usage sched: Add struct rq::cpu_capacity_orig sched: Make scale_rt invariant with frequency sched: Make sched entity usage tracking scale-invariant sched: Remove frequency scaling from cpu_capacity sched: Track group sched_entity usage contributions sched: Add sched_avg::utilization_avg_contrib ...	2015-04-13 10:47:34 -07:00
Linus Torvalds	cc76ee75a9	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking changes from Ingo Molnar: "Main changes: - jump label asm preparatory work for PowerPC (Anton Blanchard) - rwsem optimizations and cleanups (Davidlohr Bueso) - mutex optimizations and cleanups (Jason Low) - futex fix (Oleg Nesterov) - remove broken atomicity checks from {READ,WRITE}_ONCE() (Peter Zijlstra)" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: powerpc, jump_label: Include linux/jump_label.h to get HAVE_JUMP_LABEL define jump_label: Allow jump labels to be used in assembly jump_label: Allow asm/jump_label.h to be included in assembly locking/mutex: Further simplify mutex_spin_on_owner() locking: Remove atomicy checks from {READ,WRITE}_ONCE locking/rtmutex: Rename argument in the rt_mutex_adjust_prio_chain() documentation as well locking/rwsem: Fix lock optimistic spinning when owner is not running locking: Remove ACCESS_ONCE() usage locking/rwsem: Check for active lock before bailing on spinning locking/rwsem: Avoid deceiving lock spinners locking/rwsem: Set lock ownership ASAP locking/rwsem: Document barrier need when waking tasks locking/futex: Check PF_KTHREAD rather than !p->mm to filter out kthreads locking/mutex: Refactor mutex_spin_on_owner() locking/mutex: In mutex_spin_on_owner(), return true when owner changes	2015-04-13 10:27:28 -07:00
Linus Torvalds	9c65e12a55	Merge branch 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI update from Ingo Molnar: "This tree includes various fixes, cleanups, a new efi=debug boot option and EFI boot stub memory allocation optimizations" * 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/libstub: Retrieve FDT size when loaded from UEFI config table efi: Clean up the efi_call_phys_[prolog\|epilog]() save/restore interaction efi: Disable interrupts around EFI calls, not in the epilog/prolog calls x86/efi: Add a "debug" option to the efi= cmdline firmware: dmi_scan: Use direct access to static vars firmware: dmi_scan: Use full dmi version for SMBIOS3	2015-04-13 10:22:30 -07:00
Linus Torvalds	9003601310	The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64. ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling vhost, too), page aging s390: interrupt handling rework, allowing to inject all local interrupts via new ioctl and to get/set the full local irq state for migration and introspection. New ioctls to access memory by virtual address, and to get/set the guest storage keys. SIMD support. MIPS: FPU and MIPS SIMD Architecture (MSA) support. Includes some patches from Ralf Baechle's MIPS tree. x86: bugfixes (notably for pvclock, the others are small) and cleanups. Another small latency improvement for the TSC deadline timer. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJVJ9vmAAoJEL/70l94x66DoMEH/R3rh8IMf4jTiWRkcqohOMPX k1+NaSY/lCKayaSgggJ2hcQenMbQoXEOdslvaA/H0oC+VfJGK+lmU6E63eMyyhjQ Y+Px6L85NENIzDzaVu/TIWWuhil5PvIRr3VO8cvntExRoCjuekTUmNdOgCvN2ObW wswN2qRdPIeEj2kkulbnye+9IV4G0Ne9bvsmUdOdfSSdi6ZcV43JcvrpOZT++mKj RrKB+3gTMZYGJXMMLBwMkdl8mK1ozriD+q0mbomT04LUyGlPwYLl4pVRDBqyksD7 KsSSybaK2E4i5R80WEljgDMkNqrCgNfg6VZe4n9Y+CfAAOToNnkMJaFEi+yuqbs= =yu2b -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "First batch of KVM changes for 4.1 The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64. Summary: ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling vhost, too), page aging s390: interrupt handling rework, allowing to inject all local interrupts via new ioctl and to get/set the full local irq state for migration and introspection. New ioctls to access memory by virtual address, and to get/set the guest storage keys. SIMD support. MIPS: FPU and MIPS SIMD Architecture (MSA) support. Includes some patches from Ralf Baechle's MIPS tree. x86: bugfixes (notably for pvclock, the others are small) and cleanups. Another small latency improvement for the TSC deadline timer" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits) KVM: use slowpath for cross page cached accesses kvm: mmu: lazy collapse small sptes into large sptes KVM: x86: Clear CR2 on VCPU reset KVM: x86: DR0-DR3 are not clear on reset KVM: x86: BSP in MSR_IA32_APICBASE is writable KVM: x86: simplify kvm_apic_map KVM: x86: avoid logical_map when it is invalid KVM: x86: fix mixed APIC mode broadcast KVM: x86: use MDA for interrupt matching kvm/ppc/mpic: drop unused IRQ_testbit KVM: nVMX: remove unnecessary double caching of MAXPHYADDR KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu KVM: vmx: pass error code with internal error #2 x86: vdso: fix pvclock races with task migration KVM: remove kvm_read_hva and kvm_read_hva_atomic KVM: x86: optimize delivery of TSC deadline timer interrupt KVM: x86: extract blocking logic from __vcpu_run kvm: x86: fix x86 eflags fixed bit KVM: s390: migrate vcpu interrupt state ...	2015-04-13 09:47:01 -07:00
Richard Weinberger	3050a35fba	x86: Remove signal translation and exec_domain As execution domain support is gone we can remove signal translation from the signal code and remove exec_domain from thread_info. Signed-off-by: Richard Weinberger <richard@nod.at>	2015-04-12 21:03:28 +02:00
Rafael J. Wysocki	be77002101	Merge back earlier suspend/hibernate material for v4.1.	2015-04-10 12:01:59 +02:00
Aravind Gopalakrishnan	b44915927c	x86/iommu: Fix header comments regarding standard and _FINISH macros The comment line regarding IOMMU_INIT and IOMMU_INIT_FINISH macros is incorrect: "The standard vs the _FINISH differs in that the _FINISH variant will continue detecting other IOMMUs in the call list..." It should be "..the standard variant will continue detecting..." Fix that. Also, make it readable while at it. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: konrad.wilk@oracle.com Fixes: `6e96366933` ("x86, iommu: Update header comments with appropriate naming") Link: http://lkml.kernel.org/r/1428508017-5316-1-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-09 10:56:31 +02:00
Anton Blanchard	55dd0df781	jump_label: Allow asm/jump_label.h to be included in assembly Wrap asm/jump_label.h for all archs with #ifndef __ASSEMBLY__. Since these are kernel only headers, we don't need #ifdef __KERNEL__ so can simplify things a bit. If an architecture wants to use jump labels in assembly, it will still need to define a macro to create the __jump_table entries (see ARCH_STATIC_BRANCH in the powerpc asm/jump_label.h for an example). Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: catalin.marinas@arm.com Cc: davem@davemloft.net Cc: heiko.carstens@de.ibm.com Cc: jbaron@akamai.com Cc: linux@arm.linux.org.uk Cc: linuxppc-dev@lists.ozlabs.org Cc: liuj97@gmail.com Cc: mgorman@suse.de Cc: mmarek@suse.cz Cc: mpe@ellerman.id.au Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: rostedt@goodmis.org Cc: schwidefsky@de.ibm.com Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1428551492-21977-1-git-send-email-anton@samba.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-09 09:40:23 +02:00
Linus Torvalds	cae2a173fe	x86: clean up/fix 'copy_in_user()' tail zeroing The rule for 'copy_from_user()' is that it zeroes the remaining kernel buffer even when the copy fails halfway, just to make sure that we don't leave uninitialized kernel memory around. Because even if we check for errors, some kernel buffers stay around after thge copy (think page cache). However, the x86-64 logic for user copies uses a copy_user_generic() function for all the cases, that set the "zerorest" flag for any fault on the source buffer. Which meant that it didn't just try to clear the kernel buffer after a failure in copy_from_user(), it also tried to clear the destination user buffer for the "copy_in_user()" case. Not only is that pointless, it also means that the clearing code has to worry about the tail clearing taking page faults for the user buffer case. Which is just stupid, since that case shouldn't happen in the first place. Get rid of the whole "zerorest" thing entirely, and instead just check if the destination is in kernel space or not. And then just use memset() to clear the tail of the kernel buffer if necessary. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-04-08 14:28:45 -07:00
Wanpeng Li	3ea3b7fa9a	kvm: mmu: lazy collapse small sptes into large sptes Dirty logging tracks sptes in 4k granularity, meaning that large sptes have to be split. If live migration is successful, the guest in the source machine will be destroyed and large sptes will be created in the destination. However, the guest continues to run in the source machine (for example if live migration fails), small sptes will remain around and cause bad performance. This patch introduce lazy collapsing of small sptes into large sptes. The rmap will be scanned in ioctl context when dirty logging is stopped, dropping those sptes which can be collapsed into a single large-page spte. Later page faults will create the large-page sptes. Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Message-Id: <1428046825-6905-1-git-send-email-wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:47:04 +02:00
Nadav Amit	ae561edeb4	KVM: x86: DR0-DR3 are not clear on reset DR0-DR3 are not cleared as they should during reset and when they are set from userspace. It appears to be caused by `c77fb5fe6f` ("KVM: x86: Allow the guest to run with dirty debug registers"). Force their reload on these situations. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427933438-12782-4-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:47:03 +02:00
Radim Krčmář	3b5a5ffa92	KVM: x86: simplify kvm_apic_map recalculate_apic_map() uses two passes over all VCPUs. This is a relic from time when we selected a global mode in the first pass and set up the optimized table in the second pass (to have a consistent mode). Recent changes made mixed mode unoptimized and we can do it in one pass. Format of logical MDA is a function of the mode, so we encode it in apic_logical_id() and drop obsoleted variables from the struct. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1423766494-26150-5-git-send-email-rkrcmar@redhat.com> [Add lid_bits temporary in apic_logical_id. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:47:01 +02:00
Radim Krčmář	3548a259f6	KVM: x86: avoid logical_map when it is invalid We want to support mixed modes and the easiest solution is to avoid optimizing those weird and unlikely scenarios. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1423766494-26150-4-git-send-email-rkrcmar@redhat.com> [Add comment above KVM_APIC_MODE_* defines. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:47:01 +02:00
Radim Krčmář	9ea369b032	KVM: x86: fix mixed APIC mode broadcast Broadcast allowed only one global APIC mode, but mixed modes are theoretically possible. x2APIC IPI doesn't mean 0xff as broadcast, the rest does. x2APIC broadcasts are accepted by xAPIC. If we take SDM to be logical, even addreses beginning with 0xff should be accepted, but real hardware disagrees. This patch aims for simple code by considering most of real behavior as undefined. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1423766494-26150-3-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:47:00 +02:00
Eugene Korenevsky	5a4f55cde8	KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu cpuid_maxphyaddr(), which performs lot of memory accesses is called extensively across KVM, especially in nVMX code. This patch adds a cached value of maxphyaddr to vcpu.arch to reduce the pressure onto CPU cache and simplify the code of cpuid_maxphyaddr() callers. The cached value is initialized in kvm_arch_vcpu_init() and reloaded every time CPUID is updated by usermode. It is obvious that these reloads occur infrequently. Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com> Message-Id: <20150329205612.GA1223@gnote> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:46:56 +02:00
Denys Vlasenko	3304c9c37b	x86/asm/entry/irq: Simplify interrupt dispatch table (IDT) layout Interrupt entry points are handled with the following code, each 32-byte code block contains seven entry points: ... [push][jump 22] // 4 bytes [push][jump 18] // 4 bytes [push][jump 14] // 4 bytes [push][jump 10] // 4 bytes [push][jump 6] // 4 bytes [push][jump 2] // 4 bytes [push][jump common_interrupt][padding] // 8 bytes [push][jump] [push][jump] [push][jump] [push][jump] [push][jump] [push][jump] [push][jump common_interrupt][padding] [padding_2] common_interrupt: And there is a table which holds pointers to every entry point, IOW: to every push. In cold cache, two jumps are still costlier than one, even though we get the benefit of them residing in the same cacheline. This change replaces short jumps with near ones to 'common_interrupt', and pads every push+jump pair to 8 bytes. This way, each interrupt takes only one jump. This change replaces ".p2align CONFIG_X86_L1_CACHE_SHIFT" before dispatch table with ".align 8" - we do not need anything stronger than that. The table of entry addresses (the interrupt[] array) is no longer necessary, the address of entries can be easily calculated as (irq_entries_start + i*8). text data bss dec hex filename 12546 0 0 12546 3102 entry_64.o.before 11626 0 0 11626 2d6a entry_64.o The size decrease is because 1656 bytes of .init.rodata are gone. That's initdata, though. The resident size does go up a bit. Run-tested (32 and 64 bits). Acked-and-Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428090553-7283-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-08 09:02:13 +02:00
Denys Vlasenko	fc3e958a2b	x86/asm/entry: Clear EXTRA_REGS for all executable formats On failure, sys_execve() does not clobber EXTRA_REGS, so we can just return to userpsace without saving/restoring them. On success, ELF_PLAT_INIT() in sys_execve() clears all these registers. On other executable formats: - binfmt_flat.c has similar FLAT_PLAT_INIT, but x86 (and everyone else except sh) doesn't define it. - binfmt_elf_fdpic.c has ELF_FDPIC_PLAT_INIT, but x86 (and most others) doesn't define it. - There are no such hooks in binfmt_aout.c et al. We inherit EXTRA_REGS from the prior executable. This inconsistency was not intended. This change removes SAVE/RESTORE_EXTRA_REGS in stub_execve, removes register clearing in ELF_PLAT_INIT(), and instead simply clears them on success in stub_execve. Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428173719-7637-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-06 09:24:08 +02:00
Brian Gerst	6a3713f001	x86/signal: Remove pax argument from restore_sigcontext The 'pax' argument is unnecesary. Instead, store the RAX value directly in regs. This pattern goes all the way back to 2.1.106pre1, when restore_sigcontext() was changed to return an error code instead of EAX directly: https://git.kernel.org/cgit/linux/kernel/git/history/history.git/diff/arch/i386/kernel/signal.c?id=9a8f8b7ca3f319bd668298d447bdf32730e51174 In 2007 sigaltstack syscall support was added, where the return value of restore_sigcontext() was changed to carry the memory-copying failure code. But instead of putting 'ax' into regs->ax directly, it was carried in via a pointer and then returned, where the generic syscall return code copied it to regs->ax. So there was never any deeper reason for this suboptimal pattern, it was simply never noticed after being introduced. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1428152303-17154-1-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-06 09:06:39 +02:00
Borislav Petkov	dbe4058a6a	x86/alternatives: Fix ALTERNATIVE_2 padding generation properly Quentin caught a corner case with the generation of instruction padding in the ALTERNATIVE_2 macro: if len(orig_insn) < len(alt1) < len(alt2), then not enough padding gets added and that is not good(tm) as we could overwrite the beginning of the next instruction. Luckily, at the time of this writing, we don't have ALTERNATIVE_2() invocations which have that problem and even if we did, a simple fix would be to prepend the instructions with enough prefixes so that that corner case doesn't happen. However, best it would be if we fixed it properly. See below for a simple, abstracted example of what we're doing. So what we ended up doing is, we compute the max(len(alt1), len(alt2)) - len(orig_insn) and feed that value to the .skip gas directive. The max() cannot have conditionals due to gas limitations, thus the fancy integer math. With this patch, all ALTERNATIVE_2 sites get padded correctly; generating obscure test cases pass too: #define alt_max_short(a, b) ((a) ^ (((a) ^ (b)) & -(-((a) < (b))))) #define gen_skip(orig, alt1, alt2, marker) \ .skip -((alt_max_short(alt1, alt2) - (orig)) > 0) * \ (alt_max_short(alt1, alt2) - (orig)),marker .pushsection .text, "ax" .globl main main: gen_skip(1, 2, 4, 0x09) gen_skip(4, 1, 2, 0x10) ... .popsection Thanks to Quentin for catching it and double-checking the fix! Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150404133443.GE21152@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-04 15:58:23 +02:00
Borislav Petkov	6b51311c97	x86/asm/entry/64: Use a define for an invalid segment selector ... instead of a naked number, for better readability. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428054130-25847-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-03 15:29:13 +02:00
Borislav Petkov	78cac48c04	x86/mm/KASLR: Propagate KASLR status to kernel proper Commit: `e2b32e6785` ("x86, kaslr: randomize module base load address") made module base address randomization unconditional and didn't regard disabled KKASLR due to CONFIG_HIBERNATION and command line option "nokaslr". For more info see (now reverted) commit: `f47233c2d3` ("x86/mm/ASLR: Propagate base load address calculation") In order to propagate KASLR status to kernel proper, we need a single bit in boot_params.hdr.loadflags and we've chosen bit 1 thus leaving the top-down allocated bits for bits supposed to be used by the bootloader. Originally-From: Jiri Kosina <jkosina@suse.cz> Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-03 15:26:15 +02:00
Borislav Petkov	47091e3c5b	x86/asm/entry: Drop now unused ENABLE_INTERRUPTS_SYSEXIT32 Commit: `4214a16b02` ("x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER") removed the last user of ENABLE_INTERRUPTS_SYSEXIT32. Kill the macro now too. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/1428049714-829-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-03 10:34:19 +02:00
Andy Lutomirski	cf9328cc99	x86/asm/entry/32: Stop caching MSR_IA32_SYSENTER_ESP in tss.sp1 We write a stack pointer to MSR_IA32_SYSENTER_ESP exactly once, and we unnecessarily cache the value in tss.sp1. We never read the cached value. Remove all of the caching. It serves no purpose. Suggested-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/05a0163eb33ef5208363f0015496855da7cebadd.1428002830.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-03 08:30:44 +02:00
Ross Zwisler	d9dc64f30a	x86/asm: Add support for the CLWB instruction Add support for the new CLWB (cache line write back) instruction. This instruction was announced in the document "Intel Architecture Instruction Set Extensions Programming Reference" with reference number 319433-022. https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf The CLWB instruction is used to write back the contents of dirtied cache lines to memory without evicting the cache lines from the processor's cache hierarchy. This should be used in favor of clflushopt or clflush in cases where you require the cache line to be written to memory but plan to access the data again in the near future. One of the main use cases for this is with persistent memory where CLWB can be used with PCOMMIT to ensure that data has been accepted to memory and is durable on the DIMM. This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH and PCOMMIT with appropriate fencing: void flush_and_commit_buffer(void vaddr, unsigned int size) { void vend = vaddr + size - 1; for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size) clwb(vaddr); /* Flush any possible final partial cacheline / clwb(vend); / * Use SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes. * (MFENCE via mb() also works) / wmb(); / PCOMMIT and the required SFENCE for ordering */ pcommit_sfence(); } After this function completes the data pointed to by vaddr is has been accepted to memory and will be durable if the vaddr points to persistent memory. Regarding the details of how the alternatives assembly is set up, we need one additional byte at the beginning of the CLFLUSH so that we can flip it into a CLFLUSHOPT by changing that byte into a 0x66 prefix. Two options are to either insert a 1 byte ASM_NOP1, or to add a 1 byte NOP_DS_PREFIX. Both have no functional effect with the plain CLFLUSH, but I've been told that executing a CLFLUSH + prefix should be faster than executing a CLFLUSH + NOP. We had to hard code the assembly for CLWB because, lacking the ability to assemble the CLWB instruction itself, the next closest thing is to have an xsaveopt instruction with a 0x66 prefix. Unfortunately XSAVEOPT itself is also relatively new, and isn't included by all the GCC versions that the kernel needs to support. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1422377631-8986-3-git-send-email-ross.zwisler@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-03 06:56:38 +02:00
Alexander Shishkin	52ca9ced3f	perf/x86/intel/pt: Add Intel PT PMU driver Add support for Intel Processor Trace (PT) to kernel's perf events. PT is an extension of Intel Architecture that collects information about software execuction such as control flow, execution modes and timings and formats it into highly compressed binary packets. Even being compressed, these packets are generated at hundreds of megabytes per second per core, which makes it impractical to decode them on the fly in the kernel. This driver exports trace data by through AUX space in the perf ring buffer, which is zero-copy mapped into userspace for faster data retrieval. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Link: http://lkml.kernel.org/r/1422614392-114498-1-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:14:20 +02:00
Alexander Shishkin	ed69628b3b	x86: Add Intel Processor Trace (INTEL_PT) cpu feature detection Intel Processor Trace is an architecture extension that allows for program flow tracing. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Link: http://lkml.kernel.org/r/1421237903-181015-11-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:14:18 +02:00
Boris Ostrovsky	3f85483bd8	x86/cpu: Factor out common CPU initialization code, fix 32-bit Xen PV guests Some of x86 bare-metal and Xen CPU initialization code is common between the two and therefore can be factored out to avoid code duplication. As a side effect, doing so will also extend the fix provided by commit `a7fcf28d43` ("x86/asm/entry: Replace this_cpu_sp0() with current_top_of_stack() to x86_32") to 32-bit Xen PV guests. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: konrad.wilk@oracle.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1427897534-5086-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 12:06:41 +02:00
Christoph Hellwig	ec776ef6bb	x86/mm: Add support for the non-standard protected e820 type Various recent BIOSes support NVDIMMs or ADR using a non-standard e820 memory type, and Intel supplied reference Linux code using this type to various vendors. Wire this e820 table type up to export platform devices for the pmem driver so that we can use it in Linux. Based on earlier work from: Dave Jiang <dave.jiang@intel.com> Dan Williams <dan.j.williams@intel.com> Includes fixes for NUMA regions from Boaz Harrosh. Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jens Axboe <axboe@fb.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <keith.busch@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-nvdimm@ml01.01.org Link: http://lkml.kernel.org/r/1427872339-6688-2-git-send-email-hch@lst.de [ Minor cleanups. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-01 17:02:43 +02:00
Ingo Molnar	84a87c628a	* Fixes and cleanups for SMBIOS 3.0 DMI code - Ivan Khoronzhuk * A new efi=debug command line option that enables debug output in the EFI boot stub and results in less verbose EFI memory map output by default - Borislav Petkov * Disable interrupts around EFI calls and use a more standard page table saving and restoring idiom when making EFI calls - Ingo Molnar * Reduce the number of memory allocations performed when allocating the FDT in EFI boot stub by retrieving size from the FDT header in the EFI config table - Ard Biesheuvel -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVG9qYAAoJEC84WcCNIz1V8+sP/iKFXQIIXRdlLVSrHHqUPn4K f32qYdfFfJvG5RMF3Y9B4+lUYi5Svr9SHgg9ZxkVW+GcuI5GUdjU9LjaVtDL9kZ0 YepHp7hdrV+mqX/zDC+NaKqOjbF4jR+5JK8cYnzMDt22jCLBV96aREbH75rN43v1 55VJUplDd6JM4h4XuF/LxyKXJf+LOIFLS4p8c0XPVd3ict7ACAi+JgxPl25fRbe4 bGx9D+LvTvQ0am5C1s8dDcpEd53jbIdKiMM+vhVGmjcvtfA2L01i1aA9pw1zVhyn FKZXSKOwWjxDzWa/oTLAUzawcLPS3i/0FsDH5TVBLM57OI7bSP1kqzdgFOfR/X5L KQmuY1TeiYZCeS/JtNHqV1/vap8jucGJEYXcQe/neaD9VvJYGYFEBXFvi9c/68Lk yLJu4NAYmAp5GnkM+AxXO0aKOVvfNJ6YeGvH+Js7jBPlSdCwa93DzUJgGQxIQD3n mGfjNgu8dyK3fHIrXFEH7mzokfNHE3cE/FI+1hGS7TGLGxvfXsatZEX813Wjc9+Q 9cL2jAnWf1kZLkbDSSJ/6XJ2a121MgmaXqLrzmLznpIkUgEuhWmbL7/gyZy4Q4/+ ZKA/PNykRWolnz/DZNbF5XnUnRdGTB/kJ4pVVuZc9U/3QjmIb0BiqPa7ZiOSVLmS P24V+d3LjK2BU8JX3CnE =eapi -----END PGP SIGNATURE----- Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into core/efi Pull EFI updates from Matt Fleming: - Fixes and cleanups for SMBIOS 3.0 DMI code. (Ivan Khoronzhuk) - A new efi=debug command line option that enables debug output in the EFI boot stub and results in less verbose EFI memory map output by default. (Borislav Petkov) - Disable interrupts around EFI calls and use a more standard page table saving and restoring idiom when making EFI calls. (Ingo Molnar) - Reduce the number of memory allocations performed when allocating the FDT in EFI boot stub by retrieving size from the FDT header in the EFI config table. (Ard Biesheuvel) Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-01 15:10:25 +02:00
Ingo Molnar	744937b0b1	efi: Clean up the efi_call_phys_[prolog\|epilog]() save/restore interaction Currently x86-64 efi_call_phys_prolog() saves into a global variable (save_pgd), and efi_call_phys_epilog() restores the kernel pagetables from that global variable. Change this to a cleaner save/restore pattern where the saving function returns the saved object and the restore function restores that. Apply the same concept to the 32-bit code as well. Plus this approach, as an added bonus, allows us to express the !efi_enabled(EFI_OLD_MEMMAP) situation in a clean fashion as well, via a 'NULL' return value. Cc: Tapasweni Pathak <tapaswenipathak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2015-04-01 12:46:22 +01:00
Bandan Das	4399c03c67	x86/apic: Remove verify_local_APIC() __verify_local_APIC() is detritus from the early APIC days. Its return value isn't used anywhere and the information it prints when debug is enabled is already part of APIC initialization messages printed to syslog. Off with it! Signed-off-by: Bandan Das <bsd@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/jpgy4mcsxsq.fsf@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-01 10:47:57 +02:00
Joe Perches	1d804d079a	x86: Use bool function return values of true/false not 1/0 Use the normal return values for bool functions Signed-off-by: Joe Perches <joe@perches.com> Message-Id: <9f593eb2f43b456851cd73f7ed09654ca58fb570.1427759009.git.joe@perches.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-03-31 18:05:09 +02:00
Ingo Molnar	55474c48b4	x86/asm/entry: Remove user_mode_ignore_vm86() user_mode_ignore_vm86() can be used instead of user_mode(), in places where we have already done a v8086_mode() security check of ptregs. But doing this check in the wrong place would be a bug that could result in security problems, and also the naming still isn't very clear. Furthermore, it only affects 32-bit kernels, while most development happens on 64-bit kernels. If we replace them with user_mode() checks then the cost is only a very minor increase in various slowpaths: text data bss dec hex filename 10573391 703562 1753042 13029995 c6d26b vmlinux.o.before 10573423 703562 1753042 13030027 c6d28b vmlinux.o.after So lets get rid of this distinction once and for all. Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150329090233.GA1963@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-31 11:45:19 +02:00
Hector Marco-Gisbert	4e26d11f52	x86/mm: Improve AMD Bulldozer ASLR workaround The ASLR implementation needs to special-case AMD F15h processors by clearing out bits [14:12] of the virtual address in order to avoid I$ cross invalidations and thus performance penalty for certain workloads. For details, see: `dfb09f9b7a` ("x86, amd: Avoid cache aliasing penalties on AMD family 15h") This special case reduces the mmapped file's entropy by 3 bits. The following output is the run on an AMD Opteron 62xx class CPU processor under x86_64 Linux 4.0.0: $ for i in `seq 1 10`; do cat /proc/self/maps \| grep "r-xp.*libc" ; done b7588000-b7736000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6 b7570000-b771e000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6 b75d0000-b777e000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6 b75b0000-b775e000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6 b7578000-b7726000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6 ... Bits [12:14] are always 0, i.e. the address always ends in 0x8000 or 0x0000. 32-bit systems, as in the example above, are especially sensitive to this issue because 32-bit randomness for VA space is 8 bits (see mmap_rnd()). With the Bulldozer special case, this diminishes to only 32 different slots of mmap virtual addresses. This patch randomizes per boot the three affected bits rather than setting them to zero. Since all the shared pages have the same value at bits [12..14], there is no cache aliasing problems. This value gets generated during system boot and it is thus not known to a potential remote attacker. Therefore, the impact from the Bulldozer workaround gets diminished and ASLR randomness increased. More details at: http://hmarco.org/bugs/AMD-Bulldozer-linux-ASLR-weakness-reducing-mmaped-files-by-eight.html Original white paper by AMD dealing with the issue: http://developer.amd.com/wordpress/media/2012/10/SharedL1InstructionCacheonAMD15hCPU.pdf Mentored-by: Ismael Ripoll <iripoll@disca.upv.es> Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan-Simon <dl9pf@gmx.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-fsdevel@vger.kernel.org Link: http://lkml.kernel.org/r/1427456301-3764-1-git-send-email-hecmargi@upv.es Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-31 10:01:17 +02:00
Nadav Amit	b32a991800	KVM: x86: Remove redundant definitions Some constants are redfined in emulate.c. Avoid it. s/SELECTOR_RPL_MASK/SEGMENT_RPL_MASK s/SELECTOR_TI_MASK/SEGMENT_TI_MASK No functional change. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427635984-8113-3-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-03-30 16:46:42 +02:00
Nadav Amit	0efb04406d	KVM: x86: removing redundant eflags bits definitions The eflags are redefined (using other defines) in emulate.c. Use the definition from processor-flags.h as some mess already started. No functional change. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427635984-8113-2-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-03-30 16:46:37 +02:00
Ingo Molnar	4bfe186dbe	Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Documentation updates. - Changes permitting use of call_rcu() and friends very early in boot, for example, before rcu_init() is invoked. - Miscellaneous fixes. - Add in-kernel API to enable and disable expediting of normal RCU grace periods. - Improve RCU's handling of (hotplug-) outgoing CPUs. Note: ARM support is lagging a bit here, and these improved diagnostics might generate (harmless) splats. - NO_HZ_FULL_SYSIDLE fixes. - Tiny RCU updates to make it more tiny. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-27 10:04:06 +01:00
Denys Vlasenko	aa6d9a128b	x86/irq/tracing: Do not save callee-preserved registers around lockdep_sys_exit_thunk Internally, lockdep_sys_exit_thunk saves callee-clobbered registers, and calls a C function, lockdep_sys_exit. Thus, callee-preserved registers won't be mangled, there is no need to save them. Patch was run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427314468-12763-4-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-27 10:01:49 +01:00
Denys Vlasenko	7dc7cc0780	x86/irq/tracing: Fold ARCH_LOCKDEP_SYS_EXIT defines into their users There is no need to have an extra level of macro indirection here. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427314468-12763-3-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-27 10:01:49 +01:00
Denys Vlasenko	40e2ec657d	x86/irq/tracing: Move ARCH_LOCKDEP_SYS_EXIT defines closer to their users This change simply moves defines around (even if it's not obvious in a patch form). Nothing is changed. This is a preparation for folding ARCH_LOCKDEP_SYS_EXIT defines into their users. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427314468-12763-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-27 10:01:48 +01:00
Ingo Molnar	936c663aed	Merge branch 'perf/x86' into perf/core, because it's ready Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-27 09:46:19 +01:00
Jan Kiszka	b3a2a9076d	KVM: nVMX: Add support for rdtscp If the guest CPU is supposed to support rdtscp and the host has rdtscp enabled in the secondary execution controls, we can also expose this feature to L1. Just extend nested_vmx_exit_handled to properly route EXIT_REASON_RDTSCP. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-26 22:33:48 -03:00
Greg Kroah-Hartman	ff85f707ac	Merge 4.0-rc5 into char-misc-next We want those fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-03-25 10:51:53 +01:00
Ingo Molnar	72d64cc769	x86/asm: Further improve segment.h readability - extend/clarify explanations where necessary - move comments from macro values to before the macro, to make them more consistent, and to reduce preprocessor overhead - sort GDT index and selector values likewise by number - use consistent, modern kernel coding style across the file - capitalize consistently - use consistent vertical spacing - remove the unused get_limit() method (noticed by Andy Lutomirski) No change in code (verified with objdump -d): 64-bit defconfig+kvmconfig: 815a129bc1f80de6445c1d8ca5b97cad vmlinux.o.before.asm 815a129bc1f80de6445c1d8ca5b97cad vmlinux.o.after.asm 32-bit defconfig+kvmconfig: e659ef045159ddf41a0771b33a34aae5 vmlinux.o.before.asm e659ef045159ddf41a0771b33a34aae5 vmlinux.o.after.asm Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-24 21:13:38 +01:00
Ingo Molnar	dca5b52ad7	x86/asm/entry/64: Rename THREAD_INFO() to ASM_THREAD_INFO() The THREAD_INFO() macro has a somewhat confusingly generic name, defined in a generic .h C header file. It also does not make it clear that it constructs a memory operand for use in assembly code. Rename it to ASM_THREAD_INFO() to make it all glaringly obvious on first glance. Acked-by: Borislav Petkov <bp@suse.de> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/20150324184442.GC14760@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-24 20:57:31 +01:00
Ingo Molnar	f9d71854b4	x86/asm/entry/64: Merge the field offset into the THREAD_INFO() macro Before: TI_sysenter_return+THREAD_INFO(%rsp,38),%r10d After: movl THREAD_INFO(TI_sysenter_return, %rsp, 38), %r10d to turn it into a clear thread_info accessor. No code changed: md5: fb4cb2b3ce05d89940ca304efc8ff183 ia32entry.o.before.asm fb4cb2b3ce05d89940ca304efc8ff183 ia32entry.o.after.asm e39f2958a5d1300158e276e4f7663263 entry_64.o.before.asm e39f2958a5d1300158e276e4f7663263 entry_64.o.after.asm Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/20150324184411.GB14760@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-24 20:57:31 +01:00
Ingo Molnar	1ddc6f3c60	x86/asm/entry/64: Improve the THREAD_INFO() macro explanation Explain the background, and add a real example. Acked-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Borislav Petkov <bp@suse.de> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/20150324184311.GA14760@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-24 20:57:30 +01:00
Denys Vlasenko	84f5378845	x86/asm: Deobfuscate segment.h This file just defines a number of constants, and a few macros and inline functions. It is particularly badly written. For example, it is not trivial to see how descriptors are numbered (you'd expect that should be easy, right?). This change deobfuscates it via the following changes: Group all GDT_ENTRY_foo together (move intervening stuff away). Number them explicitly: use a number, not PREV_DEFINE+1, +2, +3: I want to immediately see that GDT_ENTRY_PNPBIOS_CS32 is 18. Seeing (GDT_ENTRY_KERNEL_BASE+6) instead is not useful. The above change allows to remove GDT_ENTRY_KERNEL_BASE and GDT_ENTRY_PNPBIOS_BASE, which weren't used anywhere else. After a group of GDT_ENTRY_foo, define all selector values. Remove or improve some comments. In particular: Comment deleted as stating the obvious: /* * The GDT has 32 entries / #define GDT_ENTRIES 32 "The segment offset needs to contain a RPL. Grr. -AK" changed to "Selectors need to also have a correct RPL (+3 thingy)" "GDT layout to get 64bit syscall right (sysret hardcodes gdt offsets)" expanded into a description how exactly* sysret hardcodes them. Patch was tested to compile and not change vmlinux.o on 32-bit and 64-bit builds (verified with objdump). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-24 20:47:07 +01:00
Denys Vlasenko	ef593260f0	x86/asm/entry: Get rid of KERNEL_STACK_OFFSET PER_CPU_VAR(kernel_stack) was set up in a way where it points five stack slots below the top of stack. Presumably, it was done to avoid one "sub $5*8,%rsp" in syscall/sysenter code paths, where iret frame needs to be created by hand. Ironically, none of them benefits from this optimization, since all of them need to allocate additional data on stack (struct pt_regs), so they still have to perform subtraction. This patch eliminates KERNEL_STACK_OFFSET. PER_CPU_VAR(kernel_stack) now points directly to top of stack. pt_regs allocations are adjusted to allocate iret frame as well. Hopefully we can merge it later with 32-bit specific PER_CPU_VAR(cpu_current_top_of_stack) variable... Net result in generated code is that constants in several insns are changed. This change is necessary for changing struct pt_regs creation in SYSCALL64 code path from MOV to PUSH instructions. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1426785469-15125-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-24 19:42:38 +01:00
Denys Vlasenko	b3fe8ba320	x86/asm/entry/64: Change the THREAD_INFO() definition to not depend on KERNEL_STACK_OFFSET This changes the THREAD_INFO() definition and all its callsites so that they do not count stack position from (top of stack - KERNEL_STACK_OFFSET), but from top of stack. Semi-mysterious expressions THREAD_INFO(%rsp,RIP) - "why RIP??" are now replaced by more logical THREAD_INFO(%rsp,SIZEOF_PTREGS) - "calculate thread_info's address using information that rsp is SIZEOF_PTREGS bytes below top of stack". While at it, replace "(off)-THREAD_SIZE(reg)" with equivalent "((off)-THREAD_SIZE)(reg)". The form without parentheses falsely looks like we invoke THREAD_SIZE() macro. Improve comment atop THREAD_INFO macro definition. This patch does not change generated code (verified by objdump). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1426785469-15125-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-24 19:42:37 +01:00
Aravind Gopalakrishnan	43eaa2a1ad	x86/mce: Define mce_severity function pointer Rename mce_severity() to mce_severity_intel() and assign the mce_severity function pointer to mce_severity_amd() during init on AMD. This way, we can avoid a test to call mce_severity_amd every time we get into mce_severity(). And it's cleaner to do it this way. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Suggested-by: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1427125373-2918-3-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-24 12:14:15 +01:00
Aravind Gopalakrishnan	bf80bbd7dc	x86/mce: Add an AMD severities-grading function Add a severities function that caters to AMD processors. This allows us to do some vendor-specific work within the function if necessary. Also, introduce a vendor flag bitfield for vendor-specific settings. The severities code uses this to define error scope based on the prescence of the flags field. This is based off of work by Boris Petkov. Testing details: Fam10h, Model 9h (Greyhound) Fam15h: Models 0h-0fh (Orochi), 30h-3fh (Kaveri) and 60h-6fh (Carrizo), Fam16h Model 00h-0fh (Kabini) Boris: Intel SNB AMD K8 (JH-E0) Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/1427125373-2918-2-git-send-email-Aravind.Gopalakrishnan@amd.com [ Fixup build, clean up comments. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-24 12:13:34 +01:00
Rusty Russell	2f921b5bb0	lguest: suppress interrupts for single insn, not range. The last patch reduced our interrupt-suppression region to one address, so simplify the code somewhat. Also, remove the obsolete undefined instruction ranges and the comment which refers to lguest_guest.S instead of head_32.S. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-03-24 11:52:08 +10:30
Marcelo Tosatti	0a4e6be9ca	x86: kvm: Revert "remove sched notifier for cross-cpu migrations" The following point: 2. per-CPU pvclock time info is updated if the underlying CPU changes. Is not true anymore since "KVM: x86: update pvclock area conditionally, on cpu migration". Add task migration notification back. Problem noticed by Andy Lutomirski. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> CC: stable@kernel.org # 3.11+	2015-03-23 20:22:48 -03:00
Greg Kroah-Hartman	caa445d808	Merge 4.0-rc5 into tty-next We want the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-03-23 21:45:24 +01:00
Andy Lutomirski	7a2806741e	x86/asm/entry: Remove user_mode_vm() It has no callers anymore. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a594afd6a0bddb1311bd7c92a15201c87fbb8681.1426728647.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-23 11:14:33 +01:00
Andy Lutomirski	efa7045103	x86/asm/entry: Make user_mode() work correctly if regs came from VM86 mode user_mode() is now identical to user_mode_vm(). Subsequent patches will change all callers of user_mode_vm() to user_mode() and then delete user_mode_vm(). Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/0dd03eacb5f0a2b5ba0240de25347a31b493c289.1426728647.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-23 11:13:51 +01:00
Andy Lutomirski	a67e7277d0	x86/asm/entry: Add user_mode_ignore_vm86() user_mode() is dangerous and user_mode_vm() has a confusing name. Add user_mode_ignore_vm86() (equivalent to current user_mode()). We'll change the small number of legitimate users of user_mode() to user_mode_ignore_vm86(). Inspired by grsec, although this works rather differently. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/202c56ca63823c338af8e2e54948dbe222da6343.1426728647.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-23 11:13:36 +01:00
Ingo Molnar	e4518ab90f	Linux 4.0-rc5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVD1VGAAoJEHm+PkMAQRiG7yoH/juKOQ1zbxi5M+mleDEEJtA0 RxQSojqEMWIKrWi8PNZxjENn1OZB6XOLIXOhlyAZBrmgsjO34p1DyXlZMznr/R8W kQ2Xxs061hRtB3OuruMIqOApUrjuqsaCwgbgUS1qWmqZcoyZN4oELyZMP6OOlqv5 UUBZm8MfyXGyxrCcg39mjct3VEOhiuEcvL6SUxOC380CdSVAnyqHFPcz0JVqMUn9 9RUBs0T9cMdhb0mZ2bfXzt6AKArj63G2nXOum+VzFcvspSm2U+MPIDCuoE+ZbTPS jqIAgG0rj1ezRyb5oeJrvlU0Yy3u/cXoMPs9+kORvpladooYNLti8ovh6qllm0I= =d/ye -----END PGP SIGNATURE----- Merge tag 'v4.0-rc5' into x86/asm, to resolve conflicts Conflicts: arch/x86/kernel/entry_64.S Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-23 11:13:15 +01:00
Ingo Molnar	e1b63dec2d	Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying new patches Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-23 10:50:29 +01:00
Borislav Petkov	b85e67d148	x86/fpu: Rename drop_init_fpu() to fpu_reset_state() Call it what it does and in accordance with the context where it is used: we reset the FPU state either because we were unable to restore it from the one saved in the task or because we simply want to reset it. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-23 10:13:59 +01:00
Borislav Petkov	d2d0ac9a46	x86/fpu: Fold __drop_fpu() into its sole user Fold it into drop_fpu(). Phew, one less FPU function to pay attention to. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pekka Riikonen <priikone@iki.fi> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-23 10:13:59 +01:00
Oleg Nesterov	8f4d81863b	x86/fpu: Introduce restore_init_xstate() Extract the "use_eager_fpu()" code from drop_init_fpu() into a new, simple helper restore_init_xstate(). The next patch adds another user. - It is not clear why we do not check use_fxsr() like fpu_restore_checking() does. eager_fpu_init_bp() calls setup_init_fpu_buf() too, and we have the "eagerfpu=on" kernel option. - Ignoring the fact that init_xstate_buf is "struct xsave_struct ", not "union thread_xstate ", it is not clear why we can not simply use fpu_restore_checking() and avoid the code duplication. - It is not clear why we can't call setup_init_fpu_buf() unconditionally to always create init_xstate_buf(). Then do_device_not_available() path (at least) could use restore_init_xstate() too. It doesn't need to init fpu->state, its content doesn't matter until unlazy_fpu()/__switch_to()/etc which overwrites this memory anyway. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pekka Riikonen <priikone@iki.fi> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Link: http://lkml.kernel.org/r/20150311173429.GD5032@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-23 10:13:58 +01:00
Oleg Nesterov	fb14b4eadf	x86/fpu: Document user_fpu_begin() Currently, user_fpu_begin() has a single caller and it is not clear why do we actually need it and why we should not worry about preemption right after preempt_enable(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pekka Riikonen <priikone@iki.fi> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Link: http://lkml.kernel.org/r/20150311173409.GC5032@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-23 10:13:58 +01:00
Ingo Molnar	eda2360ad1	Linux 4.0-rc5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVD1VGAAoJEHm+PkMAQRiG7yoH/juKOQ1zbxi5M+mleDEEJtA0 RxQSojqEMWIKrWi8PNZxjENn1OZB6XOLIXOhlyAZBrmgsjO34p1DyXlZMznr/R8W kQ2Xxs061hRtB3OuruMIqOApUrjuqsaCwgbgUS1qWmqZcoyZN4oELyZMP6OOlqv5 UUBZm8MfyXGyxrCcg39mjct3VEOhiuEcvL6SUxOC380CdSVAnyqHFPcz0JVqMUn9 9RUBs0T9cMdhb0mZ2bfXzt6AKArj63G2nXOum+VzFcvspSm2U+MPIDCuoE+ZbTPS jqIAgG0rj1ezRyb5oeJrvlU0Yy3u/cXoMPs9+kORvpladooYNLti8ovh6qllm0I= =d/ye -----END PGP SIGNATURE----- Merge tag 'v4.0-rc5' into x86/fpu, to prevent conflicts Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-23 10:13:36 +01:00
Brian Gerst	1daeaa3151	x86/asm/entry: Fix execve() and sigreturn() syscalls to always return via IRET Both the execve() and sigreturn() family of syscalls have the ability to change registers in ways that may not be compatabile with the syscall path they were called from. In particular, SYSRET and SYSEXIT can't handle non-default %cs and %ss, and some bits in eflags. These syscalls have stubs that are hardcoded to jump to the IRET path, and not return to the original syscall path. The following commit: `76f5df43ca` ("Always allocate a complete "struct pt_regs" on the kernel stack") recently changed this for some 32-bit compat syscalls, but introduced a bug where execve from a 32-bit program to a 64-bit program would fail because it still returned via SYSRETL. This caused Wine to fail when built for both 32-bit and 64-bit. This patch sets TIF_NOTIFY_RESUME for execve() and sigreturn() so that the IRET path is always taken on exit to userspace. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1426978461-32089-1-git-send-email-brgerst@gmail.com [ Improved the changelog and comments. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-23 08:52:46 +01:00
Linus Torvalds	3d7a6db537	Power management and ACPI fixes for v4.0-rc5 - Revert a recent PCI commit related to IRQ resources management that introduced a regression for drivers attempting to bind to devices whose previous drivers did not balance pci_enable_device() and pci_disable_device() as expected (Rafael J Wysocki). - Fix a deadlock in at91_rtc_interrupt() introduced by a typo in a recent commit related to wakeup interrupt handling (Dan Carpenter). - Allow the power capping RAPL (Running-Average Power Limit) driver to use different energy units for domains within one CPU package which is necessary to handle Intel Haswell EP processors correctly (Jacob Pan). - Improve the cpuidle mvebu driver's handling of Armada XP SoCs by updating the target residency and exit latency numbers for those chips (Sebastien Rannou). - Prevent the cpuidle mvebu driver from calling cpu_pm_enter() twice in a row before cpu_pm_exit() is called on the same CPU which breaks the core's assumptions regarding the usage of those functions (Gregory Clement). / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJVDLwIAAoJEILEb/54YlRxSrMQAKn/DwfNMqVWP5vf/YWauSfL S51+E/emGvy+3fwBsa46KkddRQ0ysxc1wKIHcWXc1UPtA+lKNS3MoCUdD+isUFt7 PUrMsblgjh/e6LiXOBqElAiuugVoH7JCVMKvlv5Tsn3qxY3AJEoxGwV7p4XyP6lJ PumqAvWFtaIFKThJFdKPGC511tYTQWoZ/3u843aEsHtpvmiytgUrvxpuCXlSSKT1 vbOdHAJXi0QyQYWIZ0VNN+MZ2WvaU9t1QCpBJUnzZMi2kuG3HP9rzY40GOnoMn6/ jXaxegeT7UX5JY5NWU9VrrVwKzppIpyKW6yckIRcKD+ovwKdGbMrfMco2iyK1xgV Q6B5h5guYTTynjBoi9XO3d7AWN3gM+8OYCPJgcRG2BMQEunlS0D+i3cRDqeHzW0M W+OaENK9MnxG9KVEq0PIrWomGZL1SlOtHfHm9xu8hpqGx4h1iTSgiAEFQQ+Zmmzh +g1OLgddHkWjkPoZ/Y8d1NpdnTf+kbkm8Wqm9Uyie1/HnUJMnHYNbzZTyF4ZjlV2 MAl2P0zBqWhLEDb4STHWHdnBZVhvGCpg1J2pFaSRjDEn+EP0YBH+LscWi//xONNr 5acBoVzid92co+JwrYn3/MYHctV8bBLdXqeGUiuKD6tk9u+aLke24RTBwm5frPBE SjHr1sLhmzubzXtzQIrp =4Oz9 -----END PGP SIGNATURE----- Merge tag 'pm+acpi-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI fixes from Rafael Wysocki: "These are fixes for recent regressions (PCI/ACPI resources and at91 RTC locking), a stable-candidate powercap RAPL driver fix and two ARM cpuidle fixes (one stable-candidate too). Specifics: - Revert a recent PCI commit related to IRQ resources management that introduced a regression for drivers attempting to bind to devices whose previous drivers did not balance pci_enable_device() and pci_disable_device() as expected (Rafael J Wysocki). - Fix a deadlock in at91_rtc_interrupt() introduced by a typo in a recent commit related to wakeup interrupt handling (Dan Carpenter). - Allow the power capping RAPL (Running-Average Power Limit) driver to use different energy units for domains within one CPU package which is necessary to handle Intel Haswell EP processors correctly (Jacob Pan). - Improve the cpuidle mvebu driver's handling of Armada XP SoCs by updating the target residency and exit latency numbers for those chips (Sebastien Rannou). - Prevent the cpuidle mvebu driver from calling cpu_pm_enter() twice in a row before cpu_pm_exit() is called on the same CPU which breaks the core's assumptions regarding the usage of those functions (Gregory Clement)" * tag 'pm+acpi-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "x86/PCI: Refine the way to release PCI IRQ resources" rtc: at91rm9200: double locking bug in at91_rtc_interrupt() powercap / RAPL: handle domains with different energy units cpuidle: mvebu: Update cpuidle thresholds for Armada XP SOCs cpuidle: mvebu: Fix the CPU PM notifier usage	2015-03-21 12:51:36 -07:00
Rafael J. Wysocki	9e8ce4b96b	Revert "x86/PCI: Refine the way to release PCI IRQ resources" Commit `b4b55cda58` (Refine the way to release PCI IRQ resources) introduced a regression in the PCI IRQ resource management by causing the IRQ resource of a device, established when pci_enabled_device() is called on a fully disabled device, to be released when the driver is unbound from the device, regardless of the enable_cnt. This leads to the situation that an ill-behaved driver can now make a device unusable to subsequent drivers by an imbalance in their use of pci_enable/disable_device(). That is a serious problem for secondary drivers like vfio-pci, which are innocent of the transgressions of the previous driver. Since the solution of this problem is not immediate and requires further discussion, revert commit `b4b55cda58` and the issue it was supposed to address (a bug related to xen-pciback) will be taken care of in a different way going forward. Reported-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-03-20 14:56:19 +01:00
Zhonghui Fu	431d452af1	PM / sleep: add pm-trace support for suspending phase Occasionally, the system can't come back up after suspend/resume due to problems of device suspending phase. This patch make PM_TRACE infrastructure cover device suspending phase of suspend/resume process, and the information in RTC can tell developers which device suspending function make system hang. Signed-off-by: Zhonghui Fu <zhonghui.fu@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-03-18 15:54:27 +01:00
Ingo Molnar	ac9af4983e	x86/asm/entry/64: Remove thread_struct::usersp Nothing uses thread_struct::usersp anymore, so remove it. Originally-from: Denys Vlasenko <dvlasenk@redhat.com> Tested-by: Borislav Petkov <bp@alien8.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-17 16:01:41 +01:00
Ingo Molnar	9854dd74c3	x86/asm/entry/64: Simplify 'old_rsp' usage Remove all manipulations of PER_CPU(old_rsp) in C code: - it is not used on SYSRET return anymore, and system entries are atomic, so updating it from the fork and context switch paths is pointless. - Tweak a few related comments as well: we no longer have a "partial stack frame" on entry, ever. Based on (split out of) patch from Denys Vlasenko. Originally-from: Denys Vlasenko <dvlasenk@redhat.com> Tested-by: Borislav Petkov <bp@alien8.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1426599779-8010-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-17 16:01:41 +01:00
Denys Vlasenko	d828c71fba	x86/asm/entry/32: Document the 32-bit SYSENTER "emergency stack" better Before the patch, the 'tss_struct::stack' field was not referenced anywhere. It was used only to set SYSENTER's stack to point after the last byte of tss_struct, thus the trailing field, stack[64], was used. But grep would not know it. You can comment it out, compile, and kernel will even run until an unlucky NMI corrupts io_bitmap[] (which is also not easily detectable). This patch changes code so that the purpose and usage of this field is not mysterious anymore, and can be easily grepped for. This does change generated code, for a subtle reason: since tss_struct is ____cacheline_aligned, there happens to be 5 longs of padding at the end. Old code was using the padding too; new code will strictly use it only for SYSENTER_stack[]. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1425912738-559-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-17 09:25:29 +01:00
Denys Vlasenko	5c39403e00	x86/asm/entry: Simplify task_pt_regs() macro definition Before this change, task_pt_regs() was using KSTK_TOP(), and it was the only use of that macro. In turn, KSTK_TOP used THREAD_SIZE_LONGS, and it was the only use of that macro too. Fold these macros into task_pt_regs(). Tweak comment about "- 8" - we now use a symbolic constant, not literal 8. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1426255743-5394-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-17 09:25:28 +01:00
Andy Lutomirski	76e4c4908a	x86/asm/entry/32: Document our abuse of x86_hw_tss::ss1 and x86_hw_tss::sp1 This has confused me for a while. Now that I figured it out, document it. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/b7efc1b7364039824776f68e9ddee9ec1500e894.1426009661.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-17 09:25:27 +01:00
Andy Lutomirski	d9e05cc5a5	x86/asm/entry: Unify and fix initial thread_struct::sp0 values x86_32 and x86_64 need slightly different thread_struct::sp0 values, and x86_32's was incorrect for init. This never mattered -- the init thread never runs user code, so we never used thread_struct::sp0 for anything. Fix it and mostly unify them. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1b810c1d2e797e27bb4a7708c426101161edd1f6.1426009661.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-17 09:25:27 +01:00
Andy Lutomirski	3ee4298f44	x86/asm/entry: Create and use a 'TOP_OF_KERNEL_STACK_PADDING' macro x86_32, unlike x86_64, pads the top of the kernel stack, because the hardware stack frame formats are variable in size. Document this padding and give it a name. This should make no change whatsoever to the compiled kernel image. It also doesn't fix any of the current bugs in this area. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/02bf2f54b8dcb76a62a142b6dfe07d4ef7fc582e.1426009661.git.luto@amacapital.net [ Fixed small details, such as a missed magic constant in entry_32.S pointed out by Denys Vlasenko. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-17 09:25:26 +01:00
Andy Lutomirski	9a036b93a3	x86/signal/64: Remove 'fs' and 'gs' from sigcontext As far as I can tell, these fields have been set to zero on save and ignored on restore since Linux was imported into git. Rename them '__pad1' and '__pad2' to avoid confusion. This may also allow us to recycle them some day. This also adds a comment clarifying the history of those fields. I'm intentionally avoiding calling either of them '__pad0': the field formerly known as '__pad0' is now 'ss'. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/844f8490e938780c03355be4c9b69eb4c494bf4e.1426193719.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-17 09:25:26 +01:00
Andy Lutomirski	c6f2062935	x86/signal/64: Fix SS handling for signals delivered to 64-bit programs The comment in the signal code says that apps can save/restore other segments on their own. It's true that apps can save SS on their own, but there's no way for apps to restore it: SYSCALL effectively resets SS to __USER_DS, so any value that user code tries to load into SS gets lost on entry to sigreturn. This recycles two padding bytes in the segment selector area for SS. While we're at it, we need a second change to make this useful. If the signal we're delivering is caused by a bad SS value, saving that value isn't enough. We need to remove that bad value from the regs before we try to deliver the signal. Oddly, the i386 code already got this right. I suspect that 64-bit programs that try to run 16-bit code and use signals will have a lot of trouble without this. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/405594361340a2ec32f8e2b115c142df0e180d8e.1426193719.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-17 09:25:25 +01:00
Borislav Petkov	69797dafe3	Revert "x86/mm/ASLR: Propagate base load address calculation" This reverts commit: `f47233c2d3` ("x86/mm/ASLR: Propagate base load address calculation") The main reason for the revert is that the new boot flag does not work at all currently, and in order to make this work, we need non-trivial changes to the x86 boot code which we didn't manage to get done in time for merging. And even if we did, they would've been too risky so instead of rushing things and break booting 4.1 on boxes left and right, we will be very strict and conservative and will take our time with this to fix and test it properly. Reported-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Baoquan He <bhe@redhat.com> Cc: H. Peter Anvin <hpa@linux.intel.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Junjie Mao <eternal.n08@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/20150316100628.GD22995@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-16 11:18:21 +01:00
Len Brown	b253149b84	sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance In Linux-3.9 we removed the mwait_idle() loop: `69fb3676df` ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param") The reasoning was that modern machines should be sufficiently happy during the boot process using the default_idle() HALT loop, until cpuidle loads and either acpi_idle or intel_idle invoke the newer MWAIT-with-hints idle loop. But two machines reported problems: 1. Certain Core2-era machines support MWAIT-C1 and HALT only. MWAIT-C1 is preferred for optimal power and performance. But if they support just C1, cpuidle never loads and so they use the boot-time default idle loop forever. 2. Some laptops will boot-hang if HALT is used, but will boot successfully if MWAIT is used. This appears to be a hidden assumption in BIOS SMI, that is presumably valid on the proprietary OS where the BIOS was validated. https://bugzilla.kernel.org/show_bug.cgi?id=60770 So here we effectively revert the patch above, restoring the mwait_idle() loop. However, we don't bother restoring the idle=mwait cmdline parameter, since it appears to add no value. Maintainer notes: For 3.9, simply revert `69fb3676df` for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in context For 3.11, 3.12, 3.13, this patch applies cleanly Tested-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Mike Galbraith <bitbucket@online.de> Cc: <stable@vger.kernel.org> # 3.9+ Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ian Malone <ibmalone@gmail.com> Cc: Josh Boyer <jwboyer@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com [ Ported to recent kernels. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-16 11:14:21 +01:00
Oleg Nesterov	f4c3686386	x86/fpu: Drop_fpu() should not assume that tsk equals current drop_fpu() does clear_used_math() and usually this is correct because tsk == current. However switch_fpu_finish()->restore_fpu_checking() is called before __switch_to() updates the "current_task" variable. If it fails, we will wrongly clear the PF_USED_MATH flag of the previous task. So use clear_stopped_child_used_math() instead. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pekka Riikonen <priikone@iki.fi> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150309171041.GB11388@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-13 12:44:29 +01:00
Paul E. McKenney	2a442c9c64	x86: Use common outgoing-CPU-notification code This commit removes the open-coded CPU-offline notification with new common code. Among other things, this change avoids calling scheduler code using RCU from an offline CPU that RCU is ignoring. It also allows Xen to notice at online time that the CPU did not go offline correctly. Note that Xen has the surviving CPU carry out some cleanup operations, so if the surviving CPU times out, these cleanup operations might have been carried out while the outgoing CPU was still running. It might therefore be unwise to bring this CPU back online, and this commit avoids doing so. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <x86@kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: <xen-devel@lists.xenproject.org>	2015-03-11 13:22:35 -07:00
Andy Shevchenko	079119a2c7	serial, x86: use UPF_* constants for flags This patch fixes the following sparse warnings: drivers/tty/serial/8250/8250_core.c:3231:32: warning: incorrect type in assignment (different base types) drivers/tty/serial/8250/8250_core.c:3231:32: expected restricted upf_t [usertype] flags drivers/tty/serial/8250/8250_core.c:3231:32: got unsigned int const [unsigned] flags Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-03-11 14:01:17 +01:00
Joel Schopp	5cb56059c9	kvm: x86: make kvm_emulate_* consistant Currently kvm_emulate() skips the instruction but kvm_emulate_* sometimes don't. The end reult is the caller ends up doing the skip themselves. Let's make them consistant. Signed-off-by: Joel Schopp <joel.schopp@amd.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 20:29:15 -03:00
Denys Vlasenko	263042e463	x86/asm/entry/64: Save user RSP in pt_regs->sp on SYSCALL64 fastpath Prepare for the removal of 'usersp', by simplifying PER_CPU(old_rsp) usage: - use it only as temp storage - store the userspace stack pointer immediately in pt_regs->sp on syscall entry, instead of using it later, on syscall exit. - change C code to use pt_regs->sp only, instead of PER_CPU(old_rsp) and task->thread.usersp. FIXUP/RESTORE_TOP_OF_STACK are simplified as well. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1425926364-9526-4-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-10 13:56:10 +01:00
Denys Vlasenko	29722cd4ef	x86/asm/entry/64: Save R11 into pt_regs->flags on SYSCALL64 fastpath Before this patch, R11 was saved in pt_regs->r11. Which looks natural, but requires messy shuffling to/from iret frame whenever ptrace or e.g. sys_iopl() wants to modify flags - because that's how this register is used by SYSCALL/SYSRET. This patch saves R11 in pt_regs->flags, and uses that value for the SYSRET64 instruction. Shuffling is eliminated. FIXUP/RESTORE_TOP_OF_STACK are simplified. stub_iopl is no longer needed: pt_regs->flags needs no fixing up. Testing shows that syscall fast path is ~54.3 ns before and after the patch (on 2.7 GHz Sandy Bridge CPU). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1425926364-9526-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-10 13:56:10 +01:00
Oleg Nesterov	1d23c4518b	x86/fpu: Factor out memset(xstate, 0) in fpu_finit() paths fx_finit() has two users but only fpu_finit() needs to clear xstate, alloc_bootmem_align() in setup_init_fpu_buf() returns zero-filled memory. And note that both memset()'s look confusing. Yes, offsetof() is 0 for ->fxsave or ->fsave, but it would be cleaner to turn them into a single memset() which zeroes fpu->state. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tavis Ormandy <taviso@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1425967585-4725-2-git-send-email-bp@alien8.de Link: http://lkml.kernel.org/r/20150302183257.GC23085@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-10 07:14:31 +01:00
Greg Kroah-Hartman	e94f16a4fd	Merge 4.0-rc3 into char-misc-next We want the mei fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-03-09 08:44:23 +01:00
Greg Kroah-Hartman	becba85f0e	Merge 4.0-rc3 into tty-testing This resolves a merge issue in drivers/tty/serial/8250/8250_pci.c Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-03-09 07:08:37 +01:00
Andy Lutomirski	a7fcf28d43	x86/asm/entry: Replace this_cpu_sp0() with current_top_of_stack() and fix it on x86_32 I broke 32-bit kernels. The implementation of sp0 was correct as far as I can tell, but sp0 was much weirder on x86_32 than I realized. It has the following issues: - Init's sp0 is inconsistent with everything else's: non-init tasks are offset by 8 bytes. (I have no idea why, and the comment is unhelpful.) - vm86 does crazy things to sp0. Fix it up by replacing this_cpu_sp0() with current_top_of_stack() and using a new percpu variable to track the top of the stack on x86_32. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `75182b1632` ("x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()") Link: http://lkml.kernel.org/r/d09dbe270883433776e0cbee3c7079433349e96d.1425692936.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-07 09:34:03 +01:00
Andy Shevchenko	1bd187de53	x86, intel-mid: remove Intel MID specific serial support Since we have a native 8250 driver carrying the Intel MID serial devices the specific support is not needed anymore. This patch removes it for Intel MID. Note that the console device name is changed from ttyMFDx to ttySx. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-03-07 03:25:18 +01:00
Andy Lutomirski	d0a0de21f8	x86/asm/entry: Remove INIT_TSS and fold the definitions into 'cpu_tss' The INIT_TSS is unnecessary. Just define the initial TSS where 'cpu_tss' is defined. While we're at it, merge the 32-bit and 64-bit definitions. The only syntactic change is that 32-bit kernels were computing sp0 as long, but now they compute it as unsigned long. Verified by objdump: the contents and relocations of .data..percpu..shared_aligned are unchanged on 32-bit and 64-bit kernels. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8fc39fa3f6c5d635e93afbdd1a0fe0678a6d7913.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-06 08:32:58 +01:00
Andy Lutomirski	24933b82c0	x86/asm/entry: Rename 'init_tss' to 'cpu_tss' It has nothing to do with init -- there's only one TSS per cpu. Other names considered include: - current_tss: Confusing because we never switch the tss. - singleton_tss: Too long. This patch was generated with 's/init_tss/cpu_tss/g'. Followup patches will fix INIT_TSS and INIT_TSS_IST by hand. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/da29fb2a793e4f649d93ce2d1ed320ebe8516262.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-06 08:32:58 +01:00
Andy Lutomirski	75182b1632	x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0() This will make modifying the semantics of kernel_stack easier. The change to ist_begin_non_atomic() is necessary because sp0 no longer points to the same THREAD_SIZE-aligned region as RSP; it's one byte too high for that. At Denys' suggestion, rather than offsetting it, just check explicitly that we're in the correct range ending at sp0. This has the added benefit that we no longer assume that the thread stack is aligned to THREAD_SIZE. Suggested-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/ef8254ad414cbb8034c9a56396eeb24f5dd5b0de.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-06 08:32:57 +01:00
Andy Lutomirski	8ef46a672a	x86/asm/entry: Add this_cpu_sp0() to read sp0 for the current cpu We currently store references to the top of the kernel stack in multiple places: kernel_stack (with an offset) and init_tss.x86_tss.sp0 (no offset). The latter is defined by hardware and is a clean canonical way to find the top of the stack. Add an accessor so we can start using it. This needs minor paravirt tweaks. On native, sp0 defines the top of the kernel stack and is therefore always correct. On Xen and lguest, the hypervisor tracks the top of the stack, but we want to start reading sp0 in the kernel. Fixing this is simple: just update our local copy of sp0 as well as the hypervisor's copy on task switches. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8d675581859712bee09a055ed8f785d80dac1eca.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-06 08:32:57 +01:00
Quentin Casasnovas	06c8173eb9	x86/fpu/xsaves: Fix improper uses of __ex_table Commit: `f31a9f7c71` ("x86/xsaves: Use xsaves/xrstors to save and restore xsave area") introduced alternative instructions for XSAVES/XRSTORS and commit: `adb9d526e9` ("x86/xsaves: Add xsaves and xrstors support for booting time") added support for the XSAVES/XRSTORS instructions at boot time. Unfortunately both failed to properly protect them against faulting: The 'xstate_fault' macro will use the closest label named '1' backward and that ends up in the .altinstr_replacement section rather than in .text. This means that the kernel will never find in the __ex_table the .text address where this instruction might fault, leading to serious problems if userspace manages to trigger the fault. Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Jamie Iles <jamie.iles@oracle.com> [ Improved the changelog, fixed some whitespace noise. ] Acked-by: Borislav Petkov <bp@alien8.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> Cc: Allan Xavier <mr.a.xavier@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `adb9d526e9` ("x86/xsaves: Add xsaves and xrstors support for booting time") Fixes: `f31a9f7c71` ("x86/xsaves: Use xsaves/xrstors to save and restore xsave area") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-05 18:20:36 +01:00
Wang Nan	5eca7453d6	x86/traps: Separate set_intr_gate() and clean up early_trap_init() As early_trap_init() doesn't use IST, replace set_intr_gate_ist() and set_system_intr_gate_ist() with their standard counterparts. set_intr_gate() requires a trace_debug symbol which we don't have and won't use. This patch separates set_intr_gate() into two parts, and uses base version in early_trap_init(). Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: <dave.hansen@linux.intel.com> Cc: <lizefan@huawei.com> Cc: <masami.hiramatsu.pt@hitachi.com> Cc: <oleg@redhat.com> Cc: <rostedt@goodmis.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1425010789-13714-1-git-send-email-wangnan0@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-05 00:47:29 +01:00
Denys Vlasenko	d441c1f2b7	x86/asm/entry/64: Simplify optimistic SYSRET Avoid redundant load of %r11 (it is already loaded a few instructions before). Also simplify %rsp restoration, instead of two steps: add $0x80, %rsp mov 0x18(%rsp), %rsp we can do a simplified single step to restore user-space RSP: mov 0x98(%rsp), %rsp and get the same result. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> [ Clarified the changelog. ] Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1aef69b346a6db0d99cdfb0f5ba83e8c985e27d7.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:52 +01:00
Denys Vlasenko	911d2bb5cc	x86/asm/entry/64: Use more readable constants Constants such as SS+8 or SS+8-RIP are mysterious. In most cases, SS+8 is just meant to be SIZEOF_PTREGS, SS+8-RIP is RIP's offset in the iret frame. This patch changes some of these constants to be less mysterious. No code changes (verified with objdump). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1d20491384773bd606e23a382fac23ddb49b5178.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:52 +01:00
Denys Vlasenko	f2db9382c1	x86/asm/entry: Do mass removal of 'ARGOFFSET' ARGOFFSET is zero now, removing it changes no code. A few macros lost "offset" parameter, since it is always zero now too. No code changes - verified with objdump. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/8689f937622d9d2db0ab8be82331fa15e4ed4713.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:50 +01:00
Denys Vlasenko	e90e147cbc	x86/asm/entry/64: Fix comments - Misleading and slightly incorrect comments in "struct pt_regs" are fixed (four instances). - Fix incorrect comment atop EMPTY_FRAME macro. - Explain in more detail what we do with stack layout during hw interrupt. - Correct comments about "partial stack frame" which are no longer true. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1423778052-21038-3-git-send-email-dvlasenk@redhat.com Link: http://lkml.kernel.org/r/e1f4429c491fe6ceeddb879dea2786e0f8920f9c.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:49 +01:00
Denys Vlasenko	76f5df43ca	x86/asm/entry/64: Always allocate a complete "struct pt_regs" on the kernel stack The 64-bit entry code was using six stack slots less by not saving/restoring registers which are callee-preserved according to the C ABI, and was not allocating space for them. Only when syscalls needed a complete "struct pt_regs" was the complete area allocated and filled in. As an additional twist, on interrupt entry a "slightly less truncated pt_regs" trick is used, to make nested interrupt stacks easier to unwind. This proved to be a source of significant obfuscation and subtle bugs. For example, 'stub_fork' had to pop the return address, extend the struct, save registers, and push return address back. Ugly. 'ia32_ptregs_common' pops return address and "returns" via jmp insn, throwing a wrench into CPU return stack cache. This patch changes the code to always allocate a complete "struct pt_regs" on the kernel stack. The saving of registers is still done lazily. "Partial pt_regs" trick on interrupt stack is retained. Macros which manipulate "struct pt_regs" on stack are reworked: - ALLOC_PT_GPREGS_ON_STACK allocates the structure. - SAVE_C_REGS saves to it those registers which are clobbered by C code. - SAVE_EXTRA_REGS saves to it all other registers. - Corresponding RESTORE_* and REMOVE_PT_GPREGS_FROM_STACK macros reverse it. 'ia32_ptregs_common', 'stub_fork' and friends lost their ugly dance with the return pointer. LOAD_ARGS32 in ia32entry.S now uses symbolic stack offsets instead of magic numbers. 'error_entry' and 'save_paranoid' now use SAVE_C_REGS + SAVE_EXTRA_REGS instead of having it open-coded yet again. Patch was run-tested: 64-bit executables, 32-bit executables, strace works. Timing tests did not show measurable difference in 32-bit and 64-bit syscalls. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1423778052-21038-2-git-send-email-dvlasenk@redhat.com Link: http://lkml.kernel.org/r/b89763d354aa23e670b9bdf3a40ae320320a7c2e.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:49 +01:00
Denys Vlasenko	49db46a67b	x86/asm: Introduce push/pop macros which generate CFI_REL_OFFSET and CFI_RESTORE Sequences: pushl_cfi %reg CFI_REL_OFFSET reg, 0 and: popl_cfi %reg CFI_RESTORE reg happen quite often. This patch adds macros which generate them. No assembly changes (verified with objdump -dr vmlinux.o). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1421017655-25561-1-git-send-email-dvlasenk@redhat.com Link: http://lkml.kernel.org/r/2202eb90f175cf45d1b2d1c64dbb5676a8ad07ad.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:49 +01:00
Ingo Molnar	f8e92fb4b0	A more involved rework of the alternatives framework to be able to pad instructions and thus make using the alternatives macros more straightforward and without having to figure out old and new instruction sizes but have the toolchain figure that out for us. Furthermore, it optimizes JMPs used so that fetch and decode can be relieved with smaller versions of the JMPs, where possible. Some stats: x86_64 defconfig: Alternatives sites total: 2478 Total padding added (in Bytes): 6051 The padding is currently done for: X86_FEATURE_ALWAYS X86_FEATURE_ERMS X86_FEATURE_LFENCE_RDTSC X86_FEATURE_MFENCE_RDTSC X86_FEATURE_SMAP This is with the latest version of the patchset. Of course, on each machine the alternatives sites actually being patched are a proper subset of the total number. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU9ekpAAoJEBLB8Bhh3lVKyjYP/AiHEiHkkjnpwTt49kUtUMI6 GIlGfJVNjp5LLnSRD/fkL/wdkBgQtMzr9O1g8Qi/lbFqxsOFteU9f1OtLx34ZwZw MhtdiHcrKGMsaIxTJh4FaqPHBT5ussm2yn1jlAX+LgILd3dpqe3oytsO8JihcK9j t2u9V/Lq92TV7zXxGgWJsPc86WhhgdldlU3X96S++Di18bnDaKbGkzthU6WzZG/H qtFZ5bfK8TlVHYduft+D9ZPzFYGp1WCOa03qU4+Djaxw02HDB6Ltysend9zg0lB1 RT/BP0PwHD3mOL11qpgtV1ChCbR8FJMN/z5+YdSNJgzDQA0H5Sf0UueTweosfAz+ /iC5t/wkegdYtqtA0nKVypYOJCS+UdfMZXenYgtSUJl6drB6I5BCW4mVft3AuWo+ EilPGpblvmjWRx1HiF4/Q/5zrSWHzmKQDyXuyxI9m0OUxAGAM0+8CY6wOqRA5pX+ /f5MjZ1hXELQGhl5Qdj4nqJacICGevJ8WYdZ53B+uYVxz7fbXk9hSYcZKT94UshD qSdaV4XJSuC7pDKqiWoNWXp5N1g+D2BgfwoQEr/RnodFZRlfc+cmOv/visak0OLr E/pp1vJvCi3+T3ImX1MCDiXmflQtFctiL3hNgMXYK2IGhJb2RDC2bFeZkksOHuAE BGgrn+usQDjVlikEnfI3 =0KXp -----END PGP SIGNATURE----- Merge tag 'alternatives_padding' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/asm Pull alternative instructions framework improvements from Borislav Petkov: "A more involved rework of the alternatives framework to be able to pad instructions and thus make using the alternatives macros more straightforward and without having to figure out old and new instruction sizes but have the toolchain figure that out for us. Furthermore, it optimizes JMPs used so that fetch and decode can be relieved with smaller versions of the JMPs, where possible. Some stats: x86_64 defconfig: Alternatives sites total: 2478 Total padding added (in Bytes): 6051 The padding is currently done for: X86_FEATURE_ALWAYS X86_FEATURE_ERMS X86_FEATURE_LFENCE_RDTSC X86_FEATURE_MFENCE_RDTSC X86_FEATURE_SMAP This is with the latest version of the patchset. Of course, on each machine the alternatives sites actually being patched are a proper subset of the total number." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 06:36:15 +01:00
Ingo Molnar	d2c032e3dc	Linux 4.0-rc2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJU9enEAAoJEHm+PkMAQRiG/ewIAJ4MW4tcAhaVj6ndCF3+uL/b RaVm1apUjsTloe5Fl0TT9J5CO3zdOetmMNToy2sf0W4MJDIyHf21o83l7eniV/6q al/c3fQ6HVtNjiSUNghTtzVlL+gUD1F60b9BGYi1V5h2Mp8u0NG1alTGLQfCB8sE ArB+v2aWEdSPn7mZDA0Yuc1In+8bkpht3oy+OLD/8JNkqqLnml9YOyPjM1cuRpBr NxKCLcPzSHH9/nR3T6XtkxXYV5xD3+CDm9roJhfHukoFmfT/G3C65Zcp2KEed/Cw QQpu+ox7fpUs10F/Fbfm8AE+tRB4o2sGh97sprXrO5oaFdx6FPIBo4WN8i/Vy68= =qpY+ -----END PGP SIGNATURE----- Merge tag 'v4.0-rc2' into x86/asm, to refresh the tree Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 06:35:43 +01:00
Borislav Petkov	e3d8f67476	x86/microcode/intel: Move mc arg last in get_matching_{microcode\|sig} ... arguments list so that it comes more natural for those functions to have the signature, processor flags and revision together, before the rest of the args. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-02 20:32:13 +01:00
Borislav Petkov	58ce8d6d3a	x86/microcode: Consolidate family,model, ... code ... to the header. Split the family acquiring function into a main one, doing CPUID and a helper which computes the extended family and is used in multiple places. Get rid of the locally-grown get_x86_{family,model}(). While at it, rename local variables to something more descriptive and vertically align assignments for better readability. There should be no functionality change resulting from this patch. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-02 20:32:07 +01:00
Borislav Petkov	4f5e5f2b57	x86/microcode/intel: Rename update_match_revision() ... to revision_is_newer() and push it up into the header and make it an inline function. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-02 20:32:03 +01:00
Dexuan Cui	89f9f6796d	hv: vmbus_post_msg: retry the hypercall on some transient errors I got HV_STATUS_INVALID_CONNECTION_ID on Hyper-V 2008 R2 when keeping running "rmmod hv_netvsc; modprobe hv_netvsc; rmmod hv_utils; modprobe hv_utils" in a Linux guest. Looks the host has some kind of throttling mechanism if some kinds of hypercalls are sent too frequently. Without the patch, the driver can occasionally fail to load. Also let's retry HV_STATUS_INSUFFICIENT_MEMORY, though we didn't get it before. Removed 'case -ENOMEM', since the hypervisor doesn't return this. CC: "K. Y. Srinivasan" <kys@microsoft.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-03-01 19:30:07 -08:00
Peter P Waskiewicz Jr	cbc82b1726	x86: Add support for Intel Cache QoS Monitoring (CQM) detection This patch adds support for the new Cache QoS Monitoring (CQM) feature found in future Intel Xeon processors. It includes the new values to track CQM resources to the cpuinfo_x86 structure, plus the CPUID detection routines for CQM. CQM allows a process, or set of processes, to be tracked by the CPU to determine the cache usage of that task group. Using this data from the CPU, software can be written to extract this data and report cache usage and occupancy for a particular process, or group of processes. More information about Cache QoS Monitoring can be found in the Intel (R) x86 Architecture Software Developer Manual, section 17.14. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Webb <chris@arachsys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Honeyman <stevenhoneyman@gmail.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Link: http://lkml.kernel.org/r/1422038748-21397-5-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-25 13:53:31 +01:00
Borislav Petkov	a930dc4543	x86/asm: Cleanup prefetch primitives This is based on a patch originally by hpa. With the current improvements to the alternatives, we can simply use %P1 as a mem8 operand constraint and rely on the toolchain to generate the proper instruction sizes. For example, on 32-bit, where we use an empty old instruction we get: apply_alternatives: feat: 632+8, old: (c104648b, len: 4), repl: (c195566c, len: 4) c104648b: alt_insn: 90 90 90 90 c195566c: rpl_insn: 0f 0d 4b 5c ... apply_alternatives: feat: 632+8, old: (c18e09b4, len: 3), repl: (c1955948, len: 3) c18e09b4: alt_insn: 90 90 90 c1955948: rpl_insn: 0f 0d 08 ... apply_alternatives: feat: 6*32+8, old: (c1190cf9, len: 7), repl: (c1955a79, len: 7) c1190cf9: alt_insn: 90 90 90 90 90 90 90 c1955a79: rpl_insn: 0f 0d 0d a0 d4 85 c1 all with the proper padding done depending on the size of the replacement instruction the compiler generates. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@linux.intel.com>	2015-02-23 13:44:17 +01:00
Borislav Petkov	c70e1b475f	x86/asm: Use alternative_2() in rdtsc_barrier() ... now that we have it. Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:17 +01:00
Borislav Petkov	669f8a9001	x86/smap: Use ALTERNATIVE macro ... and drop unfolded version. No need for ASM_NOP3 anymore either as the alternatives do the proper padding at build time and insert proper NOPs at boot time. There should be no apparent operational change from this patch. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:14 +01:00
Borislav Petkov	48c7a2509f	x86/alternatives: Make JMPs more robust Up until now we had to pay attention to relative JMPs in alternatives about how their relative offset gets computed so that the jump target is still correct. Or, as it is the case for near CALLs (opcode e8), we still have to go and readjust the offset at patching time. What is more, the static_cpu_has_safe() facility had to forcefully generate 5-byte JMPs since we couldn't rely on the compiler to generate properly sized ones so we had to force the longest ones. Worse than that, sometimes it would generate a replacement JMP which is longer than the original one, thus overwriting the beginning of the next instruction at patching time. So, in order to alleviate all that and make using JMPs more straight-forward we go and pad the original instruction in an alternative block with NOPs at build time, should the replacement(s) be longer. This way, alternatives users shouldn't pay special attention so that original and replacement instruction sizes are fine but the assembler would simply add padding where needed and not do anything otherwise. As a second aspect, we go and recompute JMPs at patching time so that we can try to make 5-byte JMPs into two-byte ones if possible. If not, we still have to recompute the offsets as the replacement JMP gets put far away in the .altinstr_replacement section leading to a wrong offset if copied verbatim. For example, on a locally generated kernel image old insn VA: 0xffffffff810014bd, CPU feat: X86_FEATURE_ALWAYS, size: 2 __switch_to: ffffffff810014bd: eb 21 jmp ffffffff810014e0 repl insn: size: 5 ffffffff81d0b23c: e9 b1 62 2f ff jmpq ffffffff810014f2 gets corrected to a 2-byte JMP: apply_alternatives: feat: 332+21, old: (ffffffff810014bd, len: 2), repl: (ffffffff81d0b23c, len: 5) alt_insn: e9 b1 62 2f ff recompute_jumps: next_rip: ffffffff81d0b241, tgt_rip: ffffffff810014f2, new_displ: 0x00000033, ret len: 2 converted to: eb 33 90 90 90 and a 5-byte JMP: old insn VA: 0xffffffff81001516, CPU feat: X86_FEATURE_ALWAYS, size: 2 __switch_to: ffffffff81001516: eb 30 jmp ffffffff81001548 repl insn: size: 5 ffffffff81d0b241: e9 10 63 2f ff jmpq ffffffff81001556 gets shortened into a two-byte one: apply_alternatives: feat: 332+21, old: (ffffffff81001516, len: 2), repl: (ffffffff81d0b241, len: 5) alt_insn: e9 10 63 2f ff recompute_jumps: next_rip: ffffffff81d0b246, tgt_rip: ffffffff81001556, new_displ: 0x0000003e, ret len: 2 converted to: eb 3e 90 90 90 ... and so on. This leads to a net win of around 40ish replacements * 3 bytes savings =~ 120 bytes of I$ on an AMD guest which means some savings of precious instruction cache bandwidth. The padding to the shorter 2-byte JMPs are single-byte NOPs which on smart microarchitectures means discarding NOPs at decode time and thus freeing up execution bandwidth. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:11 +01:00
Borislav Petkov	4332195c56	x86/alternatives: Add instruction padding Up until now we have always paid attention to make sure the length of the new instruction replacing the old one is at least less or equal to the length of the old instruction. If the new instruction is longer, at the time it replaces the old instruction it will overwrite the beginning of the next instruction in the kernel image and cause your pants to catch fire. So instead of having to pay attention, teach the alternatives framework to pad shorter old instructions with NOPs at buildtime - but only in the case when len(old instruction(s)) < len(new instruction(s)) and add nothing in the >= case. (In that case we do add_nops() when patching). This way the alternatives user shouldn't have to care about instruction sizes and simply use the macros. Add asm ALTERNATIVE* flavor macros too, while at it. Also, we need to save the pad length in a separate struct alt_instr member for NOP optimization and the way to do that reliably is to carry the pad length instead of trying to detect whether we're looking at single-byte NOPs or at pathological instruction offsets like e9 90 90 90 90, for example, which is a valid instruction. Thanks to Michael Matz for the great help with toolchain questions. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:00 +01:00
Linus Torvalds	f9677375b0	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull Intel Quark SoC support from Ingo Molnar: "This adds support for Intel Quark X1000 SoC boards, used in the low power 32-bit x86 Intel Galileo microcontroller board intended for the Arduino space. There's been some preparatory core x86 patches for Quark CPU quirks merged already, but this rounds it all up and adds Kconfig enablement. It's a clean hardware enablement addition tree at this point" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel/quark: Fix simple_return.cocci warnings x86/intel/quark: Fix ptr_ret.cocci warnings x86/intel/quark: Add Intel Quark platform support x86/intel/quark: Add Isolated Memory Regions for Quark X1000	2015-02-21 11:12:07 -08:00
Linus Torvalds	10436cf881	Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Two fixes: the paravirt spin_unlock() corruption/crash fix, and an rtmutex NULL dereference crash fix" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/spinlocks/paravirt: Fix memory corruption on unlock locking/rtmutex: Avoid a NULL pointer dereference on deadlock	2015-02-21 10:45:03 -08:00
Linus Torvalds	5fbe4c224c	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: "This contains: - EFI fixes - a boot printout fix - ASLR/kASLR fixes - intel microcode driver fixes - other misc fixes Most of the linecount comes from an EFI revert" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/ASLR: Avoid PAGE_SIZE redefinition for UML subarch x86/microcode/intel: Handle truncated microcode images more robustly x86/microcode/intel: Guard against stack overflow in the loader x86, mm/ASLR: Fix stack randomization on 64-bit systems x86/mm/init: Fix incorrect page size in init_memory_mapping() printks x86/mm/ASLR: Propagate base load address calculation Documentation/x86: Fix path in zero-page.txt x86/apic: Fix the devicetree build in certain configs Revert "efi/libstub: Call get_memory_map() to obtain map and desc sizes" x86/efi: Avoid triple faults during EFI mixed mode calls	2015-02-21 10:41:29 -08:00
Jiri Kosina	570e1aa84c	x86/mm/ASLR: Avoid PAGE_SIZE redefinition for UML subarch Commit `f47233c2d3` ("x86/mm/ASLR: Propagate base load address calculation") causes PAGE_SIZE redefinition warnings for UML subarch builds. This is caused by added includes that were leftovers from previous patch versions are are not actually needed (especially page_types.h inlcude in module.c). Drop those stray includes. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1502201017240.28769@pobox.suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-20 10:55:32 +01:00
Ross Zwisler	719d359dc7	x86/asm: Add support for the pcommit instruction Add support for the new pcommit (persistent commit) instruction. This instruction was announced in the document "Intel Architecture Instruction Set Extensions Programming Reference" with reference number 319433-022: https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf The pcommit instruction ensures that data that has been flushed from the processor's cache hierarchy with clwb, clflushopt or clflush is accepted to memory and is durable on the DIMM. The primary use case for this is persistent memory. This function shows how to properly use clwb/clflushopt/clflush and pcommit with appropriate fencing: void flush_and_commit_buffer(void vaddr, unsigned int size) { void vend = vaddr + size - 1; for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size) clwb(vaddr); /* Flush any possible final partial cacheline / clwb(vend); / * sfence to order clwb/clflushopt/clflush cache flushes * mfence via mb() also works / wmb(); / pcommit and the required sfence for ordering / pcommit_sfence(); } After this function completes the data pointed to by vaddr is has been accepted to memory and will be durable if the vaddr points to persistent memory. Pcommit must always be ordered by an mfence or sfence, so to help simplify things we include both the pcommit and the required sfence in the alternatives generated by pcommit_sfence(). The other option is to keep them separated, but on platforms that don't support pcommit this would then turn into: void flush_and_commit_buffer(void vaddr, unsigned int size) { void vend = vaddr + size - 1; for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size) clwb(vaddr); / Flush any possible final partial cacheline / clwb(vend); / * sfence to order clwb/clflushopt/clflush cache flushes * mfence via mb() also works / wmb(); nop(); / from pcommit(), via alternatives / / * sfence to order pcommit * mfence via mb() also works */ wmb(); } This is still correct, but now you've got two fences separated by only a nop. With the commit and the fence together in pcommit_sfence() you avoid the final unneeded fence. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1424367448-24254-1-git-send-email-ross.zwisler@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-20 09:43:36 +01:00
David Vrabel	e3a1f6cac1	x86: pte_protnone() and pmd_protnone() must check entry is not present Since _PAGE_PROTNONE aliases _PAGE_GLOBAL it is only valid if _PAGE_PRESENT is clear. Make pte_protnone() and pmd_protnone() check for this. This fixes a 64-bit Xen PV guest regression introduced by `8a0516ed8b` ("mm: convert p[te\|md]_numa users to p[te\|md]_protnone_numa"). Any userspace process would endlessly fault. In a 64-bit PV guest, userspace page table entries have _PAGE_GLOBAL set by the hypervisor. This meant that any fault on a present userspace entry (e.g., a write to a read-only mapping) would be misinterpreted as a NUMA hinting fault and the fault would not be correctly handled, resulting in the access endlessly faulting. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-19 15:04:49 -08:00
Ingo Molnar	fa45a45ca3	Merge tag 'ras_for_3.21' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull RAS updates from Borislav Petkov: "- Enable AMD thresholding IRQ by default if supported. (Aravind Gopalakrishnan) - Unify mce_panic() message pattern. (Derek Che) - A bit more involved simplification of the CMCI logic after yet another report about race condition with the adaptive logic. (Borislav Petkov) - ACPI APEI EINJ fleshing out of the user documentation. (Borislav Petkov) - Minor cleanup. (Jan Beulich.)" Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 13:31:33 +01:00
Borislav Petkov	3f2f0680d1	x86/MCE/intel: Cleanup CMCI storm logic Initially, this started with the yet another report about a race condition in the CMCI storm adaptive period length thing. Yes, we have to admit, it is fragile and error prone. So let's simplify it. The simpler logic is: now, after we enter storm mode, we go straight to polling with CMCI_STORM_INTERVAL, i.e. once a second. We remain in storm mode as long as we see errors being logged while polling. Theoretically, if we see an uninterrupted error stream, we will remain in storm mode indefinitely and keep polling the MSRs. However, when the storm is actually a burst of errors, once we have logged them all, we back out of it after ~5 mins of polling and no more errors logged. If we encounter an error during those 5 minutes, we reset the polling interval to 5 mins. Making machine_check_poll() return a bool and denoting whether it has seen an error or not lets us simplify a bunch of code and move the storm handling private to mce_intel.c. Some minor cleanups while at it. Reported-by: Calvin Owens <calvinowens@fb.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1417746575-23299-1-git-send-email-calvinowens@fb.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 13:24:25 +01:00
Ingo Molnar	a267b0a349	Merge branch 'tip-x86-kaslr' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent Pull ASLR and kASLR fixes from Borislav Petkov: - Add a global flag announcing KASLR state so that relevant code can do informed decisions based on its setting. (Jiri Kosina) - Fix a stack randomization entropy decrease bug. (Hector Marco-Gisbert) Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 12:31:34 +01:00
Jiri Kosina	f47233c2d3	x86/mm/ASLR: Propagate base load address calculation Commit: `e2b32e6785` ("x86, kaslr: randomize module base load address") makes the base address for module to be unconditionally randomized in case when CONFIG_RANDOMIZE_BASE is defined and "nokaslr" option isn't present on the commandline. This is not consistent with how choose_kernel_location() decides whether it will randomize kernel load base. Namely, CONFIG_HIBERNATION disables kASLR (unless "kaslr" option is explicitly specified on kernel commandline), which makes the state space larger than what module loader is looking at. IOW CONFIG_HIBERNATION && CONFIG_RANDOMIZE_BASE is a valid config option, kASLR wouldn't be applied by default in that case, but module loader is not aware of that. Instead of fixing the logic in module.c, this patch takes more generic aproach. It introduces a new bootparam setup data_type SETUP_KASLR and uses that to pass the information whether kaslr has been applied during kernel decompression, and sets a global 'kaslr_enabled' variable accordingly, so that any kernel code (module loading, livepatching, ...) can make decisions based on its value. x86 module loader is converted to make use of this flag. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Kees Cook <keescook@chromium.org> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Link: https://lkml.kernel.org/r/alpine.LNX.2.00.1502101411280.10719@pobox.suse.cz [ Always dump correct kaslr status when panicking ] Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 11:38:54 +01:00
Ingo Molnar	f353e61230	Merge branch 'tip-x86-fpu' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/fpu Pull FPU updates from Borislav Petkov: "A round of updates to the FPU maze from Oleg and Rik. It should make the code a bit more understandable/readable/streamlined and a preparation for more cleanups and improvements in that area." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 11:19:05 +01:00
Rik van Riel	728e53fef4	x86/fpu: Also check fpu_lazy_restore() when use_eager_fpu() With Oleg's patch: `33a3ebdc07` ("x86, fpu: Don't abuse has_fpu in __kernel_fpu_begin/end()") kernel threads no longer have an FPU state even on systems with use_eager_fpu(). That in turn means that a task may still have its FPU state loaded in the FPU registers, if the task only got interrupted by kernel threads from when it went to sleep, to when it woke up again. In that case, there is no need to restore the FPU state for this task, since it is still in the registers. The kernel can simply use the same logic to determine this as is used for !use_eager_fpu() systems. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/1423252925-14451-9-git-send-email-riel@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 11:15:55 +01:00
Rik van Riel	6a5fe8952b	x86/fpu: Use task_disable_lazy_fpu_restore() helper Replace magic assignments of fpu.last_cpu = ~0 with more explicit task_disable_lazy_fpu_restore() calls. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1423252925-14451-8-git-send-email-riel@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 11:15:55 +01:00
Rik van Riel	1361ef29c7	x86/fpu: Use an explicit if/else in switch_fpu_prepare() Use an explicit if/else branch after __save_init_fpu(old) in switch_fpu_prepare(). This makes substituting the assignment with a call in task_disable_lazy_fpu_restore() in the next patch easier to review. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/1423252925-14451-7-git-send-email-riel@redhat.com [ Space out stuff for more readability. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 11:15:54 +01:00
Rik van Riel	33e03dedd7	x86/fpu: Introduce task_disable_lazy_fpu_restore() helper Currently there are a few magic assignments sprinkled through the code that disable lazy FPU state restoring, some more effective than others, and all equally mystifying. It would be easier to have a helper to explicitly disable lazy FPU state restoring for a task. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/1423252925-14451-6-git-send-email-riel@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 11:15:53 +01:00
Rik van Riel	1c927eea4c	x86/fpu: Move lazy restore functions up a few lines We need another lazy restore related function, that will be called from a function that is above where the lazy restore functions are now. It would be nice to keep all three functions grouped together. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/1423252925-14451-5-git-send-email-riel@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 11:15:53 +01:00
Oleg Nesterov	08a744c6bf	x86/fpu: Change math_error() to use unlazy_fpu(), kill (now) unused save_init_fpu() math_error() calls save_init_fpu() after conditional_sti(), this means that the caller can be preempted. If !use_eager_fpu() we can hit the WARN_ON_ONCE(!__thread_has_fpu(tsk)) and/or save the wrong FPU state. Change math_error() to use unlazy_fpu() and kill save_init_fpu(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1423252925-14451-4-git-send-email-riel@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 11:15:03 +01:00
Andy Lutomirski	91e5ed49fc	x86/asm/decoder: Fix and enforce max instruction size in the insn decoder x86 instructions cannot exceed 15 bytes, and the instruction decoder should enforce that. Prior to `6ba48ff46f`, the instruction length limit was implicitly set to 16, which was an approximation of 15, but there is currently no limit at all. Fix MAX_INSN_SIZE (it should be 15, not 16), and fix the decoder to reject instructions that exceed MAX_INSN_SIZE. Other than potentially confusing some of the decoder sanity checks, I'm not aware of any actual problems that omitting this check would cause, nor am I aware of any practical problems caused by the MAX_INSN_SIZE error. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Fixes: `6ba48ff46f` ("x86: Remove arbitrary instruction size limit ... Link: http://lkml.kernel.org/r/f8f0bc9b8c58cfd6830f7d88400bf1396cbdcd0f.1422403511.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 00:01:24 +01:00
Bryan O'Donoghue	28a375df16	x86/intel/quark: Add Isolated Memory Regions for Quark X1000 Intel's Quark X1000 SoC contains a set of registers called Isolated Memory Regions. IMRs are accessed over the IOSF mailbox interface. IMRs are areas carved out of memory that define read/write access rights to the various system agents within the Quark system. For a given agent in the system it is possible to specify if that agent may read or write an area of memory defined by an IMR with a granularity of 1 KiB. Quark_SecureBootPRM_330234_001.pdf section 4.5 details the concept of IMRs quark-x1000-datasheet.pdf section 12.7.4 details the implementation of IMRs in silicon. eSRAM flush, CPU Snoop write-only, CPU SMM Mode, CPU non-SMM mode, RMU and PCIe Virtual Channels (VC0 and VC1) can have individual read/write access masks applied to them for a given memory region in Quark X1000. This enables IMRs to treat each memory transaction type listed above on an individual basis and to filter appropriately based on the IMR access mask for the memory region. Quark supports eight IMRs. Since all of the DMA capable SoC components in the X1000 are mapped to VC0 it is possible to define sections of memory as invalid for DMA write operations originating from Ethernet, USB, SD and any other DMA capable south-cluster component on VC0. Similarly it is possible to mark kernel memory as non-SMM mode read/write only or to mark BIOS runtime memory as SMM mode accessible only depending on the particular memory footprint on a given system. On an IMR violation Quark SoC X1000 systems are configured to reset the system, so ensuring that the IMR memory map is consistent with the EFI provided memory map is critical to ensure no IMR violations reset the system. The API for accessing IMRs is based on MTRR code but doesn't provide a /proc or /sys interface to manipulate IMRs. Defining the size and extent of IMRs is exclusively the domain of in-kernel code. Quark firmware sets up a series of locked IMRs around pieces of memory that firmware owns such as ACPI runtime data. During boot a series of unlocked IMRs are placed around items in memory to guarantee no DMA modification of those items can take place. Grub also places an unlocked IMR around the kernel boot params data structure and compressed kernel image. It is necessary for the kernel to tear down all unlocked IMRs in order to ensure that the kernel's view of memory passed via the EFI memory map is consistent with the IMR memory map. Without tearing down all unlocked IMRs on boot transitory IMRs such as those used to protect the compressed kernel image will cause IMR violations and system reboots. The IMR init code tears down all unlocked IMRs and sets a protective IMR around the kernel .text and .rodata as one contiguous block. This sanitizes the IMR memory map with respect to the EFI memory map and protects the read-only portions of the kernel from unwarranted DMA access. Tested-by: Ong, Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Reviewed-by: Andy Shevchenko <andy.schevchenko@gmail.com> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Ong, Boon Leong <boon.leong.ong@intel.com> Cc: andy.shevchenko@gmail.com Cc: dvhart@infradead.org Link: http://lkml.kernel.org/r/1422635379-12476-2-git-send-email-pure.logic@nexus-software.ie Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 23:22:47 +01:00
Ricardo Ribalda Delgado	b273c2c2f2	x86/apic: Fix the devicetree build in certain configs Without this patch: LD init/built-in.o arch/x86/built-in.o: In function `dtb_lapic_setup': kernel/devicetree.c:155: undefined reference to `apic_force_enable' Makefile:923: recipe for target 'vmlinux' failed make: *** [vmlinux] Error 1 Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Reviewed-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: David Rientjes <rientjes@google.com> Cc: Jan Beulich <JBeulich@suse.com> Link: http://lkml.kernel.org/r/1422905231-16067-1-git-send-email-ricardo.ribalda@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 22:30:19 +01:00
Miroslav Benes	4421f8f0fa	livepatch: remove extern specifier from header files Storage-class specifier 'extern' is redundant in front of the function declaration. According to the C specification it has the same meaning as if not present at all. So remove it. Signed-off-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2015-02-18 20:50:05 +01:00
Linus Torvalds	eaa0eda562	asm-generic: uaccess.h cleanup Like in 3.19, I once more have a multi-stage cleanup for one asm-generic header file, this time the work was done by Michael Tsirkin and cleans up the uaccess.h file in asm-generic, as well as all architectures for which the respective maintainers did not pick up his patches directly. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIVAwUAVONFpmCrR//JCVInAQIoYRAA1T3ID1bQLqdi8TU1X+vzutXzGFRhRFii u18GYeN6sGTcfqQD0GsNSaH7G8XehF3cgJ9eo4h9YkRPIG/0T0FO+dqdB0uRh8iy GKcUqVhgvCFpOBDUJC6FgMvgWWyVrgSUBqG6qSXck/PDcMSsUa/m/GcLhR/sHWGn EGEAzYNvJgdOaJ1z0vfPFK6mPwFwmYzIss5XFuoBAKKN856fBlxofkQqdpKjGDFH n0UziaJ5tbCdlZ9M9Y5JN9RU8yBCcOmGHnHUAQHz3BXOt9sD7o5jDuzsUbj+vUGJ gzNc8kee9Pyy8ZA1F959gspaxe5Oumq7NLgs3HDjK6ZDRKpJvZb6iXi56f15chlZ dItTbFSxCHOFs0d8XJKNbmPt44pJ/qKO+03lMIGttMkIm7hXfvyMWSPZV9G0Pu1y zbWEDgW2Mdrdt0saNSD46IEp+c7E5P3D9JSctQRdQjReoCbOHwqrSHi1Zeg97XL4 I1E0KwDqFUw3P1dXr5ahXmR50ZigBGjN5Fz3N7GmJt2x4PRSS2Sw92hyCrL0YM8J 56FdRA7UJ0V/SzmAko3F5wWmhabc6L+qrVA42R6U3SNSjU8hwppOkYKDINNhPZfL SGy1oQS6Jj10WxLOVp66NC7XxXzBybDcQnatz4XtNN8P5sfekUGSGBeMyMsHl7IJ 9MT3xym+DWU= =LROx -----END PGP SIGNATURE----- Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic uaccess.h cleanup from Arnd Bergmann: "Like in 3.19, I once more have a multi-stage cleanup for one asm-generic header file, this time the work was done by Michael Tsirkin and cleans up the uaccess.h file in asm-generic, as well as all architectures for which the respective maintainers did not pick up his patches directly" * tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (37 commits) sparc32: nocheck uaccess coding style tweaks sparc64: nocheck uaccess coding style tweaks xtensa: macro whitespace fixes sh: macro whitespace fixes parisc: macro whitespace fixes m68k: macro whitespace fixes m32r: macro whitespace fixes frv: macro whitespace fixes cris: macro whitespace fixes avr32: macro whitespace fixes arm64: macro whitespace fixes arm: macro whitespace fixes alpha: macro whitespace fixes blackfin: macro whitespace fixes sparc64: uaccess_64 macro whitespace fixes sparc32: uaccess_32 macro whitespace fixes avr32: whitespace fix sh: fix put_user sparse errors metag: fix put_user sparse errors ia64: fix put_user sparse errors ...	2015-02-18 10:02:24 -08:00
Linus Torvalds	53861af9a1	OK, this has the big virtio 1.0 implementation, as specified by OASIS. On top of tht is the major rework of lguest, to use PCI and virtio 1.0, to double-check the implementation. Then comes the inevitable fixes and cleanups from that work. Thanks, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU5B9cAAoJENkgDmzRrbjxPacP/jajliXX353JJ/g/hkZ6oDN5 o7FhELBKiUMr7enVZYwj2BBYk5OM36nB9pQkiqHMSbjJGoS5IK70enxb4YRxSHBn YCLblZMNqutGS0kclZ9DDysztjAhxH7CvLM6pMZ7eHP0f3+FM/QhbxHfbG9DTBUH 2U/nybvd3M/+YBe7ptwQdrH8aOCAD6RTIsXellfm99dNMK6K/5lqnWQ98WSXmNXq vyvdaAQsqqUkmxtajjcBumaCH4/SehOJJjUqojCMsR3aBkgOBWDZJURMek+KA5Dt X996fBsTAlvTtCUKRrmLTb2ScDH7fu+jwbWRqMYDk8zpEr3XqiLTTPV4/TiHGmi7 Wiw3g1wIY1YbETlZyongB5MIoVyUfmDAd+bT8nBsj3KIITD84gOUQFDMl6d63c0I z6A9Pu/UzpJGsXZT3WoFLi6TO67QyhOseqZnhS4wBgLabjxffNM7yov9RVKUVH/n JHunnpUk2iTtSgscBarOBz5867dstuurnaUIspZthVBo6y6N0z+GrU+agJ8Y4DXx mvwzeYLhQH2208PjxPFiah/kA/gHNm1m678TbpS+CUsgmpQiJ4gTwtazDSi4TwZY Hs9T9GulkzpZIzEyKL3qG2TsfyDhW5Avn+GvKInAT9+Fkig4BnP3DUONBxcwGZ78 eI3FDUWsE36NqE5ECWmz =ivCe -----END PGP SIGNATURE----- Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio updates from Rusty Russell: "OK, this has the big virtio 1.0 implementation, as specified by OASIS. On top of tht is the major rework of lguest, to use PCI and virtio 1.0, to double-check the implementation. Then comes the inevitable fixes and cleanups from that work" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (80 commits) virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice. virtio_net: unconditionally define struct virtio_net_hdr_v1. tools/lguest: don't use legacy definitions for net device in example launcher. virtio: Don't expose legacy net features when VIRTIO_NET_NO_LEGACY defined. tools/lguest: use common error macros in the example launcher. tools/lguest: give virtqueues names for better error messages tools/lguest: more documentation and checking of virtio 1.0 compliance. lguest: don't look in console features to find emerg_wr. tools/lguest: don't start devices until DRIVER_OK status set. tools/lguest: handle indirect partway through chain. tools/lguest: insert driver references from the 1.0 spec (4.1 Virtio Over PCI) tools/lguest: insert device references from the 1.0 spec (4.1 Virtio Over PCI) tools/lguest: rename virtio_pci_cfg_cap field to match spec. tools/lguest: fix features_accepted logic in example launcher. tools/lguest: handle device reset correctly in example launcher. virtual: Documentation: simplify and generalize paravirt_ops.txt lguest: remove NOTIFY call and eventfd facility. lguest: remove NOTIFY facility from demonstration launcher. lguest: use the PCI console device's emerg_wr for early boot messages. lguest: always put console in PCI slot #1. ...	2015-02-18 09:24:01 -08:00
Raghavendra K T	d6abfdb202	x86/spinlocks/paravirt: Fix memory corruption on unlock Paravirt spinlock clears slowpath flag after doing unlock. As explained by Linus currently it does: prev = lock; add_smp(&lock->tickets.head, TICKET_LOCK_INC); / add_smp() is a full mb() / if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); which is exactly* the kind of things you cannot do with spinlocks, because after you've done the "add_smp()" and released the spinlock for the fast-path, you can't access the spinlock any more. Exactly because a fast-path lock might come in, and release the whole data structure. Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. So this patch implements the fix with: 1. Moving slowpath flag to head (Oleg): Unlocked locks don't care about the slowpath flag; therefore we can keep it set after the last unlock, and clear it again on the first (try)lock. -- this removes the write after unlock. note that keeping slowpath flag would result in unnecessary kicks. By moving the slowpath flag from the tail to the head ticket we also avoid the need to access both the head and tail tickets on unlock. 2. use xadd to avoid read/write after unlock that checks the need for unlock_kick (Linus): We further avoid the need for a read-after-release by using xadd; the prev head value will include the slowpath flag and indicate if we need to do PV kicking of suspended spinners -- on modern chips xadd isn't (much) more expensive than an add + load. Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) benchmark overcommit %improve kernbench 1x -0.13 kernbench 2x 0.02 dbench 1x -1.77 dbench 2x -0.63 [Jeremy: Hinted missing TICKET_LOCK_INC for kick] [Oleg: Moved slowpath flag to head, ticket_equals idea] [PeterZ: Added detailed changelog] Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Jones <drjones@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Waiman Long <Waiman.Long@hp.com> Cc: a.ryabinin@samsung.com Cc: dave@stgolabs.net Cc: hpa@zytor.com Cc: jasowang@redhat.com Cc: jeremy@goop.org Cc: paul.gortmaker@windriver.com Cc: riel@redhat.com Cc: tglx@linutronix.de Cc: waiman.long@hp.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20150215173043.GA7471@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 14:53:49 +01:00
Linus Torvalds	37507717de	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf updates from Ingo Molnar: "This series tightens up RDPMC permissions: currently even highly sandboxed x86 execution environments (such as seccomp) have permission to execute RDPMC, which may leak various perf events / PMU state such as timing information and other CPU execution details. This 'all is allowed' RDPMC mode is still preserved as the (non-default) /sys/devices/cpu/rdpmc=2 setting. The new default is that RDPMC access is only allowed if a perf event is mmap-ed (which is needed to correctly interpret RDPMC counter values in any case). As a side effect of these changes CR4 handling is cleaned up in the x86 code and a shadow copy of the CR4 value is added. The extra CR4 manipulation adds ~ <50ns to the context switch cost between rdpmc-capable and rdpmc-non-capable mms" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks perf/x86: Only allow rdpmc if a perf_event is mapped perf: Pass the event to arch_perf_update_userpage() perf: Add pmu callbacks to track event mapping and unmapping x86: Add a comment clarifying LDT context switching x86: Store a per-cpu shadow copy of CR4 x86: Clean up cr4 manipulation	2015-02-16 14:58:12 -08:00
Linus Torvalds	a9724125ad	TTY/Serial driver patches for 3.20-rc1 Here's the big tty/serial driver update for 3.20-rc1. Nothing huge here, just lots of driver updates and some core tty layer fixes as well. All have been in linux-next with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlTgtgkACgkQMUfUDdst+ykXbACg14oFAmeYjO9RsdIHPXBvKseO 47QAn0foy91bpNQ5UFOxWS5L6Fzj2ZND =syx2 -----END PGP SIGNATURE----- Merge tag 'tty-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver patches from Greg KH: "Here's the big tty/serial driver update for 3.20-rc1. Nothing huge here, just lots of driver updates and some core tty layer fixes as well. All have been in linux-next with no reported issues" * tag 'tty-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (119 commits) serial: 8250: Fix UART_BUG_TXEN workaround serial: driver for ETRAX FS UART tty: remove unused variable sprop serial: of-serial: fetch line number from DT serial: samsung: earlycon support depends on CONFIG_SERIAL_SAMSUNG_CONSOLE tty/serial: serial8250_set_divisor() can be static tty/serial: Add Spreadtrum sc9836-uart driver support Documentation: DT: Add bindings for Spreadtrum SoC Platform serial: samsung: remove redundant interrupt enabling tty: Remove external interface for tty_set_termios() serial: omap: Fix RTS handling serial: 8250_omap: Use UPSTAT_AUTORTS for RTS handling serial: core: Rework hw-assisted flow control support tty/serial: 8250_early: Add support for PXA UARTs tty/serial: of_serial: add support for PXA/MMP uarts tty/serial: of_serial: add DT alias ID handling serial: 8250: Prevent concurrent updates to shadow registers serial: 8250: Use canary to restart console after suspend serial: 8250: Refactor XR17V35X divisor calculation serial: 8250: Refactor divisor programming ...	2015-02-15 11:37:02 -08:00
Linus Torvalds	4ba63072b9	Char / Misc patches for 3.20-rc1 Here's the big char/misc driver update for 3.20-rc1. Lots of little things in here, all described in the changelog. Nothing major or unusual, except maybe the binder selinux stuff, which was all acked by the proper selinux people and they thought it best to come through this tree. All of this has been in linux-next with no reported issues for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlTgs80ACgkQMUfUDdst+yn86gCeMLbxANGExVLd+PR46GNsAUQb SJ4AmgIqrkIz+5LCwZWM02ldbYhPeBVf =lfmM -----END PGP SIGNATURE----- Merge tag 'char-misc-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc patches from Greg KH: "Here's the big char/misc driver update for 3.20-rc1. Lots of little things in here, all described in the changelog. Nothing major or unusual, except maybe the binder selinux stuff, which was all acked by the proper selinux people and they thought it best to come through this tree. All of this has been in linux-next with no reported issues for a while" * tag 'char-misc-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (90 commits) coresight: fix function etm_writel_cp14() parameter order coresight-etm: remove check for unknown Kconfig macro coresight: fixing CPU hwid lookup in device tree coresight: remove the unnecessary function coresight_is_bit_set() coresight: fix the debug AMBA bus name coresight: remove the extra spaces coresight: fix the link between orphan connection and newly added device coresight: remove the unnecessary replicator property coresight: fix the replicator subtype value pdfdocs: Fix 'make pdfdocs' failure for 'uio-howto.tmpl' mcb: Fix error path of mcb_pci_probe virtio/console: verify device has config space ti-st: clean up data types (fix harmless memory corruption) mei: me: release hw from reset only during the reset flow mei: mask interrupt set bit on clean reset bit extcon: max77693: Constify struct regmap_config extcon: adc-jack: Release IIO channel on driver remove extcon: Remove duplicated include from extcon-class.c Drivers: hv: vmbus: hv_process_timer_expiration() can be static Drivers: hv: vmbus: serialize Offer and Rescind offer ...	2015-02-15 10:48:44 -08:00
Linus Torvalds	c833e17e27	Tighten rules for ACCESS_ONCE This series tightens the rules for ACCESS_ONCE to only work on scalar types. It also contains the necessary fixups as indicated by build bots of linux-next. Now everything is in place to prevent new non-scalar users of ACCESS_ONCE and we can continue to convert code to READ_ONCE/WRITE_ONCE. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJU2H5MAAoJEBF7vIC1phx8Jm4QALPqKOMDSUBCrqJFWJeujtv2 ILxJKsnjrAlt3dxnlVI3q6e5wi896hSce75PcvZ/vs/K3GdgMxOjrakBJGTJ2Qjg 5njW9aGJDDr/SYFX33MLWfqy222TLtpxgSz379UgXjEzB0ymMWbJJ3FnGjVqQJdp RXDutpncRySc/rGHh9UPREIRR5GvimONsWE2zxgXjUzB8vIr2fCGvHTXfIb6RKbQ yaFoihzn0m+eisc5Gy4tQ1qhhnaYyWEGrINjHTjMFTQOWTlH80BZAyQeLdbyj2K5 qloBPS/VhBTr/5TxV5onM+nVhu0LiblVNrdMHVeb7jyST4LeFOCaWK98lB3axSB5 v/2D1YKNb3g1U1x3In/oNGQvs36zGiO1uEdMF1l8ZFXgCvHmATSFSTWBtqUhb5Ew JA3YyqMTG6dpRTMSnmu3/frr4wDqnxlB/ktQC1pf3tDp87mr1ZYEy/dQld+tltjh 9Z5GSdrw0nf91wNI3DJf+26ZDdz5B+EpDnPnOKG8anI1lc/mQneI21/K/xUteFXw UZ1XGPLV2vbv9/a13u44SdjenHvQs1egsGeebMxVPoj6WmDLVmcIqinyS6NawYzn IlDGy/b3bSnXWMBP0ZVBX94KWLxqDDc4a/ayxsmxsP1tPZ+jDXjVDa7E3zskcHxG Uj5ULCPyU087t8Sl76mv =Dj70 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux Pull ACCESS_ONCE() rule tightening from Christian Borntraeger: "Tighten rules for ACCESS_ONCE This series tightens the rules for ACCESS_ONCE to only work on scalar types. It also contains the necessary fixups as indicated by build bots of linux-next. Now everything is in place to prevent new non-scalar users of ACCESS_ONCE and we can continue to convert code to READ_ONCE/WRITE_ONCE" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux: kernel: Fix sparse warning for ACCESS_ONCE next: sh: Fix compile error kernel: tighten rules for ACCESS ONCE mm/gup: Replace ACCESS_ONCE with READ_ONCE x86/spinlock: Leftover conversion ACCESS_ONCE->READ_ONCE x86/xen/p2m: Replace ACCESS_ONCE with READ_ONCE ppc/hugetlbfs: Replace ACCESS_ONCE with READ_ONCE ppc/kvm: Replace ACCESS_ONCE with READ_ONCE	2015-02-14 10:54:28 -08:00
Andrey Ryabinin	c420f167db	kasan: enable stack instrumentation Stack instrumentation allows to detect out of bounds memory accesses for variables allocated on stack. Compiler adds redzones around every variable on stack and poisons redzones in function's prologue. Such approach significantly increases stack usage, so all in-kernel stacks size were doubled. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-13 21:21:41 -08:00
Andrey Ryabinin	393f203f5f	x86_64: kasan: add interceptors for memset/memmove/memcpy functions Recently instrumentation of builtin functions calls was removed from GCC 5.0. To check the memory accessed by such functions, userspace asan always uses interceptors for them. So now we should do this as well. This patch declares memset/memmove/memcpy as weak symbols. In mm/kasan/kasan.c we have our own implementation of those functions which checks memory before accessing it. Default memset/memmove/memcpy now now always have aliases with '__' prefix. For files that built without kasan instrumentation (e.g. mm/slub.c) original mem* replaced (via #define) with prefixed variants, cause we don't want to check memory accesses there. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-13 21:21:41 -08:00
Andrey Ryabinin	ef7f0d6a6c	x86_64: add KASan support This patch adds arch specific code for kernel address sanitizer. 16TB of virtual addressed used for shadow memory. It's located in range [ffffec0000000000 - fffffc0000000000] between vmemmap and %esp fixup stacks. At early stage we map whole shadow region with zero page. Latter, after pages mapped to direct mapping address range we unmap zero pages from corresponding shadow (see kasan_map_shadow()) and allocate and map a real shadow memory reusing vmemmap_populate() function. Also replace __pa with __pa_nodebug before shadow initialized. __pa with CONFIG_DEBUG_VIRTUAL=y make external function call (__phys_addr) __phys_addr is instrumented, so __asan_load could be called before shadow area initialized. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Jim Davis <jim.epost@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-13 21:21:41 -08:00
Linus Torvalds	b9085bcbf5	Fairly small update, but there are some interesting new features. Common: Optional support for adding a small amount of polling on each HLT instruction executed in the guest (or equivalent for other architectures). This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This also has to be enabled manually for now, but the plan is to auto-tune this in the future. ARM/ARM64: the highlights are support for GICv3 emulation and dirty page tracking s390: several optimizations and bugfixes. Also a first: a feature exposed by KVM (UUID and long guest name in /proc/sysinfo) before it is available in IBM's hypervisor! :) MIPS: Bugfixes. x86: Support for PML (page modification logging, a new feature in Broadwell Xeons that speeds up dirty page tracking), nested virtualization improvements (nested APICv---a nice optimization), usual round of emulation fixes. There is also a new option to reduce latency of the TSC deadline timer in the guest; this needs to be tuned manually. Some commits are common between this pull and Catalin's; I see you have already included his tree. ARM has other conflicts where functions are added in the same place by 3.19-rc and 3.20 patches. These are not large though, and entirely within KVM. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJU28rkAAoJEL/70l94x66DXqQH/1TDOfJIjW7P2kb0Sw7Fy1wi cEX1KO/VFxAqc8R0E/0Wb55CXyPjQJM6xBXuFr5cUDaIjQ8ULSktL4pEwXyyv/s5 DBDkN65mriry2w5VuEaRLVcuX9Wy+tqLQXWNkEySfyb4uhZChWWHvKEcgw5SqCyg NlpeHurYESIoNyov3jWqvBjr4OmaQENyv7t2c6q5ErIgG02V+iCux5QGbphM2IC9 LFtPKxoqhfeB2xFxTOIt8HJiXrZNwflsTejIlCl/NSEiDVLLxxHCxK2tWK/tUXMn JfLD9ytXBWtNMwInvtFm4fPmDouv2VDyR0xnK2db+/axsJZnbxqjGu1um4Dqbak= =7gdx -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM update from Paolo Bonzini: "Fairly small update, but there are some interesting new features. Common: Optional support for adding a small amount of polling on each HLT instruction executed in the guest (or equivalent for other architectures). This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This also has to be enabled manually for now, but the plan is to auto-tune this in the future. ARM/ARM64: The highlights are support for GICv3 emulation and dirty page tracking s390: Several optimizations and bugfixes. Also a first: a feature exposed by KVM (UUID and long guest name in /proc/sysinfo) before it is available in IBM's hypervisor! :) MIPS: Bugfixes. x86: Support for PML (page modification logging, a new feature in Broadwell Xeons that speeds up dirty page tracking), nested virtualization improvements (nested APICv---a nice optimization), usual round of emulation fixes. There is also a new option to reduce latency of the TSC deadline timer in the guest; this needs to be tuned manually. Some commits are common between this pull and Catalin's; I see you have already included his tree. Powerpc: Nothing yet. The KVM/PPC changes will come in through the PPC maintainers, because I haven't received them yet and I might end up being offline for some part of next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: ia64: drop kvm.h from installed user headers KVM: x86: fix build with !CONFIG_SMP KVM: x86: emulate: correct page fault error code for NoWrite instructions KVM: Disable compat ioctl for s390 KVM: s390: add cpu model support KVM: s390: use facilities and cpu_id per KVM KVM: s390/CPACF: Choose crypto control block format s390/kernel: Update /proc/sysinfo file with Extended Name and UUID KVM: s390: reenable LPP facility KVM: s390: floating irqs: fix user triggerable endless loop kvm: add halt_poll_ns module parameter kvm: remove KVM_MMIO_SIZE KVM: MIPS: Don't leak FPU/DSP to guest KVM: MIPS: Disable HTW while in guest KVM: nVMX: Enable nested posted interrupt processing KVM: nVMX: Enable nested virtual interrupt delivery KVM: nVMX: Enable nested apic register virtualization KVM: nVMX: Make nested control MSRs per-cpu KVM: nVMX: Enable nested virtualize x2apic mode KVM: nVMX: Prepare for using hardware MSR bitmap ...	2015-02-13 09:55:09 -08:00
Andy Lutomirski	f56141e3e2	all arches, signal: move restart_block to struct task_struct If an attacker can cause a controlled kernel stack overflow, overwriting the restart block is a very juicy exploit target. This is because the restart_block is held in the same memory allocation as the kernel stack. Moving the restart block to struct task_struct prevents this exploit by making the restart_block harder to locate. Note that there are other fields in thread_info that are also easy targets, at least on some architectures. It's also a decent simplification, since the restart code is more or less identical on all architectures. [james.hogan@imgtec.com: metag: align thread_info::supervisor_stack] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: David Miller <davem@davemloft.net> Acked-by: Richard Weinberger <richard@nod.at> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Steven Miao <realmz6@gmail.com> Cc: Mark Salter <msalter@redhat.com> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chen Liqin <liqin.linux@gmail.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-12 18:54:12 -08:00
Mel Gorman	c819f37e7e	x86: mm: restore original pte_special check Commit `b38af4721f` ("x86,mm: fix pte_special versus pte_numa") adjusted the pte_special check to take into account that a special pte had SPECIAL and neither PRESENT nor PROTNONE. Now that NUMA hinting PTEs are no longer modifying _PAGE_PRESENT it should be safe to restore the original pte_special behaviour. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Jones <davej@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-12 18:54:08 -08:00
Mel Gorman	21d9ee3eda	mm: remove remaining references to NUMA hinting bits and helpers This patch removes the NUMA PTE bits and associated helpers. As a side-effect it increases the maximum possible swap space on x86-64. One potential source of problems is races between the marking of PTEs PROT_NONE, NUMA hinting faults and migration. It must be guaranteed that a PTE being protected is not faulted in parallel, seen as a pte_none and corrupting memory. The base case is safe but transhuge has problems in the past due to an different migration mechanism and a dependance on page lock to serialise migrations and warrants a closer look. task_work hinting update parallel fault ------------------------ -------------- change_pmd_range change_huge_pmd __pmd_trans_huge_lock pmdp_get_and_clear __handle_mm_fault pmd_none do_huge_pmd_anonymous_page read? pmd_lock blocks until hinting complete, fail !pmd_none test write? __do_huge_pmd_anonymous_page acquires pmd_lock, checks pmd_none pmd_modify set_pmd_at task_work hinting update parallel migration ------------------------ ------------------ change_pmd_range change_huge_pmd __pmd_trans_huge_lock pmdp_get_and_clear __handle_mm_fault do_huge_pmd_numa_page migrate_misplaced_transhuge_page pmd_lock waits for updates to complete, recheck pmd_same pmd_modify set_pmd_at Both of those are safe and the case where a transhuge page is inserted during a protection update is unchanged. The case where two processes try migrating at the same time is unchanged by this series so should still be ok. I could not find a case where we are accidentally depending on the PTE not being cleared and flushed. If one is missed, it'll manifest as corruption problems that start triggering shortly after this series is merged and only happen when NUMA balancing is enabled. Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Jones <davej@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Rik van Riel <riel@redhat.com> Cc: Mark Brown <broonie@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-12 18:54:08 -08:00
Mel Gorman	e7bb4b6d16	mm: add p[te\|md] protnone helpers for use by NUMA balancing This is a preparatory patch that introduces protnone helpers for automatic NUMA balancing. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Jones <davej@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-12 18:54:08 -08:00
Kirill A. Shutemov	d016bf7ece	mm: make FIRST_USER_ADDRESS unsigned long on all archs LKP has triggered a compiler warning after my recent patch "mm: account pmd page tables to the process": mm/mmap.c: In function 'exit_mmap': >> mm/mmap.c:2857:2: warning: right shift count >= width of type [enabled by default] The code: > 2857 WARN_ON(mm_nr_pmds(mm) > 2858 round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT); In this, on tile, we have FIRST_USER_ADDRESS defined as 0. round_up() has the same type -- int. PUD_SHIFT. I think the best way to fix it is to define FIRST_USER_ADDRESS as unsigned long. On every arch for consistency. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-11 17:06:03 -08:00
Rusty Russell	d9bab50aa4	lguest: remove NOTIFY call and eventfd facility. Disappointing, as this was kind of neat (especially getting to use RCU to manage the address -> eventfd mapping). But now the devices are PCI handled in userspace, we get rid of both the NOTIFY hypercall and the interface to connect an eventfd. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-02-11 16:47:46 +10:30
Linus Torvalds	1d9c5d79e6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull live patching infrastructure from Jiri Kosina: "Let me provide a bit of history first, before describing what is in this pile. Originally, there was kSplice as a standalone project that implemented stop_machine()-based patching for the linux kernel. This project got later acquired, and the current owner is providing live patching as a proprietary service, without any intentions to have their implementation merged. Then, due to rising user/customer demand, both Red Hat and SUSE started working on their own implementation (not knowing about each other), and announced first versions roughly at the same time [1] [2]. The principle difference between the two solutions is how they are making sure that the patching is performed in a consistent way when it comes to different execution threads with respect to the semantic nature of the change that is being introduced. In a nutshell, kPatch is issuing stop_machine(), then looking at stacks of all existing processess, and if it decides that the system is in a state that can be patched safely, it proceeds insterting code redirection machinery to the patched functions. On the other hand, kGraft provides a per-thread consistency during one single pass of a process through the kernel and performs a lazy contignuous migration of threads from "unpatched" universe to the "patched" one at safe checkpoints. If interested in a more detailed discussion about the consistency models and its possible combinations, please see the thread that evolved around [3]. It pretty quickly became obvious to the interested parties that it's absolutely impractical in this case to have several isolated solutions for one task to co-exist in the kernel. During a dedicated Live Kernel Patching track at LPC in Dusseldorf, all the interested parties sat together and came up with a joint aproach that would work for both distro vendors. Steven Rostedt took notes [4] from this meeting. And the foundation for that aproach is what's present in this pull request. It provides a basic infrastructure for function "live patching" (i.e. code redirection), including API for kernel modules containing the actual patches, and API/ABI for userspace to be able to operate on the patches (look up what patches are applied, enable/disable them, etc). It's relatively simple and minimalistic, as it's making use of existing kernel infrastructure (namely ftrace) as much as possible. It's also self-contained, in a sense that it doesn't hook itself in any other kernel subsystem (it doesn't even touch any other code). It's now implemented for x86 only as a reference architecture, but support for powerpc, s390 and arm is already in the works (adding arch-specific support basically boils down to teaching ftrace about regs-saving). Once this common infrastructure gets merged, both Red Hat and SUSE have agreed to immediately start porting their current solutions on top of this, abandoning their out-of-tree code. The plan basically is that each patch will be marked by flag(s) that would indicate which consistency model it is willing to use (again, the details have been sketched out already in the thread at [3]). Before this happens, the current codebase can be used to patch a large group of secruity/stability problems the patches for which are not too complex (in a sense that they don't introduce non-trivial change of function's return value semantics, they don't change layout of data structures, etc) -- this corresponds to LEAVE_FUNCTION && SWITCH_FUNCTION semantics described at [3]. This tree has been in linux-next since December. [1] https://lkml.org/lkml/2014/4/30/477 [2] https://lkml.org/lkml/2014/7/14/857 [3] https://lkml.org/lkml/2014/11/7/354 [4] http://linuxplumbersconf.org/2014/wp-content/uploads/2014/10/LPC2014_LivePatching.txt [ The core code is introduced by the three commits authored by Seth Jennings, which got a lot of changes incorporated during numerous respins and reviews of the initial implementation. All the followup commits have materialized only after public tree has been created, so they were not folded into initial three commits so that the public tree doesn't get rebased ]" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: add missing newline to error message livepatch: rename config to CONFIG_LIVEPATCH livepatch: fix uninitialized return value livepatch: support for repatching a function livepatch: enforce patch stacking semantics livepatch: change ARCH_HAVE_LIVE_PATCHING to HAVE_LIVE_PATCHING livepatch: fix deferred module patching order livepatch: handle ancient compilers with more grace livepatch: kconfig: use bool instead of boolean livepatch: samples: fix usage example comments livepatch: MAINTAINERS: add git tree location livepatch: use FTRACE_OPS_FL_IPMODIFY livepatch: move x86 specific ftrace handler code to arch/x86 livepatch: samples: add sample live patching module livepatch: kernel: add support for live patching livepatch: kernel: add TAINT_LIVEPATCH	2015-02-10 18:35:40 -08:00
Linus Torvalds	992de5a8ec	Merge branch 'akpm' (patches from Andrew) Merge misc updates from Andrew Morton: "Bite-sized chunks this time, to avoid the MTA ratelimiting woes. - fs/notify updates - ocfs2 - some of MM" That laconic "some MM" is mainly the removal of remap_file_pages(), which is a big simplification of the VM, and which gets rid of a lot of random cruft and special cases because we no longer support the non-linear mappings that it used. From a user interface perspective, nothing has changed, because the remap_file_pages() syscall still exists, it's just done by emulating the old behavior by creating a lot of individual small mappings instead of one non-linear one. The emulation is slower than the old "native" non-linear mappings, but nobody really uses or cares about remap_file_pages(), and simplifying the VM is a big advantage. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (78 commits) memcg: zap memcg_slab_caches and memcg_slab_mutex memcg: zap memcg_name argument of memcg_create_kmem_cache memcg: zap __memcg_{charge,uncharge}_slab mm/page_alloc.c: place zone_id check before VM_BUG_ON_PAGE check mm: hugetlb: fix type of hugetlb_treat_as_movable variable mm, hugetlb: remove unnecessary lower bound on sysctl handlers"? mm: memory: merge shared-writable dirtying branches in do_wp_page() mm: memory: remove ->vm_file check on shared writable vmas xtensa: drop _PAGE_FILE and pte_file()-related helpers x86: drop _PAGE_FILE and pte_file()-related helpers unicore32: drop pte_file()-related helpers um: drop _PAGE_FILE and pte_file()-related helpers tile: drop pte_file()-related helpers sparc: drop pte_file()-related helpers sh: drop _PAGE_FILE and pte_file()-related helpers score: drop _PAGE_FILE and pte_file()-related helpers s390: drop pte_file()-related helpers parisc: drop _PAGE_FILE and pte_file()-related helpers openrisc: drop _PAGE_FILE and pte_file()-related helpers nios2: drop _PAGE_FILE and pte_file()-related helpers ...	2015-02-10 16:45:56 -08:00
Linus Torvalds	872912352c	ACPI and power management updates for v3.20-rc1 - Rework of the core ACPI resources parsing code to fix issues in it and make using resource offsets more convenient and consolidation of some resource-handing code in a couple of places that have grown analagous data structures and code to cover the the same gap in the core (Jiang Liu, Thomas Gleixner, Lv Zheng). - ACPI-based IOAPIC hotplug support on top of the resources handling rework (Jiang Liu, Yinghai Lu). - ACPICA update to upstream release 20150204 including an interrupt handling rework that allows drivers to install raw handlers for ACPI GPEs which then become entirely responsible for the given GPE and the ACPICA core code won't touch it (Lv Zheng, David E Box, Octavian Purdila). - ACPI EC driver rework to fix several concurrency issues and other problems related to events handling on top of the ACPICA's new support for raw GPE handlers (Lv Zheng). - New ACPI driver for AMD SoCs analogous to the LPSS (Low-Power Subsystem) driver for Intel chips (Ken Xue). - Two minor fixes of the ACPI LPSS driver (Heikki Krogerus, Jarkko Nikula). - Two new blacklist entries for machines (Samsung 730U3E/740U3E and 510R) where the native backlight interface doesn't work correctly while the ACPI one does (Hans de Goede). - Rework of the ACPI processor driver's handling of idle states to make the code more straightforward and less bloated overall (Rafael J Wysocki). - Assorted minor fixes related to ACPI and SFI (Andreas Ruprecht, Andy Shevchenko, Hanjun Guo, Jan Beulich, Rafael J Wysocki, Yaowei Bai). - PCI core power management modification to avoid resuming (some) runtime-suspended devices during system suspend if they are in the right states already (Rafael J Wysocki). - New SFI-based cpufreq driver for Intel platforms using SFI (Srinidhi Kasagar). - cpufreq core fixes, cleanups and simplifications (Viresh Kumar, Doug Anderson, Wolfram Sang). - SkyLake CPU support and other updates for the intel_pstate driver (Kristen Carlson Accardi, Srinivas Pandruvada). - cpufreq-dt driver cleanup (Markus Elfring). - Init fix for the ARM big.LITTLE cpuidle driver (Sudeep Holla). - Generic power domains core code fixes and cleanups (Ulf Hansson). - Operating Performance Points (OPP) core code cleanups and kernel documentation update (Nishanth Menon). - New dabugfs interface to make the list of PM QoS constraints available to user space (Nishanth Menon). - New devfreq driver for Tegra Activity Monitor (Tomeu Vizoso). - New devfreq class (devfreq_event) to provide raw utilization data to devfreq governors (Chanwoo Choi). - Assorted minor fixes and cleanups related to power management (Andreas Ruprecht, Krzysztof Kozlowski, Rickard Strandqvist, Pavel Machek, Todd E Brandt, Wonhong Kwon). - turbostat updates (Len Brown) and cpupower Makefile improvement (Sriram Raghunathan). / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJU2neOAAoJEILEb/54YlRx51QP/jrv1Wb5eMaemzMksPIWI5Zn I8IbxzToxu7wDDsrTBRv+LuyllMPrnppFOHHvB35gUYu7Y6I066s3ErwuqeFlbmy +VicmyGMahv3yN74qg49MXzWtaJZa8hrFXn8ItujiUIcs08yELi0vBQFlZImIbTB PdQngO88VfiOVjDvmKkYUU//9Sc9LCU0ZcdUQXSnA1oNOxuUHjiARz98R03hhSqu BWR+7M0uaFbu6XeK+BExMXJTpKicIBZ1GAF6hWrS8V4aYg+hH1cwjf2neDAzZkcU UkXieJlLJrCq+ZBNcy7WEhkWQkqJNWei5WYiy6eoQeQpNoliY2V+2OtSMJaKqDye PIiMwXstyDc5rgyULN0d1UUzY6mbcUt2rOL0VN2bsFVIJ1HWCq8mr8qq689pQUYv tcH18VQ2/6r2zW28sTO/ByWLYomklD/Y6bw2onMhGx3Knl0D8xYJKapVnTGhr5eY d4k41ybHSWNKfXsZxdJc+RxndhPwj9rFLfvY/CZEhLcW+2pAiMarRDOPXDoUI7/l aJpmPzy/6mPXGBnTfr6jKDSY3gXNazRIvfPbAdiGayKcHcdRM4glbSbNH0/h1Iq6 HKa8v9Fx87k1X5r4ZbhiPdABWlxuKDiM7725rfGpvjlWC3GNFOq7YTVMOuuBA225 Mu9PRZbOsZsnyNkixBpX =zZER -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: "We have a few new features this time, including a new SFI-based cpufreq driver, a new devfreq driver for Tegra Activity Monitor, a new devfreq class for providing its governors with raw utilization data and a new ACPI driver for AMD SoCs. Still, the majority of changes here are reworks of existing code to make it more straightforward or to prepare it for implementing new features on top of it. The primary example is the rework of ACPI resources handling from Jiang Liu, Thomas Gleixner and Lv Zheng with support for IOAPIC hotplug implemented on top of it, but there is quite a number of changes of this kind in the cpufreq core, ACPICA, ACPI EC driver, ACPI processor driver and the generic power domains core code too. The most active developer is Viresh Kumar with his cpufreq changes. Specifics: - Rework of the core ACPI resources parsing code to fix issues in it and make using resource offsets more convenient and consolidation of some resource-handing code in a couple of places that have grown analagous data structures and code to cover the the same gap in the core (Jiang Liu, Thomas Gleixner, Lv Zheng). - ACPI-based IOAPIC hotplug support on top of the resources handling rework (Jiang Liu, Yinghai Lu). - ACPICA update to upstream release 20150204 including an interrupt handling rework that allows drivers to install raw handlers for ACPI GPEs which then become entirely responsible for the given GPE and the ACPICA core code won't touch it (Lv Zheng, David E Box, Octavian Purdila). - ACPI EC driver rework to fix several concurrency issues and other problems related to events handling on top of the ACPICA's new support for raw GPE handlers (Lv Zheng). - New ACPI driver for AMD SoCs analogous to the LPSS (Low-Power Subsystem) driver for Intel chips (Ken Xue). - Two minor fixes of the ACPI LPSS driver (Heikki Krogerus, Jarkko Nikula). - Two new blacklist entries for machines (Samsung 730U3E/740U3E and 510R) where the native backlight interface doesn't work correctly while the ACPI one does (Hans de Goede). - Rework of the ACPI processor driver's handling of idle states to make the code more straightforward and less bloated overall (Rafael J Wysocki). - Assorted minor fixes related to ACPI and SFI (Andreas Ruprecht, Andy Shevchenko, Hanjun Guo, Jan Beulich, Rafael J Wysocki, Yaowei Bai). - PCI core power management modification to avoid resuming (some) runtime-suspended devices during system suspend if they are in the right states already (Rafael J Wysocki). - New SFI-based cpufreq driver for Intel platforms using SFI (Srinidhi Kasagar). - cpufreq core fixes, cleanups and simplifications (Viresh Kumar, Doug Anderson, Wolfram Sang). - SkyLake CPU support and other updates for the intel_pstate driver (Kristen Carlson Accardi, Srinivas Pandruvada). - cpufreq-dt driver cleanup (Markus Elfring). - Init fix for the ARM big.LITTLE cpuidle driver (Sudeep Holla). - Generic power domains core code fixes and cleanups (Ulf Hansson). - Operating Performance Points (OPP) core code cleanups and kernel documentation update (Nishanth Menon). - New dabugfs interface to make the list of PM QoS constraints available to user space (Nishanth Menon). - New devfreq driver for Tegra Activity Monitor (Tomeu Vizoso). - New devfreq class (devfreq_event) to provide raw utilization data to devfreq governors (Chanwoo Choi). - Assorted minor fixes and cleanups related to power management (Andreas Ruprecht, Krzysztof Kozlowski, Rickard Strandqvist, Pavel Machek, Todd E Brandt, Wonhong Kwon). - turbostat updates (Len Brown) and cpupower Makefile improvement (Sriram Raghunathan)" * tag 'pm+acpi-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (151 commits) tools/power turbostat: relax dependency on APERF_MSR tools/power turbostat: relax dependency on invariant TSC Merge branch 'pci/host-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into acpi-resources tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS tools/power turbostat: relax dependency on root permission ACPI / video: Add disable_native_backlight quirk for Samsung 510R ACPI / PM: Remove unneeded nested #ifdef USB / PM: Remove unneeded #ifdef and associated dead code intel_pstate: provide option to only use intel_pstate with HWP ACPI / EC: Add GPE reference counting debugging messages ACPI / EC: Add query flushing support ACPI / EC: Refine command storm prevention support ACPI / EC: Add command flushing support. ACPI / EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag ACPI: add AMD ACPI2Platform device support for x86 system ACPI / table: remove duplicate NULL check for the handler of acpi_table_parse() ACPI / EC: Update revision due to raw handler mode. ACPI / EC: Reduce ec_poll() by referencing the last register access timestamp. ACPI / EC: Fix several GPE handling issues by deploying ACPI_GPE_DISPATCH_RAW_HANDLER mode. ACPICA: Events: Enable APIs to allow interrupt/polling adaptive request based GPE handling model ...	2015-02-10 15:09:41 -08:00
Kirill A. Shutemov	0a19136205	x86: drop _PAGE_FILE and pte_file()-related helpers We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-10 14:30:33 -08:00
Linus Torvalds	bdccc4edeb	xen: features and fixes for 3.20-rc0 - Reworked handling for foreign (grant mapped) pages to simplify the code, enable a number of additional use cases and fix a number of long-standing bugs. - Prefer the TSC over the Xen PV clock when dom0 (and the TSC is stable). - Assorted other cleanup and minor bug fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJU2JC+AAoJEFxbo/MsZsTRIvAH/1lgQ0EQlxaZtEFWY8cJBzxY dXaTMfyGQOddGYDCW0r42hhXJHeX7DWXSERSD3aW9DZOn/eYdneHq9gWRD4uPrGn hEFQ26J4jZWR5riGXaja0LqI2gJKLZ6BhHIQciLEbY+jw4ynkNBLNRPFehuwrCsZ WdBwJkyvXC3RErekncRl/aNhxdi4p1P6qeiaW/mo3UcSO/CFSKybOLwT65iePazg XuY9UiTn2+qcRkm/tjx8K9heHK8SBEGNWuoTcWYF1to8mwwUfKIAc4NO2UBDXJI+ rp7Z2lVFdII15JsQ08ATh3t7xDrMWLzCX/y4jCzmF3DBXLbSWdHCQMgI7TWt5pE= =PyJK -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.20-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen features and fixes from David Vrabel: - Reworked handling for foreign (grant mapped) pages to simplify the code, enable a number of additional use cases and fix a number of long-standing bugs. - Prefer the TSC over the Xen PV clock when dom0 (and the TSC is stable). - Assorted other cleanup and minor bug fixes. * tag 'stable/for-linus-3.20-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (25 commits) xen/manage: Fix USB interaction issues when resuming xenbus: Add proper handling of XS_ERROR from Xenbus for transactions. xen/gntdev: provide find_special_page VMA operation xen/gntdev: mark userspace PTEs as special on x86 PV guests xen-blkback: safely unmap grants in case they are still in use xen/gntdev: safely unmap grants in case they are still in use xen/gntdev: convert priv->lock to a mutex xen/grant-table: add a mechanism to safely unmap pages that are in use xen-netback: use foreign page information from the pages themselves xen: mark grant mapped pages as foreign xen/grant-table: add helpers for allocating pages x86/xen: require ballooned pages for grant maps xen: remove scratch frames for ballooned pages and m2p override xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs() mm: add 'foreign' alias for the 'pinned' page flag mm: provide a find_special_page vma operation x86/xen: cleanup arch/x86/xen/mmu.c x86/xen: add some __init annotations in arch/x86/xen/mmu.c x86/xen: add some __init and static annotations in arch/x86/xen/setup.c x86/xen: use correct types for addresses in arch/x86/xen/setup.c ...	2015-02-10 13:56:56 -08:00
Rafael J. Wysocki	b5e82233ca	Merge branch 'pm-tools' * pm-tools: tools/power turbostat: relax dependency on APERF_MSR tools/power turbostat: relax dependency on invariant TSC tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS tools/power turbostat: relax dependency on root permission cpupower Makefile change to help run the tool without 'make install'	2015-02-10 16:11:26 +01:00
Rafael J. Wysocki	7bc95d4ef1	Merge branch 'pm-cpufreq' * pm-cpufreq: (46 commits) intel_pstate: provide option to only use intel_pstate with HWP cpufreq-dt: Drop unnecessary check before cpufreq_cooling_unregister() invocation cpufreq: Create for_each_governor() cpufreq: Create for_each_policy() cpufreq: Drop cpufreq_disabled() check from cpufreq_cpu_{get\|put}() cpufreq: Set cpufreq_cpu_data to NULL before putting kobject intel_pstate: honor user space min_perf_pct override on resume intel_pstate: respect cpufreq policy request intel_pstate: Add num_pstates to sysfs intel_pstate: expose turbo range to sysfs intel_pstate: Add support for SkyLake cpufreq: stats: drop unnecessary locking cpufreq: stats: don't update stats on false notifiers cpufreq: stats: don't update stats from show_trans_table() cpufreq: stats: time_in_state can't be NULL in cpufreq_stats_update() cpufreq: stats: create sysfs group once we are ready cpufreq: remove CPUFREQ_UPDATE_POLICY_CPU notifications cpufreq: stats: drop 'cpu' field of struct cpufreq_stats cpufreq: Remove (now) unused 'last_cpu' from struct cpufreq_policy cpufreq: stats: rename 'struct cpufreq_stats' objects as 'stats' ...	2015-02-10 16:10:44 +01:00
Linus Torvalds	a8f7684214	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SoC updates from Ingo Molnar: "Various Intel Atom SoC updates (mostly to enhance debuggability), plus an apb_timer cleanup" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: pmc_atom: Expose contents of PSS x86: pmc_atom: Clean up init function x86: pmc-atom: Remove unused macro x86: pmc_atom: don%27t check for NULL twice x86: pmc-atom: Assign debugfs node as soon as possible x86/platform: Remove unused function from apb_timer.c	2015-02-09 18:11:28 -08:00
Linus Torvalds	c93ecedab3	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Ingo Molnar: "Initial round of kernel_fpu_begin/end cleanups from Oleg Nesterov, plus a cleanup from Borislav Petkov" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, fpu: Fix math_state_restore() race with kernel_fpu_begin() x86, fpu: Don't abuse has_fpu in __kernel_fpu_begin/end() x86, fpu: Introduce per-cpu in_kernel_fpu state x86/fpu: Use a symbolic name for asm operand	2015-02-09 18:01:52 -08:00
Linus Torvalds	7453311d68	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm changes from Ingo Molnar: "The main changes in this cycle were the x86/entry and sysret enhancements from Andy Lutomirski, see merge commits `772a9aca12` and `b57c0b5175` for details" [ Exectutive summary: IST exceptions that interrupt user space will run on the regular kernel stack instead of the IST stack. Which simplifies things particularly on return to user space. The sysret cleanup ends up simplifying the logic on when we can use sysret vs when we have to use iret. - Linus ] * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86_64, entry: Remove the syscall exit audit and schedule optimizations x86_64, entry: Use sysret to return to userspace when possible x86, traps: Fix ist_enter from userspace x86, vdso: teach 'make clean' remove vdso64 binaries x86_64 entry: Fix RCX for ptraced syscalls x86: entry_64.S: fold SAVE_ARGS_IRQ macro into its sole user x86: ia32entry.S: fix wrong symbolic constant usage: R11->ARGOFFSET x86: entry_64.S: delete unused code x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks x86, traps: Add ist_begin_non_atomic and ist_end_non_atomic x86: Clean up current_stack_pointer x86, traps: Track entry into and exit from IST context x86, entry: Switch stacks on a paranoid entry from userspace	2015-02-09 17:16:44 -08:00
Linus Torvalds	9d43bade34	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 APIC updates from Ingo Molnar: "Continued fallout of the conversion of the x86 IRQ code to the hierarchical irqdomain framework: more cleanups, simplifications, memory allocation behavior enhancements, mainly in the interrupt remapping and APIC code" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) x86, init: Fix UP boot regression on x86_64 iommu/amd: Fix irq remapping detection logic x86/acpi: Make acpi_[un]register_gsi_ioapic() depend on CONFIG_X86_LOCAL_APIC x86: Consolidate boot cpu timer setup x86/apic: Reuse apic_bsp_setup() for UP APIC setup x86/smpboot: Sanitize uniprocessor init x86/smpboot: Move apic init code to apic.c init: Get rid of x86isms x86/apic: Move apic_init_uniprocessor code x86/smpboot: Cleanup ioapic handling x86/apic: Sanitize ioapic handling x86/ioapic: Add proper checks to setp/enable_IO_APIC() x86/ioapic: Provide stub functions for IOAPIC%3Dn x86/smpboot: Move smpboot inlines to code x86/x2apic: Use state information for disable x86/x2apic: Split enable and setup function x86/x2apic: Disable x2apic from nox2apic setup x86/x2apic: Add proper state tracking x86/x2apic: Clarify remapping mode for x2apic enablement x86/x2apic: Move code in conditional region ...	2015-02-09 16:57:56 -08:00
Rafael J. Wysocki	c488ea4613	Merge branch 'sfi' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux into pm-cpufreq Pull SFI-based cpufreq driver for v3.20 from Len Brown. * 'sfi' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: cpufreq: Add SFI based cpufreq driver support SFI: fix compiler warnings	2015-02-09 23:43:53 +01:00
Len Brown	3a9a941d0b	tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS The Processor generation code-named Haswell added MSR_{CORE \| GFX \| RING}_PERF_LIMIT_REASONS to explain when and how the processor limits frequency. turbostat -v will now decode these bits. Each MSR has an "Active" set of bits which describe current conditions, and a "Logged" set of bits, which describe what has happened since last cleared. Turbostat currently doesn't clear the log bits. Signed-off-by: Len Brown <len.brown@intel.com>	2015-02-09 16:44:24 -05:00
Paolo Bonzini	f781951299	kvm: add halt_poll_ns module parameter This patch introduces a new module parameter for the KVM module; when it is present, KVM attempts a bit of polling on every HLT before scheduling itself out via kvm_vcpu_block. This parameter helps a lot for latency-bound workloads---in particular I tested it with O_DSYNC writes with a battery-backed disk in the host. In this case, writes are fast (because the data doesn't have to go all the way to the platters) but they cannot be merged by either the host or the guest. KVM's performance here is usually around 30% of bare metal, or 50% if you use cache=directsync or cache=writethrough (these parameters avoid that the guest sends pointless flush requests, and at the same time they are not slow because of the battery-backed cache). The bad performance happens because on every halt the host CPU decides to halt itself too. When the interrupt comes, the vCPU thread is then migrated to a new physical CPU, and in general the latency is horrible because the vCPU thread has to be scheduled back in. With this patch performance reaches 60-65% of bare metal and, more important, 99% of what you get if you use idle=poll in the guest. This means that the tunable gets rid of this particular bottleneck, and more work can be done to improve performance in the kernel or QEMU. Of course there is some price to pay; every time an otherwise idle vCPUs is interrupted by an interrupt, it will poll unnecessarily and thus impose a little load on the host. The above results were obtained with a mostly random value of the parameter (500000), and the load was around 1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU. The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll, that can be used to tune the parameter. It counts how many HLT instructions received an interrupt during the polling period; each successful poll avoids that Linux schedules the VCPU thread out and back in, and may also avoid a likely trip to C1 and back for the physical CPU. While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second. Of these halts, almost all are failed polls. During the benchmark, instead, basically all halts end within the polling period, except a more or less constant stream of 50 per second coming from vCPUs that are not running the benchmark. The wasted time is thus very low. Things may be slightly different for Windows VMs, which have a ~10 ms timer tick. The effect is also visible on Marcelo's recently-introduced latency test for the TSC deadline timer. Though of course a non-RT kernel has awful latency bounds, the latency of the timer is around 8000-10000 clock cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC deadline timer, thus, the effect is both a smaller average latency and a smaller variance. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-06 13:08:37 +01:00
Jiang Liu	b4b55cda58	x86/PCI: Refine the way to release PCI IRQ resources Some PCI device drivers assume that pci_dev->irq won't change after calling pci_disable_device() and pci_enable_device() during suspend and resume. Commit `c03b3b0738` ("x86, irq, mpparse: Release IOAPIC pin when PCI device is disabled") frees PCI IRQ resources when pci_disable_device() is called and reallocate IRQ resources when pci_enable_device() is called again. This breaks above assumption. So commit `3eec595235` ("x86, irq, PCI: Keep IRQ assignment for PCI devices during suspend/hibernation") and `9eabc99a63` ("x86, irq, PCI: Keep IRQ assignment for runtime power management") fix the issue by avoiding freeing/reallocating IRQ resources during PCI device suspend/resume. They achieve this by checking dev.power.is_prepared and dev.power.runtime_status. PM maintainer, Rafael, then pointed out that it's really an ugly fix which leaking PM internal state information to IRQ subsystem. Recently David Vrabel <david.vrabel@citrix.com> also reports an regression in pciback driver caused by commit `cffe0a2b5a` ("x86, irq: Keep balance of IOAPIC pin reference count"). Please refer to: http://lkml.org/lkml/2015/1/14/546 So this patch refine the way to release PCI IRQ resources. Instead of releasing PCI IRQ resources in pci_disable_device()/ pcibios_disable_device(), we now release it at driver unbinding notification BUS_NOTIFY_UNBOUND_DRIVER. In other word, we only release PCI IRQ resources when there's no driver bound to the PCI device, and it keeps the assumption that pci_dev->irq won't through multiple invocation of pci_enable_device()/pci_disable_device(). Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-02-05 15:09:26 +01:00
Tiejun Chen	1c2b364b22	kvm: remove KVM_MMIO_SIZE After `f78146b0f9`, "KVM: Fix page-crossing MMIO", and `87da7e66a4`, "KVM: x86: fix vcpu->mmio_fragments overflow", actually KVM_MMIO_SIZE is gone. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-05 12:26:14 +01:00
Andy Lutomirski	a66734297f	perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks While perfmon2 is a sufficiently evil library (it pokes MSRs directly) that breaking it is fair game, it's still useful, so we might as well try to support it. This allows users to write 2 to /sys/devices/cpu/rdpmc to disable all rdpmc protection so that hack like perfmon2 can continue to work. At some point, if perf_event becomes fast enough to replace perfmon2, then this can go. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/caac3c1c707dcca48ecbc35f4def21495856f479.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 12:10:49 +01:00
Andy Lutomirski	7911d3f7af	perf/x86: Only allow rdpmc if a perf_event is mapped We currently allow any process to use rdpmc. This significantly weakens the protection offered by PR_TSC_DISABLED, and it could be helpful to users attempting to exploit timing attacks. Since we can't enable access to individual counters, use a very coarse heuristic to limit access to rdpmc: allow access only when a perf_event is mmapped. This protects seccomp sandboxes. There is plenty of room to further tighen these restrictions. For example, this allows rdpmc for any x86_pmu event, but it's only useful for self-monitoring tasks. As a side effect, cap_user_rdpmc will now be false for AMD uncore events. This isn't a real regression, since .event_idx is disabled for these events anyway for the time being. Whenever that gets re-added, the cap_user_rdpmc code can be adjusted or refactored accordingly. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/a2bdb3cf3a1d70c26980d7c6dddfbaa69f3182bf.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 12:10:47 +01:00
Andy Lutomirski	22c4bd9fa9	x86: Add a comment clarifying LDT context switching The code is correct, but only for a rather subtle reason. This confused me for quite a while when I read switch_mm, so clarify the code to avoid confusing other people, too. TBH, I wouldn't be surprised if this code was only correct by accident. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/0db86397f968996fb772c443c251415b0b430ddd.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 12:10:43 +01:00
Andy Lutomirski	1e02ce4ccc	x86: Store a per-cpu shadow copy of CR4 Context switches and TLB flushes can change individual bits of CR4. CR4 reads take several cycles, so store a shadow copy of CR4 in a per-cpu variable. To avoid wasting a cache line, I added the CR4 shadow to cpu_tlbstate, which is already touched in switch_mm. The heaviest users of the cr4 shadow will be switch_mm and __switch_to_xtra, and __switch_to_xtra is called shortly after switch_mm during context switch, so the cacheline is likely to be hot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/3a54dd3353fffbf84804398e00dfdc5b7c1afd7d.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 12:10:42 +01:00
Andy Lutomirski	375074cc73	x86: Clean up cr4 manipulation CR4 manipulation was split, seemingly at random, between direct (write_cr4) and using a helper (set/clear_in_cr4). Unfortunately, the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code, which only a small subset of users actually wanted. This patch replaces all cr4 access in functions that don't leave cr4 exactly the way they found it with new helpers cr4_set_bits, cr4_clear_bits, and cr4_set_bits_and_update_boot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 12:10:41 +01:00
Josh Poimboeuf	12cf89b550	livepatch: rename config to CONFIG_LIVEPATCH Rename CONFIG_LIVE_PATCHING to CONFIG_LIVEPATCH to make the naming of the config and the code more consistent. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2015-02-04 11:25:51 +01:00
Ingo Molnar	0967160ad6	Merge branch 'x86/asm' into perf/x86, to avoid conflicts with upcoming patches Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 09:01:12 +01:00
Andy Shevchenko	874e52086f	x86, mrst: remove Moorestown specific serial drivers Intel Moorestown platform support was removed few years ago. This is a follow up which removes Moorestown specific code for the serial devices. It includes mrst_max3110 and earlyprintk bits. This was used on SFI (Medfield, Clovertrail) based platforms as well, though new ones use normal serial interface for the console service. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: David Cohen <david.a.cohen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-02-02 10:11:24 -08:00
Marcelo Tosatti	2e6d015799	KVM: x86: revert "add method to test PIR bitmap vector" Revert `7c6a98dfa1`, given that testing PIR is not necessary anymore. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-02 18:36:34 +01:00
Kai Huang	843e433057	KVM: VMX: Add PML support in VMX This patch adds PML support in VMX. A new module parameter 'enable_pml' is added to allow user to enable/disable it manually. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 09:39:54 +01:00
Kai Huang	88178fd4f7	KVM: x86: Add new dirty logging kvm_x86_ops for PML This patch adds new kvm_x86_ops dirty logging hooks to enable/disable dirty logging for particular memory slot, and to flush potentially logged dirty GPAs before reporting slot->dirty_bitmap to userspace. kvm x86 common code calls these hooks when they are available so PML logic can be hidden to VMX specific. SVM won't be impacted as these hooks remain NULL there. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-29 15:31:41 +01:00
Kai Huang	1c91cad423	KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access This patch changes the second parameter of kvm_mmu_slot_remove_write_access from 'slot id' to 'struct kvm_memory_slot ' to align with kvm_x86_ops dirty logging hooks, which will be introduced in further patch. Better way is to change second parameter of kvm_arch_commit_memory_region from 'struct kvm_userspace_memory_region ' to 'struct kvm_memory_slot * new', but it requires changes on other non-x86 ARCH too, so avoid it now. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-29 15:31:37 +01:00
Kai Huang	f4b4b18086	KVM: MMU: Add mmu help functions to support PML This patch adds new mmu layer functions to clear/set D-bit for memory slot, and to write protect superpages for memory slot. In case of PML, CPU logs the dirty GPA automatically to PML buffer when CPU updates D-bit from 0 to 1, therefore we don't have to write protect 4K pages, instead, we only need to clear D-bit in order to log that GPA. For superpages, we still write protect it and let page fault code to handle dirty page logging, as we still need to split superpage to 4K pages in PML. As PML is always enabled during guest's lifetime, to eliminate unnecessary PML GPA logging, we set D-bit manually for the slot with dirty logging disabled. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-29 15:31:29 +01:00
Ingo Molnar	b3890e4704	Merge branch 'perf/hw_breakpoints' into perf/core The new hw_breakpoint bits are now ready for v3.20, merge them into the main branch, to avoid conflicts. Conflicts: tools/perf/Documentation/perf-record.txt Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-28 15:48:59 +01:00
Ingo Molnar	772a9aca12	This is my accumulated x86 entry work, part 1, for 3.20. The meat of this is an IST rework. When an IST exception interrupts user space, we will handle it on the per-thread kernel stack instead of on the IST stack. This sounds messy, but it actually simplifies the IST entry/exit code, because it eliminates some ugly games we used to play in order to handle rescheduling, signal delivery, etc on the way out of an IST exception. The IST rework introduces proper context tracking to IST exception handlers. I haven't seen any bug reports, but the old code could have incorrectly treated an IST exception handler as an RCU extended quiescent state. The memory failure change (included in this pull request with Borislav and Tony's permission) eliminates a bunch of code that is no longer needed now that user memory failure handlers are called in process context. Finally, this includes a few on Denys' uncontroversial and Obviously Correct (tm) cleanups. The IST and memory failure changes have been in -next for a while. LKML references: IST rework: http://lkml.kernel.org/r/cover.1416604491.git.luto@amacapital.net Memory failure change: http://lkml.kernel.org/r/54ab2ffa301102cd6e@agluck-desk.sc.intel.com Denys' cleanups: http://lkml.kernel.org/r/1420927210-19738-1-git-send-email-dvlasenk@redhat.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUtvkFAAoJEK9N98ZeDfrkcfsIAJxZ0UBUCEDvulbqgk/iPGOa fIpKLMowS7CpKtw6Wdc/YvAIkeHXWm1vU44Hj0TrjSrXCgVF8yCngs/xlXtOjoa1 dosXQqgqVJJ+hyui7chAEWyalLW7bEO8raq/6snhiMrhiuEkVKpEr7Fer4FVVCZL 4VALmNQQsbV+Qq4pXIhuagZC0Nt/XKi/+/cKvhS4p//q1F/TbHTz0FpDUrh0jPMh 18WFy0jWgxdkMRnSp/wJhekvdXX6PwUy5BdES9fjw8LQJZxxFpqN3Fe1kgfyzV0k yuvEHw1hPt2aBGj3q69wQvDVyyn4OqMpRDBhk4S+GJYmVh7mFyFMN4BDMEy/EY8= =LXVl -----END PGP SIGNATURE----- Merge tag 'pr-20150114-x86-entry' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/asm Pull x86/entry enhancements from Andy Lutomirski: " This is my accumulated x86 entry work, part 1, for 3.20. The meat of this is an IST rework. When an IST exception interrupts user space, we will handle it on the per-thread kernel stack instead of on the IST stack. This sounds messy, but it actually simplifies the IST entry/exit code, because it eliminates some ugly games we used to play in order to handle rescheduling, signal delivery, etc on the way out of an IST exception. The IST rework introduces proper context tracking to IST exception handlers. I haven't seen any bug reports, but the old code could have incorrectly treated an IST exception handler as an RCU extended quiescent state. The memory failure change (included in this pull request with Borislav and Tony's permission) eliminates a bunch of code that is no longer needed now that user memory failure handlers are called in process context. Finally, this includes a few on Denys' uncontroversial and Obviously Correct (tm) cleanups. The IST and memory failure changes have been in -next for a while. LKML references: IST rework: http://lkml.kernel.org/r/cover.1416604491.git.luto@amacapital.net Memory failure change: http://lkml.kernel.org/r/54ab2ffa301102cd6e@agluck-desk.sc.intel.com Denys' cleanups: http://lkml.kernel.org/r/1420927210-19738-1-git-send-email-dvlasenk@redhat.com " This tree semantically depends on and is based on the following RCU commit: `734d168013` ("rcu: Make rcu_nmi_enter() handle nesting") ... and for that reason won't be pushed upstream before the RCU bits hit Linus's tree. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-28 15:33:26 +01:00
David Vrabel	0bb599fd30	xen: remove scratch frames for ballooned pages and m2p override The scratch frame mappings for ballooned pages and the m2p override are broken. Remove them in preparation for replacing them with simpler mechanisms that works. The scratch pages did not ensure that the page was not in use. In particular, the foreign page could still be in use by hardware. If the guest reused the frame the hardware could read or write that frame. The m2p override did not handle the same frame being granted by two different grant references. Trying an M2P override lookup in this case is impossible. With the m2p override removed, the grant map/unmap for the kernel mappings (for x86 PV) can be easily batched in set_foreign_p2m_mapping() and clear_foreign_p2m_mapping(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2015-01-28 14:03:10 +00:00
David Vrabel	853d028934	xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs() When unmapping grants, instead of converting the kernel map ops to unmap ops on the fly, pre-populate the set of unmap ops. This allows the grant unmap for the kernel mappings to be trivially batched in the future. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2015-01-28 14:03:10 +00:00
Nadav Amit	801806d956	KVM: x86: IRET emulation does not clear NMI masking The IRET instruction should clear NMI masking, but the current implementation does not do so. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-26 12:14:42 +01:00
K. Y. Srinivasan	4061ed9e2a	Drivers: hv: vmbus: Implement a clockevent device Implement a clockevent device based on the timer support available on Hyper-V. In this version of the patch I have addressed Jason's review comments. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-01-25 09:17:57 -08:00
Paolo Bonzini	1c6007d59a	KVM/ARM changes for v3.20 including GICv3 emulation, dirty page logging, added trace symbols, and adding an explicit VGIC init device control IOCTL. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUwhsKAAoJEEtpOizt6ddyuSEH/ia2uf07N0i+C1dPKYiqhKEd nFqBvgrhAMVztWLmy1Wq4SnO9YNd+CrPYATrfCiYsYQ9aKc09+qDq+uo06bVpZXz KsHjVGUsdyJ4qRqjDixkPvZviGIXa6C//+hcwg1XH2nit1uHmXVupzB9dDz3ZM2l GCwApdRdaaUVDt5Ud2ljqIWZa18Qf/5/HD8MdPXpmotDOKucL6pBr/1R1XWueCU/ ejRs/qy3EFyMWdEdfGFAMCa0ZvHbPmsJmvB/EgkyUnuJj77ptA0jNo1jtzSfEyis 53x4ffWnIsPl9yqhk0oKerIALVUvV4A7/me2ya6tsQ5fiBX7lJ3+qwggvCkWQzw= =fMS2 -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next KVM/ARM changes for v3.20 including GICv3 emulation, dirty page logging, added trace symbols, and adding an explicit VGIC init device control IOCTL. Conflicts: arch/arm64/include/asm/kvm_arm.h arch/arm64/kvm/handle_exit.c	2015-01-23 13:39:51 +01:00
Andy Lutomirski	3669ef9fa7	x86, tls: Interpret an all-zero struct user_desc as "no segment" The Witcher 2 did something like this to allocate a TLS segment index: struct user_desc u_info; bzero(&u_info, sizeof(u_info)); u_info.entry_number = (uint32_t)-1; syscall(SYS_set_thread_area, &u_info); Strictly speaking, this code was never correct. It should have set read_exec_only and seg_not_present to 1 to indicate that it wanted to find a free slot without putting anything there, or it should have put something sensible in the TLS slot if it wanted to allocate a TLS entry for real. The actual effect of this code was to allocate a bogus segment that could be used to exploit espfix. The set_thread_area hardening patches changed the behavior, causing set_thread_area to return -EINVAL and crashing the game. This changes set_thread_area to interpret this as a request to find a free slot and to leave it empty, which isn't quite what the game expects but should be close enough to keep it working. In particular, using the code above to allocate two segments will allocate the same segment both times. According to FrostbittenKing on Github, this fixes The Witcher 2. If this somehow still causes problems, we could instead allocate a limit==0 32-bit data segment, but that seems rather ugly to me. Fixes: `41bdc78544` x86/tls: Validate TLS entries to protect espfix Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: stable@vger.kernel.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 21:45:07 +01:00
Andy Lutomirski	e30ab185c4	x86, tls, ldt: Stop checking lm in LDT_empty 32-bit programs don't have an lm bit in their ABI, so they can't reliably cause LDT_empty to return true without resorting to memset. They shouldn't need to do this. This should fix a longstanding, if minor, issue in all 64-bit kernels as well as a potential regression in the TLS hardening code. Fixes: `41bdc78544` x86/tls: Validate TLS entries to protect espfix Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/72a059de55e86ad5e2935c80aa91880ddf19d07c.1421954363.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 21:11:06 +01:00
Dave Hansen	c922228efe	x86, mpx: Fix potential performance issue on unmaps The 3.19 merge window saw some TLB modifications merged which caused a performance regression. They were fixed in commit 045bbb9fa. Once that fix was applied, I also noticed that there was a small but intermittent regression still present. It was not present consistently enough to bisect reliably, but I'm fairly confident that it came from (my own) MPX patches. The source was reading a relatively unused field in the mm_struct via arch_unmap. I also noted that this code was in the main instruction flow of do_munmap() and probably had more icache impact than we want. This patch does two things: 1. Adds a static (via Kconfig) and dynamic (via cpuid) check for MPX with cpu_feature_enabled(). This keeps us from reading that cacheline in the mm and trades it for a check of the global CPUID variables at least on CPUs without MPX. 2. Adds an unlikely() to ensure that the MPX call ends up out of the main instruction flow in do_munmap(). I've added a detailed comment about why this was done and why we want it even on systems where MPX is present. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: luto@amacapital.net Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20150108223021.AEEAB987@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 21:11:06 +01:00
Thomas Gleixner	374aab339f	x86/apic: Reuse apic_bsp_setup() for UP APIC setup Extend apic_bsp_setup() so the same code flow can be used for APIC_init_uniprocessor(). Folded Jiangs fix to provide proper ordering of the UP setup. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211704.084765674@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:56 +01:00
Thomas Gleixner	05f7e46d2a	x86/smpboot: Move apic init code to apic.c We better provide proper functions which implement the required code flow in the apic code rather than letting the smpboot code open code it. That allows to make more functions static and confines the APIC functionality to apic.c where it belongs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211703.907616730@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:56 +01:00
Thomas Gleixner	8686608336	x86/ioapic: Provide stub functions for IOAPIC%3Dn To avoid lots of ifdeffery provide proper stubs for setup_IO_APIC(), enable_IO_APIC() and setup_ioapic_dest(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211703.397170414@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	f77aa308e5	x86/smpboot: Move smpboot inlines to code No point for a separate header file. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211703.304126687@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	659006bf3a	x86/x2apic: Split enable and setup function enable_x2apic() is a convoluted unreadable mess because it is used for both enablement in early boot and for setup in cpu_init(). Split the code into x2apic_enable() for enablement and x2apic_setup() for setup of (secondary cpus). Make use of the new state tracking to simplify the logic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211703.129287153@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	44e25ff9e6	x86/x2apic: Disable x2apic from nox2apic setup There is no point in postponing the hardware disablement of x2apic. It can be disabled right away in the nox2apic setup function. Disable it right away and set the state to DISABLED . This allows to remove all the nox2apic conditionals all over the place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211703.051214090@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	55eae7de72	x86/x2apic: Move code in conditional region No point in having try_to_enable_x2apic() outside of the CONFIG_X86_X2APIC section and having inline functions and more ifdefs to deal with it. Move the code into the existing ifdef section and remove the inline cruft. Fixup the printk about not enabling interrupt remapping as suggested by Boris. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211702.795388613@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Thomas Gleixner	d524165cb8	x86/apic: Check x2apic early No point in delaying the x2apic detection for the CONFIG_X86_X2APIC=n case to enable_IR_x2apic(). We rather detect that before we try to setup anything there. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211702.702479404@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Thomas Gleixner	2ca5b40479	x86/ioapic: Check x2apic really The x2apic_preenabled flag is just a horrible hack and if X2APIC support is disabled it does not reflect the actual hardware state. Check the hardware instead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211702.541280622@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Thomas Gleixner	81a46dd824	x86/apic: Make x2apic_mode depend on CONFIG_X86_X2APIC No point in having a static variable around which is always 0. Let the compiler optimize code out if disabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211702.363274310@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Thomas Gleixner	8d80696060	x86/apic: Avoid open coded x2apic detection enable_IR_x2apic() grew a open coded x2apic detection. Implement a proper helper function which shares the code with the already existing x2apic_enabled(). Made it use rdmsrl_safe as suggested by Boris. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211702.285038186@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Borislav Petkov	cfaa790a3f	kvm: Fix CR3_PCID_INVD type on 32-bit arch/x86/kvm/emulate.c: In function ‘check_cr_write’: arch/x86/kvm/emulate.c:3552:4: warning: left shift count >= width of type rsvd = CR3_L_MODE_RESERVED_BITS & ~CR3_PCID_INVD; happens because sizeof(UL) on 32-bit is 4 bytes but we shift it 63 bits to the left. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-21 15:59:09 +01:00
Marcelo Tosatti	54750f2cf0	KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue SuSE's 2.6.16 kernel fails to boot if the delta between tsc_timestamp and rdtsc is larger than a given threshold: * If we get more than the below threshold into the future, we rerequest * the real time from the host again which has only little offset then * that we need to adjust using the TSC. * * For now that threshold is 1/5th of a jiffie. That should be good * enough accuracy for completely broken systems, but also give us swing * to not call out to the host all the time. */ #define PVCLOCK_DELTA_MAX ((1000000000ULL / HZ) / 5) Disable masterclock support (which increases said delta) in case the boot vcpu does not use MSR_KVM_SYSTEM_TIME_NEW. Upstreams kernels which support pvclock vsyscalls (and therefore make use of PVCLOCK_STABLE_BIT) use MSR_KVM_SYSTEM_TIME_NEW. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-20 20:38:39 +01:00
Oleg Nesterov	7575637ab2	x86, fpu: Fix math_state_restore() race with kernel_fpu_begin() math_state_restore() can race with kernel_fpu_begin() if irq comes right after __thread_fpu_begin(), __save_init_fpu() will overwrite fpu->state we are going to restore. Add 2 simple helpers, kernel_fpu_disable() and kernel_fpu_enable() which simply set/clear in_kernel_fpu, and change math_state_restore() to exclude kernel_fpu_begin() in between. Alternatively we could use local_irq_save/restore, but probably these new helpers can have more users. Perhaps they should disable/enable preemption themselves, in this case we can remove preempt_disable() in __restore_xstate_sig(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: matt.fleming@intel.com Cc: bp@suse.de Cc: pbonzini@redhat.com Cc: luto@amacapital.net Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Suresh Siddha <sbsiddha@gmail.com> Link: http://lkml.kernel.org/r/20150115192028.GD27332@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 13:53:07 +01:00
Oleg Nesterov	14e153ef75	x86, fpu: Introduce per-cpu in_kernel_fpu state interrupted_kernel_fpu_idle() tries to detect if kernel_fpu_begin() is safe or not. In particular it should obviously deny the nested kernel_fpu_begin() and this logic looks very confusing. If use_eager_fpu() == T we rely on a) __thread_has_fpu() check in interrupted_kernel_fpu_idle(), and b) on the fact that _begin() does __thread_clear_has_fpu(). Otherwise we demand that the interrupted task has no FPU if it is in kernel mode, this works because __kernel_fpu_begin() does clts() and interrupted_kernel_fpu_idle() checks X86_CR0_TS. Add the per-cpu "bool in_kernel_fpu" variable, and change this code to check/set/clear it. This allows to do more cleanups and fixes, see the next changes. The patch also moves WARN_ON_ONCE() under preempt_disable() just to make this_cpu_read() look better, this is not really needed. And in fact I think we should move it into __kernel_fpu_begin(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: matt.fleming@intel.com Cc: bp@suse.de Cc: pbonzini@redhat.com Cc: luto@amacapital.net Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Suresh Siddha <sbsiddha@gmail.com> Link: http://lkml.kernel.org/r/20150115191943.GB27332@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 13:53:07 +01:00
Andy Shevchenko	0e1540208e	x86: pmc_atom: Expose contents of PSS The PSS register reflects the power state of each island on SoC. It would be useful to know which of the islands is on or off at the momemnt. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Aubrey Li <aubrey.li@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Kumar P. Mahesh <mahesh.kumar.p@intel.com> Link: http://lkml.kernel.org/r/1421253575-22509-6-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 12:50:14 +01:00
Jiang Liu	8abb850a03	x86/xen: Override ACPI IRQ management callback __acpi_unregister_gsi Xen overrides __acpi_register_gsi and leaves __acpi_unregister_gsi as is. That means, an IRQ allocated by acpi_register_gsi_xen_hvm() or acpi_register_gsi_xen() will be freed by acpi_unregister_gsi_ioapic(), which may cause undesired effects. So override __acpi_unregister_gsi to NULL for safety. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Cc: Tony Luck <tony.luck@intel.com> Cc: xen-devel@lists.xenproject.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Graeme Gregory <graeme.gregory@linaro.org> Cc: Lv Zheng <lv.zheng@intel.com> Link: http://lkml.kernel.org/r/1421720467-7709-4-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 11:44:41 +01:00
Christian Borntraeger	bccec2a0a2	x86/spinlock: Leftover conversion ACCESS_ONCE->READ_ONCE commit `78bff1c868` ("x86/ticketlock: Fix spin_unlock_wait() livelock") introduced two additional ACCESS_ONCE cases in x86 spinlock.h. Lets change those as well. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com>	2015-01-19 14:14:20 +01:00
Paolo Bonzini	e108ff2f80	KVM: x86: switch to kvm_get_dirty_log_protect We now have a generic function that does most of the work of kvm_vm_ioctl_get_dirty_log, now use it. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>	2015-01-16 14:40:14 +01:00
Jiang Liu	c392f56c94	iommu/irq_remapping: Kill function irq_remapping_supported() and related code Simplify irq_remapping code by killing irq_remapping_supported() and related interfaces. Joerg posted a similar patch at https://lkml.org/lkml/2014/12/15/490, so assume an signed-off from Joerg. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Tested-by: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: iommu@lists.linux-foundation.org Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Oren Twaig <oren@scalemp.com> Link: http://lkml.kernel.org/r/1420615903-28253-14-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-15 11:24:23 +01:00
Thomas Gleixner	a1dafe857d	iommu, x86: Restructure setup of the irq remapping feature enable_IR_x2apic() calls setup_irq_remapping_ops() which by default installs the intel dmar remapping ops and then calls the amd iommu irq remapping prepare callback to figure out whether we are running on an AMD machine with irq remapping hardware. Right after that it calls irq_remapping_prepare() which pointlessly checks: if (!remap_ops \|\| !remap_ops->prepare) return -ENODEV; and then calls remap_ops->prepare() which is silly in the AMD case as it got called from setup_irq_remapping_ops() already a few microseconds ago. Simplify this and just collapse everything into irq_remapping_prepare(). The irq_remapping_prepare() remains still silly as it assigns blindly the intel ops, but that's not scope of this patch. The scope here is to move the preperatory work, i.e. memory allocations out of the atomic section which is required to enable irq remapping. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@alien8.de> Acked-and-tested-by: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: iommu@lists.linux-foundation.org Cc: Joerg Roedel <jroedel@suse.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Oren Twaig <oren@scalemp.com> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20141205084147.232633738@linutronix.de Link: http://lkml.kernel.org/r/1420615903-28253-2-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-15 11:24:22 +01:00
Arnd Bergmann	643165c8bb	uaccess: fix sparse warning on get/put_user for bitwise types At the moment, if p and x are both tagged as bitwise types, some of get_user(x, p), put_user(x, p), __get_user(x, p), __put_user(x, p) might produce a sparse warning on many architectures. This is a false positive: p on these architectures is loaded into long (typically using asm), then cast back to typeof(p). When typeof(p) is a bitwise type (which is uncommon), such a cast needs __force, otherwise sparse produces a warning. Some architectures already have the __force tag, add it where it's missing. I verified that adding these __force casts does not supress any useful warnings. Specifically, vhost wants to read/write bitwise types in userspace memory using get_user/put_user. At the moment this triggers sparse errors, since the value is passed through an integer. For example: __le32 __user p; __u32 x; both put_user(x, p); and get_user(x, p); should be safe, but produce warnings on some architectures. While there, I noticed that a bunch of architectures violated coding style rules within uaccess macros. Included patches to fix them up. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUtS+YAAoJECgfDbjSjVRpQ/QIAKXOc6tMXo+r/F32YC0Fv74G W4VKIk7u9XQNjOzez9i+xce75YBDBKHk5R9kLCfAg6Zew+6NRgbBV+QjGVB8dpot 2GxajcVhOySgaR45sGK3Ldg5yVz5ficqZEyYWKNgYeyMWJdlpvUk+4W5q15TiPZe u+C57/KzfRMDHyv3UkwAbqrkYGE0h7vXBi0BmOdCJlbKjG+6kFoVU/dAWsByDD5p q54ji8UdIkh2oyH5qhSbAwQN4Cg5N37Agw86HwltjQFJAVvV3yPRUsv7MQnpRB1+ hKlPXPUarNozGVV7OlcvGa9Lvz8m3a2rNd9+1tgHY0Fpia1JYAY2UdubS99fl5E= =LVcN -----END PGP SIGNATURE----- Merge tag 'uaccess_for_upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost into asm-generic Merge "uaccess: fix sparse warning on get/put_user for bitwise types" from Michael S. Tsirkin: At the moment, if p and x are both tagged as bitwise types, some of get_user(x, p), put_user(x, p), __get_user(x, p), __put_user(x, p) might produce a sparse warning on many architectures. This is a false positive: p on these architectures is loaded into long (typically using asm), then cast back to typeof(p). When typeof(p) is a bitwise type (which is uncommon), such a cast needs __force, otherwise sparse produces a warning. Some architectures already have the __force tag, add it where it's missing. I verified that adding these __force casts does not supress any useful warnings. Specifically, vhost wants to read/write bitwise types in userspace memory using get_user/put_user. At the moment this triggers sparse errors, since the value is passed through an integer. For example: __le32 __user p; __u32 x; both put_user(x, p); and get_user(x, p); should be safe, but produce warnings on some architectures. While there, I noticed that a bunch of architectures violated coding style rules within uaccess macros. Included patches to fix them up. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> * tag 'uaccess_for_upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (37 commits) sparc32: nocheck uaccess coding style tweaks sparc64: nocheck uaccess coding style tweaks xtensa: macro whitespace fixes sh: macro whitespace fixes parisc: macro whitespace fixes m68k: macro whitespace fixes m32r: macro whitespace fixes frv: macro whitespace fixes cris: macro whitespace fixes avr32: macro whitespace fixes arm64: macro whitespace fixes arm: macro whitespace fixes alpha: macro whitespace fixes blackfin: macro whitespace fixes sparc64: uaccess_64 macro whitespace fixes sparc32: uaccess_32 macro whitespace fixes avr32: whitespace fix sh: fix put_user sparse errors metag: fix put_user sparse errors ia64: fix put_user sparse errors ...	2015-01-14 23:17:49 +01:00
Denys Vlasenko	af9cfe270d	x86: entry_64.S: delete unused code A define, two macros and an unreferenced bit of assembly are gone. Acked-by: Borislav Petkov <bp@suse.de> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Oleg Nesterov <oleg@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Andy Lutomirski <luto@amacapital.net> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: X86 ML <x86@kernel.org> CC: Alexei Starovoitov <ast@plumgrid.com> CC: Will Drewry <wad@chromium.org> CC: Kees Cook <keescook@chromium.org> CC: linux-kernel@vger.kernel.org Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2015-01-13 14:00:33 -08:00
Michael S. Tsirkin	e182c570e9	x86/uaccess: fix sparse errors virtio wants to read bitwise types from userspace using get_user. At the moment this triggers sparse errors, since the value is passed through an integer. Fix that up using __force. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-13 15:22:59 +02:00
Jiri Kosina	b9dfe0bed9	livepatch: handle ancient compilers with more grace We are aborting a build in case when gcc doesn't support fentry on x86_64 (regs->ip modification can't really reliably work with mcount). This however breaks allmodconfig for people with older gccs that don't support -mfentry. Turn the build-time failure into runtime failure, resulting in the whole infrastructure not being initialized if CC_USING_FENTRY is unset. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>	2015-01-09 10:55:10 +01:00
Nadav Amit	c205fb7d7d	KVM: x86: #PF error-code on R/W operations is wrong When emulating an instruction that reads the destination memory operand (i.e., instructions without the Mov flag in the emulator), the operand is first read. If a page-fault is detected in this phase, the error-code which would be delivered to the VM does not indicate that the access that caused the exception is a write one. This does not conform with real hardware, and may cause the VM to enter the page-fault handler twice for no reason (once for read, once for write). Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-09 10:24:11 +01:00
Marcelo Tosatti	7c6a98dfa1	KVM: x86: add method to test PIR bitmap vector kvm_x86_ops->test_posted_interrupt() returns true/false depending whether 'vector' is set. Next patch makes use of this interface. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-08 22:45:18 +01:00
Eugene Korenevsky	e9ac033e6b	KVM: nVMX: Improve nested msr switch checking This patch improve checks required by Intel Software Developer Manual. - SMM MSRs are not allowed. - microcode MSRs are not allowed. - check x2apic MSRs only when LAPIC is in x2apic mode. - MSR switch areas must be aligned to 16 bytes. - address of first and last byte in MSR switch areas should not set any bits beyond the processor's physical-address width. Also it adds warning messages on failures during MSR switch. These messages are useful for people who debug their VMMs in nVMX. Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-08 22:45:15 +01:00
Wincy Van	ff651cb613	KVM: nVMX: Add nested msr load/restore algorithm Several hypervisors need MSR auto load/restore feature. We read MSRs from VM-entry MSR load area which specified by L1, and load them via kvm_set_msr in the nested entry. When nested exit occurs, we get MSRs via kvm_get_msr, writing them to L1`s MSR store area. After this, we read MSRs from VM-exit MSR load area, and load them via kvm_set_msr. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-08 22:45:14 +01:00
Luck, Tony	d4812e169d	x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks We now switch to the kernel stack when a machine check interrupts during user mode. This means that we can perform recovery actions in the tail of do_machine_check() Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2015-01-07 07:47:42 -08:00
Andy Lutomirski	bced35b65a	x86, traps: Add ist_begin_non_atomic and ist_end_non_atomic In some IST handlers, if the interrupt came from user mode, we can safely enable preemption. Add helpers to do it safely. This is intended to be used my the memory failure code in do_machine_check. Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2015-01-02 10:22:46 -08:00
Andy Lutomirski	83653c16da	x86: Clean up current_stack_pointer There's no good reason for it to be a macro, and x86_64 will want to use it, so it should be in a header. Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2015-01-02 10:22:46 -08:00
Andy Lutomirski	9592747538	x86, traps: Track entry into and exit from IST context We currently pretend that IST context is like standard exception context, but this is incorrect. IST entries from userspace are like standard exceptions except that they use per-cpu stacks, so they are atomic. IST entries from kernel space are like NMIs from RCU's perspective -- they are not quiescent states even if they interrupted the kernel during a quiescent state. Add and use ist_enter and ist_exit to track IST context. Even though x86_32 has no IST stacks, we track these interrupts the same way. This fixes two issues: - Scheduling from an IST interrupt handler will now warn. It would previously appear to work as long as we got lucky and nothing overwrote the stack frame. (I don't know of any bugs in this that would trigger the warning, but it's good to be on the safe side.) - RCU handling in IST context was dangerous. As far as I know, only machine checks were likely to trigger this, but it's good to be on the safe side. Note that the machine check handlers appears to have been missing any context tracking at all before this patch. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2015-01-02 10:22:46 -08:00
Ingo Molnar	2aba73a614	This is hopefully the last vdso fix for 3.19. It should be very safe (it just adds a volatile). I don't think it fixes an actual bug (the __getcpu calls in the pvclock code may not have been needed in the first place), but discussion on that point is ongoing. It also fixes a big performance issue in 3.18 and earlier in which the lsl instructions in vclock_gettime got hoisted so far up the function that they happened even when the function they were in was never called. n 3.19, the performance issue seems to be gone due to the whims of my compiler and some interaction with a branch that's now gone. I'll hopefully have a much bigger overhaul of the pvclock code for 3.20, but it needs careful review. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUmduGAAoJEK9N98ZeDfrk874H/RkkP+y6/DmdKVR1dTOUQW4u f1wPU0/sc5xywGjNcfR3XwUuyBJyd3s81WVaE5XXHfCHnbjG2Z4CNTqga27hL1D0 io01Q2s3dh1Y5c0cccVmJmyw//YVzMUOzGTNM9R0NKQNXmYUz6jgQaqk+wWORdD6 JXCU3/LI5VT0fjNPLj1M9l59eC2Qg/V4GqY2xRJ1AfbwkX1CFZTcWUPb+4FScVYv 9gds/vOoFg54MypVJD4SeIC9I8U0qcim9gV7gGFdzyDNCXS5J4P+02sEOFNu8oYy HVK1B0LXhswT08Ho1yRxXUhFxpqEGeGJvTlDTvwy+r/yuKE2AVBtlhLQBqMPhnY= =u3d2 -----END PGP SIGNATURE----- Merge tag 'pr-20141223-x86-vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/urgent Pull VDSO fix from Andy Lutomirski: "This is hopefully the last vdso fix for 3.19. It should be very safe (it just adds a volatile). I don't think it fixes an actual bug (the __getcpu calls in the pvclock code may not have been needed in the first place), but discussion on that point is ongoing. It also fixes a big performance issue in 3.18 and earlier in which the lsl instructions in vclock_gettime got hoisted so far up the function that they happened even when the function they were in was never called. n 3.19, the performance issue seems to be gone due to the whims of my compiler and some interaction with a branch that's now gone. I'll hopefully have a much bigger overhaul of the pvclock code for 3.20, but it needs careful review." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-01 22:21:22 +01:00
Andy Lutomirski	1ddf0b1b11	x86, vdso: Use asm volatile in __getcpu In Linux 3.18 and below, GCC hoists the lsl instructions in the pvclock code all the way to the beginning of __vdso_clock_gettime, slowing the non-paravirt case significantly. For unknown reasons, presumably related to the removal of a branch, the performance issue is gone as of `e76b027e64` x86,vdso: Use LSL unconditionally for vgetcpu but I don't trust GCC enough to expect the problem to stay fixed. There should be no correctness issue, because the __getcpu calls in __vdso_vlock_gettime were never necessary in the first place. Note to stable maintainers: In 3.18 and below, depending on configuration, gcc 4.9.2 generates code like this: 9c3: 44 0f 03 e8 lsl %ax,%r13d 9c7: 45 89 eb mov %r13d,%r11d 9ca: 0f 03 d8 lsl %ax,%ebx This patch won't apply as is to any released kernel, but I'll send a trivial backported version if needed. Fixes: `51c19b4f59` x86: vdso: pvclock gettime support Cc: stable@vger.kernel.org # 3.8+ Cc: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2014-12-23 13:05:30 -08:00
Borislav Petkov	6ca7a8a150	x86/fpu: Use a symbolic name for asm operand Fix up the else-case in fpu_fxsave() which seems like it has been overlooked. Correct comment style in restore_fpu_checking() while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1419170543-11393-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-12-23 10:41:12 +01:00
Li Bin	b5bfc51707	livepatch: move x86 specific ftrace handler code to arch/x86 The execution flow redirection related implemention in the livepatch ftrace handler is depended on the specific architecture. This patch introduces klp_arch_set_pc(like kgdb_arch_set_pc) interface to change the pt_regs. Signed-off-by: Li Bin <huawei.libin@huawei.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2014-12-22 15:40:49 +01:00
Seth Jennings	b700e7f03d	livepatch: kernel: add support for live patching This commit introduces code for the live patching core. It implements an ftrace-based mechanism and kernel interface for doing live patching of kernel and kernel module functions. It represents the greatest common functionality set between kpatch and kgraft and can accept patches built using either method. This first version does not implement any consistency mechanism that ensures that old and new code do not run together. In practice, ~90% of CVEs are safe to apply in this way, since they simply add a conditional check. However, any function change that can not execute safely with the old version of the function can _not_ be safely applied in this version. [ jkosina@suse.cz: due to the number of contributions that got folded into this original patch from Seth Jennings, add SUSE's copyright as well, as discussed via e-mail ] Signed-off-by: Seth Jennings <sjenning@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2014-12-22 15:40:49 +01:00
Linus Torvalds	60815cf2e0	kernel: Provide READ_ONCE and ASSIGN_ONCE As discussed on LKML http://marc.info/?i=54611D86.4040306%40de.ibm.com ACCESS_ONCE might fail with specific compilers for non-scalar accesses. Here is a set of patches to tackle that problem. The first patch introduce READ_ONCE and ASSIGN_ONCE. If the data structure is larger than the machine word size memcpy is used and a warning is emitted. The next patches fix up several in-tree users of ACCESS_ONCE on non-scalar types. This merge does not yet contain a patch that forces ACCESS_ONCE to work only on scalar types. This is targetted for the next merge window as Linux next already contains new offenders regarding ACCESS_ONCE vs. non-scalar types. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJUkrVGAAoJEBF7vIC1phx8stkP/2LmN5y6LOseoEW06xa5MX4m cbIKsZNtsGHl7EDcTzzuWs6Sq5/Cj7V3yzeBF7QGbUKOqvFWU3jvpUBCCfjMg37C 77/Vf0ZPrxTXXxeJ4Ykdy2CGvuMtuYY9TWkrRNKmLU0xex7lGblEzCt9z6+mZviw 26/DN8ctjkHRvIUAi+7RfQBBc3oSMYAC1mzxYKBAsAFLV+LyFmsGU/4iofZMAsdt XFyVXlrLn0Bjx/MeceGkOlMDiVx4FnfccfFaD4hhuTLBJXWitkUK/MRa4JBiXWzH agY8942A8/j9wkI2DFp/pqZYqA/sTXLndyOWlhE//ZSti0n0BSJaOx3S27rTLkAc 5VmZEVyIrS3hyOpyyAi0sSoPkDnjeCHmQg9Rqn34/poKLd7JDrW2UkERNCf/T3eh GI2rbhAlZz3v5mIShn8RrxzslWYmOObpMr3HYNUdRk8YUfTf6d6aZ3txHp2nP4mD VBAEzsvP9rcVT2caVhU2dnBzeaZAj3zeDxBtjcb3X2osY9tI7qgLc9Fa/fWKgILk 2evkLcctsae2mlLNGHyaK3Dm/ZmYJv+57MyaQQEZNfZZgeB1y4k0DkxH4w1CFmCi s8XlH5voEHgnyjSQXXgc/PNVlkPAKr78ZyTiAfiKmh8rpe41/W4hGcgao7L9Lgiu SI0uSwKibuZt4dHGxQuG =IQ5o -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux Pull ACCESS_ONCE cleanup preparation from Christian Borntraeger: "kernel: Provide READ_ONCE and ASSIGN_ONCE As discussed on LKML http://marc.info/?i=54611D86.4040306%40de.ibm.com ACCESS_ONCE might fail with specific compilers for non-scalar accesses. Here is a set of patches to tackle that problem. The first patch introduce READ_ONCE and ASSIGN_ONCE. If the data structure is larger than the machine word size memcpy is used and a warning is emitted. The next patches fix up several in-tree users of ACCESS_ONCE on non-scalar types. This does not yet contain a patch that forces ACCESS_ONCE to work only on scalar types. This is targetted for the next merge window as Linux next already contains new offenders regarding ACCESS_ONCE vs. non-scalar types" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux: s390/kvm: REPLACE barrier fixup with READ_ONCE arm/spinlock: Replace ACCESS_ONCE with READ_ONCE arm64/spinlock: Replace ACCESS_ONCE READ_ONCE mips/gup: Replace ACCESS_ONCE with READ_ONCE x86/gup: Replace ACCESS_ONCE with READ_ONCE x86/spinlock: Replace ACCESS_ONCE with READ_ONCE mm: replace ACCESS_ONCE with READ_ONCE or barriers kernel: Provide READ_ONCE and ASSIGN_ONCE	2014-12-20 16:48:59 -08:00
Srinidhi Kasagar	e7ddf4b7b3	cpufreq: Add SFI based cpufreq driver support This adds the SFI based cpu freq driver for some of the Intel's Silvermont based Atom architectures like Z34xx and Z35xx. Signed-off-by: Rudramuni, Vishwesh M <vishwesh.m.rudramuni@intel.com> Signed-off-by: Srinidhi Kasagar <srinidhi.kasagar@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Len Brown <len.brown@intel.com>	2014-12-20 13:37:37 -05:00
Linus Torvalds	e589c9e13a	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Thomas Gleixner: "After stopping the full x86/apic branch, I took some time to go through the first block of patches again, which are mostly cleanups and preparatory work for the irqdomain conversion and ioapic hotplug support. Unfortunaly one of the real problematic commits was right at the beginning, so I rebased this portion of the pending patches without the offenders. It would be great to get this into 3.19. That makes reworking the problematic parts simpler. The usual tip testing did not unearth any issues and it is fully bisectible now. I'm pretty confident that this wont affect the calmness of the xmas season. Changes: - Split the convoluted io_apic.c code into domain specific parts (vector, ioapic, msi, htirq) - Introduce proper helper functions to retrieve irq specific data instead of open coded dereferencing of pointers - Preparatory work for ioapic hotplug and irqdomain conversion - Removal of the non functional pci-ioapic driver - Removal of unused irq entry stubs - Make native_smp_prepare_cpus() preemtible to avoid GFP_ATOMIC allocations for everything which is called from there. - Small cleanups and fixes" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) iommu/amd: Use helpers to access irq_cfg data structure associated with IRQ iommu/vt-d: Use helpers to access irq_cfg data structure associated with IRQ x86: irq_remapping: Use helpers to access irq_cfg data structure associated with IRQ x86, irq: Use helpers to access irq_cfg data structure associated with IRQ x86, irq: Make MSI and HT_IRQ indepenent of X86_IO_APIC x86, irq: Move IRQ initialization routines from io_apic.c into vector.c x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h x86, irq: Move HT IRQ related code from io_apic.c into htirq.c x86, irq: Move PCI MSI related code from io_apic.c into msi.c x86, irq: Replace printk(KERN_LVL) with pr_lvl() utilities x86, irq: Make UP version of irq_complete_move() an inline stub x86, irq: Move local APIC related code from io_apic.c into vector.c x86, irq: Introduce helpers to access struct irq_cfg x86, irq: Protect __clear_irq_vector() with vector_lock x86, irq: Rename local APIC related functions in io_apic.c as apic_xxx() x86, irq: Refine hw_irq.h to prepare for irqdomain support x86, irq: Convert irq_2_pin list to generic list x86, irq: Kill useless parameter 'irq_attr' of IO_APIC_get_PCI_irq_vector() x86, irq, acpi: Get rid of special handling of GSI for ACPI SCI x86, irq: Introduce helper to check whether an IOAPIC has been registered ...	2014-12-19 14:02:02 -08:00
Linus Torvalds	1092b596a5	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "This contains a single TLS ABI validation fix from Andy Lutomirski" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tls: Don't validate lm in set_thread_area() after all	2014-12-19 13:18:31 -08:00
Linus Torvalds	66dcff86ba	3.19 changes for KVM: - spring cleaning: removed support for IA64, and for hardware-assisted virtualization on the PPC970 - ARM, PPC, s390 all had only small fixes For x86: - small performance improvements (though only on weird guests) - usual round of hardware-compliancy fixes from Nadav - APICv fixes - XSAVES support for hosts and guests. XSAVES hosts were broken because the (non-KVM) XSAVES patches inadvertently changed the KVM userspace ABI whenever XSAVES was enabled; hence, this part is going to stable. Guest support is just a matter of exposing the feature and CPUID leaves support. Right now KVM is broken for PPC BookE in your tree (doesn't compile). I'll reply to the pull request with a patch, please apply it either before the pull request or in the merge commit, in order to preserve bisectability somewhat. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJUkpg+AAoJEL/70l94x66DUmoH/jzXYkptSW9NGgm79KqxGJlD lzLnLBkitVvx++Mz5YBhdJEhKKLUlCtifFT1zPJQ/pthQhIRSaaAwZyNGgUs5w5x yMGKHiPQFyZRbmQtZhCInW0BftJoYHHciO3nUfHCZnp34My9MP2D55W7/z+fYFfQ DuqBSE9ThyZJtZ4zh8NRA9fCOeuqwVYRyoBs820Wbsh4cpIBoIK63Dg7k+CLE+ZV MZa/mRL6bAfsn9W5bnOUAgHJ3SPznnWbO3/g0aV+roL/5pffblprJx9lKNR08xUM 6hDFLop2gDehDJesDkY/o8Ckp1hEouvfsVpSShry4vcgtn0hgh2O5/6Orbmj6vE= =Zwq1 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM update from Paolo Bonzini: "3.19 changes for KVM: - spring cleaning: removed support for IA64, and for hardware- assisted virtualization on the PPC970 - ARM, PPC, s390 all had only small fixes For x86: - small performance improvements (though only on weird guests) - usual round of hardware-compliancy fixes from Nadav - APICv fixes - XSAVES support for hosts and guests. XSAVES hosts were broken because the (non-KVM) XSAVES patches inadvertently changed the KVM userspace ABI whenever XSAVES was enabled; hence, this part is going to stable. Guest support is just a matter of exposing the feature and CPUID leaves support" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits) KVM: move APIC types to arch/x86/ KVM: PPC: Book3S: Enable in-kernel XICS emulation by default KVM: PPC: Book3S HV: Improve H_CONFER implementation KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register KVM: PPC: Book3S HV: Remove code for PPC970 processors KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions KVM: PPC: Book3S HV: Simplify locking around stolen time calculations arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function arch: powerpc: kvm: book3s_pr.c: Remove unused function arch: powerpc: kvm: book3s.c: Remove some unused functions arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked KVM: PPC: Book3S HV: ptes are big endian KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI KVM: PPC: Book3S HV: Fix KSM memory corruption KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI KVM: PPC: Book3S HV: Fix computation of tlbie operand KVM: PPC: Book3S HV: Add missing HPTE unlock KVM: PPC: BookE: Improve irq inject tracepoint arm/arm64: KVM: Require in-kernel vgic for the arch timers ...	2014-12-18 16:05:28 -08:00
Andy Lutomirski	3fb2f4237b	x86/tls: Don't validate lm in set_thread_area() after all It turns out that there's a lurking ABI issue. GCC, when compiling this in a 32-bit program: struct user_desc desc = { .entry_number = idx, .base_addr = base, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0, }; will leave .lm uninitialized. This means that anything in the kernel that reads user_desc.lm for 32-bit tasks is unreliable. Revert the .lm check in set_thread_area(). The value never did anything in the first place. Fixes: `0e58af4e1d` ("x86/tls: Disallow unusual TLS segments") Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org # Only if `0e58af4e1d` is backported Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-12-18 12:12:26 +01:00
Christian Borntraeger	4f9d1382e6	x86/spinlock: Replace ACCESS_ONCE with READ_ONCE ACCESS_ONCE does not work reliably on non-scalar types. For example gcc 4.6 and 4.7 might remove the volatile tag for such accesses during the SRA (scalar replacement of aggregates) step (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145) Change the spinlock code to replace ACCESS_ONCE with READ_ONCE. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-12-18 09:54:38 +01:00
Paolo Bonzini	cb5281a572	KVM: move APIC types to arch/x86/ They are not used anymore by IA64, move them away. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-18 09:39:51 +01:00
Linus Torvalds	eb64c3c6cd	xen: additional features for 3.19-rc0 - Linear p2m for x86 PV guests which simplifies the p2m code, improves performance and will allow for > 512 GB PV guests in the future. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJUjx7OAAoJEFxbo/MsZsTRXLIH/ishF/xDCL6F5r0I0SKDuaz5 C/BediDcFzbzh4/t3x2PrPooHk4gPmeyIg688ZGgBAxHRXC5OJ2U5tdtZ/qUCnwf 0J1pdp/yoAOVRJT+Sax10lN4+G8YV7+6Ptikz0C7glXBAg8SgFL3Y6tfBS0jNwYR wQph09S9n7gMZTodSBLbb0ymtJMhl16DrETJsYV73sU7bAL5sFDVkMQvY3SxkusX GNFeALfqM0cSK9mDI6O9avGJKoIdKlzt7VWHdlc+yKTlQsoyg/cSH3AaihhG6af9 IElRxwH9Z40VFLKip0gNMOIrUwAjFGSw6N+Uhik27tlmvfI3Dll/+gsMz/5sHc8= =OyoK -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.19-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull additional xen update from David Vrabel: "Xen: additional features for 3.19-rc0 - Linear p2m for x86 PV guests which simplifies the p2m code, improves performance and will allow for > 512 GB PV guests in the future. A last-minute, configuration specific issue was discovered with this change which is why it was not included in my previous pull request. This is now been fixed and tested" * tag 'stable/for-linus-3.19-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: switch to post-init routines in xen mmu.c earlier Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single" xen: annotate xen_set_identity_and_remap_chunk() with __init xen: introduce helper functions to do safe read and write accesses xen: Speed up set_phys_to_machine() by using read-only mappings xen: switch to linear virtual mapped sparse p2m list xen: Hide get_phys_to_machine() to be able to tune common path x86: Introduce function to get pmd entry pointer xen: Delay invalidating extra memory xen: Delay m2p_override initialization xen: Delay remapping memory of pv-domain xen: use common page allocation function in p2m.c xen: Make functions static xen: fix some style issues in p2m.c	2014-12-16 13:23:03 -08:00
Jiang Liu	11d686e956	x86, irq: Move IRQ initialization routines from io_apic.c into vector.c Move IRQ initialization routines from io_apic.c into vector.c, preparing for enabling hierarchy irqdomain. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Grant Likely <grant.likely@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1414397531-28254-15-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-16 14:08:17 +01:00
Jiang Liu	8643e28da2	x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h Clean up code by moving IOAPIC related declarations from hw_irq.h into io_apic.h. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Aubrey <aubrey.li@linux.intel.com> Cc: Ryan Desfosses <ryan@desfo.org> Cc: Quentin Lambert <lambert.quentin@gmail.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: http://lkml.kernel.org/r/1414397531-28254-14-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-16 14:08:17 +01:00
Jiang Liu	443809828c	x86, irq: Move PCI MSI related code from io_apic.c into msi.c Create arch/x86/kernel/apic/msi.c to host MSI related code, preparing for enabling hierarchy irqdomain. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1414397531-28254-12-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-16 14:08:17 +01:00
Thomas Gleixner	f0e5bf7583	x86, irq: Make UP version of irq_complete_move() an inline stub No point for having an empty real function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-16 14:08:16 +01:00
Jiang Liu	74afab7af7	x86, irq: Move local APIC related code from io_apic.c into vector.c Create arch/x86/kernel/apic/vector.c to host local APIC related code, prepare for making MSI/HT_IRQ independent of IOAPIC. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1414397531-28254-10-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-16 14:08:16 +01:00
Jiang Liu	55a0e2b122	x86, irq: Introduce helpers to access struct irq_cfg Change irq_cfg() from static to extern, also introduce helper function irqd_cfg(). Later we can rewrite these two helpers when enabling hierarchy irqdomain. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1414397531-28254-9-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-16 14:08:16 +01:00
Jiang Liu	cb39288cd6	x86, irq: Rename local APIC related functions in io_apic.c as apic_xxx() Rename local APIC related functions in io_apic.c as apic_xxx() instead of ioapic_xxx(), later they will be moved into separate file. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1414397531-28254-7-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-16 14:08:16 +01:00
Jiang Liu	26011eee04	x86, irq: Refine hw_irq.h to prepare for irqdomain support Refine hw_irq.h to prepare for irqdomain support by: 1) guarding common APIC related interfaces with CONFIG_X86_LOCAL_APIC 2) guarding interrupt remapping related interfaces with CONFIG_IRQ_REMAP 3) guarding IOAPIC related interfaces with CONFIG_X86_IO_APIC No functional changes. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1414397531-28254-6-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-16 14:08:16 +01:00
Yinghai Lu	a178b87b20	x86, irq: Convert irq_2_pin list to generic list Use generic list to replace private list implementation so we can use the existing helper functions. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1414397531-28254-5-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Joerg Roedel <joro@8bytes.org>	2014-12-16 14:08:16 +01:00
Jiang Liu	25d0d35ed7	x86, irq: Kill useless parameter 'irq_attr' of IO_APIC_get_PCI_irq_vector() None of the callers requires irq_attr to be filled in. IO_APIC_get_PCI_irq_vector() does not do anything useful with it either. Remove the parameter and fixup the call sites. [ tglx: Massaged changelog ] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Ryan Desfosses <ryan@desfo.org> Cc: Quentin Lambert <lambert.quentin@gmail.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: http://lkml.kernel.org/r/1414397531-28254-4-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-16 14:08:16 +01:00
Jiang Liu	e89900c9ad	x86, irq: Introduce helper to check whether an IOAPIC has been registered Introduce acpi_ioapic_registered() to check whether an IOAPIC has already been registered, it will be used when enabling IOAPIC hotplug. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Len Brown <len.brown@intel.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1414387308-27148-18-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-16 14:08:15 +01:00
Jiang Liu	15516a3b86	x86, irq, ACPI: Implement interfaces to support ACPI based IOAPIC hot-removal Implement acpi_unregister_ioapic() to support ACPI based IOAPIC hot-removal. An IOAPIC could only be removed when all its pins are unused. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Grant Likely <grant.likely@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1414387308-27148-17-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-16 14:08:15 +01:00
Jiang Liu	35ef9c941c	x86, irq: Refine mp_register_ioapic() to prepare for IOAPIC hotplug Refine mp_register_ioapic() to prepare for IOAPIC hotplug by: 1) change return value from void to int. 2) check for gsi range conflicts 3) check for IOAPIC physical address conflicts 4) enhance the way to allocate IOAPIC index Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Grant Likely <grant.likely@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1414387308-27148-14-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-16 14:08:15 +01:00
Jiang Liu	67dc5e701f	x86, irq: Remove __init marker for functions will be used by IOAPIC hotplug Remove __init marker for functions which will be used by IOAPIC hotplug at runtime. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Grant Likely <grant.likely@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1414387308-27148-12-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-16 14:08:15 +01:00
Jan Beulich	2414e021ac	x86: Avoid building unused IRQ entry stubs When X86_LOCAL_APIC (i.e. unconditionally on x86-64), first_system_vector will never end up being higher than LOCAL_TIMER_VECTOR (0xef), and hence building stubs for vectors 0xef...0xff is pointlessly reducing code density. Deal with this at build time already. Taking into consideration that X86_64 implies X86_LOCAL_APIC, also simplify (and hence make easier to read and more consistent with the change done here) some #if-s in arch/x86/kernel/irqinit.c. While we could further improve the packing of the IRQ entry stubs (the four ones now left in the last set could be fit into the four padding bytes each of the final four sets have) this doesn't seem to provide any real benefit: Both irq_entries_start and common_interrupt getting cache line aligned, eliminating the 30th set would just produce 32 bytes of padding between the 29th and common_interrupt. [ tglx: Folded lguest fix from Dan Carpenter ] Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: lguest@lists.ozlabs.org Cc: Rusty Russell <rusty@rustcorp.com.au> Link: http://lkml.kernel.org/r/54574D5F0200007800044389@mail.emea.novell.com Link: http://lkml.kernel.org/r/20141115185718.GB6530@mwanda Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-16 14:08:14 +01:00
Jan Beulich	e106798259	x86: irq: Fix placement of mp_should_keep_irq() While `f3761db164` ("x86, irq: Fix build error caused by 9eabc99a635a77cbf09") addressed the original build problem, declaration, inline stub, and definition still seem misplaced: It isn't really IO-APIC related, and it's being used solely in arch/x86/pci/. This also means stubbing it out when !CONFIG_X86_IO_APIC was at least questionable. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Link: http://lkml.kernel.org/r/545747BE020000780004436E@mail.emea.novell.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-16 14:08:14 +01:00
Jiang Liu	e32c67e0cb	x86, irq: Provide empty send_cleanup_vector() stub for UP builds Define an empty send_cleanup_vector() for UP kernel to fix link error of undefined reference, which is used by uv_irq and irq_remapping. [ tglx: Made it an inline stub and moved it ahead of the file split changes ] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1414397531-28254-21-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-16 14:08:14 +01:00
Linus Torvalds	536e89ee53	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes (mainly Andy's TLS fixes), plus a cleanup" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tls: Disallow unusual TLS segments x86/tls: Validate TLS entries to protect espfix MAINTAINERS: Add me as x86 VDSO submaintainer x86/asm: Unify segment selector defines x86/asm: Guard against building the 32/64-bit versions of the asm-offsets*.c file directly x86_64, switch_to(): Load TLS descriptors before switching DS and ES x86/mm: Use min() instead of min_t() in the e820 printout code x86/mm: Fix zone ranges boot printout x86/doc: Update documentation after file shuffling	2014-12-14 11:51:50 -08:00
Linus Torvalds	f96fe22567	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull another networking update from David Miller: "Small follow-up to the main merge pull from the other day: 1) Alexander Duyck's DMA memory barrier patch set. 2) cxgb4 driver fixes from Karen Xie. 3) Add missing export of fixed_phy_register() to modules, from Mark Salter. 4) DSA bug fixes from Florian Fainelli" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits) net/macb: add TX multiqueue support for gem linux/interrupt.h: remove the definition of unused tasklet_hi_enable jme: replace calls to redundant function net: ethernet: davicom: Allow to select DM9000 for nios2 net: ethernet: smsc: Allow to select SMC91X for nios2 cxgb4: Add support for QSA modules libcxgbi: fix freeing skb prematurely cxgb4i: use set_wr_txq() to set tx queues cxgb4i: handle non-pdu-aligned rx data cxgb4i: additional types of negative advice cxgb4/cxgb4i: set the max. pdu length in firmware cxgb4i: fix credit check for tx_data_wr cxgb4i: fix tx immediate data credit check net: phy: export fixed_phy_register() fib_trie: Fix trie balancing issue if new node pushes down existing node vlan: Add ability to always enable TSO/UFO r8169:update rtl8168g pcie ephy parameter net: dsa: bcm_sf2: force link for all fixed PHY devices fm10k/igb/ixgbe: Use dma_rmb on Rx descriptor reads r8169: Use dma_rmb() and dma_wmb() for DescOwn checks ...	2014-12-12 16:11:12 -08:00
Linus Torvalds	9d050966e2	xen: features and fixes for 3.19-rc0 - Fully support non-coherent devices on ARM by introducing the mechanisms to request the hypervisor to perform the required cache maintainance operations. - A number of pciback bug fixes and cleanups. Notably a deadlock fix if a PCI device was manually uunbound and a fix for incorrectly restoring state after a function reset. - In x86 PVHVM guests, use the APIC for interrupts if this has been virtualized by the hardware. This reduces the number of interrupt- related VM exits on such hardware. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJUiYb+AAoJEFxbo/MsZsTRwmEH+gNaJz5r8gIJlq8Q51+nOIs4 Gw6HdjUB5MOT47vDV4treEOx0Bk8hYTfgWUWvAC81JMJ1sMWOVrUGuG/0lmzaomW zXvSk+o0n4LafwEhHb8LIccZMbaH7f9o3PNdNchrTkPrIl8Gf2nmBXCkDsT4mRye 5ZFpc4ntgBrznh3baPYDS8PCAmlyZ0uVEnz1ofYI6S80dC13siEiPG0c9TrNEKzO glhvgCRmR0C4ZNLblM36HWBEqrdLuGCoNJSH+7okygyP2TLD3aO4R+9aD5JWYNdf fO2WmivX/zK+UGVAElrLx+rb8R2dv3ddeaE5piZhIBUieopIWJd32L3LhQORdtc= =N6DP -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.19-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen features and fixes from David Vrabel: - Fully support non-coherent devices on ARM by introducing the mechanisms to request the hypervisor to perform the required cache maintainance operations. - A number of pciback bug fixes and cleanups. Notably a deadlock fix if a PCI device was manually uunbound and a fix for incorrectly restoring state after a function reset. - In x86 PVHVM guests, use the APIC for interrupts if this has been virtualized by the hardware. This reduces the number of interrupt- related VM exits on such hardware. * tag 'stable/for-linus-3.19-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (26 commits) Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single" xen/pci: Use APIC directly when APIC virtualization hardware is available xen/pci: Defer initialization of MSI ops on HVM guests xen-pciback: drop SR-IOV VFs when PF driver unloads xen/pciback: Restore configuration space when detaching from a guest. PCI: Expose pci_load_saved_state for public consumption. xen/pciback: Remove tons of dereferences xen/pciback: Print out the domain owning the device. xen/pciback: Include the domain id if removing the device whilst still in use driver core: Provide an wrapper around the mutex to do lockdep warnings xen/pciback: Don't deadlock when unbinding. swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single swiotlb-xen: call xen_dma_sync_single_for_device when appropriate swiotlb-xen: remove BUG_ON in xen_bus_to_phys swiotlb-xen: pass dev_addr to xen_dma_unmap_page and xen_dma_sync_single_for_cpu xen/arm: introduce GNTTABOP_cache_flush xen/arm/arm64: introduce xen_arch_need_swiotlb xen/arm/arm64: merge xen/mm32.c into xen/mm.c xen/arm: use hypercall to flush caches in map_page xen: add a dma_addr_t dev_addr argument to xen_dma_map_page ...	2014-12-11 18:15:33 -08:00
Alexander Duyck	1077fa36f2	arch: Add lightweight memory barriers dma_rmb() and dma_wmb() There are a number of situations where the mandatory barriers rmb() and wmb() are used to order memory/memory operations in the device drivers and those barriers are much heavier than they actually need to be. For example in the case of PowerPC wmb() calls the heavy-weight sync instruction when for coherent memory operations all that is really needed is an lsync or eieio instruction. This commit adds a coherent only version of the mandatory memory barriers rmb() and wmb(). In most cases this should result in the barrier being the same as the SMP barriers for the SMP case, however in some cases we use a barrier that is somewhere in between rmb() and smp_rmb(). For example on ARM the rmb barriers break down as follows: Barrier Call Explanation --------- -------- ---------------------------------- rmb() dsb() Data synchronization barrier - system dma_rmb() dmb(osh) data memory barrier - outer sharable smp_rmb() dmb(ish) data memory barrier - inner sharable These new barriers are not as safe as the standard rmb() and wmb(). Specifically they do not guarantee ordering between coherent and incoherent memories. The primary use case for these would be to enforce ordering of reads and writes when accessing coherent memory that is shared between the CPU and a device. It may also be noted that there is no dma_mb(). Most architectures don't provide a good mechanism for performing a coherent only full barrier without resorting to the same mechanism used in mb(). As such there isn't much to be gained in trying to define such a function. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: David Miller <davem@davemloft.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-11 21:15:06 -05:00
Alexander Duyck	8a44971841	arch: Cleanup read_barrier_depends() and comments This patch is meant to cleanup the handling of read_barrier_depends and smp_read_barrier_depends. In multiple spots in the kernel headers read_barrier_depends is defined as "do {} while (0)", however we then go into the SMP vs non-SMP sections and have the SMP version reference read_barrier_depends, and the non-SMP define it as yet another empty do/while. With this commit I went through and cleaned out the duplicate definitions and reduced the number of definitions down to 2 per header. In addition I moved the 50 line comments for the macro from the x86 and mips headers that defined it as an empty do/while to those that were actually defining the macro, alpha and blackfin. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-11 21:15:05 -05:00
Linus Torvalds	70e71ca0af	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) New offloading infrastructure and example 'rocker' driver for offloading of switching and routing to hardware. This work was done by a large group of dedicated individuals, not limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend, Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu 2) Start making the networking operate on IOV iterators instead of modifying iov objects in-situ during transfers. Thanks to Al Viro and Herbert Xu. 3) A set of new netlink interfaces for the TIPC stack, from Richard Alpe. 4) Remove unnecessary looping during ipv6 routing lookups, from Martin KaFai Lau. 5) Add PAUSE frame generation support to gianfar driver, from Matei Pavaluca. 6) Allow for larger reordering levels in TCP, which are easily achievable in the real world right now, from Eric Dumazet. 7) Add a variable of napi_schedule that doesn't need to disable cpu interrupts, from Eric Dumazet. 8) Use a doubly linked list to optimize neigh_parms_release(), from Nicolas Dichtel. 9) Various enhancements to the kernel BPF verifier, and allow eBPF programs to actually be attached to sockets. From Alexei Starovoitov. 10) Support TSO/LSO in sunvnet driver, from David L Stevens. 11) Allow controlling ECN usage via routing metrics, from Florian Westphal. 12) Remote checksum offload, from Tom Herbert. 13) Add split-header receive, BQL, and xmit_more support to amd-xgbe driver, from Thomas Lendacky. 14) Add MPLS support to openvswitch, from Simon Horman. 15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen Klassert. 16) Do gro flushes on a per-device basis using a timer, from Eric Dumazet. This tries to resolve the conflicting goals between the desired handling of bulk vs. RPC-like traffic. 17) Allow userspace to ask for the CPU upon what a packet was received/steered, via SO_INCOMING_CPU. From Eric Dumazet. 18) Limit GSO packets to half the current congestion window, from Eric Dumazet. 19) Add a generic helper so that all drivers set their RSS keys in a consistent way, from Eric Dumazet. 20) Add xmit_more support to enic driver, from Govindarajulu Varadarajan. 21) Add VLAN packet scheduler action, from Jiri Pirko. 22) Support configurable RSS hash functions via ethtool, from Eyal Perry. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits) Fix race condition between vxlan_sock_add and vxlan_sock_release net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header net/mlx4: Add support for A0 steering net/mlx4: Refactor QUERY_PORT net/mlx4_core: Add explicit error message when rule doesn't meet configuration net/mlx4: Add A0 hybrid steering net/mlx4: Add mlx4_bitmap zone allocator net/mlx4: Add a check if there are too many reserved QPs net/mlx4: Change QP allocation scheme net/mlx4_core: Use tasklet for user-space CQ completion events net/mlx4_core: Mask out host side virtualization features for guests net/mlx4_en: Set csum level for encapsulated packets be2net: Export tunnel offloads only when a VxLAN tunnel is created gianfar: Fix dma check map error when DMA_API_DEBUG is enabled cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call net: fec: only enable mdio interrupt before phy device link up net: fec: clear all interrupt events to support i.MX6SX net: fec: reset fep link status in suspend function net: sock: fix access via invalid file descriptor net: introduce helper macro for_each_cmsghdr ...	2014-12-11 14:27:06 -08:00
Linus Torvalds	bae41e45b7	sound updates for 3.19-rc1 This became a fairly large pull request. In addition to the usual driver updates / fixes, there have been a high amount of cleanups in ASoC area, as well as control API helpers and kernel documentations fixes touching through the whole tree. In the driver side, the biggest changes are the support for new Intel SoC found on new x86 machines, and the updates of FireWire dice and oxfw drivers. Some remarkable items are below: * ALSA core - PCM mmap code cleanup, removal of arch-dependent codes - PCM xrun injection support - PCM hwptr tracepoint support - Refactoring of snd_pcm_action(), simplification of PCM locking - Robustified sequecner auto-load functionality - New control API helpers and lots of cleanups along with them - Lots of kerneldoc fixes and cleanups * USB-audio - The mixer resume code was largely rewritten, and the devices with quirks are resumed properly. - New hardware support: Focusrite Scarlett, Digidesign Mbox1, Denon/Marantz DACs, Zoom R16/24 * FireWire - DICE driver updates with better duplex and sync support, including MIDI support - New OXFW driver for Oxford Semiconductor FW970/971 chipset, including the previous LaCie Speakers device. Fullduplex and MIDI support included as well as DICE driver. * HD-audio - Refactoring the driver-caps quirk handling in snd-hda-intel - More consistent control names representing the topology better - Fixups: HP mute LED with ALC268 codec, Ideapad S210 built-in mic fix, ASUS Z99He laptop EAPD * ASoC - Conversion of AC'97 drivers to use regmap, bringing us closer to the removal of the ASoC level I/O code - Clean up a lot of old drivers that were open coding things that have subsequently been implemented in the core - Some DAPM performance improvements - Removal of the now seldom used CODEC mutex - Lots of updates for the newer Intel SoC support, including support for the DSP and some Cherrytrail and Braswell machine drivers - Support for Samsung boards using rt5631 as the CODEC - Removal of the obsolete AFEB9260 machine driver - Driver support for the TI TS3A227E headset driver used in some Chrombeooks * Others - ASIHPI driver update and cleanups - Lots of dev_() printk conversions - Lots of trivial cleanups for the codes spotted by Coccinelle -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABAgAGBQJUiYaqAAoJEGwxgFQ9KSmkeo0P/2aDx2w8iVi8n7Og/7VBubkm VZkk08IOpP3h1ojyQRsBQPI0H5AquqQTZN1TJUDcy+6PD9vckYYcag9JWhA+0RBr I+BfTMLB3E4umIkzOjxeoyOzheL7GoZ+eZYEm8DkAhaue+cFhjNJz+S6g8ENkxJ9 lSjErXQxyiowc39I0v1WBZcuq6glX1psEsVup9U8m7KhNx6lexj28A2MkqicW4hs DZE6pYrk57W7y3+/NWxaBiglrItvScBAPpPqoyDm9zuDNTmAtGjf1uMRmRyHe30Z iunHXki8Fc2yBBapmfYrcLC2jyIyZykcxniF8Hd4nXUvddisFUEFFhNmB6v392d0 4/NXSqTnsq48vm0Ezjia2LySWKZZVQtam8t9262BKHcosKYObxirekD6vijSoWO8 ZWoXa+U1oWSFEoOAFDsu6GFqFHFRi5VhqBgIaPEIxrT2MQGHL3KU1bp8CJi/5CTU pNh0wC9SMtnSJJXBIP/nYH81WQxaik3c4eiHFPN4+0McBZQiIaIqMG6x+iiVNvPB MNLLVAzk0QiWeCmSo8OBdjOV0/T+pfQ7lrTCn2B1jdJi1CkAO8m2SwQrG4PpRx8k lUTBd4zTx5DYR+yPF69OyoCQg0XKjW9g62Qo5rmxrQreiidROZOBS1bljWzIPeft otupLmK5kz67n3eB2eto =sB6v -----END PGP SIGNATURE----- Merge tag 'sound-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "This became a fairly large pull request. In addition to the usual driver updates / fixes, there have been a high amount of cleanups in ASoC area, as well as control API helpers and kernel documentations fixes touching through the whole tree. In the driver side, the biggest changes are the support for new Intel SoC found on new x86 machines, and the updates of FireWire dice and oxfw drivers. Some remarkable items are below: ALSA core: - PCM mmap code cleanup, removal of arch-dependent codes - PCM xrun injection support - PCM hwptr tracepoint support - Refactoring of snd_pcm_action(), simplification of PCM locking - Robustified sequecner auto-load functionality - New control API helpers and lots of cleanups along with them - Lots of kerneldoc fixes and cleanups USB-audio: - The mixer resume code was largely rewritten, and the devices with quirks are resumed properly. - New hardware support: Focusrite Scarlett, Digidesign Mbox1, Denon/Marantz DACs, Zoom R16/24 FireWire: - DICE driver updates with better duplex and sync support, including MIDI support - New OXFW driver for Oxford Semiconductor FW970/971 chipset, including the previous LaCie Speakers device. Fullduplex and MIDI support included as well as DICE driver. HD-audio: - Refactoring the driver-caps quirk handling in snd-hda-intel - More consistent control names representing the topology better - Fixups: HP mute LED with ALC268 codec, Ideapad S210 built-in mic fix, ASUS Z99He laptop EAPD ASoC: - Conversion of AC'97 drivers to use regmap, bringing us closer to the removal of the ASoC level I/O code - Clean up a lot of old drivers that were open coding things that have subsequently been implemented in the core - Some DAPM performance improvements - Removal of the now seldom used CODEC mutex - Lots of updates for the newer Intel SoC support, including support for the DSP and some Cherrytrail and Braswell machine drivers - Support for Samsung boards using rt5631 as the CODEC - Removal of the obsolete AFEB9260 machine driver - Driver support for the TI TS3A227E headset driver used in some Chrombeooks Others: - ASIHPI driver update and cleanups - Lots of dev_() printk conversions - Lots of trivial cleanups for the codes spotted by Coccinelle" * tag 'sound-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (594 commits) ALSA: pcxhr: NULL dereference on probe failure ALSA: lola: NULL dereference on probe failure ALSA: hda - Add "eapd" model string for AD1986A codec ALSA: hda - Add EAPD fixup for ASUS Z99He laptop ALSA: oxfw: Add hwdep interface ALSA: oxfw: Add support for capture/playback MIDI messages ALSA: oxfw: add support for capturing PCM samples ALSA: oxfw: Add support AMDTP in-stream ALSA: oxfw: Add support for Behringer/Mackie devices ALSA: oxfw: Change the way to start stream ALSA: oxfw: Add proc interface for debugging purpose ALSA: oxfw: Change the way to make PCM rules/constraints ALSA: oxfw: Add support for AV/C stream format command to get/set supported stream formation ALSA: oxfw: Change the way to name card ALSA: dice: Add support for MIDI capture/playback ALSA: dice: Add support for capturing PCM samples ALSA: dice: Support for non SYT-Match sampling clock source mode ALSA: dice: Add support for duplex streams with synchronization ALSA: dice: Change the way to start stream ALSA: jack: Add dummy snd_jack_set_key() definition ...	2014-12-11 13:20:50 -08:00
Borislav Petkov	be9d1738b1	x86/asm: Unify segment selector defines Those are identical on 32- and 64-bit, unify them. No functional change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1418127959-29902-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-12-11 11:45:03 +01:00
Xishi Qiu	c072b90c8d	x86/mm: Fix zone ranges boot printout This is the usual physical memory layout boot printout: ... [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x00001000-0x00ffffff] [ 0.000000] DMA32 [mem 0x01000000-0xffffffff] [ 0.000000] Normal [mem 0x100000000-0xc3fffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x00001000-0x00099fff] [ 0.000000] node 0: [mem 0x00100000-0xbf78ffff] [ 0.000000] node 0: [mem 0x100000000-0x63fffffff] [ 0.000000] node 1: [mem 0x640000000-0xc3fffffff] ... This is the log when we set "mem=2G" on the boot cmdline: ... [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x00001000-0x00ffffff] [ 0.000000] DMA32 [mem 0x01000000-0xffffffff] // should be 0x7fffffff, right? [ 0.000000] Normal empty [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x00001000-0x00099fff] [ 0.000000] node 0: [mem 0x00100000-0x7fffffff] ... This patch fixes the printout, the following log shows the right ranges: ... [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x00001000-0x00ffffff] [ 0.000000] DMA32 [mem 0x01000000-0x7fffffff] [ 0.000000] Normal empty [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x00001000-0x00099fff] [ 0.000000] node 0: [mem 0x00100000-0x7fffffff] ... Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Cc: Linux MM <linux-mm@kvack.org> Cc: <dave@sr71.net> Cc: Rik van Riel <riel@redhat.com> Link: http://lkml.kernel.org/r/5487AB3D.6070306@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-12-11 11:35:02 +01:00
Linus Torvalds	92a578b064	ACPI and power management updates for 3.19-rc1 This time we have some more new material than we used to have during the last couple of development cycles. The most important part of it to me is the introduction of a unified interface for accessing device properties provided by platform firmware. It works with Device Trees and ACPI in a uniform way and drivers using it need not worry about where the properties come from as long as the platform firmware (either DT or ACPI) makes them available. It covers both devices and "bare" device node objects without struct device representation as that turns out to be necessary in some cases. This has been in the works for quite a few months (and development cycles) and has been approved by all of the relevant maintainers. On top of that, some drivers are switched over to the new interface (at25, leds-gpio, gpio_keys_polled) and some additional changes are made to the core GPIO subsystem to allow device drivers to manipulate GPIOs in the "canonical" way on platforms that provide GPIO information in their ACPI tables, but don't assign names to GPIO lines (in which case the driver needs to do that on the basis of what it knows about the device in question). That also has been approved by the GPIO core maintainers and the rfkill driver is now going to use it. Second is support for hardware P-states in the intel_pstate driver. It uses CPUID to detect whether or not the feature is supported by the processor in which case it will be enabled by default. However, it can be disabled entirely from the kernel command line if necessary. Next is support for a platform firmware interface based on ACPI operation regions used by the PMIC (Power Management Integrated Circuit) chips on the Intel Baytrail-T and Baytrail-T-CR platforms. That interface is used for manipulating power resources and for thermal management: sensor temperature reporting, trip point setting and so on. Also the ACPI core is now going to support the _DEP configuration information in a limited way. Basically, _DEP it supposed to reflect off-the-hierarchy dependencies between devices which may be very indirect, like when AML for one device accesses locations in an operation region handled by another device's driver (usually, the device depended on this way is a serial bus or GPIO controller). The support added this time is sufficient to make the ACPI battery driver work on Asus T100A, but it is general enough to be able to cover some other use cases in the future. Finally, we have a new cpufreq driver for the Loongson1B processor. In addition to the above, there are fixes and cleanups all over the place as usual and a traditional ACPICA update to a recent upstream release. As far as the fixes go, the ACPI LPSS (Low-power Subsystem) driver for Intel platforms should be able to handle power management of the DMA engine correctly, the cpufreq-dt driver should interact with the thermal subsystem in a better way and the ACPI backlight driver should handle some more corner cases, among other things. On top of the ACPICA update there are fixes for race conditions in the ACPICA's interrupt handling code which might lead to some random and strange looking failures on some systems. In the cleanups department the most visible part is the series of commits targeted at getting rid of the CONFIG_PM_RUNTIME configuration option. That was triggered by a discussion regarding the generic power domains code during which we realized that trying to support certain combinations of PM config options was painful and not really worth it, because nobody would use them in production anyway. For this reason, we decided to make CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and that lead to the conclusion that the latter became redundant and CONFIG_PM could be used instead of it. The material here makes that replacement in a major part of the tree, but there will be at least one more batch of that in the second part of the merge window. Specifics: - Support for retrieving device properties information from ACPI _DSD device configuration objects and a unified device properties interface for device drivers (and subsystems) on top of that. As stated above, this works with Device Trees and ACPI and allows device drivers to be written in a platform firmware (DT or ACPI) agnostic way. The at25, leds-gpio and gpio_keys_polled drivers are now going to use this new interface and the GPIO subsystem is additionally modified to allow device drivers to assign names to GPIO resources returned by ACPI _CRS objects (in case _DSD is not present or does not provide the expected data). The changes in this set are mostly from Mika Westerberg, Rafael J Wysocki, Aaron Lu, and Darren Hart with some fixes from others (Fabio Estevam, Geert Uytterhoeven). - Support for Hardware Managed Performance States (HWP) as described in Volume 3, section 14.4, of the Intel SDM in the intel_pstate driver. CPUID is used to detect whether or not the feature is supported by the processor. If supported, it will be enabled automatically unless the intel_pstate=no_hwp switch is present in the kernel command line. From Dirk Brandewie. - New Intel Broadwell-H ID for intel_pstate (Dirk Brandewie). - Support for firmware interface based on ACPI operation regions used by the PMIC chips on the Intel Baytrail-T and Baytrail-T-CR platforms for power resource control and thermal management (Aaron Lu). - Limited support for retrieving off-the-hierarchy dependencies between devices from ACPI _DEP device configuration objects and deferred probing support for the ACPI battery driver based on the _DEP information to make that driver work on Asus T100A (Lan Tianyu). - New cpufreq driver for the Loongson1B processor (Kelvin Cheung). - ACPICA update to upstream revision 20141107 which only affects tools (Bob Moore). - Fixes for race conditions in the ACPICA's interrupt handling code and in the ACPI code related to system suspend and resume (Lv Zheng and Rafael J Wysocki). - ACPI core fix for an RCU-related issue in the ioremap() regions management code that slowed down significantly after CPUs had been allowed to enter idle states even if they'd had RCU callbakcs queued and triggered some problems in certain proprietary graphics driver (and elsewhere). The fix replaces synchronize_rcu() in that code with synchronize_rcu_expedited() which makes the issue go away. From Konstantin Khlebnikov. - ACPI LPSS (Low-Power Subsystem) driver fix to handle power management of the DMA engine included into the LPSS correctly. The problem is that the DMA engine doesn't have ACPI PM support of its own and it simply is turned off when the last LPSS device having ACPI PM support goes into D3cold. To work around that, the PM domain used by the ACPI LPSS driver is redesigned so at least one device with ACPI PM support will be on as long as the DMA engine is in use. From Andy Shevchenko. - ACPI backlight driver fix to avoid using it on "Win8-compatible" systems where it doesn't work and where it was used by default by mistake (Aaron Lu). - Assorted minor ACPI core fixes and cleanups from Tomasz Nowicki, Sudeep Holla, Huang Rui, Hanjun Guo, Fabian Frederick, and Ashwin Chaugule (mostly related to the upcoming ARM64 support). - Intel RAPL (Running Average Power Limit) power capping driver fixes and improvements including new processor IDs (Jacob Pan). - Generic power domains modification to power up domains after attaching devices to them to meet the expectations of device drivers and bus types assuming devices to be accessible at probe time (Ulf Hansson). - Preliminary support for controlling device clocks from the generic power domains core code and modifications of the ARM/shmobile platform to use that feature (Ulf Hansson). - Assorted minor fixes and cleanups of the generic power domains core code (Ulf Hansson, Geert Uytterhoeven). - Assorted minor fixes and cleanups of the device clocks control code in the PM core (Geert Uytterhoeven, Grygorii Strashko). - Consolidation of device power management Kconfig options by making CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and removing the latter which is now redundant (Rafael J Wysocki and Kevin Hilman). That is the first batch of the changes needed for this purpose. - Core device runtime power management support code cleanup related to the execution of callbacks (Andrzej Hajda). - cpuidle ARM support improvements (Lorenzo Pieralisi). - cpuidle cleanup related to the CPUIDLE_FLAG_TIME_VALID flag and a new MAINTAINERS entry for ARM Exynos cpuidle (Daniel Lezcano and Bartlomiej Zolnierkiewicz). - New cpufreq driver callback (->ready) to be executed when the cpufreq core is ready to use a given policy object and cpufreq-dt driver modification to use that callback for cooling device registration (Viresh Kumar). - cpufreq core fixes and cleanups (Viresh Kumar, Vince Hsu, James Geboski, Tomeu Vizoso). - Assorted fixes and cleanups in the cpufreq-pcc, intel_pstate, cpufreq-dt, pxa2xx cpufreq drivers (Lenny Szubowicz, Ethan Zhao, Stefan Wahren, Petr Cvek). - OPP (Operating Performance Points) framework modification to allow OPPs to be removed too and update of a few cpufreq drivers (cpufreq-dt, exynos5440, imx6q, cpufreq) to remove OPPs (added during initialization) on driver removal (Viresh Kumar). - Hibernation core fixes and cleanups (Tina Ruchandani and Markus Elfring). - PM Kconfig fix related to CPU power management (Pankaj Dubey). - cpupower tool fix (Prarit Bhargava). / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJUhj6JAAoJEILEb/54YlRxTM4P/j5g5SfqvY0QKsn7sR7MGZ6v nsgCBhJAqTw3ocNC7EAs8z9h2GWy1KbKpakKYWAh9Fs1yZoey7tFSlcv/Rgjlp70 uU5sDQHtpE9mHKiymdsowiQuWgpl962L4k+k8hUslhlvgk1PvVbpajR6OqG8G+pD asuIW9eh1APNkLyXmRJ3ZPomzs0VmRdZJ0NEs0lKX9mJskqEvxPIwdaxq3iaJq9B Fo0J345zUDcJnxWblDRdHlOigCimglElfN5qJwaC4KpwUKuBvLRKbp4f69+wfT0c kYFiR29X5KjJ2kLfP/wKsLyuDCYYXRq3tCia5M1tAqOjZ+UA89H/GDftx/5lntmv qUlBa35VfdS1SX4HyApZitOHiLgo+It/hl8Z9bJnhyVw66NxmMQ8JYN2imb8Lhqh XCLR7BxLTah82AapLJuQ0ZDHPzZqMPG2veC2vAzRMYzVijict/p4Y2+qBqONltER 4rs9uRVn+hamX33lCLg8BEN8zqlnT3rJFIgGaKjq/wXHAU/zpE9CjOrKMQcAg9+s t51XMNPwypHMAYyGVhEL89ImjXnXxBkLRuquhlmEpvQchIhR+mR3dLsarGn7da44 WPIQJXzcsojXczcwwfqsJCR4I1FTFyQIW+UNh02GkDRgRovQqo+Jk762U7vQwqH+ LBdhvVaS1VW4v+FWXEoZ =5dox -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: "This time we have some more new material than we used to have during the last couple of development cycles. The most important part of it to me is the introduction of a unified interface for accessing device properties provided by platform firmware. It works with Device Trees and ACPI in a uniform way and drivers using it need not worry about where the properties come from as long as the platform firmware (either DT or ACPI) makes them available. It covers both devices and "bare" device node objects without struct device representation as that turns out to be necessary in some cases. This has been in the works for quite a few months (and development cycles) and has been approved by all of the relevant maintainers. On top of that, some drivers are switched over to the new interface (at25, leds-gpio, gpio_keys_polled) and some additional changes are made to the core GPIO subsystem to allow device drivers to manipulate GPIOs in the "canonical" way on platforms that provide GPIO information in their ACPI tables, but don't assign names to GPIO lines (in which case the driver needs to do that on the basis of what it knows about the device in question). That also has been approved by the GPIO core maintainers and the rfkill driver is now going to use it. Second is support for hardware P-states in the intel_pstate driver. It uses CPUID to detect whether or not the feature is supported by the processor in which case it will be enabled by default. However, it can be disabled entirely from the kernel command line if necessary. Next is support for a platform firmware interface based on ACPI operation regions used by the PMIC (Power Management Integrated Circuit) chips on the Intel Baytrail-T and Baytrail-T-CR platforms. That interface is used for manipulating power resources and for thermal management: sensor temperature reporting, trip point setting and so on. Also the ACPI core is now going to support the _DEP configuration information in a limited way. Basically, _DEP it supposed to reflect off-the-hierarchy dependencies between devices which may be very indirect, like when AML for one device accesses locations in an operation region handled by another device's driver (usually, the device depended on this way is a serial bus or GPIO controller). The support added this time is sufficient to make the ACPI battery driver work on Asus T100A, but it is general enough to be able to cover some other use cases in the future. Finally, we have a new cpufreq driver for the Loongson1B processor. In addition to the above, there are fixes and cleanups all over the place as usual and a traditional ACPICA update to a recent upstream release. As far as the fixes go, the ACPI LPSS (Low-power Subsystem) driver for Intel platforms should be able to handle power management of the DMA engine correctly, the cpufreq-dt driver should interact with the thermal subsystem in a better way and the ACPI backlight driver should handle some more corner cases, among other things. On top of the ACPICA update there are fixes for race conditions in the ACPICA's interrupt handling code which might lead to some random and strange looking failures on some systems. In the cleanups department the most visible part is the series of commits targeted at getting rid of the CONFIG_PM_RUNTIME configuration option. That was triggered by a discussion regarding the generic power domains code during which we realized that trying to support certain combinations of PM config options was painful and not really worth it, because nobody would use them in production anyway. For this reason, we decided to make CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and that lead to the conclusion that the latter became redundant and CONFIG_PM could be used instead of it. The material here makes that replacement in a major part of the tree, but there will be at least one more batch of that in the second part of the merge window. Specifics: - Support for retrieving device properties information from ACPI _DSD device configuration objects and a unified device properties interface for device drivers (and subsystems) on top of that. As stated above, this works with Device Trees and ACPI and allows device drivers to be written in a platform firmware (DT or ACPI) agnostic way. The at25, leds-gpio and gpio_keys_polled drivers are now going to use this new interface and the GPIO subsystem is additionally modified to allow device drivers to assign names to GPIO resources returned by ACPI _CRS objects (in case _DSD is not present or does not provide the expected data). The changes in this set are mostly from Mika Westerberg, Rafael J Wysocki, Aaron Lu, and Darren Hart with some fixes from others (Fabio Estevam, Geert Uytterhoeven). - Support for Hardware Managed Performance States (HWP) as described in Volume 3, section 14.4, of the Intel SDM in the intel_pstate driver. CPUID is used to detect whether or not the feature is supported by the processor. If supported, it will be enabled automatically unless the intel_pstate=no_hwp switch is present in the kernel command line. From Dirk Brandewie. - New Intel Broadwell-H ID for intel_pstate (Dirk Brandewie). - Support for firmware interface based on ACPI operation regions used by the PMIC chips on the Intel Baytrail-T and Baytrail-T-CR platforms for power resource control and thermal management (Aaron Lu). - Limited support for retrieving off-the-hierarchy dependencies between devices from ACPI _DEP device configuration objects and deferred probing support for the ACPI battery driver based on the _DEP information to make that driver work on Asus T100A (Lan Tianyu). - New cpufreq driver for the Loongson1B processor (Kelvin Cheung). - ACPICA update to upstream revision 20141107 which only affects tools (Bob Moore). - Fixes for race conditions in the ACPICA's interrupt handling code and in the ACPI code related to system suspend and resume (Lv Zheng and Rafael J Wysocki). - ACPI core fix for an RCU-related issue in the ioremap() regions management code that slowed down significantly after CPUs had been allowed to enter idle states even if they'd had RCU callbakcs queued and triggered some problems in certain proprietary graphics driver (and elsewhere). The fix replaces synchronize_rcu() in that code with synchronize_rcu_expedited() which makes the issue go away. From Konstantin Khlebnikov. - ACPI LPSS (Low-Power Subsystem) driver fix to handle power management of the DMA engine included into the LPSS correctly. The problem is that the DMA engine doesn't have ACPI PM support of its own and it simply is turned off when the last LPSS device having ACPI PM support goes into D3cold. To work around that, the PM domain used by the ACPI LPSS driver is redesigned so at least one device with ACPI PM support will be on as long as the DMA engine is in use. From Andy Shevchenko. - ACPI backlight driver fix to avoid using it on "Win8-compatible" systems where it doesn't work and where it was used by default by mistake (Aaron Lu). - Assorted minor ACPI core fixes and cleanups from Tomasz Nowicki, Sudeep Holla, Huang Rui, Hanjun Guo, Fabian Frederick, and Ashwin Chaugule (mostly related to the upcoming ARM64 support). - Intel RAPL (Running Average Power Limit) power capping driver fixes and improvements including new processor IDs (Jacob Pan). - Generic power domains modification to power up domains after attaching devices to them to meet the expectations of device drivers and bus types assuming devices to be accessible at probe time (Ulf Hansson). - Preliminary support for controlling device clocks from the generic power domains core code and modifications of the ARM/shmobile platform to use that feature (Ulf Hansson). - Assorted minor fixes and cleanups of the generic power domains core code (Ulf Hansson, Geert Uytterhoeven). - Assorted minor fixes and cleanups of the device clocks control code in the PM core (Geert Uytterhoeven, Grygorii Strashko). - Consolidation of device power management Kconfig options by making CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and removing the latter which is now redundant (Rafael J Wysocki and Kevin Hilman). That is the first batch of the changes needed for this purpose. - Core device runtime power management support code cleanup related to the execution of callbacks (Andrzej Hajda). - cpuidle ARM support improvements (Lorenzo Pieralisi). - cpuidle cleanup related to the CPUIDLE_FLAG_TIME_VALID flag and a new MAINTAINERS entry for ARM Exynos cpuidle (Daniel Lezcano and Bartlomiej Zolnierkiewicz). - New cpufreq driver callback (->ready) to be executed when the cpufreq core is ready to use a given policy object and cpufreq-dt driver modification to use that callback for cooling device registration (Viresh Kumar). - cpufreq core fixes and cleanups (Viresh Kumar, Vince Hsu, James Geboski, Tomeu Vizoso). - Assorted fixes and cleanups in the cpufreq-pcc, intel_pstate, cpufreq-dt, pxa2xx cpufreq drivers (Lenny Szubowicz, Ethan Zhao, Stefan Wahren, Petr Cvek). - OPP (Operating Performance Points) framework modification to allow OPPs to be removed too and update of a few cpufreq drivers (cpufreq-dt, exynos5440, imx6q, cpufreq) to remove OPPs (added during initialization) on driver removal (Viresh Kumar). - Hibernation core fixes and cleanups (Tina Ruchandani and Markus Elfring). - PM Kconfig fix related to CPU power management (Pankaj Dubey). - cpupower tool fix (Prarit Bhargava)" * tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (120 commits) i2c-omap / PM: Drop CONFIG_PM_RUNTIME from i2c-omap.c dmaengine / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM tools: cpupower: fix return checks for sysfs_get_idlestate_count() drivers: sh / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM e1000e / igb / PM: Eliminate CONFIG_PM_RUNTIME MMC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM MFD / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM misc / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM media / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM input / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM leds: leds-gpio: Fix multiple instances registration without 'label' property iio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM hsi / OMAP / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM i2c-hid / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM drm / exynos / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM gpio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM hwrandom / exynos / PM: Use CONFIG_PM in #ifdef block / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM USB / PM: Drop CONFIG_PM_RUNTIME from the USB core PM: Merge the SET*_RUNTIME_PM_OPS() macros ...	2014-12-10 21:17:00 -08:00
Linus Torvalds	1dd7dcb6ea	There was a lot of clean ups and minor fixes. One of those clean ups was to the trace_seq code. It also removed the return values to the trace_seq_() functions and use trace_seq_has_overflowed() to see if the buffer filled up or not. This is similar to work being done to the seq_file code as well in another tree. Some of the other goodies include: o Added some "!" (NOT) logic to the tracing filter. o Fixed the frame pointer logic to the x86_64 mcount trampolines o Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems. That is, the ftrace trampoline can be dynamically allocated and be called directly by functions that only have a single hook to them. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUhbLGAAoJEEjnJuOKh9ldRV4H/3NcLbgGB2iu96la1zdYE6pG Q7cDJMxXK80YIIL70h9G0IItcD4t62LMb72lfBnMGRj3msgFb3AgISW57EuI0Pxk xk24wuIPoTG2S7v9sc3SboNFwO8qbtIjxD2OBmqIUrGo2sZIiGjyj3gX7mCY3uzL WB2bUOSFz/22OgaANinR5EELHA3pZZCf54Vz1K9ndmtK0xp0j1a7xJShD6TrMdYv mZ3zH5ViIhW4A3mdcMceh6fy2JLQAiEKF0uPTvcMMz7NlVul0mxyL/+10P7AE/3R Ehw4fzmm4NDshPDtBOkKH0LsppgXzuItFuQUTpact3JlqTg++bV6onSsrkt1hlY= =Z7Cm -----END PGP SIGNATURE----- Merge tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "There was a lot of clean ups and minor fixes. One of those clean ups was to the trace_seq code. It also removed the return values to the trace_seq_() functions and use trace_seq_has_overflowed() to see if the buffer filled up or not. This is similar to work being done to the seq_file code as well in another tree. Some of the other goodies include: - Added some "!" (NOT) logic to the tracing filter. - Fixed the frame pointer logic to the x86_64 mcount trampolines - Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems. That is, the ftrace trampoline can be dynamically allocated and be called directly by functions that only have a single hook to them" * tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (55 commits) tracing: Truncated output is better than nothing tracing: Add additional marks to signal very large time deltas Documentation: describe trace_buf_size parameter more accurately tracing: Allow NOT to filter AND and OR clauses tracing: Add NOT to filtering logic ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter ftrace/x86: Get rid of ftrace_caller_setup ftrace/x86: Have save_mcount_regs macro also save stack frames if needed ftrace/x86: Add macro MCOUNT_REG_SIZE for amount of stack used to save mcount regs ftrace/x86: Simplify save_mcount_regs on getting RIP ftrace/x86: Have save_mcount_regs store RIP in %rdi for first parameter ftrace/x86: Rename MCOUNT_SAVE_FRAME and add more detailed comments ftrace/x86: Move MCOUNT_SAVE_FRAME out of header file ftrace/x86: Have static tracing also use ftrace_caller_setup ftrace/x86: Have static function tracing always test for function graph kprobes: Add IPMODIFY flag to kprobe_ftrace_ops ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict kprobes/ftrace: Recover original IP if pre_handler doesn't change it tracing/trivial: Fix typos and make an int into a bool tracing: Deletion of an unnecessary check before iput() ...	2014-12-10 19:58:13 -08:00
Linus Torvalds	b6da0076ba	Merge branch 'akpm' (patchbomb from Andrew) Merge first patchbomb from Andrew Morton: - a few minor cifs fixes - dma-debug upadtes - ocfs2 - slab - about half of MM - procfs - kernel/exit.c - panic.c tweaks - printk upates - lib/ updates - checkpatch updates - fs/binfmt updates - the drivers/rtc tree - nilfs - kmod fixes - more kernel/exit.c - various other misc tweaks and fixes * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits) exit: pidns: fix/update the comments in zap_pid_ns_processes() exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting exit: exit_notify: re-use "dead" list to autoreap current exit: reparent: call forget_original_parent() under tasklist_lock exit: reparent: avoid find_new_reaper() if no children exit: reparent: introduce find_alive_thread() exit: reparent: introduce find_child_reaper() exit: reparent: document the ->has_child_subreaper checks exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper() exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting exit: proc: don't try to flush /proc/tgid/task/tgid exit: release_task: fix the comment about group leader accounting exit: wait: drop tasklist_lock before psig->c* accounting exit: wait: don't use zombie->real_parent exit: wait: cleanup the ptrace_reparented() checks usermodehelper: kill the kmod_thread_locker logic usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper() fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races ...	2014-12-10 18:34:42 -08:00
Kirill A. Shutemov	c164e038ee	mm: fix huge zero page accounting in smaps report As a small zero page, huge zero page should not be accounted in smaps report as normal page. For small pages we rely on vm_normal_page() to filter out zero page, but vm_normal_page() is not designed to handle pmds. We only get here due hackish cast pmd to pte in smaps_pte_range() -- pte and pmd format is not necessary compatible on each and every architecture. Let's add separate codepath to handle pmds. follow_trans_huge_pmd() will detect huge zero page for us. We would need pmd_dirty() helper to do this properly. The patch adds it to THP-enabled architectures which don't yet have one. [akpm@linux-foundation.org: use do_div to fix 32-bit build] Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Fengwei Yin <yfw.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:08 -08:00
Linus Torvalds	3a5dc1fafb	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading updates from Ingo Molnar: "The main changes in this cycle are: - Reload microcode when resuming and the case when only the early loader has been utilized. (Borislav Petkov) - Also, do not load the driver on paravirt guests. (Boris Ostrovsky)" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/intel: Fish out the stashed microcode for the BSP x86, microcode: Reload microcode on resume x86, microcode: Don't initialize microcode code on paravirt x86, microcode, intel: Drop unused parameter x86, microcode, AMD: Do not use smp_processor_id() in preemtible context	2014-12-10 15:01:43 -08:00
Linus Torvalds	3100e448e7	Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso updates from Ingo Molnar: "Various vDSO updates from Andy Lutomirski, mostly cleanups and reorganization to improve maintainability, but also some micro-optimizations and robustization changes" * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86_64/vsyscall: Restore orig_ax after vsyscall seccomp x86_64: Add a comment explaining the TASK_SIZE_MAX guard page x86_64,vsyscall: Make vsyscall emulation configurable x86_64, vsyscall: Rewrite comment and clean up headers in vsyscall code x86_64, vsyscall: Turn vsyscalls all the way off when vsyscall==none x86,vdso: Use LSL unconditionally for vgetcpu x86: vdso: Fix build with older gcc x86_64/vdso: Clean up vgetcpu init and merge the vdso initcalls x86_64/vdso: Remove jiffies from the vvar page x86/vdso: Make the PER_CPU segment 32 bits x86/vdso: Make the PER_CPU segment start out accessed x86/vdso: Change the PER_CPU segment to use struct desc_struct x86_64/vdso: Move getcpu code from vsyscall_64.c to vdso/vma.c x86_64/vsyscall: Move all of the gate_area code to vsyscall_64.c	2014-12-10 14:24:20 -08:00
Linus Torvalds	c9f861c772	Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS update from Ingo Molnar: "The biggest change in this cycle is better support for UCNA (UnCorrected No Action) events: "Handle all uncorrected error reports in the same way (soft offline the page). We used to only do that for SRAO (software recoverable action optional) machine checks, but it makes sense to also do it for UCNA (UnCorrected No Action) logs found by CMCI or polling." plus various x86 MCE handling updates and fixes" * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Spell "panicked" correctly x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error x86, MCE, AMD: Assign interrupt handler only when bank supports it x86, MCE, AMD: Drop software-defined bank in error thresholding x86, MCE, AMD: Move invariant code out from loop body x86, MCE, AMD: Correct thresholding error logging x86, MCE, AMD: Use macros to compute bank MSRs RAS, HWPOISON: Fix wrong error recovery status GHES: Make ghes_estatus_caches static APEI, GHES: Cleanup unnecessary function for lockless list	2014-12-10 14:20:10 -08:00
Linus Torvalds	a023748d53	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm tree changes from Ingo Molnar: "The biggest change is full PAT support from Jürgen Gross: The x86 architecture offers via the PAT (Page Attribute Table) a way to specify different caching modes in page table entries. The PAT MSR contains 8 entries each specifying one of 6 possible cache modes. A pte references one of those entries via 3 bits: _PAGE_PAT, _PAGE_PWT and _PAGE_PCD. The Linux kernel currently supports only 4 different cache modes. The PAT MSR is set up in a way that the setting of _PAGE_PAT in a pte doesn't matter: the top 4 entries in the PAT MSR are the same as the 4 lower entries. This results in the kernel not supporting e.g. write-through mode. Especially this cache mode would speed up drivers of video cards which now have to use uncached accesses. OTOH some old processors (Pentium) don't support PAT correctly and the Xen hypervisor has been using a different PAT MSR configuration for some time now and can't change that as this setting is part of the ABI. This patch set abstracts the cache mode from the pte and introduces tables to translate between cache mode and pte bits (the default cache mode "write back" is hard-wired to PAT entry 0). The tables are statically initialized with values being compatible to old processors and current usage. As soon as the PAT MSR is changed (or - in case of Xen - is read at boot time) the tables are changed accordingly. Requests of mappings with special cache modes are always possible now, in case they are not supported there will be a fallback to a compatible but slower mode. Summing it up, this patch set adds the following features: - capability to support WT and WP cache modes on processors with full PAT support - processors with no or uncorrect PAT support are still working as today, even if WT or WP cache mode are selected by drivers for some pages - reduction of Xen special handling regarding cache mode Another change is a boot speedup on ridiculously large RAM systems, plus other smaller fixes" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) x86: mm: Move PAT only functions to mm/pat.c xen: Support Xen pv-domains using PAT x86: Enable PAT to use cache mode translation tables x86: Respect PAT bit when copying pte values between large and normal pages x86: Support PAT bit in pagetable dump for lower levels x86: Clean up pgtable_types.h x86: Use new cache mode type in memtype related functions x86: Use new cache mode type in mm/ioremap.c x86: Use new cache mode type in setting page attributes x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert() x86: Use new cache mode type in mm/iomap_32.c x86: Use new cache mode type in asm/pgtable.h x86: Use new cache mode type in arch/x86/mm/init_64.c x86: Use new cache mode type in arch/x86/pci x86: Use new cache mode type in drivers/video/fbdev/vermilion x86: Use new cache mode type in drivers/video/fbdev/gbefb.c x86: Use new cache mode type in include/asm/fb.h x86: Make page cache mode a real type x86: mm: Use 2GB memory block size on large-memory x86-64 systems ...	2014-12-10 13:59:34 -08:00
Linus Torvalds	773fed910d	Merge branches 'x86-platform-for-linus' and 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform changes from Ingo Molnar: "A handful of numachip APIC driver updates/fixes, and two small SGI/UV fixes" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: numachip: APIC driver cleanups x86: numachip: Elide self-IPI ICR polling x86: numachip: Fix 16-bit APIC ID truncation * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: UV BAU: Increase maximum CPUs per socket/hub x86: UV BAU: Avoid NULL pointer reference in ptc_seq_show	2014-12-10 13:40:11 -08:00
Linus Torvalds	8139548136	Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "Changes in this cycle are: - support module unload for efivarfs (Mathias Krause) - another attempt at moving x86 to libstub taking advantage of the __pure attribute (Ard Biesheuvel) - add EFI runtime services section to ptdump (Mathias Krause)" * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, ptdump: Add section for EFI runtime services efi/x86: Move x86 back to libstub efivarfs: Allow unloading when build as module	2014-12-10 12:42:16 -08:00
Daniel Borkmann	0cb6c969ed	net, lib: kill arch_fast_hash library bits As there are now no remaining users of arch_fast_hash(), lets kill it entirely. This basically reverts commit `71ae8aac3e` ("lib: introduce arch optimized hash library") and follow-up work, that is f.e., commit `237217546d` ("lib: hash: follow-up fixups for arch hash"), commit `e3fec2f74f` ("lib: Add missing arch generic-y entries for asm-generic/hash.h") and last but not least commit `6a02652df5` ("perf tools: Fix include for non x86 architectures"). Cc: Francesco Fusco <fusco@ntop.org> Cc: Thomas Graf <tgraf@suug.ch> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-10 15:17:46 -05:00
Linus Torvalds	b6444bd0a1	Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot and percpu updates from Ingo Molnar: "This tree contains a bootable images documentation update plus three slightly misplaced x86/asm percpu changes/optimizations" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64: Use RIP-relative addressing for most per-CPU accesses x86-64: Handle PC-relative relocations on per-CPU data x86: Convert a few more per-CPU items to read-mostly ones x86, boot: Document intermediates more clearly	2014-12-10 12:10:24 -08:00
Linus Torvalds	9d0cf6f564	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "Misc changes: - context switch micro-optimization - debug printout micro-optimization - comment enhancements and typo fix" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Replace seq_printf() with seq_puts() x86/asm: Fix typo in arch/x86/kernel/asm_offset_64.c sched/x86: Add a comment clarifying LDT context switching sched/x86_64: Don't save flags on context switch	2014-12-10 12:09:26 -08:00
Linus Torvalds	3eb5b893eb	Merge branch 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 MPX support from Thomas Gleixner: "This enables support for x86 MPX. MPX is a new debug feature for bound checking in user space. It requires kernel support to handle the bound tables and decode the bound violating instruction in the trap handler" * 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: asm-generic: Remove asm-generic arch_bprm_mm_init() mm: Make arch_unmap()/bprm_mm_init() available to all architectures x86: Cleanly separate use of asm-generic/mm_hooks.h x86 mpx: Change return type of get_reg_offset() fs: Do not include mpx.h in exec.c x86, mpx: Add documentation on Intel MPX x86, mpx: Cleanup unused bound tables x86, mpx: On-demand kernel allocation of bounds tables x86, mpx: Decode MPX instruction to get bound violation information x86, mpx: Add MPX-specific mmap interface x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific x86, mpx: Add MPX to disabled features ia64: Sync struct siginfo with general version mips: Sync struct siginfo with general version mpx: Extend siginfo structure to include bound violation information x86, mpx: Rename cfg_reg_u and status_reg x86: mpx: Give bndX registers actual names x86: Remove arbitrary instruction size limit in instruction decoder	2014-12-10 09:34:43 -08:00
Linus Torvalds	9e66645d72	Merge branch 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq domain updates from Thomas Gleixner: "The real interesting irq updates: - Support for hierarchical irq domains: For complex interrupt routing scenarios where more than one interrupt related chip is involved we had no proper representation in the generic interrupt infrastructure so far. That made people implement rather ugly constructs in their nested irq chip implementations. The main offenders are x86 and arm/gic. To distangle that mess we have now hierarchical irqdomains which seperate the various interrupt chips and connect them via the hierarchical domains. That keeps the domain specific details internal to the particular hierarchy level and removes the criss/cross referencing of chip internals. The resulting hierarchy for a complex x86 system will look like this: vector mapped: 74 msi-0 mapped: 2 dmar-ir-1 mapped: 69 ioapic-1 mapped: 4 ioapic-0 mapped: 20 pci-msi-2 mapped: 45 dmar-ir-0 mapped: 3 ioapic-2 mapped: 1 pci-msi-1 mapped: 2 htirq mapped: 0 Neither ioapic nor pci-msi know about the dmar interrupt remapping between themself and the vector domain. If interrupt remapping is disabled ioapic and pci-msi become direct childs of the vector domain. In hindsight we should have done that years ago, but in hindsight we always know better :) - Support for generic MSI interrupt domain handling We have more and more non PCI related MSI interrupts, so providing a generic infrastructure for this is better than having all affected architectures implementing their own private hacks. - Support for PCI-MSI interrupt domain handling, based on the generic MSI support. This part carries the pci/msi branch from Bjorn Helgaas pci tree to avoid a massive conflict. The PCI/MSI parts are acked by Bjorn. I have two more branches on top of this. The full conversion of x86 to hierarchical domains and a partial conversion of arm/gic" * 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) genirq: Move irq_chip_write_msi_msg() helper to core PCI/MSI: Allow an msi_controller to be associated to an irq domain PCI/MSI: Provide mechanism to alloc/free MSI/MSIX interrupt from irqdomain PCI/MSI: Enhance core to support hierarchy irqdomain PCI/MSI: Move cached entry functions to irq core genirq: Provide default callbacks for msi_domain_ops genirq: Introduce msi_domain_alloc/free_irqs() asm-generic: Add msi.h genirq: Add generic msi irq domain support genirq: Introduce callback irq_chip.irq_write_msi_msg genirq: Work around __irq_set_handler vs stacked domains ordering issues irqdomain: Introduce helper function irq_domain_add_hierarchy() irqdomain: Implement a method to automatically call parent domains alloc/free genirq: Introduce helper irq_domain_set_info() to reduce duplicated code genirq: Split out flow handler typedefs into seperate header file genirq: Add IRQ_SET_MASK_OK_DONE to support stacked irqchip genirq: Introduce irq_chip.irq_compose_msi_msg() to support stacked irqchip genirq: Add more helper functions to support stacked irq_chip genirq: Introduce helper functions to support stacked irq_chip irqdomain: Do irq_find_mapping and set_type for hierarchy irqdomain in case OF ...	2014-12-10 09:01:01 -08:00
Linus Torvalds	86c6a2fddf	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle are: - 'Nested Sleep Debugging', activated when CONFIG_DEBUG_ATOMIC_SLEEP=y. This instruments might_sleep() checks to catch places that nest blocking primitives - such as mutex usage in a wait loop. Such bugs can result in hard to debug races/hangs. Another category of invalid nesting that this facility will detect is the calling of blocking functions from within schedule() -> sched_submit_work() -> blk_schedule_flush_plug(). There's some potential for false positives (if secondary blocking primitives themselves are not ready yet for this facility), but the kernel will warn once about such bugs per bootup, so the warning isn't much of a nuisance. This feature comes with a number of fixes, for problems uncovered with it, so no messages are expected normally. - Another round of sched/numa optimizations and refinements, for CONFIG_NUMA_BALANCING=y. - Another round of sched/dl fixes and refinements. Plus various smaller fixes and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) sched: Add missing rcu protection to wake_up_all_idle_cpus sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK sched/numa: Init numa balancing fields of init_task sched/deadline: Remove unnecessary definitions in cpudeadline.h sched/cpupri: Remove unnecessary definitions in cpupri.h sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task() sched/fair: Fix stale overloaded status in the busiest group finding logic sched: Move p->nr_cpus_allowed check to select_task_rq() sched/completion: Document when to use wait_for_completion_io_() sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC sched/fair: Kill task_struct::numa_entry and numa_group::task_list sched: Refactor task_struct to use numa_faults instead of numa_ pointers sched/deadline: Don't check CONFIG_SMP in switched_from_dl() sched/deadline: Reschedule from switched_from_dl() after a successful pull sched/deadline: Push task away if the deadline is equal to curr during wakeup sched/deadline: Add deadline rq status print sched/deadline: Fix artificial overrun introduced by yield_task_dl() sched/rt: Clean up check_preempt_equal_prio() sched/core: Use dl_bw_of() under rcu_read_lock_sched() sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu ...	2014-12-09 21:21:34 -08:00
Linus Torvalds	5706ffd045	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events update from Ingo Molnar: "On the kernel side there's few changes, the one that stands out is PEBS machine state sampling support on x86, by Stephane Eranian. On the tooling side: User visible tooling changes: - Don't open the DWARF info multiple times, keeping instead a dwfl handle in struct dso, greatly speeding up 'perf report' on powerpc. (Sukadev Bhattiprolu) - Introduce PARSE_OPT_DISABLED option flag and use it to avoid showing undersired options in tools that provides frontends to 'perf record', like sched, kvm, etc (Namhyung Kim) - Fallback to kallsyms when using the minimal 'ELF' loader (Arnaldo Carvalho de Melo) - Fix annotation with kcore (Adrian Hunter) - Support source line numbers in annotate using a hotkey (Andi Kleen) - Callchain improvements including: * Enable printing the srcline in the history * Make get_srcline fall back to sym+offset (Andi Kleen) - TUI hist_entry browser fixes, including showing missing overhead value for first level callchain. Detected comparing the output of --stdio/--gui (that matched) with --tui, that had this problem. (Namhyung Kim) - Support handling complete branch stacks as histograms (Andi Kleen) Tooling infrastructure changes: - Prep work for supporting per-pkg and snapshot counters in 'perf stat' (Jiri Olsa) - 'perf stat' refactorings, moving stuff from it to evsel.c to use in per-pkg/snapshot format changes (Jiri Olsa) - Add per-pkg format file parsing (Matt Fleming) - Clean up libelf feature support code (Namhyung Kim) - Add gzip decompression support for kernel modules (Namhyung Kim) - More prep patches for Intel PT, including a a thread stack and more stuff made available via the database export mechanism (Adrian Hunter) - More Intel PT work, including a facility to export sample data (comms, threads, symbol names, etc) in a database friendly way, with an script to use this to create a postgresql database. (Adrian Hunter) - Make sure that thread->mg->machine points to the machine where the thread exists (it was being set only for the kmaps kernel modules case, do it as well for the mmaps) and use it to shorten function signatures (Arnaldo Carvalho de Melo) ... and lots of other fixes and smaller improvements" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (91 commits) perf report: In branch stack mode use address history sorting perf report: Add --branch-history option perf callchain: Support handling complete branch stacks as histograms perf stat: Add support for snapshot counters perf stat: Add support for per-pkg counters perf tools: Remove perf_evsel__read interface perf stat: Use read_counter in read_counter_aggr perf stat: Make read_counter work over the thread dimension perf stat: Use perf_evsel__read_cb in read_counter perf tools: Add snapshot format file parsing perf tools: Add per-pkg format file parsing perf evsel: Introduce perf_evsel__read_cb function perf evsel: Introduce perf_counts_values__scale function perf evsel: Introduce perf_evsel__compute_deltas function perf tools: Allow to force redirect pr_debug to stderr. perf tools: Fix segfault due to invalid kernel dso access perf callchain: Make get_srcline fall back to sym+offset perf symbols: Move bfd_demangle stubbing to its only user perf callchain: Enable printing the srcline in the history perf tools: Collapse first level callchain entry if it has sibling ...	2014-12-09 20:55:37 -08:00
Linus Torvalds	9c37f95936	Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking tree changes from Ingo Molnar: "Two changes: a documentation update and a ticket locks live lock fix" * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ticketlock: Fix spin_unlock_wait() livelock locking/lglocks: Add documentation of current lglocks implementation	2014-12-09 19:59:22 -08:00
Linus Torvalds	a0e4467726	asm-generic: asm/io.h rewrite While there normally is no reason to have a pull request for asm-generic but have all changes get merged through whichever tree needs them, I do have a series for 3.19. There are two sets of patches that change significant portions of asm/io.h, and this branch contains both in order to resolve the conflicts: - Will Deacon has done a set of patches to ensure that all architectures define {read,write}{b,w,l,q}_relaxed() functions or get them by including asm-generic/io.h. These functions are commonly used on ARM specific drivers to avoid expensive L2 cache synchronization implied by the normal {read,write}{b,w,l,q}, but we need to define them on all architectures in order to share the drivers across architectures and to enable CONFIG_COMPILE_TEST configurations for them - Thierry Reding has done an unrelated set of patches that extends the asm-generic/io.h file to the degree necessary to make it useful on ARM64 and potentially other architectures. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIVAwUAVIdwNmCrR//JCVInAQJWuw/9FHt2ThMnI1J1Jqy4CVwtyjWTSa6Y/uVj xSytS7AOvmU/nw1quSoba5mN9fcUQUtK9kqjqNcq71WsQcDE6BF9SFpi9cWtjWcI ZfWsC+5kqry/mbnuHefENipem9RqBrLbOBJ3LARf5M8rZJuTz1KbdZs9r9+1QsCX ou8jeqVvNKUn9J1WyekJBFSrPOtZ4bCUpeyh23JHRfPtJeAHNOuPuymj6WceAz98 uMV1icRaCBMySsf9HgsHRYW5HwuCm3MrrYj6ukyPpgxYz7FRq4hJLDs6GnlFtAGb 71g87NpFdB32qbW+y1ntfYaJyUryMHMVHBWcV5H9m0btdHTRHYZjoOGOPuyLHHO8 +l4/FaOQhnDL8cNDj0HKfhdlyaFylcWgs1wzj68nv31c1dGjcJcQiyCDwry9mJhr erh4EewcerUvWzbBMQ4JP1f8syKMsKwbo1bVU61a1RQJxEqVCzJMLweGSOFmqMX2 6E4ZJVWv81UFLoFTzYx+7+M45K4NWywKNQdzwKmqKHc4OQyvq4ALJI0A7SGFJdDR HJ7VqDiLaSdBitgJcJUxNzKcyXij6wE9jE1fBe3YDFE4LrnZXFVLN+MX6hs7AIFJ vJM1UpxRxQUMGIH2m7rbDNazOAsvQGxINOjNor23cNLuf6qLY1LrpHVPQDAfJVvA 6tROM77bwIQ= =xUv6 -----END PGP SIGNATURE----- Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic asm/io.h rewrite from Arnd Bergmann: "While there normally is no reason to have a pull request for asm-generic but have all changes get merged through whichever tree needs them, I do have a series for 3.19. There are two sets of patches that change significant portions of asm/io.h, and this branch contains both in order to resolve the conflicts: - Will Deacon has done a set of patches to ensure that all architectures define {read,write}{b,w,l,q}_relaxed() functions or get them by including asm-generic/io.h. These functions are commonly used on ARM specific drivers to avoid expensive L2 cache synchronization implied by the normal {read,write}{b,w,l,q}, but we need to define them on all architectures in order to share the drivers across architectures and to enable CONFIG_COMPILE_TEST configurations for them - Thierry Reding has done an unrelated set of patches that extends the asm-generic/io.h file to the degree necessary to make it useful on ARM64 and potentially other architectures" * tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (29 commits) ARM64: use GENERIC_PCI_IOMAP sparc: io: remove duplicate relaxed accessors on sparc32 ARM: sa11x0: Use void __iomem * in MMIO accessors arm64: Use include/asm-generic/io.h ARM: Use include/asm-generic/io.h asm-generic/io.h: Implement generic {read,write}s*() asm-generic/io.h: Reconcile I/O accessor overrides /dev/mem: Use more consistent data types Change xlate_dev_{kmem,mem}_ptr() prototypes ARM: ixp4xx: Properly override I/O accessors ARM: ixp4xx: Fix build with IXP4XX_INDIRECT_PCI ARM: ebsa110: Properly override I/O accessors ARC: Remove redundant PCI_IOBASE declaration documentation: memory-barriers: clarify relaxed io accessor semantics x86: io: implement dummy relaxed accessor macros for writes tile: io: implement dummy relaxed accessor macros for writes sparc: io: implement dummy relaxed accessor macros for writes powerpc: io: implement dummy relaxed accessor macros for writes parisc: io: implement dummy relaxed accessor macros for writes mn10300: io: implement dummy relaxed accessor macros for writes ...	2014-12-09 17:25:00 -08:00
Mark Brown	addaeea9ee	Merge remote-tracking branches 'asoc/topic/hdmi', 'asoc/topic/intel', 'asoc/topic/jack', 'asoc/topic/jz4740' and 'asoc/topic/lm49453' into asoc-next	2014-12-08 13:12:00 +00:00
Juergen Gross	90fff3ea15	xen: introduce helper functions to do safe read and write accesses Introduce two helper functions to safely read and write unsigned long values from or to memory when the access may fault because the mapping is non-present or read-only. These helpers can be used instead of open coded uses of __get_user() and __put_user() avoiding the need to do casts to fix sparse warnings. Use the helpers in page.h and p2m.c. This will fix the sparse warnings when doing "make C=1". Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-08 10:53:59 +00:00
Ingo Molnar	2a2662bf88	Merge branch 'perf/core-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into perf/hw_breakpoints Pull AMD range breakpoints support from Frederic Weisbecker: " - Extend breakpoint tools and core to support address range through perf event with initial backend support for AMD extended breakpoints. Syntax is: perf record -e mem:addr/len:type For example set write breakpoint from 0x1000 to 0x1200 (0x1000 + 512) perf record -e mem:0x1000/512:w - Clean up a bit breakpoint code validation It has been acked by Jiri and Oleg. " Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-12-08 11:50:24 +01:00
Oleg Nesterov	78bff1c868	x86/ticketlock: Fix spin_unlock_wait() livelock arch_spin_unlock_wait() looks very suboptimal, to the point I think this is just wrong and can lead to livelock: if the lock is heavily contended we can never see head == tail. But we do not need to wait for arch_spin_is_locked() == F. If it is locked we only need to wait until the current owner drops this lock. So we could simply spin until old_head != lock->tickets.head in this case, but .head can overflow and thus we can't check "unlocked" only once before the main loop. Also, the "unlocked" check can ignore TICKET_SLOWPATH_FLAG bit. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Paul E.McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Waiman Long <Waiman.Long@hp.com> Link: http://lkml.kernel.org/r/20141201213417.GA5842@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-12-08 11:36:44 +01:00
Borislav Petkov	fbae4ba8c4	x86, microcode: Reload microcode on resume Normally, we do reapply microcode on resume. However, in the cases where that microcode comes from the early loader and the late loader hasn't been utilized yet, there's no easy way for us to go and apply the patch applied during boot by the early loader. Thus, reuse the patch stashed by the early loader for the BSP. Signed-off-by: Borislav Petkov <bp@suse.de>	2014-12-06 13:03:03 +01:00
Wanpeng Li	203000993d	kvm: vmx: add MSR logic for XSAVES Add logic to get/set the XSS model-specific register. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-05 13:57:39 +01:00
Wanpeng Li	f53cd63c2d	kvm: x86: handle XSAVES vmcs and vmexit Initialize the XSS exit bitmap. It is zero so there should be no XSAVES or XRSTORS exits. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-05 13:57:33 +01:00
Wanpeng Li	55412b2eda	kvm: x86: Add kvm_x86_ops hook that enables XSAVES for guest Expose the XSAVES feature to the guest if the kvm_x86_ops say it is available. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-05 13:57:16 +01:00
Juergen Gross	054954eb05	xen: switch to linear virtual mapped sparse p2m list At start of the day the Xen hypervisor presents a contiguous mfn list to a pv-domain. In order to support sparse memory this mfn list is accessed via a three level p2m tree built early in the boot process. Whenever the system needs the mfn associated with a pfn this tree is used to find the mfn. Instead of using a software walked tree for accessing a specific mfn list entry this patch is creating a virtual address area for the entire possible mfn list including memory holes. The holes are covered by mapping a pre-defined page consisting only of "invalid mfn" entries. Access to a mfn entry is possible by just using the virtual base address of the mfn list and the pfn as index into that list. This speeds up the (hot) path of determining the mfn of a pfn. Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0 showed following improvements: Elapsed time: 32:50 -> 32:35 System: 18:07 -> 17:47 User: 104:00 -> 103:30 Tested with following configurations: - 64 bit dom0, 8GB RAM - 64 bit dom0, 128 GB RAM, PCI-area above 4 GB - 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without the patch) - 32 bit domU, ballooning up and down - 32 bit domU, save and restore - 32 bit domU with PCI passthrough - 64 bit domU, 8 GB, 2049 MB, 5000 MB - 64 bit domU, ballooning up and down - 64 bit domU, save and restore - 64 bit domU with PCI passthrough Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 14:09:15 +00:00
Juergen Gross	0aad568983	xen: Hide get_phys_to_machine() to be able to tune common path Today get_phys_to_machine() is always called when the mfn for a pfn is to be obtained. Add a wrapper __pfn_to_mfn() as inline function to be able to avoid calling get_phys_to_machine() when possible as soon as the switch to a linear mapped p2m list has been done. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 14:09:09 +00:00
Juergen Gross	792230c3a6	x86: Introduce function to get pmd entry pointer Introduces lookup_pmd_address() to get the address of the pmd entry related to a virtual address in the current address space. This function is needed for support of a virtual mapped sparse p2m list in xen pv domains, as we need the address of the pmd entry, not the one of the pte in that case. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 14:09:04 +00:00
Juergen Gross	5b8e7d8054	xen: Delay invalidating extra memory When the physical memory configuration is initialized the p2m entries for not pouplated memory pages are set to "invalid". As those pages are beyond the hypervisor built p2m list the p2m tree has to be extended. This patch delays processing the extra memory related p2m entries during the boot process until some more basic memory management functions are callable. This removes the need to create new p2m entries until virtual memory management is available. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 14:08:59 +00:00
Juergen Gross	1f3ac86b4c	xen: Delay remapping memory of pv-domain Early in the boot process the memory layout of a pv-domain is changed to match the E820 map (either the host one for Dom0 or the Xen one) regarding placement of RAM and PCI holes. This requires removing memory pages initially located at positions not suitable for RAM and adding them later at higher addresses where no restrictions apply. To be able to operate on the hypervisor supported p2m list until a virtual mapped linear p2m list can be constructed, remapping must be delayed until virtual memory management is initialized, as the initial p2m list can't be extended unlimited at physical memory initialization time due to it's fixed structure. A further advantage is the reduction in complexity and code volume as we don't have to be careful regarding memory restrictions during p2m updates. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 14:08:48 +00:00
Juergen Gross	820c4db2be	xen: Make functions static Some functions in arch/x86/xen/p2m.c are used locally only. Make them static. Rearrange the functions in p2m.c to avoid forward declarations. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 14:08:37 +00:00
Boris Ostrovsky	14520c92cb	xen/pci: Use APIC directly when APIC virtualization hardware is available When hardware supports APIC/x2APIC virtualization we don't need to use pirqs for MSI handling and instead use APIC since most APIC accesses (MMIO or MSR) will now be processed without VMEXITs. As an example, netperf on the original code produces this profile (collected wih 'xentrace -e 0x0008ffff -T 5'): 342 cpu_change 260 CPUID 34638 HLT 64067 INJ_VIRQ 28374 INTR 82733 INTR_WINDOW 10 NPF 24337 TRAP 370610 vlapic_accept_pic_intr 307528 VMENTRY 307527 VMEXIT 140998 VMMCALL 127 wrap_buffer After applying this patch the same test shows 230 cpu_change 260 CPUID 36542 HLT 174 INJ_VIRQ 27250 INTR 222 INTR_WINDOW 20 NPF 24999 TRAP 381812 vlapic_accept_pic_intr 166480 VMENTRY 166479 VMEXIT 77208 VMMCALL 81 wrap_buffer ApacheBench results (ab -n 10000 -c 200) improve by about 10% Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 13:02:43 +00:00
Stefano Stabellini	a4dba13089	xen/arm/arm64: introduce xen_arch_need_swiotlb Introduce an arch specific function to find out whether a particular dma mapping operation needs to bounce on the swiotlb buffer. On ARM and ARM64, if the page involved is a foreign page and the device is not coherent, we need to bounce because at unmap time we cannot execute any required cache maintenance operations (we don't know how to find the pfn from the mfn). No change of behaviour for x86. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2014-12-04 12:41:54 +00:00
Stefano Stabellini	a0f2dee0cd	xen: add a dma_addr_t dev_addr argument to xen_dma_map_page dev_addr is the machine address of the page. The new parameter can be used by the ARM and ARM64 implementations of xen_dma_map_page to find out if the page is a local page (pfn == mfn) or a foreign page (pfn != mfn). dev_addr could be retrieved again from the physical address, using pfn_to_mfn, but it requires accessing an rbtree. Since we already have the dev_addr in our hands at the call site there is no need to get the mfn twice. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2014-12-04 12:41:51 +00:00
Jacob Shin	d6d55f0b9d	perf/x86/amd: AMD support for bp_len > HW_BREAKPOINT_LEN_8 Implement hardware breakpoint address mask for AMD Family 16h and above processors. CPUID feature bit indicates hardware support for DRn_ADDR_MASK MSRs. These masks further qualify DRn/DR7 hardware breakpoint addresses to allow matching of larger addresses ranges. Valuable advice and pseudo code from Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jacob Shin <jacob.w.shin@gmail.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: xiakaixu <xiakaixu@huawei.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2014-12-03 15:14:26 +01:00
Steven Rostedt (Red Hat)	4bcdf1522f	ftrace/x86: Move MCOUNT_SAVE_FRAME out of header file Linus pointed out that MCOUNT_SAVE_FRAME is used in only a single file and that there's no reason that it should be in a header file. Move the macro to the code that uses it. Link: http://lkml.kernel.org/r/CA+55aFwF+qCGSKdGaEgW4p6N65GZ5_XTV=1NbtWDvxnd5yYLiw@mail.gmail.com Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-01 14:07:16 -05:00
Borislav Petkov	2ef84b3bb9	x86, microcode, AMD: Do not use smp_processor_id() in preemtible context Hand down the cpu number instead, otherwise lockdep screams when doing echo 1 > /sys/devices/system/cpu/microcode/reload. BUG: using smp_processor_id() in preemptible [00000000] code: amd64-microcode/2470 caller is debug_smp_processor_id+0x12/0x20 CPU: 1 PID: 2470 Comm: amd64-microcode Not tainted 3.18.0-rc6+ #26 ... Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1417428741-4501-1-git-send-email-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-01 11:51:05 +01:00
Rafael J. Wysocki	8a497cfdc0	Merge back earlier cpufreq material for 3.19-rc1.	2014-12-01 02:46:24 +01:00
Paolo Bonzini	2b4a273b42	kvm: x86: avoid warning about potential shift wrapping bug cs.base is declared as a __u64 variable and vector is a u32 so this causes a static checker warning. The user indeed can set "sipi_vector" to any u32 value in kvm_vcpu_ioctl_x86_set_vcpu_events(), but the value should really have 8-bit precision only. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-24 16:53:50 +01:00
Paolo Bonzini	c9eab58f64	KVM: x86: move device assignment out of kvm_host.h Create a new header, and hide the device assignment functions there. Move struct kvm_assigned_dev_kernel to assigned-dev.c by modifying arch/x86/kvm/iommu.c to take a PCI device struct. Based on a patch by Radim Krcmar <rkrcmark@redhat.com>. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-24 16:53:50 +01:00
Andy Lutomirski	82975bc6a6	uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but not on non-paranoid returns. I suspect that this is a mistake and that the code only works because int3 is paranoid. Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround for the x86 bug. With that bug fixed, we can remove _TIF_NOTIFY_RESUME from the uprobes code. Reported-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-23 14:25:28 -08:00
Andy Lutomirski	6f442be2fb	x86_64, traps: Stop using IST for #SS On a 32-bit kernel, this has no effect, since there are no IST stacks. On a 64-bit kernel, #SS can only happen in user code, on a failed iret to user space, a canonical violation on access via RSP or RBP, or a genuine stack segment violation in 32-bit kernel code. The first two cases don't need IST, and the latter two cases are unlikely fatal bugs, and promoting them to double faults would be fine. This fixes a bug in which the espfix64 code mishandles a stack segment violation. This saves 4k of memory per CPU and a tiny bit of code. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-23 13:56:19 -08:00
Radim Krčmář	c274e03af7	kvm: x86: move assigned-dev.c and iommu.c to arch/x86/ Now that ia64 is gone, we can hide deprecated device assignment in x86. Notable changes: - kvm_vm_ioctl_assigned_device() was moved to x86/kvm_arch_vm_ioctl() The easy parts were removed from generic kvm code, remaining - kvm_iommu_(un)map_pages() would require new code to be moved - struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-23 18:33:36 +01:00
Paolo Bonzini	6ef768fac9	kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/ ia64 does not need them anymore. Ack notifiers become x86-specific too. Suggested-by: Gleb Natapov <gleb@kernel.org> Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-21 18:02:37 +01:00
Chen Yucong	e3480271f5	x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error Until now, the mce_severity mechanism can only identify the severity of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter out DEFERRED error for AMD platform. This patch extends the mce_severity mechanism for handling UCNA/DEFERRED error. In order to do this, the patch introduces a new severity level - MCE_UCNA/DEFERRED_SEVERITY. In addition, mce_severity is specific to machine check exception, and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity mechanism in non-exception context, the patch also introduces a new argument (is_excp) for mce_severity. `is_excp' is used to explicitly specify the calling context of mce_severity. Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Chen Yucong <slaoub@gmail.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2014-11-19 10:55:43 -08:00
Dave Hansen	a1ea1c032b	x86: Cleanly separate use of asm-generic/mm_hooks.h asm-generic/mm_hooks.h provides some generic fillers for the 90% of architectures that do not need to hook some mmap-manipulation functions. A comment inside says: > Define generic no-op hooks for arch_dup_mmap and > arch_exit_mmap, to be included in asm-FOO/mmu_context.h > for any arch FOO which doesn't need to hook these. So, does x86 need to hook these? It depends on CONFIG_PARAVIRT. We conditionally include this generic header if we have CONFIG_PARAVIRT=n. That's madness. With this patch, x86 stops using asm-generic/mmu_hooks.h entirely. We use our own copies of the functions. The paravirt code provides some stubs if it is disabled, and we always call those stubs in our x86-private versions of arch_exit_mmap() and arch_dup_mmap(). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20141118182349.14567FA5@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-19 11:54:13 +01:00
Rafael J. Wysocki	bd2a0f6754	Merge back cpufreq material for 3.19-rc1.	2014-11-18 01:22:29 +01:00
Dave Hansen	1de4fa14ee	x86, mpx: Cleanup unused bound tables The previous patch allocates bounds tables on-demand. As noted in an earlier description, these can add up to HUGE amounts of memory. This has caused OOMs in practice when running tests. This patch adds support for freeing bounds tables when they are no longer in use. There are two types of mappings in play when unmapping tables: 1. The mapping with the actual data, which userspace is munmap()ing or brk()ing away, etc... 2. The mapping for the bounds table backing the data (is tagged with VM_MPX, see the patch "add MPX specific mmap interface"). If userspace use the prctl() indroduced earlier in this patchset to enable the management of bounds tables in kernel, when it unmaps the first type of mapping with the actual data, the kernel needs to free the mapping for the bounds table backing the data. This patch hooks in at the very end of do_unmap() to do so. We look at the addresses being unmapped and find the bounds directory entries and tables which cover those addresses. If an entire table is unused, we clear associated directory entry and free the table. Once we unmap the bounds table, we would have a bounds directory entry pointing at empty address space. That address space might now be allocated for some other (random) use, and the MPX hardware might now try to walk it as if it were a bounds table. That would be bad. So any unmapping of an enture bounds table has to be accompanied by a corresponding write to the bounds directory entry to invalidate it. That write to the bounds directory can fault, which causes the following problem: Since we are doing the freeing from munmap() (and other paths like it), we hold mmap_sem for write. If we fault, the page fault handler will attempt to acquire mmap_sem for read and we will deadlock. To avoid the deadlock, we pagefault_disable() when touching the bounds directory entry and use a get_user_pages() to resolve the fault. The unmapping of bounds tables happends under vm_munmap(). We also (indirectly) call vm_munmap() to _do_ the unmapping of the bounds tables. We avoid unbounded recursion by disallowing freeing of bounds tables for bounds tables. This would not occur normally, so should not have any practical impact. Being strict about it here helps ensure that we do not have an exploitable stack overflow. Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151831.E4531C4A@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 00:58:54 +01:00
Dave Hansen	fe3d197f84	x86, mpx: On-demand kernel allocation of bounds tables This is really the meat of the MPX patch set. If there is one patch to review in the entire series, this is the one. There is a new ABI here and this kernel code also interacts with userspace memory in a relatively unusual manner. (small FAQ below). Long Description: This patch adds two prctl() commands to provide enable or disable the management of bounds tables in kernel, including on-demand kernel allocation (See the patch "on-demand kernel allocation of bounds tables") and cleanup (See the patch "cleanup unused bound tables"). Applications do not strictly need the kernel to manage bounds tables and we expect some applications to use MPX without taking advantage of this kernel support. This means the kernel can not simply infer whether an application needs bounds table management from the MPX registers. The prctl() is an explicit signal from userspace. PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to require kernel's help in managing bounds tables. PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel won't allocate and free bounds tables even if the CPU supports MPX. PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds directory out of a userspace register (bndcfgu) and then cache it into a new field (->bd_addr) in the 'mm_struct'. PR_MPX_DISABLE_MANAGEMENT will set "bd_addr" to an invalid address. Using this scheme, we can use "bd_addr" to determine whether the management of bounds tables in kernel is enabled. Also, the only way to access that bndcfgu register is via an xsaves, which can be expensive. Caching "bd_addr" like this also helps reduce the cost of those xsaves when doing table cleanup at munmap() time. Unfortunately, we can not apply this optimization to #BR fault time because we need an xsave to get the value of BNDSTATUS. ==== Why does the hardware even have these Bounds Tables? ==== MPX only has 4 hardware registers for storing bounds information. If MPX-enabled code needs more than these 4 registers, it needs to spill them somewhere. It has two special instructions for this which allow the bounds to be moved between the bounds registers and some new "bounds tables". They are similar conceptually to a page fault and will be raised by the MPX hardware during both bounds violations or when the tables are not present. This patch handles those #BR exceptions for not-present tables by carving the space out of the normal processes address space (essentially calling the new mmap() interface indroduced earlier in this patch set.) and then pointing the bounds-directory over to it. The tables need to be accessed and controlled by userspace because the instructions for moving bounds in and out of them are extremely frequent. They potentially happen every time a register pointing to memory is dereferenced. Any direct kernel involvement (like a syscall) to access the tables would obviously destroy performance. ==== Why not do this in userspace? ==== This patch is obviously doing this allocation in the kernel. However, MPX does not strictly require anything in the kernel. It can theoretically be done completely from userspace. Here are a few ways this could be done. I don't think any of them are practical in the real-world, but here they are. Q: Can virtual space simply be reserved for the bounds tables so that we never have to allocate them? A: As noted earlier, these tables are HUGE. An X-GB virtual area needs 4X GB of virtual space, plus 2GB for the bounds directory. If we were to preallocate them for the 128TB of user virtual address space, we would need to reserve 512TB+2GB, which is larger than the entire virtual address space today. This means they can not be reserved ahead of time. Also, a single process's pre-popualated bounds directory consumes 2GB of virtual AND* physical memory. IOW, it's completely infeasible to prepopulate bounds directories. Q: Can we preallocate bounds table space at the same time memory is allocated which might contain pointers that might eventually need bounds tables? A: This would work if we could hook the site of each and every memory allocation syscall. This can be done for small, constrained applications. But, it isn't practical at a larger scale since a given app has no way of controlling how all the parts of the app might allocate memory (think libraries). The kernel is really the only place to intercept these calls. Q: Could a bounds fault be handed to userspace and the tables allocated there in a signal handler instead of in the kernel? A: (thanks to tglx) mmap() is not on the list of safe async handler functions and even if mmap() would work it still requires locking or nasty tricks to keep track of the allocation state there. Having ruled out all of the userspace-only approaches for managing bounds tables that we could think of, we create them on demand in the kernel. Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 00:58:53 +01:00
Dave Hansen	fcc7ffd679	x86, mpx: Decode MPX instruction to get bound violation information This patch sets bound violation fields of siginfo struct in #BR exception handler by decoding the user instruction and constructing the faulting pointer. We have to be very careful when decoding these instructions. They are completely controlled by userspace and may be changed at any time up to and including the point where we try to copy them in to the kernel. They may or may not be MPX instructions and could be completely invalid for all we know. Note: This code is based on Qiaowei Ren's specialized MPX decoder, but uses the generic decoder whenever possible. It was tested for robustness by generating a completely random data stream and trying to decode that stream. I also unmapped random pages inside the stream to test the "partial instruction" short read code. We kzalloc() the siginfo instead of stack allocating it because we need to memset() it anyway, and doing this makes it much more clear when it got initialized by the MPX instruction decoder. Changes from the old decoder: * Use the generic decoder instead of custom functions. Saved ~70 lines of code overall. * Remove insn->addr_bytes code (never used??) * Make sure never to possibly overflow the regoff[] array, plus check the register range correctly in 32 and 64-bit modes. * Allow get_reg() to return an error and have mpx_get_addr_ref() handle when it sees errors. * Only call insn_get_() near where we actually use the values instead if trying to call them all at once. Handle short reads from copy_from_user() and check the actual number of read bytes against what we expect from insn_get_length(). If a read stops in the middle of an instruction, we error out. * Actually check the opcodes intead of ignoring them. * Dynamically kzalloc() siginfo_t so we don't leak any stack data. * Detect and handle decoder failures instead of ignoring them. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151828.5BDD0915@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 00:58:53 +01:00
Qiaowei Ren	57319d80e1	x86, mpx: Add MPX-specific mmap interface We have chosen to perform the allocation of bounds tables in kernel (See the patch "on-demand kernel allocation of bounds tables") and to mark these VMAs with VM_MPX. However, there is currently no suitable interface to actually do this. Existing interfaces, like do_mmap_pgoff(), have no way to set a modified ->vm_ops or ->vm_flags and don't hold mmap_sem long enough to let a caller do it. This patch wraps mmap_region() and hold mmap_sem long enough to make the modifications to the VMA which we need. Also note the 32/64-bit #ifdef in the header. We actually need to do this at runtime eventually. But, for now, we don't support running 32-bit binaries on 64-bit kernels. Support for this will come in later patches. Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151827.CE440F67@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 00:58:53 +01:00
Dave Hansen	95290cf13e	x86, mpx: Add MPX to disabled features This allows us to use cpu_feature_enabled(X86_FEATURE_MPX) as both a runtime and compile-time check. When CONFIG_X86_INTEL_MPX is disabled, cpu_feature_enabled(X86_FEATURE_MPX) will evaluate at compile-time to 0. If CONFIG_X86_INTEL_MPX=y, then the cpuid flag will be checked at runtime. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Qiaowei Ren <qiaowei.ren@intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151823.B358EAD2@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 00:58:53 +01:00
Dave Hansen	62e7759b1b	x86, mpx: Rename cfg_reg_u and status_reg According to Intel SDM extension, MPX configuration and status registers should be BNDCFGU and BNDSTATUS. This patch renames cfg_reg_u and status_reg to bndcfgu and bndstatus. [ tglx: Renamed 'struct bndscr_struct' to 'struct bndscr' ] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Cc: Qiaowei Ren <qiaowei.ren@intel.com> Link: http://lkml.kernel.org/r/20141114151817.031762AC@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 00:58:53 +01:00
Dave Hansen	c04e051ccc	x86: mpx: Give bndX registers actual names Consider the bndX MPX registers. There 4 registers each containing a 64-bit lower and a 64-bit upper bound. That's 864 bits and we declare it thusly: struct bndregs_struct { u64 bndregs[8]; } Let's say you want to read the upper bound from the MPX register bnd2 out of the xsave buf. You do: bndregno = 2; upper_bound = xsave_buf->bndregs.bndregs[2bndregno+1]; That kinda sucks. Every time you access it, you need to know: 1. Each bndX register is two entries wide in "bndregs" 2. The lower comes first followed by upper. We do the +1 to get upper vs. lower. This replaces the old definition. You can now access them indexed by the register number directly, and with a meaningful name for the lower and upper bound: bndregno = 2; xsave_buf->bndreg[bndregno].upper_bound; It's now VERY clear that there are 4 registers. The programmer now doesn't have to care what order the lower and upper bounds are in, and it's harder to get it wrong. [ tglx: Changed ub/lb to upper_bound/lower_bound and renamed struct bndreg_struct to struct bndreg ] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Qiaowei Ren <qiaowei.ren@intel.com> Cc: "Yu, Fenghua" <fenghua.yu@intel.com> Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141031215820.5EA5E0EC@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 00:58:52 +01:00
Dave Hansen	6ba48ff46f	x86: Remove arbitrary instruction size limit in instruction decoder The current x86 instruction decoder steps along through the instruction stream but always ensures that it never steps farther than the largest possible instruction size (MAX_INSN_SIZE). The MPX code is now going to be doing some decoding of userspace instructions. We copy those from userspace in to the kernel and they're obviously completely untrusted coming from userspace. In addition to the constraint that instructions can only be so long, we also have to be aware of how long the buffer is that came in from userspace. This _looks_ to be similar to what the perf and kprobes is doing, but it's unclear to me whether they are affected. The whole reason we need this is that it is perfectly valid to be executing an instruction within MAX_INSN_SIZE bytes of an unreadable page. We should be able to gracefully handle short reads in those cases. This adds support to the decoder to record how long the buffer being decoded is and to refuse to "validate" the instruction if we would have gone over the end of the buffer to decode it. The kprobes code probably needs to be looked at here a bit more carefully. This patch still respects the MAX_INSN_SIZE limit there but the kprobes code does look like it might be able to be a bit more strict than it currently is. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: x86@kernel.org Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Link: http://lkml.kernel.org/r/20141114153957.E6B01535@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 00:58:52 +01:00
Thomas Gleixner	0dbcae8847	x86: mm: Move PAT only functions to mm/pat.c Commit `e00c8cc93c` "x86: Use new cache mode type in memtype related functions" broke the ARCH=um build. arch/x86/include/asm/cacheflush.h:67:36: error: return type is an incomplete type static inline enum page_cache_mode get_page_memtype(struct page *pg) The reason is simple. get_page_memtype() and set_page_memtype() require enum page_cache_mode now, which is defined in asm/pgtable_types.h. UM does not include that file for obvious reasons. The simple solution is to move that functions to arch/x86/mm/pat.c where the only callsites of this are located. They should have been there in the first place. Fixes: `e00c8cc93c` "x86: Use new cache mode type in memtype related functions" Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Weinberger <richard@nod.at>	2014-11-16 18:59:19 +01:00
Ingo Molnar	f108c898dd	Merge branch 'perf/urgent' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 11:31:17 +01:00
Juergen Gross	bd809af16e	x86: Enable PAT to use cache mode translation tables Update the translation tables from cache mode to pgprot values according to the PAT settings. This enables changing the cache attributes of a PAT index in just one place without having to change at the users side. With this change it is possible to use the same kernel with different PAT configurations, e.g. supporting Xen. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-18-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-16 11:04:26 +01:00
Juergen Gross	87ad0b713b	x86: Clean up pgtable_types.h Remove no longer used defines from pgtable_types.h as they are not used any longer. Switch __PAGE_KERNEL_NOCACHE to use cache mode type instead of pte bits. Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-15-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-16 11:04:26 +01:00
Juergen Gross	e00c8cc93c	x86: Use new cache mode type in memtype related functions Instead of directly using the cache mode bits in the pte switch to using the cache mode type. Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-14-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-16 11:04:26 +01:00
Juergen Gross	b14097bd91	x86: Use new cache mode type in mm/ioremap.c Instead of directly using the cache mode bits in the pte switch to using the cache mode type. Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-13-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-16 11:04:26 +01:00
Juergen Gross	49a3b3cbdf	x86: Use new cache mode type in mm/iomap_32.c Instead of directly using the cache mode bits in the pte switch to using the cache mode type. This requires to change io_reserve_memtype() as well. Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-9-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-16 11:04:25 +01:00
Juergen Gross	d85f33342a	x86: Use new cache mode type in asm/pgtable.h Instead of directly using the cache mode bits in the pte switch to using the cache mode type. This requires changing some callers of is_new_memtype_allowed() to be changed as well. Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-8-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-16 11:04:25 +01:00
Juergen Gross	c27ce0af89	x86: Use new cache mode type in include/asm/fb.h Instead of directly using cache mode bits in the pte switch to usage of the new cache mode type. Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-3-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-16 11:04:24 +01:00
Juergen Gross	281d4078be	x86: Make page cache mode a real type At the moment there are a lot of places that handle setting or getting the page cache mode by treating the pgprot bits equal to the cache mode. This is only true because there are a lot of assumptions about the setup of the PAT MSR. Otherwise the cache type needs to get translated into pgprot bits and vice versa. This patch tries to prepare for that by introducing a separate type for the cache mode and adding functions to translate between those and pgprot values. To avoid too much performance penalty the translation between cache mode and pgprot values is done via tables which contain the relevant information. Write-back cache mode is hard-wired to be 0, all other modes are configurable via those tables. For large pages there are translation functions as the PAT bit is located at different positions in the ptes of 4k and large pages. Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-2-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-16 11:04:24 +01:00
Ingo Molnar	595247f61f	* Support module unload for efivarfs - Mathias Krause * Another attempt at moving x86 to libstub taking advantage of the __pure attribute - Ard Biesheuvel * Add EFI runtime services section to ptdump - Mathias Krause -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUZnQGAAoJEC84WcCNIz1VfyUP/1MCVt4vepl7+0JzdUP/eVPs CwPM6gBOvgx1PviWrtvSU8UyjtYDqZx7jnCvyvbmlgixAqqIoFm80x5sd9DfyJBj vSrmavXaJgQomJN3N+fvaIpGJXp8NQmeNT87++UMb6VE5nYvx7suDcwfTqOaxcYt yDwKatTXTvQxDLlGgtymp2UhgVKBICs9WVo8weevB5LPmpt4TFCi1GDSimJkfg+0 JTvkKF+QxPmVqgwY7bgdFFcfhsYCux5VgtbD4DJKS3LgfLJLAMKPOt83DbeOSmIa 8zqtlF3eMwHgecKrMLSfIZH3XIl2gsIdPdvT6iBQkwwGZuSzG93JgwJ90HYiKoDm yNlffnhmdgI2RXO97UZJpzqGor+eNc0auuS4485PcE8NtZ1tbo20A/OCpfIzK8j8 Vk7sfZxaaHKF5PUtRe6vo3myRlUCofMIuSEWSF8d709R6AEEuia6RZ3Y45EJPROn fKOiLsf7Og1Mk43Iy2lb7kFT766OsUnZZHU/xiIZj/v94HPWFWoFPtxPC0IURvPx 24oiJxCnyXWGtoyn+SSprl+NAPuPsxVFYriTwaq2RBuoY0NAdy7NIXKe2HTp/WI0 oSTtYkCRcwiHv0aSrg+yQHmwH7y7m39S3yIS4t5LXenn2G4ObUUjhcAQdF2Ft0Pr MT/l+stTt390cpfpcQbE =he2O -----END PGP SIGNATURE----- Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/efi Pull EFI updates for v3.19 from Matt Fleming: - Support module unload for efivarfs - Mathias Krause - Another attempt at moving x86 to libstub taking advantage of the __pure attribute - Ard Biesheuvel - Add EFI runtime services section to ptdump - Mathias Krause Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 10:48:53 +01:00
Igor Mammedov	1d4e7e3c0b	kvm: x86: increase user memory slots to 509 With the 3 private slots, this gives us 512 slots total. Motivation for this is in addition to assigned devices support more memory hotplug slots, where 1 slot is used by a hotplugged memory stick. It will allow to support upto 256 hotplug memory slots and leave 253 slots for assigned devices and other devices that use them. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-14 10:02:40 +01:00
Aravind Gopalakrishnan	904cb3677f	perf/x86/amd/ibs: Update IBS MSRs and feature definitions New Fam15h models carry extra feature bits and extend the MSR register space for IBS ops. Adding them here. While at it, add functionality to read IbsBrTarget and OpData4 depending on their availability if user wants a PERF_SAMPLE_RAW. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Len Brown <len.brown@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <paulus@samba.org> Cc: <acme@kernel.org> Link: http://lkml.kernel.org/r/1415651066-13523-1-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-12 15:12:32 +01:00
Dirk Brandewie	2f86dc4cdd	intel_pstate: Add support for HWP Add support of Hardware Managed Performance States (HWP) described in Volume 3 section 14.4 of the SDM. With HWP enbaled intel_pstate will no longer be responsible for selecting P states for the processor. intel_pstate will continue to register to the cpufreq core as the scaling driver for CPUs implementing HWP. In HWP mode intel_pstate provides three functions reporting frequency to the cpufreq core, support for the set_policy() interface from the core and maintaining the intel_pstate sysfs interface in /sys/devices/system/cpu/intel_pstate. User preferences expressed via the set_policy() interface or the sysfs interface are forwared to the CPU via the HWP MSR interface. Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-11-12 00:04:38 +01:00
Dirk Brandewie	7787388772	x86: Add support for Intel HWP feature detection. Add support of Hardware Managed Performance States (HWP) described in Volume 3 section 14.4 of the SDM. One bit CPUID.06H:EAX[bit 7] expresses the presence of the HWP feature on the processor. The remaining bits CPUID.06H:EAX[bit 8-11] denote the presense of various HWP features. Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-11-12 00:04:37 +01:00
Mathias Krause	8266e31ed0	x86, ptdump: Add section for EFI runtime services In commit `3891a04aaf` ("x86-64, espfix: Don't leak bits 31:16 of %esp returning..") the "ESPFix Area" was added to the page table dump special sections. That area, though, has a limited amount of entries printed. The EFI runtime services are, unfortunately, located in-between the espfix area and the high kernel memory mapping. Due to the enforced limitation for the espfix area, the EFI mappings won't be printed in the page table dump. To make the ESP runtime service mappings visible again, provide them a dedicated entry. Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-11-11 22:28:57 +00:00
Ard Biesheuvel	243b6754cd	efi/x86: Move x86 back to libstub This reverts commit `84be880560`, which itself reverted my original attempt to move x86 from #include'ing .c files from across the tree to using the EFI stub built as a static library. The issue that affected the original approach was that splitting the implementation into several .o files resulted in the variable 'efi_early' becoming a global with external linkage, which under -fPIC implies that references to it must go through the GOT. However, dealing with this additional GOT entry turned out to be troublesome on some EFI implementations. (GCC's visibility=hidden attribute is supposed to lift this requirement, but it turned out not to work on the 32-bit build.) Instead, use a pure getter function to get a reference to efi_early. This approach results in no additional GOT entries being generated, so there is no need for any changes in the early GOT handling. Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-11-11 22:23:11 +00:00
Yijing Wang	03f56e42d0	Revert "PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()" The problem fixed by `0e4ccb1505` ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()") has been fixed in a simpler way by a previous commit ("PCI/MSI: Add pci_msi_ignore_mask to prevent writes to MSI/MSI-X Mask Bits"). The msi_mask_irq() and msix_mask_irq() x86_msi_ops added by `0e4ccb1505` are no longer needed, so revert the commit. default_msi_mask_irq() and default_msix_mask_irq() were added by `0e4ccb1505` and are still used by s390, so keep them for now. [bhelgaas: changelog] Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: David Vrabel <david.vrabel@citrix.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: xen-devel@lists.xenproject.org	2014-11-11 15:14:30 -07:00
Arnd Bergmann	1c8d29696f	Merge branch 'io' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into asm-generic * 'io' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux: documentation: memory-barriers: clarify relaxed io accessor semantics x86: io: implement dummy relaxed accessor macros for writes tile: io: implement dummy relaxed accessor macros for writes sparc: io: implement dummy relaxed accessor macros for writes powerpc: io: implement dummy relaxed accessor macros for writes parisc: io: implement dummy relaxed accessor macros for writes mn10300: io: implement dummy relaxed accessor macros for writes m68k: io: implement dummy relaxed accessor macros for writes m32r: io: implement dummy relaxed accessor macros for writes ia64: io: implement dummy relaxed accessor macros for writes cris: io: implement dummy relaxed accessor macros for writes frv: io: implement dummy relaxed accessor macros for writes xtensa: io: remove dummy relaxed accessor macros for reads s390: io: remove dummy relaxed accessor macros for reads microblaze: io: remove dummy relaxed accessor macros asm-generic: io: implement relaxed accessor macros as conditional wrappers Conflicts: include/asm-generic/io.h Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2014-11-11 19:55:45 +01:00
Thierry Reding	4707a341b4	/dev/mem: Use more consistent data types The xlate_dev_{kmem,mem}_ptr() functions take either a physical address or a kernel virtual address, so data types should be phys_addr_t and void . They both return a kernel virtual address which is only ever used in calls to copy_{from,to}_user(), so make variables that store it void rather than char * for consistency. Also only define a weak unxlate_dev_mem_ptr() function if architectures haven't overridden them in the asm/io.h header file. Signed-off-by: Thierry Reding <treding@nvidia.com>	2014-11-10 15:59:21 +01:00
Boris Ostrovsky	54279552bd	x86/core, x86/xen/smp: Use 'die_complete' completion when taking CPU down Commit `2ed53c0d6c` ("x86/smpboot: Speed up suspend/resume by avoiding 100ms sleep for CPU offline during S3") introduced completions to CPU offlining process. These completions are not initialized on Xen kernels causing a panic in play_dead_common(). Move handling of die_complete into common routines to make them available to Xen guests. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: tianyu.lan@intel.com Cc: konrad.wilk@oracle.com Cc: xen-devel@lists.xenproject.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414770572-7950-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-10 11:16:40 +01:00
Andy Lutomirski	07114f0f1c	x86_64: Add a comment explaining the TASK_SIZE_MAX guard page That guard page is absolutely necessary; explain why for posterity. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/23320cb5017c2da8475ec20fcde8089d82aa2699.1415144745.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-10 10:43:13 +01:00
Nadav Amit	9d88fca71a	KVM: x86: MOV to CR3 can set bit 63 Although Intel SDM mentions bit 63 is reserved, MOV to CR3 can have bit 63 set. As Intel SDM states in section 4.10.4 "Invalidation of TLBs and Paging-Structure Caches": " MOV to CR3. ... If CR4.PCIDE = 1 and bit 63 of the instructionâ€™s source operand is 0 ..." In other words, bit 63 is not reserved. KVM emulator currently consider bit 63 as reserved. Fix it. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-07 15:44:07 +01:00
Nadav Amit	82b32774c2	KVM: x86: Breakpoints do not consider CS.base x86 debug registers hold a linear address. Therefore, breakpoints detection should consider CS.base, and check whether instruction linear address equals (CS.base + RIP). This patch introduces a function to evaluate RIP linear address and uses it for breakpoints detection. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-07 15:44:04 +01:00
Jan Beulich	97b67ae559	x86-64: Use RIP-relative addressing for most per-CPU accesses Observing that per-CPU data (in the SMP case) is reachable by exploiting 64-bit address wraparound (building on the default kernel load address being at 16Mb), the one byte shorter RIP-relative addressing form can be used for most per-CPU accesses. The one exception are the "stable" reads, where the use of the "P" operand modifier prevents the compiler from using RIP-relative addressing, but is unavoidable due to the use of the "p" constraint (side note: with gcc 4.9.x the intended effect of this isn't being achieved anymore, see gcc bug 63637). With the dependency on the minimum kernel load address, arbitrarily low values for CONFIG_PHYSICAL_START are now no longer possible. A link time assertion is being added, directing to the need to increase that value when it triggers. Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/5458A1780200007800044A9D@mail.emea.novell.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-04 20:43:14 +01:00
Jan Beulich	2c773dd31f	x86: Convert a few more per-CPU items to read-mostly ones Both this_cpu_off and cpu_info aren't getting modified post boot, yet are being accessed on enough code paths that grouping them with other frequently read items seems desirable. For cpu_info this at the same time implies removing the cache line alignment (which afaict became pointless when it got converted to per-CPU data years ago). Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/54589BD20200007800044A84@mail.emea.novell.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-04 20:13:28 +01:00
Andy Lutomirski	1ad83c858c	x86_64,vsyscall: Make vsyscall emulation configurable This adds CONFIG_X86_VSYSCALL_EMULATION, guarded by CONFIG_EXPERT. Turning it off completely disables vsyscall emulation, saving ~3.5k for vsyscall_64.c, 4k for vsyscall_emu_64.S (the fake vsyscall page), some tiny amount of core mm code that supports a gate area, and possibly 4k for a wasted pagetable. The latter is because the vsyscall addresses are misaligned and fit poorly in the fixmap. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/406db88b8dd5f0cbbf38216d11be34bbb43c7eae.1414618407.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-03 21:44:57 +01:00
James Custer	3ab0c49fd6	x86: UV BAU: Increase maximum CPUs per socket/hub We have encountered hardware with 18 cores/socket that gives 36 CPUs/socket with hyperthreading enabled. This exceeds the current MAX_CPUS_PER_SOCKET causing a failure in get_cpu_topology. Increase MAX_CPUS_PER_SOCKET to 64 and MAX_CPUS_PER_UVHUB to 128. Signed-off-by: James Custer <jcuster@sgi.com> Cc: Russ Anderson <rja@sgi.com> Link: http://lkml.kernel.org/r/1414952199-185319-1-git-send-email-jcuster@sgi.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-03 13:49:24 +01:00
Andy Lutomirski	e76b027e64	x86,vdso: Use LSL unconditionally for vgetcpu LSL is faster than RDTSCP and works everywhere; there's no need to switch between them depending on CPU. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Andi Kleen <andi@firstfloor.org> Link: http://lkml.kernel.org/r/72f73d5ec4514e02bba345b9759177ef03742efb.1414706021.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-03 13:41:53 +01:00
Minfei Huang	63e7b6d90c	x86: mm: Re-use the early_ioremap fixed area The temp fixed area is only used during boot for early_ioremap(), and it is unused when ioremap() is functional. vmalloc/pkmap area become available after early boot so the temp fixed area is available for re-use. The virtual address is more precious on i386, especially turning on high memory. So we can re-use the virtual address space. Remove the now unused defines FIXADDR_BOOT_START and FIXADDR_BOOT_SIZE. Signed-off-by: Minfei Huang <mnfhuang@gmail.com> Cc: pbonzini@redhat.com Cc: bp@suse.de Link: http://lkml.kernel.org/r/1414582717-32729-1-git-send-email-mnfhuang@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-03 13:40:44 +01:00
Chao Peng	612263b30c	KVM: x86: Enable Intel AVX-512 for guest Expose Intel AVX-512 feature bits to guest. Also add checks for xcr0 AVX512 related bits according to spec: http://download-software.intel.com/sites/default/files/managed/71/2e/319433-017.pdf Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:30 +01:00
Nadav Amit	16f8a6f979	KVM: vmx: Unavailable DR4/5 is checked before CPL If DR4/5 is accessed when it is unavailable (since CR4.DE is set), then #UD should be generated even if CPL>0. This is according to Intel SDM Table 6-2: "Priority Among Simultaneous Exceptions and Interrupts". Note, that this may happen on the first DR access, even if the host does not sets debug breakpoints. Obviously, it occurs when the host debugs the guest. This patch moves the DR4/5 checks from __kvm_set_dr/_kvm_get_dr to handle_dr. The emulator already checks DR4/5 availability in check_dr_read. Nested virutalization related calls to kvm_set_dr/kvm_get_dr would not like to inject exceptions to the guest. As for SVM, the patch follows the previous logic as much as possible. Anyhow, it appears the DR interception code might be buggy - even if the DR access may cause an exception, the instruction is skipped. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:26 +01:00
Nadav Amit	394457a928	KVM: x86: some apic broadcast modes does not work KVM does not deliver x2APIC broadcast messages with physical mode. Intel SDM (10.12.9 ICR Operation in x2APIC Mode) states: "A destination ID value of FFFF_FFFFH is used for broadcast of interrupts in both logical destination and physical destination modes." In addition, the local-apic enables cluster mode broadcast. As Intel SDM 10.6.2.2 says: "Broadcast to all local APICs is achieved by setting all destination bits to one." This patch enables cluster mode broadcast. The fix tries to combine broadcast in different modes through a unified code. One rare case occurs when the source of IPI has its APIC disabled. In such case, the source can still issue IPIs, but since the source is not obliged to have the same LAPIC mode as the enabled ones, we cannot rely on it. Since it is a rare case, it is unoptimized and done on the slow-path. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com> [As per Radim's review, use unsigned int for X2APIC_BROADCAST, return bool from kvm_apic_broadcast. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:22 +01:00
Minfei Huang	96e70f8328	x86/mm: Avoid overlap the fixmap area on i386 It is a problem when configuring high memory off where the vmalloc reserve area could end up overlapping the early_ioremap fixmap area on i386. The ordering of the VMALLOC_RESERVE space is: FIXADDR_TOP fixed_addresses FIXADDR_START early_ioremap fixed addresses FIXADDR_BOOT_START Persistent kmap area PKMAP_BASE VMALLOC_END Vmalloc area VMALLOC_START high_memory The available address we can use is lower than FIXADDR_BOOT_START. So we will set the kmap boundary below the FIXADDR_BOOT_START, if we configure high memory. If we configure high memory, the vmalloc reserve area should end up to PKMAP_BASE, otherwise should end up to FIXADDR_BOOT_START. Signed-off-by: Minfei Huang <mnfhuang@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/6B680A9E-6CE9-4C96-934B-CB01DCB58278@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 12:21:48 +01:00
Andy Lutomirski	61a492fb17	x86_64/vdso: Remove jiffies from the vvar page I think that the jiffies vvar was once used for the vgetcpu cache. That code is long gone, so let's just make jiffies be a normal variable. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/fcfee6f8749af14d96373a9e2656354ad0b95499.1411494540.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 11:22:13 +01:00
Andy Lutomirski	c4a7bba29b	sched/x86: Add a comment clarifying LDT context switching The code is correct, but only for a rather subtle reason. This confused me for quite a while when I read switch_mm, so clarify the code to avoid confusing other people, too. TBH, I wouldn't be surprised if this code was only correct by accident. [ I wouldn't normally send a comment-only patch, but it took me a long time to first figure out wtf was going on here, and then to figure out why this wasn't exploitable by malicious code, and then to figure out why this oddity had no user-visible effect at all. Let's spare future readers the same confusion. ] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/36275c99801a87d8dcf0502a41cf4e2ad81aae46.1412623954.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 11:11:57 +01:00
Andy Lutomirski	2c7577a758	sched/x86_64: Don't save flags on context switch Now that the kernel always runs with clean flags (in particular, NT is clear), there is no need to save and restore flags on every context switch. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Sebastian Lackner <sebastian@fds-team.de> Cc: Anish Bhatt <anish@chelsio.com> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/bf6fb790787eb95b922157838f52712c25dda157.1412187233.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 11:11:30 +01:00
Oleg Nesterov	e2336f6e51	sched: Kill task_preempt_count() task_preempt_count() is pointless if preemption counter is per-cpu, currently this is x86 only. It is only valid if the task is not running, and even in this case the only info it can provide is the state of PREEMPT_ACTIVE bit. Change its single caller to check p->on_rq instead, this should be the same if p->state != TASK_RUNNING, and kill this helper. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Alexander Graf <agraf@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/20141008183348.GC17495@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 10:47:56 +01:00
Oleg Nesterov	009f60e276	sched: stop the unbound recursion in preempt_schedule_context() preempt_schedule_context() does preempt_enable_notrace() at the end and this can call the same function again; exception_exit() is heavy and it is quite possible that need-resched is true again. 1. Change this code to dec preempt_count() and check need_resched() by hand. 2. As Linus suggested, we can use the PREEMPT_ACTIVE bit and avoid the enable/disable dance around __schedule(). But in this case we need to move into sched/core.c. 3. Cosmetic, but x86 forgets to declare this function. This doesn't really matter because it is only called by asm helpers, still it make sense to add the declaration into asm/preempt.h to match preempt_schedule(). Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Graf <agraf@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20141005202322.GB27962@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 10:46:05 +01:00
Subhransu S. Prusty	9a80e8f597	ASoC: Intel: mrfld: Define sst_res_info for acpi To query resources in ACPI systems we need to define a data structure. This would be set as platform_info for the devices. Signed-off-by: Subhransu S. Prusty <subhransu.s.prusty@intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>	2014-10-27 18:02:38 +00:00
Subhransu S. Prusty	43c5e23197	ASoC: Intel: mrfld - Define ipc_info structure This will be used to abstract the differances in ipc offsets for different chips. Signed-off-by: Subhransu S. Prusty <subhransu.s.prusty@intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>	2014-10-27 18:02:38 +00:00
Linus Torvalds	96971e9aa9	This is a pretty large update. I think it is roughly as big as what I usually had for the _whole_ rc period. There are a few bad bugs where the guest can OOPS or crash the host. We have also started looking at attack models for nested virtualization; bugs that usually result in the guest ring 0 crashing itself become more worrisome if you have nested virtualization, because the nested guest might bring down the non-nested guest as well. For current uses of nested virtualization these do not really have a security impact, but you never know and bugs are bugs nevertheless. A lot of these bugs are in 3.17 too, resulting in a large number of stable@ Ccs. I checked that all the patches apply there with no conflicts. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJUSjmSAAoJEL/70l94x66D2cYH/3JKWsTzhXjHGxZcXQQ85CwR 49hp/crCLWJ2YRKzyAOkvwPI0/SgYKM5wJ8kgtKlpLxrPZKYwhGd1S9tKf6EdAib 5gc/SDDAgHmkqL3IrXmkyKzUVeUWvgD/IFi1Sqalko1blpRlaN/JyJV0mjjGCbA+ yH3Qi5tD0X00u00ycuZCB6mrFH0PH87BmKFiz6bSSJ43tsgD9AVD64BZid6c6hwm iaIfNcIuShavlv1TKG80cSez2qtNXjRLeTN8A10gVZo3hof/wP8aRm+LxF/1JEZX OsoNCjOhhL29qafcZOg3j/atbiAzWtSGV3vjU+iWh5mnN5oFZHcPgIGucQsuFec= =9oQY -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "This is a pretty large update. I think it is roughly as big as what I usually had for the _whole_ rc period. There are a few bad bugs where the guest can OOPS or crash the host. We have also started looking at attack models for nested virtualization; bugs that usually result in the guest ring 0 crashing itself become more worrisome if you have nested virtualization, because the nested guest might bring down the non-nested guest as well. For current uses of nested virtualization these do not really have a security impact, but you never know and bugs are bugs nevertheless. A lot of these bugs are in 3.17 too, resulting in a large number of stable@ Ccs. I checked that all the patches apply there with no conflicts" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: vfio: fix unregister kvm_device_ops of vfio KVM: x86: Wrong assertion on paging_tmpl.h kvm: fix excessive pages un-pinning in kvm_iommu_map error path. KVM: x86: PREFETCH and HINT_NOP should have SrcMem flag KVM: x86: Emulator does not decode clflush well KVM: emulate: avoid accessing NULL ctxt->memopp KVM: x86: Decoding guest instructions which cross page boundary may fail kvm: x86: don't kill guest on unknown exit reason kvm: vmx: handle invvpid vm exit gracefully KVM: x86: Handle errors when RIP is set during far jumps KVM: x86: Emulator fixes for eip canonical checks on near branches KVM: x86: Fix wrong masking on relative jump/call KVM: x86: Improve thread safety in pit KVM: x86: Prevent host from panicking on shared MSR writes. KVM: x86: Check non-canonical addresses upon WRMSR	2014-10-24 12:42:55 -07:00
Petr Matousek	a642fc3050	kvm: vmx: handle invvpid vm exit gracefully On systems with invvpid instruction support (corresponding bit in IA32_VMX_EPT_VPID_CAP MSR is set) guest invocation of invvpid causes vm exit, which is currently not handled and results in propagation of unknown exit to userspace. Fix this by installing an invvpid vm exit handler. This is CVE-2014-3646. Cc: stable@vger.kernel.org Signed-off-by: Petr Matousek <pmatouse@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-10-24 13:21:17 +02:00
Andy Honig	8b3c3104c3	KVM: x86: Prevent host from panicking on shared MSR writes. The previous patch blocked invalid writes directly when the MSR is written. As a precaution, prevent future similar mistakes by gracefulling handle GPs caused by writes to shared MSRs. Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> [Remove parts obsoleted by Nadav's patch. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-10-24 13:21:08 +02:00
Nadav Amit	854e8bb1aa	KVM: x86: Check non-canonical addresses upon WRMSR Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is written to certain MSRs. The behavior is "almost" identical for AMD and Intel (ignoring MSRs that are not implemented in either architecture since they would anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if non-canonical address is written on Intel but not on AMD (which ignores the top 32-bits). Accordingly, this patch injects a #GP on the MSRs which behave identically on Intel and AMD. To eliminate the differences between the architecutres, the value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to canonical value before writing instead of injecting a #GP. Some references from Intel and AMD manuals: According to Intel SDM description of WRMSR instruction #GP is expected on WRMSR "If the source register contains a non-canonical address and ECX specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE, IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP." According to AMD manual instruction manual: LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the LSTAR and CSTAR registers. If an RIP written by WRMSR is not in canonical form, a general-protection exception (#GP) occurs." IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the base field must be in canonical form or a #GP fault will occur." IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must be in canonical form." This patch fixes CVE-2014-3610. Cc: stable@vger.kernel.org Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-10-24 13:21:08 +02:00
Linus Torvalds	8c81f48e16	Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 EFI updates from Peter Anvin: "This patchset falls under the "maintainers that grovel" clause in the v3.18-rc1 announcement. We had intended to push it late in the merge window since we got it into the -tip tree relatively late. Many of these are relatively simple things, but there are a couple of key bits, especially Ard's and Matt's patches" * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) rtc: Disable EFI rtc for x86 efi: rtc-efi: Export platform:rtc-efi as module alias efi: Delete the in_nmi() conditional runtime locking efi: Provide a non-blocking SetVariable() operation x86/efi: Adding efi_printks on memory allocationa and pci.reads x86/efi: Mark initialization code as such x86/efi: Update comment regarding required phys mapped EFI services x86/efi: Unexport add_efi_memmap variable x86/efi: Remove unused efi_call* macros efi: Resolve some shadow warnings arm64: efi: Format EFI memory type & attrs with efi_md_typeattr_format() ia64: efi: Format EFI memory type & attrs with efi_md_typeattr_format() x86: efi: Format EFI memory type & attrs with efi_md_typeattr_format() efi: Introduce efi_md_typeattr_format() efi: Add macro for EFI_MEMORY_UCE memory attribute x86/efi: Clear EFI_RUNTIME_SERVICES if failing to enter virtual mode arm64/efi: Do not enter virtual mode if booting with efi=noruntime or noefi arm64/efi: uefi_init error handling fix efi: Add kernel param efi=noruntime lib: Add a generic cmdline parse function parse_option_str ...	2014-10-23 14:45:09 -07:00
Borislav Petkov	a3a529d104	x86, MCE, AMD: Drop software-defined bank in error thresholding Aravind had the good question about why we're assigning a software-defined bank when reporting error thresholding errors instead of simply using the bank which reports the last error causing the overflow. Digging through git history, it pointed to `9526866439` ("[PATCH] x86_64: mce_amd support for family 0x10 processors") which added that functionality. The problem with this, however, is that tools don't know about software-defined banks and get puzzled. So drop that K8_MCE_THRESHOLD_BASE and simply use the hw bank reporting the thresholding interrupt. Save us a couple of MSR reads while at it. Reported-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Link: https://lkml.kernel.org/r/5435B206.60402@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2014-10-21 22:28:48 +02:00
Will Deacon	cbc908ef8e	x86: io: implement dummy relaxed accessor macros for writes write{b,w,l,q}_relaxed are implemented by some architectures in order to permit memory-mapped I/O accesses with weaker barrier semantics than the non-relaxed variants. This patch adds dummy macros for the write accessors to x86, in the same vein as the dummy definitions for the relaxed read accessors. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2014-10-20 18:49:18 +01:00
Vinod Koul	163d2089d2	ASoC: Intel: mrfld - add the dsp sst driver The SST driver is the missing piece in our driver stack not upstreamed, so push it now :) This driver currently supports PCI device on Merrifield. Future updates will bring support for ACPI device as well as future update to PCI devices as well In subsequent patches support is added for DSP loading using memcpy, pcm operations and compressed ops. Signed-off-by: Subhransu S. Prusty <subhransu.s.prusty@intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>	2014-10-20 12:20:34 +01:00
Linus Torvalds	1f6075f990	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more perf updates from Ingo Molnar: "A second (and last) round of late coming fixes and changes, almost all of them in perf tooling: User visible tooling changes: - Add period data column and make it default in 'perf script' (Jiri Olsa) - Add a visual cue for toggle zeroing of samples in 'perf top' (Taeung Song) - Improve callchains when using libunwind (Namhyung Kim) Tooling fixes and infrastructure changes: - Fix for double free in 'perf stat' when using some specific invalid command line combo (Yasser Shalabi) - Fix off-by-one bugs in map->end handling (Stephane Eranian) - Fix off-by-one bug in maps__find(), also related to map->end handling (Namhyung Kim) - Make struct symbol->end be the first addr after the symbol range, to make it match the convention used for struct map->end. (Arnaldo Carvalho de Melo) - Fix perf_evlist__add_pollfd() error handling in 'perf kvm stat live' (Jiri Olsa) - Fix python test build by moving callchain_param to an object linked into the python binding (Jiri Olsa) - Document sysfs events/ interfaces (Cody P Schafer) - Fix typos in perf/Documentation (Masanari Iida) - Add missing 'struct option' forward declaration (Arnaldo Carvalho de Melo) - Add option to copy events when queuing for sorting across cpu buffers and enable it for 'perf kvm stat live', to avoid having events left in the queue pointing to the ring buffer be rewritten in high volume sessions. (Alexander Yarygin, improving work done by David Ahern): - Do not include a struct hists per perf_evsel, untangling the histogram code from perf_evsel, to pave the way for exporting a minimalistic tools/lib/api/perf/ library usable by tools/perf and initially by the rasd daemon being developed by Borislav Petkov, Robert Richter and Jean Pihet. (Arnaldo Carvalho de Melo) - Make perf_evlist__open(evlist, NULL, NULL), i.e. without cpu and thread maps mean syswide monitoring, reducing the boilerplate for tools that only want system wide mode. (Arnaldo Carvalho de Melo) - Move exit stuff from perf_evsel__delete to perf_evsel__exit, delete should be just a front end for exit + free (Arnaldo Carvalho de Melo) - Add support to new style format of kernel PMU event. (Kan Liang) and other misc fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits) perf script: Add period as a default output column perf script: Add period data column perf evsel: No need to drag util/cgroup.h perf evlist: Add missing 'struct option' forward declaration perf evsel: Move exit stuff from __delete to __exit kprobes/x86: Remove stale ARCH_SUPPORTS_KPROBES_ON_FTRACE define perf kvm stat live: Enable events copying perf session: Add option to copy events when queueing perf Documentation: Fix typos in perf/Documentation perf trace: Use thread_{,_set}_priv helpers perf kvm: Use thread_{,_set}_priv helpers perf callchain: Create an address space per thread perf report: Set callchain_param.record_mode for future use perf evlist: Fix for double free in tools/perf stat perf test: Add test case for pmu event new style format perf tools: Add support to new style format of kernel PMU event perf tools: Parse the pmu event prefix and suffix Revert "perf tools: Default to cpu// for events v5" perf Documentation: Remove Ruplicated docs for powerpc cpu specific events perf Documentation: sysfs events/ interfaces ...	2014-10-19 11:55:41 -07:00
Linus Torvalds	857b50f5d0	Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus Pull MIPS updates from Ralf Baechle: "This is the MIPS pull request for the next kernel: - Zubair's patch series adds CMA support for MIPS. Doing so it also touches ARM64 and x86. - remove the last instance of IRQF_DISABLED from arch/mips - updates to two of the MIPS defconfig files. - cleanup of how cache coherency bits are handled on MIPS and implement support for write-combining. - platform upgrades for Alchemy - move MIPS DTS files to arch/mips/boot/dts/" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (24 commits) MIPS: ralink: remove deprecated IRQF_DISABLED MIPS: pgtable.h: Implement the pgprot_writecombine function for MIPS MIPS: cpu-probe: Set the write-combine CCA value on per core basis MIPS: pgtable-bits: Define the CCA bit for WC writes on Ingenic cores MIPS: pgtable-bits: Move the CCA bits out of the core's ifdef blocks MIPS: DMA: Add cma support x86: use generic dma-contiguous.h arm64: use generic dma-contiguous.h asm-generic: Add dma-contiguous.h MIPS: BPF: Add new emit_long_instr macro MIPS: ralink: Move device-trees to arch/mips/boot/dts/ MIPS: Netlogic: Move device-trees to arch/mips/boot/dts/ MIPS: sead3: Move device-trees to arch/mips/boot/dts/ MIPS: Lantiq: Move device-trees to arch/mips/boot/dts/ MIPS: Octeon: Move device-trees to arch/mips/boot/dts/ MIPS: Add support for building device-tree binaries MIPS: Create common infrastructure for building built-in device-trees MIPS: SEAD3: Enable DEVTMPFS MIPS: SEAD3: Regenerate defconfigs MIPS: Alchemy: DB1300: Add touch penirq support ...	2014-10-18 14:24:36 -07:00
Anton Blanchard	691286b556	kprobes/x86: Remove stale ARCH_SUPPORTS_KPROBES_ON_FTRACE define Commit `e7dbfe349d` ("kprobes/x86: Move ftrace-based kprobe code into kprobes-ftrace.c") switched from using ARCH_SUPPORTS_KPROBES_ON_FTRACE to CONFIG_KPROBES_ON_FTRACE but missed removing the define. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: masami.hiramatsu.pt@hitachi.com Cc: ananth@in.ibm.com Cc: a.p.zijlstra@chello.nl Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-17 07:18:34 +02:00
Linus Torvalds	0429fbc0bd	Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu consistent-ops changes from Tejun Heo: "Way back, before the current percpu allocator was implemented, static and dynamic percpu memory areas were allocated and handled separately and had their own accessors. The distinction has been gone for many years now; however, the now duplicate two sets of accessors remained with the pointer based ones - this_cpu_() - evolving various other operations over time. During the process, we also accumulated other inconsistent operations. This pull request contains Christoph's patches to clean up the duplicate accessor situation. __get_cpu_var() uses are replaced with with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr(). Unfortunately, the former sometimes is tricky thanks to C being a bit messy with the distinction between lvalues and pointers, which led to a rather ugly solution for cpumask_var_t involving the introduction of this_cpu_cpumask_var_ptr(). This converts most of the uses but not all. Christoph will follow up with the remaining conversions in this merge window and hopefully remove the obsolete accessors" 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits) irqchip: Properly fetch the per cpu offset percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write. percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t Revert "powerpc: Replace __get_cpu_var uses" percpu: Remove __this_cpu_ptr clocksource: Replace __this_cpu_ptr with raw_cpu_ptr sparc: Replace __get_cpu_var uses avr32: Replace __get_cpu_var with __this_cpu_write blackfin: Replace __get_cpu_var uses tile: Use this_cpu_ptr() for hardware counters tile: Replace __get_cpu_var uses powerpc: Replace __get_cpu_var uses alpha: Replace __get_cpu_var ia64: Replace __get_cpu_var uses s390: cio driver &__get_cpu_var replacements s390: Replace __get_cpu_var uses mips: Replace __get_cpu_var uses MIPS: Replace __get_cpu_var uses in FPU emulator. arm: Replace __this_cpu_ptr with raw_cpu_ptr ...	2014-10-15 07:48:18 +02:00
Linus Torvalds	2fd7476de9	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc smaller fixes that missed the v3.17 cycle" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Add arch/x86/purgatory/ make generated files to gitignore x86: Fix section conflict for numachip x86: Reject x32 executables if x32 ABI not supported x86_64, entry: Filter RFLAGS.NT on entry from userspace x86, boot, kaslr: Fix nuisance warning on 32-bit builds	2014-10-14 02:28:16 +02:00
Linus Torvalds	ba1a96fc7d	Merge branch 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 seccomp changes from Ingo Molnar: "This tree includes x86 seccomp filter speedups and related preparatory work, which touches core seccomp facilities as well. The main idea is to split seccomp into two phases, to be able to enter a simple fast path for syscalls with ptrace side effects. There's no substantial user-visible (and ABI) effects expected from this, except a change in how we emit a better audit record for SECCOMP_RET_TRACE events" * 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscalls x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls x86: Split syscall_trace_enter into two phases x86, entry: Only call user_exit if TIF_NOHZ x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit seccomp: Document two-phase seccomp and arch-provided seccomp_data seccomp: Allow arch code to provide seccomp_data seccomp: Refactor the filter callback and the API seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computing	2014-10-14 02:27:06 +02:00
Linus Torvalds	df133e8fa8	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "This tree includes the following changes: - fix memory hotplug - fix hibernation bootup memory layout assumptions - fix hyperv numa guest kernel messages - remove dead code - update documentation" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Update memory map description to list hypervisor-reserved area x86/mm, hibernate: Do not assume the first e820 area to be RAM x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data() x86/mm/hotplug: Modify PGD entry when removing memory x86/mm/hotplug: Pass sync_global_pgds() a correct argument in remove_pagetable() x86: Remove set_pmd_pfn	2014-10-14 02:22:41 +02:00
Linus Torvalds	e3438330f5	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading updates from Ingo Molnar: "Misc smaller cleanups" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode, intel: Fix total_size computation x86, microcode, intel: Rename apply_microcode and declare it static x86, microcode, intel: Fix typos x86, microcode, intel: Add missing static declarations x86, microcode, amd: Fix missing static declaration	2014-10-14 02:21:51 +02:00
Linus Torvalds	c7b228adca	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FPU updates from Ingo Molnar: "x86 FPU handling fixes, cleanups and enhancements from Oleg. The signal handling race fix and the __restore_xstate_sig() preemption fix for eager-mode is marked for -stable as well" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: copy_thread: Don't nullify ->ptrace_bps twice x86, fpu: Shift "fpu_counter = 0" from copy_thread() to arch_dup_task_struct() x86, fpu: copy_process: Sanitize fpu->last_cpu initialization x86, fpu: copy_process: Avoid fpu_alloc/copy if !used_math() x86, fpu: Change __thread_fpu_begin() to use use_eager_fpu() x86, fpu: __restore_xstate_sig()->math_state_restore() needs preempt_disable() x86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal()	2014-10-14 02:20:50 +02:00
Linus Torvalds	708d0b41a2	Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpufeature updates from Ingo Molnar: "This tree includes the following changes: - Introduce DISABLED_MASK to list disabled CPU features, to simplify CPU feature handling and avoid excessive #ifdefs - Remove the lightly used cpu_has_pae() primitive" * 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Add more disabled features x86: Introduce disabled-features x86: Axe the lightly-used cpu_has_pae	2014-10-14 02:19:47 +02:00
Linus Torvalds	bf10fa857f	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Three small cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tty/serial/8250: Clean up the asm/serial.h include file a bit x86/tty/serial/8250: Resolve missing-field-initializers warnings x86: Remove obsolete comment in uapi/e820.h	2014-10-13 18:19:01 +02:00
Linus Torvalds	9d9420f120	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side updates: - Fix and enhance poll support (Jiri Olsa) - Re-enable inheritance optimization (Jiri Olsa) - Enhance Intel memory events support (Stephane Eranian) - Refactor the Intel uncore driver to be more maintainable (Zheng Yan) - Enhance and fix Intel CPU and uncore PMU drivers (Peter Zijlstra, Andi Kleen) - [ plus various smaller fixes/cleanups ] User visible tooling updates: - Add +field argument support for --field option, so that one can add fields to the default list of fields to show, ie now one can just do: perf report --fields +pid And the pid will appear in addition to the default fields (Jiri Olsa) - Add +field argument support for --sort option (Jiri Olsa) - Honour -w in the report tools (report, top), allowing to specify the widths for the histogram entries columns (Namhyung Kim) - Properly show submicrosecond times in 'perf kvm stat' (Christian Borntraeger) - Add beautifier for mremap flags param in 'trace' (Alex Snast) - perf script: Allow callchains if any event samples them - Don't truncate Intel style addresses in 'annotate' (Alex Converse) - Allow profiling when kptr_restrict == 1 for non root users, kernel samples will just remain unresolved (Andi Kleen) - Allow configuring default options for callchains in config file (Namhyung Kim) - Support operations for shared futexes. (Davidlohr Bueso) - "perf kvm stat report" improvements by Alexander Yarygin: - Save pid string in opts.target.pid - Enable the target.system_wide flag - Unify the title bar output - [ plus lots of other fixes and small improvements. ] Tooling infrastructure changes: - Refactor unit and scale function parameters for PMU parsing routines (Matt Fleming) - Improve DSO long names lookup with rbtree, resulting in great speedup for workloads with lots of DSOs (Waiman Long) - We were not handling POLLHUP notifications for event file descriptors Fix it by filtering entries in the events file descriptor array after poll() returns, refcounting mmaps so that when the last fd pointing to a perf mmap goes away we do the unmap (Arnaldo Carvalho de Melo) - Intel PT prep work, from Adrian Hunter, including: - Let a user specify a PMU event without any config terms - Add perf-with-kcore script - Let default config be defined for a PMU - Add perf_pmu__scan_file() - Add a 'perf test' for tracking with sched_switch - Add 'flush' callback to scripting API - Use ring buffer consume method to look like other tools (Arnaldo Carvalho de Melo) - hists browser (used in top and report) refactorings, getting rid of unused variables and reducing source code size by handling similar cases in a fewer functions (Namhyung Kim). - Replace thread unsafe strerror() with strerror_r() accross the whole tools/perf/ tree (Masami Hiramatsu) - Rename ordered_samples to ordered_events and allow setting a queue size for ordering events (Jiri Olsa) - [ plus lots of fixes, cleanups and other improvements ]" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (198 commits) perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment perf/x86/intel/uncore: Fix minor race in box set up perf record: Fix error message for --filter option not coming after tracepoint perf tools: Fix build breakage on arm64 targets perf symbols: Improve DSO long names lookup speed with rbtree perf symbols: Encapsulate dsos list head into struct dsos perf bench futex: Sanitize -q option in requeue perf bench futex: Support operations for shared futexes perf trace: Fix mmap return address truncation to 32-bit perf tools: Refactor unit and scale function parameters perf tools: Fix line number in the config file error message perf tools: Convert {record,top}.call-graph option to call-graph.record-mode perf tools: Introduce perf_callchain_config() perf callchain: Move some parser functions to callchain.c perf tools: Move callchain config from record_opts to callchain_param perf hists browser: Fix callchain print bug on TUI perf tools: Use ACCESS_ONCE() instead of volatile cast perf tools: Modify error code for when perf_session__new() fails perf tools: Fix perf record as non root with kptr_restrict == 1 perf stat: Fix --per-core on multi socket systems ...	2014-10-13 15:58:15 +02:00
Linus Torvalds	6d5f0ebfc0	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking updates from Ingo Molnar: "The main updates in this cycle were: - mutex MCS refactoring finishing touches: improve comments, refactor and clean up code, reduce debug data structure footprint, etc. - qrwlock finishing touches: remove old code, self-test updates. - small rwsem optimization - various smaller fixes/cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Revert qrwlock recusive stuff locking/rwsem: Avoid double checking before try acquiring write lock locking/rwsem: Move EXPORT_SYMBOL() lines to follow function definition locking/rwlock, x86: Delete unused asm/rwlock.h and rwlock.S locking/rwlock, x86: Clean up asm/spinlock*.h to remove old rwlock code locking/semaphore: Resolve some shadow warnings locking/selftest: Support queued rwlock locking/lockdep: Restrict the use of recursive read_lock() with qrwlock locking/spinlocks: Always evaluate the second argument of spin_lock_nested() locking/Documentation: Update locking/mutex-design.txt disadvantages locking/Documentation: Move locking related docs into Documentation/locking/ locking/mutexes: Use MUTEX_SPIN_ON_OWNER when appropriate locking/mutexes: Refactor optimistic spinning code locking/mcs: Remove obsolete comment locking/mutexes: Document quick lock release when unlocking locking/mutexes: Standardize arguments in lock/unlock slowpaths locking: Remove deprecated smp_mb__() barriers	2014-10-13 15:51:40 +02:00
Linus Torvalds	dbb885fecc	Merge branch 'locking-arch-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull arch atomic cleanups from Ingo Molnar: "This is a series kept separate from the main locking tree, which cleans up and improves various details in the atomics type handling: - Remove the unused atomic_or_long() method - Consolidate and compress atomic ops implementations between architectures, to reduce linecount and to make it easier to add new ops. - Rewrite generic atomic support to only require cmpxchg() from an architecture - generate all other methods from that" * 'locking-arch-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) locking,arch: Use ACCESS_ONCE() instead of cast to volatile in atomic_read() locking, mips: Fix atomics locking, sparc64: Fix atomics locking,arch: Rewrite generic atomic support locking,arch,xtensa: Fold atomic_ops locking,arch,sparc: Fold atomic_ops locking,arch,sh: Fold atomic_ops locking,arch,powerpc: Fold atomic_ops locking,arch,parisc: Fold atomic_ops locking,arch,mn10300: Fold atomic_ops locking,arch,mips: Fold atomic_ops locking,arch,metag: Fold atomic_ops locking,arch,m68k: Fold atomic_ops locking,arch,m32r: Fold atomic_ops locking,arch,ia64: Fold atomic_ops locking,arch,hexagon: Fold atomic_ops locking,arch,cris: Fold atomic_ops locking,arch,avr32: Fold atomic_ops locking,arch,arm64: Fold atomic_ops locking,arch,arm: Fold atomic_ops ...	2014-10-13 15:48:00 +02:00
Linus Torvalds	81ae31d782	xen: features and fixes for 3.18-rc0 - Add pvscsi frontend and backend drivers. - Remove _PAGE_IOMAP PTE flag, freeing it for alternate uses. - Try and keep memory contiguous during PV memory setup (reduces SWIOTLB usage). - Allow front/back drivers to use threaded irqs. - Support large initrds in PV guests. - Fix PVH guests in preparation for Xen 4.5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJUNonmAAoJEFxbo/MsZsTRHAQH/inCjpCT+pkvTB0YAVfVvgMI gUogT8G+iB2MuCNpMffGIt8TAVXwcVtnOLH9ABH3IBVehzgipIbIiVEM9YhjrYvU 1rgIKBpmZqSpjDHoIHpdHeCH67cVnRzA/PyoxZWLxPNmQ0t6bNf9yeAcCXK9PfUc 7EAblUDmPGSx9x/EUnOKNNaZSEiUJZHDBXbMBLllk1+5H1vfKnpFCRGMG0IrfI44 KVP2NX9Gfa05edMZYtH887FYyjFe2KNV6LJvE7+w7h2Dy0yIzf7y86t0l4n8gETb plvEUJ/lu9RYzTiZY/RxgBFYVTV59EqT45brSUtoe2Jcp8GSwiHslTHdfyFBwSo= =gw4d -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.18-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen updates from David Vrabel: "Features and fixes: - Add pvscsi frontend and backend drivers. - Remove _PAGE_IOMAP PTE flag, freeing it for alternate uses. - Try and keep memory contiguous during PV memory setup (reduces SWIOTLB usage). - Allow front/back drivers to use threaded irqs. - Support large initrds in PV guests. - Fix PVH guests in preparation for Xen 4.5" * tag 'stable/for-linus-3.18-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (22 commits) xen: remove DEFINE_XENBUS_DRIVER() macro xen/xenbus: Remove BUG_ON() when error string trucated xen/xenbus: Correct the comments for xenbus_grant_ring() x86/xen: Set EFER.NX and EFER.SCE in PVH guests xen: eliminate scalability issues from initrd handling xen: sync some headers with xen tree xen: make pvscsi frontend dependant on xenbus frontend arm{,64}/xen: Remove "EXPERIMENTAL" in the description of the Xen options xen-scsifront: don't deadlock if the ring becomes full x86: remove the Xen-specific _PAGE_IOMAP PTE flag x86/xen: do not use _PAGE_IOMAP PTE flag for I/O mappings x86: skip check for spurious faults for non-present faults xen/efi: Directly include needed headers xen-scsiback: clean up a type issue in scsiback_make_tpg() xen-scsifront: use GFP_ATOMIC under spin_lock MAINTAINERS: Add xen pvscsi maintainer xen-scsiback: Add Xen PV SCSI backend driver xen-scsifront: Add Xen PV SCSI frontend driver xen: Add Xen pvSCSI protocol description xen/events: support threaded irqs for interdomain event channels ...	2014-10-11 20:29:01 -04:00
Mel Gorman	6a33979d5b	mm: remove misleading ARCH_USES_NUMA_PROT_NONE ARCH_USES_NUMA_PROT_NONE was defined for architectures that implemented _PAGE_NUMA using _PROT_NONE. This saved using an additional PTE bit and relied on the fact that PROT_NONE vmas were skipped by the NUMA hinting fault scanner. This was found to be conceptually confusing with a lot of implicit assumptions and it was asked that an alternative be found. Commit `c46a7c81` "x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels" redefined _PAGE_NUMA on x86 to be one of the swap PTE bits and shrunk the maximum possible swap size but it did not go far enough. There are no architectures that reuse _PROT_NONE as _PROT_NUMA but the relics still exist. This patch removes ARCH_USES_NUMA_PROT_NONE and removes some unnecessary duplication in powerpc vs the generic implementation by defining the types the core NUMA helpers expected to exist from x86 with their ppc64 equivalent. This necessitated that a PTE bit mask be created that identified the bits that distinguish present from NUMA pte entries but it is expected this will only differ between arches based on _PAGE_PROTNONE. The naming for the generic helpers was taken from x86 originally but ppc64 has types that are equivalent for the purposes of the helper so they are mapped instead of duplicating code. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-09 22:25:52 -04:00
Linus Torvalds	afa3536be8	Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "Main changes: - Fix the deadlock reported by Dave Jones et al - Clean up and fix nohz_full interaction with arch abilities - nohz init code consolidation/cleanup" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: nohz: nohz full depends on irq work self IPI support nohz: Consolidate nohz full init code arm64: Tell irq work about self IPI support arm: Tell irq work about self IPI support x86: Tell irq work about self IPI support irq_work: Force raised irq work to run on irq work interrupt irq_work: Introduce arch_irq_work_has_interrupt() nohz: Move nohz full init call to tick init	2014-10-09 06:30:57 -04:00
Linus Torvalds	e4e65676f2	Fixes and features for 3.18. Apart from the usual cleanups, here is the summary of new features: - s390 moves closer towards host large page support - PowerPC has improved support for debugging (both inside the guest and via gdbstub) and support for e6500 processors - ARM/ARM64 support read-only memory (which is necessary to put firmware in emulated NOR flash) - x86 has the usual emulator fixes and nested virtualization improvements (including improved Windows support on Intel and Jailhouse hypervisor support on AMD), adaptive PLE which helps overcommitting of huge guests. Also included are some patches that make KVM more friendly to memory hot-unplug, and fixes for rare caching bugs. Two patches have trivial mm/ parts that were acked by Rik and Andrew. Note: I will soon switch to a subkey for signing purposes. To verify future signed pull requests from me, please update my key with "gpg --recv-keys 9B4D86F2". You should see 3 new subkeys---the one for signing will be a 2048-bit RSA key, 4E6B09D7. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJUL5sPAAoJEBvWZb6bTYbyfkEP/3MNhSyn6HCjPjtjLNPAl9KL WpExZSUFL2+4CztpdGIsek1BeJYHmqv3+c5S+WvaWVA1aqh2R7FT1D1ErBLjgLQq lq23IOr+XxmC3dXQUEEk+TlD+283UzypzEG4l4UD3JYg79fE3UrXAz82SeyewJDY x7aPYhkZG3RHu+wAyMPasG6E3zS5LySdUtGWbiPwz5BejrhBJoJdeb2WIL/RwnUK 7ppSLB5EoFj/uMkuyeAAdAbdfSrhHA6faDZxNdxS9k9wGutrhhfUoQ49ONrKG4dV sFo1tSPTVgRs8QFYUZ2fJUPBAmUVddsgqh2K9d0NftGTq7b8YszaCsfFrs2/Y4MU YxssWEhxsfszerCu12bbAJrv6JBZYQ7TwGvI9L7P0iFU6IVw/djmukU4AkM9/e91 YS/cue/PN+9Pn2ccXzL9J7xRtZb8FsOuRsCXTCmbOwDkLmrKPDBN2t3RUbeF+Eam ABrpWnLKX13kZSo4LKU+/niarzmPMp7odQfHVdr8ea0fiYLp4iN8puA20WaSPIgd CLvm+RAvXe5Lm91L4mpFotJ2uFyK6QlIYJV4FsgeWv/0D0qppWQi0Utb/aCNHCgy z8MyUMD48y7EpoQrFYr/7cddXIu0/NegnM8I1coVjIPEk4NfeebGUlCJ/V3D8wMG BgEfS2x6jRc5zB3hjwDr =iEVi -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "Fixes and features for 3.18. Apart from the usual cleanups, here is the summary of new features: - s390 moves closer towards host large page support - PowerPC has improved support for debugging (both inside the guest and via gdbstub) and support for e6500 processors - ARM/ARM64 support read-only memory (which is necessary to put firmware in emulated NOR flash) - x86 has the usual emulator fixes and nested virtualization improvements (including improved Windows support on Intel and Jailhouse hypervisor support on AMD), adaptive PLE which helps overcommitting of huge guests. Also included are some patches that make KVM more friendly to memory hot-unplug, and fixes for rare caching bugs. Two patches have trivial mm/ parts that were acked by Rik and Andrew. Note: I will soon switch to a subkey for signing purposes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (157 commits) kvm: do not handle APIC access page if in-kernel irqchip is not in use KVM: s390: count vcpu wakeups in stat.halt_wakeup KVM: s390/facilities: allow TOD-CLOCK steering facility bit KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode arm/arm64: KVM: Report correct FSC for unsupported fault types arm/arm64: KVM: Fix VTTBR_BADDR_MASK and pgd alloc kvm: Fix kvm_get_page_retry_io __gup retval check arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset kvm: x86: Unpin and remove kvm_arch->apic_access_page kvm: vmx: Implement set_apic_access_page_addr kvm: x86: Add request bit to reload APIC access page address kvm: Add arch specific mmu notifier for page invalidation kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static kvm: Fix page ageing bugs kvm/x86/mmu: Pass gfn and level to rmapp callback. x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only kvm: x86: use macros to compute bank MSRs KVM: x86: Remove debug assertion of non-PAE reserved bits kvm: don't take vcpu mutex for obviously invalid vcpu ioctls kvm: Faults which trigger IO release the mmap_sem ...	2014-10-08 05:27:39 -04:00
Ben Hutchings	0e6d3112a4	x86: Reject x32 executables if x32 ABI not supported It is currently possible to execve() an x32 executable on an x86_64 kernel that has only ia32 compat enabled. However all its syscalls will fail, even _exit(). This usually causes it to segfault. Change the ELF compat architecture check so that x32 executables are rejected if we don't support the x32 ABI. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Link: http://lkml.kernel.org/r/1410120305.6822.9.camel@decadent.org.uk Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-10-08 11:17:42 +02:00
Linus Torvalds	74da38631a	Tinification for 3.18 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCAAGBQJUL0J0AAoJEA7Zo9+K/4c9w40P/iMFPfCethdBtPz5rI88CVr2 7yU99TdbEPoRJm+rU4ohvHdB73p2KWINIKvpSThvegvjXbEcKxQkdpVWHsFJZeHS bZiYmhjxdCBvJGLrYo5IwqH0PrSjokTPzMUekUCk7BkUKNJRaDjfUBHvUmKsinUR dQL+3KE3edy6W3DL+FOd0QZwSOgmOfEibTWpfmg+n16kFNa75Kg/QLwjYRvtQplP eElywDZN07IhAeBFqKhKvlKmDSAeqMd8RfoPPo9Ts+reeIrWYjVNbl9ISOqXqy2x JoLeZQmwSXj/C9Ehr5e+aId2eO8In5xueQfXP8SS8dCC7VLwRbnNgyAQQZEslEBk QH0GhT6GqTamBdiNI3I+usfs65cEaialXh2afcoLwGS/iGD8MhZ8Dt+m4iyXNxEZ kT9VA4974mPjJ1g0mDDnYIxNjxF43m+SD5K1sR/XGpMcA8NdqMUmvKNcbePCobVa WTutIemQqGipNeWE94XwZEbc0B+aWwH7eiZOBMVGhWsHInd7QeTBTbfZlctyBkzf AswgsFjC5FW05CWK6J1Lf/UI1FD9PmHMKpmQUPED1+7okDTfqGjKjdREWgZSixUt LIRfWqWEaNpRRBFbDyt0C+F4pBRPLiRDaOyNhwEdtXuVGKRXb1G3qX7nFOJAZo6G GDTZo9iIRNSfm/M4tJ+n =2VyW -----END PGP SIGNATURE----- Merge tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux Pull "tinification" patches from Josh Triplett. Work on making smaller kernels. * tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux: bloat-o-meter: Ignore syscall aliases SyS_ and compat_SyS_ mm: Support compiling out madvise and fadvise x86: Support compiling out human-friendly processor feature names x86: Drop support for /proc files when !CONFIG_PROC_FS x86, boot: Don't compile early_serial_console.c when !CONFIG_EARLY_PRINTK x86, boot: Don't compile aslr.c when !CONFIG_RANDOMIZE_BASE x86, boot: Use the usual -y -n mechanism for objects in vmlinux x86: Add "make tinyconfig" to configure the tiniest possible kernel x86, platform, kconfig: move kvmconfig functionality to a helper	2014-10-07 08:51:59 -04:00
Matt Fleming	75b128573b	Merge branch 'next' into efi-next-merge Conflicts: arch/x86/boot/compressed/eboot.c	2014-10-03 22:15:56 +01:00
Matt Fleming	60b4dc7720	efi: Delete the in_nmi() conditional runtime locking commit 5dc3826d9f08 ("efi: Implement mandatory locking for UEFI Runtime Services") implemented some conditional locking when accessing variable runtime services that Ingo described as "pretty disgusting". The intention with the !efi_in_nmi() checks was to avoid live-locks when trying to write pstore crash data into an EFI variable. Such lockless accesses are allowed according to the UEFI specification when we're in a "non-recoverable" state, but whether or not things are implemented correctly in actual firmware implementations remains an unanswered question, and so it would seem sensible to avoid doing any kind of unsynchronized variable accesses. Furthermore, the efi_in_nmi() tests are inadequate because they don't account for the case where we call EFI variable services from panic or oops callbacks and aren't executing in NMI context. In other words, live-locking is still possible. Let's just remove the conditional locking altogether. Now we've got the ->set_variable_nonblocking() EFI variable operation we can abort if the runtime lock is already held. Aborting is by far the safest option. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-10-03 18:41:03 +01:00
Mathias Krause	4e78eb0561	x86/efi: Mark initialization code as such The 32 bit and 64 bit implementations differ in their __init annotations for some functions referenced from the common EFI code. Namely, the 32 bit variant is missing some of the __init annotations the 64 bit variant has. To solve the colliding annotations, mark the corresponding functions in efi_32.c as initialization code, too -- as it is such. Actually, quite a few more functions are only used during initialization and therefore can be marked __init. They are therefore annotated, too. Also add the __init annotation to the prototypes in the efi.h header so users of those functions will see it's meant as initialization code only. This patch also fixes the "prelog" typo. ("prologue" / "epilogue" might be more appropriate but this is C code after all, not an opera! :D) Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-10-03 18:41:03 +01:00
Mathias Krause	6092068547	x86/efi: Unexport add_efi_memmap variable This variable was accidentally exported, even though it's only used in this compilation unit and only during initialization. Remove the bogus export, make the variable static instead and mark it as __initdata. Fixes: `200001eb14` ("x86 boot: only pick up additional EFI memmap...") Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-10-03 18:41:02 +01:00
Mathias Krause	24ffd84b60	x86/efi: Remove unused efi_call* macros Complement commit `62fa6e69a4` ("x86/efi: Delete most of the efi_call* macros") and delete the stub macros for the !CONFIG_EFI case, too. In fact, there are no EFI calls in this case so we don't need a dummy for efi_call() even. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-10-03 18:41:02 +01:00
Ard Biesheuvel	161485e827	efi: Implement mandatory locking for UEFI Runtime Services According to section 7.1 of the UEFI spec, Runtime Services are not fully reentrant, and there are particular combinations of calls that need to be serialized. Use a spinlock to serialize all Runtime Services with respect to all others, even if this is more than strictly needed. We've managed to get away without requiring a runtime services lock until now because most of the interactions with EFI involve EFI variables, and those operations are already serialised with __efivars->lock. Some of the assumptions underlying the decision whether locks are needed or not (e.g., SetVariable() against ResetSystem()) may not apply universally to all [new] architectures that implement UEFI. Rather than try to reason our way out of this, let's just implement at least what the spec requires in terms of locking. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-10-03 18:40:57 +01:00
Pranith Kumar	2291059c85	locking,arch: Use ACCESS_ONCE() instead of cast to volatile in atomic_read() Use the much more reader friendly ACCESS_ONCE() instead of the cast to volatile. This is purely a stylistic change. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/1411482607-20948-1-git-send-email-bobby.prani@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-03 06:06:23 +02:00
Linus Torvalds	cd40fab6db	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "This has: - EFI revert to fix a boot regression - early_ioremap() fix for boot failure - KASLR fix for possible boot failures - EFI fix for corrupted string printing - remove a misleading EFI bootup 'failed!' error message Unfortunately it's all rather close to the merge window" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/efi: Truncate 64-bit values when calling 32-bit OutputString() x86/efi: Delete misleading efi_printk() error message Revert "efi/x86: efistub: Move shared dependencies to <asm/efi.h>" x86/kaslr: Avoid the setup_data area when picking location x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8	2014-09-27 14:23:13 -07:00
Ingo Molnar	29282ac0bd	* Revert the static library changes from the merge window since they're causing issues for Macbooks and Fedora + Grub2 - Matt Fleming * Delete the misleading "setup_efi_pci() failed!" message which some people are seeing when booting EFI - Matt Fleming * Fix printing strings from the 32-bit EFI boot stub by only passing 32-bit addresses to the firmware - Matt Fleming -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUIzOFAAoJEC84WcCNIz1VpSwQAJhp9Yu60dWXyqRpV+ER1yau MWHXunkeaccbBXNABkzuDUqb2a6DRIJow+/n+dYjIGY6Nf9zzLQFQ+s/EsS+IiyY s4rRvAqzfGYk1d6xzgvEccrdD4fRP32kFKnSlZpfTRuJZHZieD+f2y6TP7D33Ja3 HV/ivPQHZNjxgsExpcE8Rz/QyOZpqRacTr9Gr7IusBRL6IMCyycfCTmt6d5pf1iD kQGSGwIBvgN4xMqPUdxTNo31bQZA5ZeywNOh9WhdSCL7FAIDfG9TmXt/J7ckq5ax 0f6X92qgCs3peLY+/szgSZ2LsZI6I/FM1udc2SFiIOPvCwwytcJ94Wro5pLYzZ4i SnqB2xLLEmsR2J3MXIeY0aVy2VtHT4bYRnXYNd9G0eaVfrlJ+4lgwqJavAJmtDZx 88ey1R8LKRQr+ueSv/BnOvE6T2+38HrjrMooFQsPvolRR0S6MITBr8I2hoRASkUt YsA+7s6+tO2QBmQYrKCYSAi9A7onMA9Fh93dmv7XLqFw/SsfVm3RnrNhOVsO9kPC zIsWZoS+PGwb4RRvM2i7JAEqUCbuLpIAHYEU6gqprWm1ERHsX9mfFYfsJQHzHuOY rg6+wtWQ9MGxek8POac4d2mC+PwC4DA0AkaTTZQBcdAJu+h/gZNt4w7mpk5v4Th1 QYr/otShMvc84Zd+RMeV =5mVJ -----END PGP SIGNATURE----- Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent Pull EFI fixes from Matt Fleming: * Revert the static library changes from the merge window since they're causing issues for Macbooks and Fedora + Grub2 (Matt Fleming) * Delete the misleading "setup_efi_pci() failed!" message which some people are seeing when booting EFI (Matt Fleming) * Fix printing strings from the 32-bit EFI boot stub by only passing 32-bit addresses to the firmware (Matt Fleming) Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-09-25 16:40:08 +02:00
Tang Chen	c24ae0dcd3	kvm: x86: Unpin and remove kvm_arch->apic_access_page In order to make the APIC access page migratable, stop pinning it in memory. And because the APIC access page is not pinned in memory, we can remove kvm_arch->apic_access_page. When we need to write its physical address into vmcs, we use gfn_to_page() to get its page struct, which is needed to call page_to_phys(); the page is then immediately unpinned. Suggested-by: Gleb Natapov <gleb@kernel.org> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:08:01 +02:00
Tang Chen	4256f43f9f	kvm: x86: Add request bit to reload APIC access page address Currently, the APIC access page is pinned by KVM for the entire life of the guest. We want to make it migratable in order to make memory hot-unplug available for machines that run KVM. This patch prepares to handle this in generic code, through a new request bit (that will be set by the MMU notifier) and a new hook that is called whenever the request bit is processed. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:08:00 +02:00
Tang Chen	fe71557afb	kvm: Add arch specific mmu notifier for page invalidation This will be used to let the guest run while the APIC access page is not pinned. Because subsequent patches will fill in the function for x86, place the (still empty) x86 implementation in the x86.c file instead of adding an inline function in kvm_host.h. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:07:59 +02:00
Andres Lagar-Cavilla	5712846808	kvm: Fix page ageing bugs 1. We were calling clear_flush_young_notify in unmap_one, but we are within an mmu notifier invalidate range scope. The spte exists no more (due to range_start) and the accessed bit info has already been propagated (due to kvm_pfn_set_accessed). Simply call clear_flush_young. 2. We clear_flush_young on a primary MMU PMD, but this may be mapped as a collection of PTEs by the secondary MMU (e.g. during log-dirty). This required expanding the interface of the clear_flush_young mmu notifier, so a lot of code has been trivially touched. 3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate the access bit by blowing the spte. This requires proper synchronizing with MMU notifier consumers, like every other removal of spte's does. Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:07:58 +02:00
Paolo Bonzini	c1118b3602	x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA. In that case, KVM will fail to patch VMCALL instructions to VMMCALL as required on AMD processors. The failure mode is currently a divide-by-zero exception, which obviously is a KVM bug that has to be fixed. However, picking the right instruction between VMCALL and VMMCALL will be faster and will help if you cannot upgrade the hypervisor. Reported-by: Chris Webb <chris@arachsys.com> Tested-by: Chris Webb <chris@arachsys.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:07:57 +02:00
Liang Chen	77c3913b74	KVM: x86: directly use kvm_make_request again A one-line wrapper around kvm_make_request is not particularly useful. Replace kvm_mmu_flush_tlb() with kvm_make_request(). Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:07:51 +02:00
Matt Fleming	84be880560	Revert "efi/x86: efistub: Move shared dependencies to <asm/efi.h>" This reverts commit `f23cf8bd5c` ("efi/x86: efistub: Move shared dependencies to <asm/efi.h>") as well as the x86 parts of commit `f4f75ad574` ("efi: efistub: Convert into static library"). The road leading to these two reverts is long and winding. The above two commits were merged during the v3.17 merge window and turned the common EFI boot stub code into a static library. This necessitated making some symbols global in the x86 boot stub which introduced new entries into the early boot GOT. The problem was that we weren't fixing up the newly created GOT entries before invoking the EFI boot stub, which sometimes resulted in hangs or resets. This failure was reported by Maarten on his Macbook pro. The proposed fix was commit `9cb0e39423` ("x86/efi: Fixup GOT in all boot code paths"). However, that caused issues for Linus when booting his Sony Vaio Pro 11. It was subsequently reverted in commit `f3670394c2`. So that leaves us back with Maarten's Macbook pro not booting. At this stage in the release cycle the least risky option is to revert the x86 EFI boot stub to the pre-merge window code structure where we explicitly #include efi-stub-helper.c instead of linking with the static library. The arm64 code remains unaffected. We can take another swing at the x86 parts for v3.18. Conflicts: arch/x86/include/asm/efi.h Tested-by: Josh Boyer <jwboyer@fedoraproject.org> Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Tested-by: Leif Lindholm <leif.lindholm@linaro.org> [arm64] Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>, Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-09-23 22:01:55 +01:00
David Vrabel	f955371ca9	x86: remove the Xen-specific _PAGE_IOMAP PTE flag The _PAGE_IO_MAP PTE flag was only used by Xen PV guests to mark PTEs that were used to map I/O regions that are 1:1 in the p2m. This allowed Xen to obtain the correct PFN when converting the MFNs read from a PTE back to their PFN. Xen guests no longer use _PAGE_IOMAP for this. Instead mfn_to_pfn() returns the correct PFN by using a combination of the m2p and p2m to determine if an MFN corresponds to a 1:1 mapping in the the p2m. Remove _PAGE_IOMAP, replacing it with _PAGE_UNUSED2 to allow for future uses of the PTE flag. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com>	2014-09-23 13:36:20 +00:00
Zubair Lutfullah Kakakhel	8057b30814	x86: use generic dma-contiguous.h dma-contiguous.h is now in asm-generic. Use that to avoid code repetition in x86. Signed-off-by: Zubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Cc: catalin.marinas@arm.com Cc: will.deacon@arm.com Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: arnd@arndb.de Cc: gregkh@linuxfoundation.org Cc: m.szyprowski@samsung.com Cc: x86@kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-arch@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/7359/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>	2014-09-22 13:35:52 +02:00
Linus Torvalds	7a5e87867e	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: EFI fixes, a build fix, a page table dumping (debug) fix and a clang build fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/arm64: Fix fdt-related memory reservation x86/mm: Apply the section attribute to the variable, not its type x86/efi: Fixup GOT in all boot code paths x86/efi: Only load initrd above 4g on second try x86-64, ptdump: Mark espfix area only if existent x86, irq: Fix build error caused by `9eabc99a63`	2014-09-19 09:07:47 -07:00
Ingo Molnar	43657ffb79	* Increase the number of early_ioremap slots to fix a regression with earlyprintk=efi after recent changes to the ACPI code - Dave Young -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUG/pjAAoJEC84WcCNIz1VQToP+wfz21mZLPyurzCXycbrXi3I h2Jg99HhCe2TL1+9slSEJZvCFGsnspbkZtfA8OB0Ms7eomt3mY9yTwsfMVfyUZRM 2DAWjK0eg9ZEAXcmU0tGbI4Iy9Kf/U0OPEsZ8ZKJkG1QGA+pHrKqhfdrKzP/pVV6 iQ94OgBa1BZqljYureERvIourghN6hcwlJHqTLF66kzbFtL//2NQJIidHBcJRO0k CA/2U65WNEFXAtUenHHUnSM8hhN5H2NiSxPKg6p4iK6nsNcBL80rhEmMHYW80VHp PP7lXoCAbGARIfXbjPPNvn7aHL0qzpsF3fvNrtQRSkEwGjEod8gxHAqqzguLrK58 G0HovhfUvEADXzjDiWCWUAIT/ryHutJLxXo6Fn2UijtAso+W/wUorqJlKNDG1dHX PRJKO0kk7TCKPBYxNRkds9Gt8g8uEoInr+V13qvURjwb6H9hFd8fyuL25bRUftdV TE4/pzm5F9YjLU2548RfXKcz5snSuOT9EYf3s4C8wTtm4tJv7GxQ+/vz000Hres3 YlWaY2CywsrZfdBZaaF9gEtsjsdBVH+PspOGtodY0jjnz/I71H7oZNdgb8vslRLS 6iH9+swCEm6cdytzqFXO1qp7DKLpxZJmwTHKhAvdSYzXZBvf1JMCQ6zdoXNfdNRR R8vYh1pvBd1tRJQ9tpJe =t4a9 -----END PGP SIGNATURE----- Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent Pull EFI fix from Matt Fleming: * Increase the number of early_ioremap() slots to fix a regression with earlyprintk=efi after recent changes to the ACPI code (Dave Young) Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-09-19 12:49:20 +02:00
Tang Chen	a255d4795f	kvm: Remove ept_identity_pagetable from struct kvm_arch. kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But it is never used to refer to the page at all. In vcpu initialization, it indicates two things: 1. indicates if ept page is allocated 2. indicates if a memory slot for identity page is initialized Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept identity pagetable is initialized. So we can remove ept_identity_pagetable. NOTE: In the original code, ept identity pagetable page is pinned in memroy. As a result, it cannot be migrated/hot-removed. After this patch, since kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page is no longer pinned in memory. And it can be migrated/hot-removed. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-17 13:10:11 +02:00
Luiz Capitulino	8b375f64dc	x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data() The setup_node_data() function allocates a pg_data_t object, inserts it into the node_data[] array and initializes the following fields: node_id, node_start_pfn and node_spanned_pages. However, a few function calls later during the kernel boot, free_area_init_node() re-initializes those fields, possibly with setup_node_data() is not used. This causes a small glitch when running Linux as a hyperv numa guest: SRAT: PXM 0 -> APIC 0x00 -> Node 0 SRAT: PXM 0 -> APIC 0x01 -> Node 0 SRAT: PXM 1 -> APIC 0x02 -> Node 1 SRAT: PXM 1 -> APIC 0x03 -> Node 1 SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff] SRAT: Node 1 PXM 1 [mem 0x80200000-0xf7ffffff] SRAT: Node 1 PXM 1 [mem 0x100000000-0x1081fffff] NUMA: Node 1 [mem 0x80200000-0xf7ffffff] + [mem 0x100000000-0x1081fffff] -> [mem 0x80200000-0x1081fffff] Initmem setup node 0 [mem 0x00000000-0x7fffffff] NODE_DATA [mem 0x7ffdc000-0x7ffeffff] Initmem setup node 1 [mem 0x80800000-0x1081fffff] NODE_DATA [mem 0x1081ea000-0x1081fdfff] crashkernel: memory value expected [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007de00000-ffff88007fdfffff] on node 0 [ffffea0002000000-ffffea00043fffff] PMD -> [ffff880105600000-ffff8801077fffff] on node 1 Zone ranges: DMA [mem 0x00001000-0x00ffffff] DMA32 [mem 0x01000000-0xffffffff] Normal [mem 0x100000000-0x1081fffff] Movable zone start for each node Early memory node ranges node 0: [mem 0x00001000-0x0009efff] node 0: [mem 0x00100000-0x7ffeffff] node 1: [mem 0x80200000-0xf7ffffff] node 1: [mem 0x100000000-0x1081fffff] On node 0 totalpages: 524174 DMA zone: 64 pages used for memmap DMA zone: 21 pages reserved DMA zone: 3998 pages, LIFO batch:0 DMA32 zone: 8128 pages used for memmap DMA32 zone: 520176 pages, LIFO batch:31 On node 1 totalpages: 524288 DMA32 zone: 7672 pages used for memmap DMA32 zone: 491008 pages, LIFO batch:31 Normal zone: 520 pages used for memmap Normal zone: 33280 pages, LIFO batch:7 In this dmesg, the SRAT table reports that the memory range for node 1 starts at 0x80200000. However, the line starting with "Initmem" reports that node 1 memory range starts at 0x80800000. The "Initmem" line is reported by setup_node_data() and is wrong, because the kernel ends up using the range as reported in the SRAT table. This commit drops all that dead code from setup_node_data(), renames it to alloc_node_data() and adds a printk() to free_area_init_node() so that we report a node's memory range accurately. Here's the same dmesg section with this patch applied: SRAT: PXM 0 -> APIC 0x00 -> Node 0 SRAT: PXM 0 -> APIC 0x01 -> Node 0 SRAT: PXM 1 -> APIC 0x02 -> Node 1 SRAT: PXM 1 -> APIC 0x03 -> Node 1 SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff] SRAT: Node 1 PXM 1 [mem 0x80200000-0xf7ffffff] SRAT: Node 1 PXM 1 [mem 0x100000000-0x1081fffff] NUMA: Node 1 [mem 0x80200000-0xf7ffffff] + [mem 0x100000000-0x1081fffff] -> [mem 0x80200000-0x1081fffff] NODE_DATA(0) allocated [mem 0x7ffdc000-0x7ffeffff] NODE_DATA(1) allocated [mem 0x1081ea000-0x1081fdfff] crashkernel: memory value expected [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007de00000-ffff88007fdfffff] on node 0 [ffffea0002000000-ffffea00043fffff] PMD -> [ffff880105600000-ffff8801077fffff] on node 1 Zone ranges: DMA [mem 0x00001000-0x00ffffff] DMA32 [mem 0x01000000-0xffffffff] Normal [mem 0x100000000-0x1081fffff] Movable zone start for each node Early memory node ranges node 0: [mem 0x00001000-0x0009efff] node 0: [mem 0x00100000-0x7ffeffff] node 1: [mem 0x80200000-0xf7ffffff] node 1: [mem 0x100000000-0x1081fffff] Initmem setup node 0 [mem 0x00001000-0x7ffeffff] On node 0 totalpages: 524174 DMA zone: 64 pages used for memmap DMA zone: 21 pages reserved DMA zone: 3998 pages, LIFO batch:0 DMA32 zone: 8128 pages used for memmap DMA32 zone: 520176 pages, LIFO batch:31 Initmem setup node 1 [mem 0x80200000-0x1081fffff] On node 1 totalpages: 524288 DMA32 zone: 7672 pages used for memmap DMA32 zone: 491008 pages, LIFO batch:31 Normal zone: 520 pages used for memmap Normal zone: 33280 pages, LIFO batch:7 This commit was tested on a two node bare-metal NUMA machine and Linux as a numa guest on hyperv and qemu/kvm. PS: The wrong memory range reported by setup_node_data() seems to be harmless in the current kernel because it's just not used. However, that bad range is used in kernel 2.6.32 to initialize the old boot memory allocator, which causes a crash during boot. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Rientjes <rientjes@google.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-09-16 08:55:10 +02:00
Yasuaki Ishimatsu	9661d5bcd0	x86/mm/hotplug: Modify PGD entry when removing memory When hot-adding/removing memory, sync_global_pgds() is called for synchronizing PGD to PGD entries of all processes MM. But when hot-removing memory, sync_global_pgds() does not work correctly. At first, sync_global_pgds() checks whether target PGD is none or not. And if PGD is none, the PGD is skipped. But when hot-removing memory, PGD may be none since PGD may be cleared by free_pud_table(). So when sync_global_pgds() is called after hot-removing memory, sync_global_pgds() should not skip PGD even if the PGD is none. And sync_global_pgds() must clear PGD entries of all processes MM. Currently sync_global_pgds() does not clear PGD entries of all processes MM when hot-removing memory. So when hot adding memory which is same memory range as removed memory after hot-removing memory, following call traces are shown: kernel BUG at arch/x86/mm/init_64.c:206! ... [<ffffffff815e0c80>] kernel_physical_mapping_init+0x1b2/0x1d2 [<ffffffff815ced94>] init_memory_mapping+0x1d4/0x380 [<ffffffff8104aebd>] arch_add_memory+0x3d/0xd0 [<ffffffff815d03d9>] add_memory+0xb9/0x1b0 [<ffffffff81352415>] acpi_memory_device_add+0x1af/0x28e [<ffffffff81325dc4>] acpi_bus_device_attach+0x8c/0xf0 [<ffffffff813413b9>] acpi_ns_walk_namespace+0xc8/0x17f [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7 [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7 [<ffffffff813418ed>] acpi_walk_namespace+0x95/0xc5 [<ffffffff81326b4c>] acpi_bus_scan+0x9a/0xc2 [<ffffffff81326bff>] acpi_scan_bus_device_check+0x8b/0x12e [<ffffffff81326cb5>] acpi_scan_device_check+0x13/0x15 [<ffffffff81320122>] acpi_os_execute_deferred+0x25/0x32 [<ffffffff8107e02b>] process_one_work+0x17b/0x460 [<ffffffff8107edfb>] worker_thread+0x11b/0x400 [<ffffffff8107ece0>] ? rescuer_thread+0x400/0x400 [<ffffffff81085aef>] kthread+0xcf/0xe0 [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 [<ffffffff815fc76c>] ret_from_fork+0x7c/0xb0 [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 This patch clears PGD entries of all processes MM when sync_global_pgds() is called after hot-removing memory Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-09-16 08:55:09 +02:00
Dave Young	3eddc69ffe	x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8 3.16 kernel boot fail with earlyprintk=efi, it keeps scrolling at the bottom line of screen. Bisected, the first bad commit is below: commit `86dfc6f339` Author: Lv Zheng <lv.zheng@intel.com> Date: Fri Apr 4 12:38:57 2014 +0800 ACPICA: Tables: Fix table checksums verification before installation. I did some debugging by enabling both serial and efi earlyprintk, below is some debug dmesg, seems early_ioremap fails in scroll up function due to no free slot, see below dmesg output: WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:116 __early_ioremap+0x90/0x1c4() __early_ioremap(ed00c800, 00000c80) not found slot Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.17.0-rc1+ #204 Hardware name: Hewlett-Packard HP Z420 Workstation/1589, BIOS J61 v03.15 05/09/2013 Call Trace: dump_stack+0x4e/0x7a warn_slowpath_common+0x75/0x8e ? __early_ioremap+0x90/0x1c4 warn_slowpath_fmt+0x47/0x49 __early_ioremap+0x90/0x1c4 ? sprintf+0x46/0x48 early_ioremap+0x13/0x15 early_efi_map+0x24/0x26 early_efi_scroll_up+0x6d/0xc0 early_efi_write+0x1b0/0x214 call_console_drivers.constprop.21+0x73/0x7e console_unlock+0x151/0x3b2 ? vprintk_emit+0x49f/0x532 vprintk_emit+0x521/0x532 ? console_unlock+0x383/0x3b2 printk+0x4f/0x51 acpi_os_vprintf+0x2b/0x2d acpi_os_printf+0x43/0x45 acpi_info+0x5c/0x63 ? __acpi_map_table+0x13/0x18 ? acpi_os_map_iomem+0x21/0x147 acpi_tb_print_table_header+0x177/0x186 acpi_tb_install_table_with_override+0x4b/0x62 acpi_tb_install_standard_table+0xd9/0x215 ? early_ioremap+0x13/0x15 ? __acpi_map_table+0x13/0x18 acpi_tb_parse_root_table+0x16e/0x1b4 acpi_initialize_tables+0x57/0x59 acpi_table_init+0x50/0xce acpi_boot_table_init+0x1e/0x85 setup_arch+0x9b7/0xcc4 start_kernel+0x94/0x42d ? early_idt_handlers+0x120/0x120 x86_64_start_reservations+0x2a/0x2c x86_64_start_kernel+0xf3/0x100 Quote reply from Lv.zheng about the early ioremap slot usage in this case: """ In early_efi_scroll_up(), 2 mapping entries will be used for the src/dst screen buffer. In drivers/acpi/acpica/tbutils.c, we've improved the early table loading code in acpi_tb_parse_root_table(). We now need 2 mapping entries: 1. One mapping entry is used for RSDT table mapping. Each RSDT entry contains an address for another ACPI table. 2. For each entry in RSDP, we need another mapping entry to map the table to perform necessary check/override before installing it. When acpi_tb_parse_root_table() prints something through EFI earlyprintk console, we'll have 4 mapping entries used. The current 4 slots setting of early_ioremap() seems to be too small for such a use case. """ Thus increase the slot to 8 in this patch to fix this issue. boot-time mappings become 512 page with this patch. Signed-off-by: Dave Young <dyoung@redhat.com> Cc: <stable@vger.kernel.org> # v3.16 Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-09-14 15:24:31 +01:00
Linus Torvalds	72d9310460	Make ARCH_HAS_FAST_MULTIPLIER a real config variable It used to be an ad-hoc hack defined by the x86 version of <asm/bitops.h> that enabled a couple of library routines to know whether an integer multiply is faster than repeated shifts and additions. This just makes it use the real Kconfig system instead, and makes x86 (which was the only architecture that did this) select the option. NOTE! Even for x86, this really is kind of wrong. If we cared, we would probably not enable this for builds optimized for netburst (P4), where shifts-and-adds are generally faster than multiplies. This patch does not change that kind of logic, though, it is purely a syntactic change with no code changes. This was triggered by the fact that we have other places that really want to know "do I want to expand multiples by constants by hand or not", particularly the hash generation code. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-09-13 11:14:53 -07:00
Frederic Weisbecker	3010279f0f	x86: Tell irq work about self IPI support x86 supports irq work self-IPIs when local apic is available. This is partly known on runtime so lets implement arch_irq_work_has_interrupt() accordingly. This should be safely called after setup_arch(). Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2014-09-13 18:38:29 +02:00
Peter Zijlstra	c5c38ef3d7	irq_work: Introduce arch_irq_work_has_interrupt() The nohz full code needs irq work to trigger its own interrupt so that the subsystem can work even when the tick is stopped. Lets introduce arch_irq_work_has_interrupt() that archs can override to tell about their support for this ability. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2014-09-13 18:38:07 +02:00
Linus Torvalds	7ee2d2d671	xen: bug fixes for 3.17-rc4 - Fix for PVHVM suspend/resume and migration - Don't pointlessly retry certain ballooning ops - Fix gntalloc when grefs have run out. - Fix PV boot if KSALR is enable or very large modules are used. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJUEZw4AAoJEFxbo/MsZsTRoHkIAJ5h287Wer2Yf4ALlFI47Esl AhcgVKi7WMJ6mBQUk/xpbSS3MZEuTuPJVbuiabJwRaZk9vX7w8q08yrAD8SOrCwY 6iGooaWEZ96P5DPI5fmWBuOgt5f2wfqzAp//wl/dK/kzr6Kw63huTXa0H0fmd8BY Gy13pN4JEPDyvjjZWFvDcrcEUA9eTcTaXQZisBUGuGictXySr5ovz9G3EKOtJP0D vkmDzApijFEEjJwZMGazgipng4mwh94fW4+7bs57s6iREpMzOJDTRKrl9suaXdza 5HA2ed7Au/qL8IwiQAFGxQpMBqNQxEGerlFfPBhRPuGQ+Ek3/pZF3BTU56w/nt4= =y/Ol -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.17-b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen bug fixes from David Vrabel: - fix for PVHVM suspend/resume and migration - don't pointlessly retry certain ballooning ops - fix gntalloc when grefs have run out. - fix PV boot if KSALR is enable or very large modules are used. * tag 'stable/for-linus-3.17-b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: don't copy bogus duplicate entries into kernel page tables xen/gntalloc: safely delete grefs in add_grefs() undo path xen/gntalloc: fix oops after runnning out of grant refs xen/balloon: cancel ballooning if adding new memory failed xen/manage: Always freeze/thaw processes when suspend/resuming	2014-09-11 16:52:29 -07:00
Dave Hansen	9298b815ef	x86: Add more disabled features The original motivation for these patches was for an Intel CPU feature called MPX. The patch to add a disabled feature for it will go in with the other parts of the support. But, in the meantime, there are a few other features than MPX that we can make assumptions about at compile-time based on compile options. Add them to disabled-features.h and check them with cpu_feature_enabled(). Note that this gets rid of the last things that needed an #ifdef CONFIG_X86_64 in cpufeature.h. Yay! Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140911211524.C0EC332A@viggo.jf.intel.com Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-09-11 14:30:17 -07:00
Dave Hansen	381aa07a9b	x86: Introduce disabled-features I believe the REQUIRED_MASK aproach was taken so that it was easier to consult in assembly (arch/x86/kernel/verify_cpu.S). DISABLED_MASK does not have the same restriction, but I implemented it the same way for consistency. We have a REQUIRED_MASK... which does two things: 1. Keeps a list of cpuid bits to check in very early boot and refuse to boot if those are not present. 2. Consulted during cpu_has() checks, which allows us to optimize out things at compile-time. In other words, if we KNOW we will not boot with the feature off, then we can safely assume that it will be present forever. But, we don't have a similar mechanism for CPU features which may be present but that we know we will not use. We simply use our existing mechanisms to repeatedly check the status of the bit at runtime (well, the alternatives patching helps here but it does not provide compile-time optimization). Adding a feature to disabled-features.h allows the bit to be checked via a new macro: cpu_feature_enabled(). Note that for features in DISABLED_MASK, checks with this macro have all of the benefits of an #ifdef. Before, we would have done this in a header: #ifdef CONFIG_X86_INTEL_MPX #define cpu_has_mpx cpu_has(X86_FEATURE_MPX) #else #define cpu_has_mpx 0 #endif and this in the code: if (cpu_has_mpx) do_some_mpx_thing(); Now, just add your feature to DISABLED_MASK and you can do this everywhere, and get the same benefits you would have from #ifdefs: if (cpu_feature_enabled(X86_FEATURE_MPX)) do_some_mpx_thing(); We need a new function and not a modification to cpu_has() because there are cases where we actually need to check the CPU itself, despite what features the kernel supports. The best example of this is a hypervisor which has no control over what features its guests are using and where the guest does not depend on the host for support. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140911211513.9E35E931@viggo.jf.intel.com Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-09-11 14:30:02 -07:00
Dave Hansen	c8128cceb4	x86: Axe the lightly-used cpu_has_pae cpu_has_pae is only referenced in one place: the X86_32 kexec code (in a file not even built on 64-bit). It hardly warrants its own macro, or the trouble we go to ensuring that it can't be called in X86_64 code. Axe the macro and replace it with a direct cpu feature check. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140911211511.AD76E774@viggo.jf.intel.com Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-09-11 14:30:01 -07:00
Stefan Bader	0b5a50635f	x86/xen: don't copy bogus duplicate entries into kernel page tables When RANDOMIZE_BASE (KASLR) is enabled; or the sum of all loaded modules exceeds 512 MiB, then loading modules fails with a warning (and hence a vmalloc allocation failure) because the PTEs for the newly-allocated vmalloc address space are not zero. WARNING: CPU: 0 PID: 494 at linux/mm/vmalloc.c:128 vmap_page_range_noflush+0x2a1/0x360() This is caused by xen_setup_kernel_pagetables() copying level2_kernel_pgt into level2_fixmap_pgt, overwriting many non-present entries. Without KASLR, the normal kernel image size only covers the first half of level2_kernel_pgt and module space starts after that. L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[ 0..255]->kernel [256..511]->module [511]->level2_fixmap_pgt[ 0..505]->module This allows 512 MiB of of module vmalloc space to be used before having to use the corrupted level2_fixmap_pgt entries. With KASLR enabled, the kernel image uses the full PUD range of 1G and module space starts in the level2_fixmap_pgt. So basically: L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[0..511]->kernel [511]->level2_fixmap_pgt[0..505]->module And now no module vmalloc space can be used without using the corrupt level2_fixmap_pgt entries. Fix this by properly converting the level2_fixmap_pgt entries to MFNs, and setting level1_fixmap_pgt as read-only. A number of comments were also using the the wrong L3 offset for level2_kernel_pgt. These have been corrected. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org	2014-09-10 15:23:42 +01:00
Waiman Long	6157c7e1bb	locking/rwlock, x86: Delete unused asm/rwlock.h and rwlock.S This patch removes the unused asm/rwlock.h and rwlock.S files. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1408037251-45918-3-git-send-email-Waiman.Long@hp.com Cc: Scott J Norton <scott.norton@hp.com> Cc: Borislav Petkov <bp@suse.de> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Francesco Fusco <ffusco@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Graf <tgraf@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-09-10 11:46:39 +02:00
Waiman Long	2ff810a7ef	locking/rwlock, x86: Clean up asm/spinlock*.h to remove old rwlock code As the x86 architecture now uses qrwlock for its read/write lock implementation, it is no longer necessary to keep the old rwlock code around. This patch removes the old rwlock code in the asm/spinlock.h and asm/spinlock_types.h files. Now the ARCH_USE_QUEUE_RWLOCK config parameter cannot be removed from x86/Kconfig or there will be a compilation error. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Dave Jones <davej@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Waiman Long <Waiman.Long@hp.com> Link: http://lkml.kernel.org/r/1408037251-45918-2-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-09-10 11:46:38 +02:00
Ingo Molnar	bdea534db8	Linux 3.17-rc4 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUDOW+AAoJEHm+PkMAQRiGOXYH/00TPKm8PdM5cXXG2YYYv9eT W99K7KD2i0/qiVtlGgjjvB7fO3K0HcZusTd2jmVd8IWntXvauq7Zpw5YZkjwu4KX Y1HCwwCd2aw0FoqgrJhNP3+j5Cr1BD/HLtbffjCe+A3tppOIis4Bwt2wJOoYlXpS hU9Jxxc4lcRo8YKbffouDo7PIneWeJy8N+WGpUR5BfJIEK0ZZtCUqn3/3WLX4FYu fE6uiF/bACTpKXU/mo4dDbhZp439H/QdwQc9B0F8+8CBDMXKaNHrPV7kN36T2SWa fD4boikTsi/yh9Ks1fvHbvNq2N0ihoMnja+vLRyvjAcAQv2fKG3OZtYgFWSdghU= =Xknd -----END PGP SIGNATURE----- Merge tag 'v3.17-rc4' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-09-09 06:48:07 +02:00
Andy Lutomirski	54eea9957f	x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls For slowpath syscalls, we initialize regs->ax to -ENOSYS and stick the syscall number into regs->orig_ax prior to any possible tracing and syscall execution. This is user-visible ABI used by ptrace syscall emulation and seccomp. For fastpath syscalls, there's no good reason not to do the same thing. It's even slightly simpler than what we're currently doing. It probably has no measureable performance impact. It should have no user-visible effect. The purpose of this patch is to prepare for two-phase syscall tracing, in which the first phase might modify the saved RAX without leaving the fast path. This change is just subtle enough that I'm keeping it separate. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/01218b493f12ae2f98034b78c9ae085e38e94350.1409954077.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-09-08 14:14:08 -07:00
Andy Lutomirski	e0ffbaabc4	x86: Split syscall_trace_enter into two phases This splits syscall_trace_enter into syscall_trace_enter_phase1 and syscall_trace_enter_phase2. Only phase 2 has full pt_regs, and only phase 2 is permitted to modify any of pt_regs except for orig_ax. The intent is that phase 1 can be called from the syscall fast path. In this implementation, phase1 can handle any combination of TIF_NOHZ (RCU context tracking), TIF_SECCOMP, and TIF_SYSCALL_AUDIT, unless seccomp requests a ptrace event, in which case phase2 is forced. In principle, this could yield a big speedup for TIF_NOHZ as well as for TIF_SECCOMP if syscall exit work were similarly split up. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/2df320a600020fda055fccf2b668145729dd0c04.1409954077.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-09-08 14:14:03 -07:00
Ingo Molnar	196cf35842	x86/tty/serial/8250: Clean up the asm/serial.h include file a bit - correct spelling - align fields vertically to make things more readable - make the layout of magic defines more obvious Cc: Mark Rustad <mark.d.rustad@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Link: http://lkml.kernel.org/r/1409972149-26272-1-git-send-email-jeffrey.t.kirsher@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-09-06 10:20:55 +02:00
Mark Rustad	9ea029f12a	x86/tty/serial/8250: Resolve missing-field-initializers warnings Resolve some missing-field-initializers warnings by using designated initialization in the expansion of the SERIAL_PORT_DFNS macro. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Link: http://lkml.kernel.org/r/1409972149-26272-1-git-send-email-jeffrey.t.kirsher@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-09-06 10:20:53 +02:00
Paolo Bonzini	54987b7afa	KVM: x86: propagate exception from permission checks on the nested page fault Currently, if a permission error happens during the translation of the final GPA to HPA, walk_addr_generic returns 0 but does not fill in walker->fault. To avoid this, add an x86_exception* argument to the translate_gpa function, and let it fill in walker->fault. The nested_page_fault field will be true, since the walk_mmu is the nested_mmu and translate_gpu instead operates on the "outer" (NPT) instance. Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-05 12:01:13 +02:00
Paolo Bonzini	ef54bcfeea	KVM: x86: skip writeback on injection of nested exception If a nested page fault happens during emulation, we will inject a vmexit, not a page fault. However because writeback happens after the injection, we will write ctxt->eip from L2 into the L1 EIP. We do not write back if an instruction caused an interception vmexit---do the same for page faults. Suggested-by: Gleb Natapov <gleb@kernel.org> Reviewed-by: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-05 12:01:06 +02:00
David Matlack	56f17dd3fb	kvm: x86: fix stale mmio cache bug The following events can lead to an incorrect KVM_EXIT_MMIO bubbling up to userspace: (1) Guest accesses gpa X without a memory slot. The gfn is cached in struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets the SPTE write-execute-noread so that future accesses cause EPT_MISCONFIGs. (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION covering the page just accessed. (3) Guest attempts to read or write to gpa X again. On Intel, this generates an EPT_MISCONFIG. The memory slot generation number that was incremented in (2) would normally take care of this but we fast path mmio faults through quickly_check_mmio_pf(), which only checks the per-vcpu mmio cache. Since we hit the cache, KVM passes a KVM_EXIT_MMIO up to userspace. This patch fixes the issue by using the memslot generation number to validate the mmio cache. Cc: stable@vger.kernel.org Signed-off-by: David Matlack <dmatlack@google.com> [xiaoguangrong: adjust the code to make it simpler for stable-tree fix.] Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Tested-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-03 10:03:42 +02:00
Oleg Nesterov	31d963389f	x86, fpu: Change __thread_fpu_begin() to use use_eager_fpu() __thread_fpu_begin() checks X86_FEATURE_EAGER_FPU by hand, we have a helper for that. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20140902175720.GA21656@redhat.com Reviewed-by: Suresh Siddha <sbsiddha@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-09-02 14:51:15 -07:00
Matthew Wilcox	bb693f13a0	x86: Remove set_pmd_pfn The last user of set_pmd_pfn() went away in commit `f03574f2d5`, so this has been dead code for over a year. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> arch/x86/include/asm/pgtable_32.h \| 3 --- arch/x86/mm/pgtable_32.c \| 35 ----------------------------------- 2 files changed, 38 deletions(-)	2014-09-01 10:15:31 +02:00
Jiang Liu	f3761db164	x86, irq: Fix build error caused by `9eabc99a63` Commit `9eabc99a63` causes following build error when IOAPIC is disabled. arch/x86/pci/irq.c: In function 'pirq_disable_irq': >> arch/x86/pci/irq.c:1259:2: error: implicit declaration of function 'mp_should_keep_irq' [-Werror=implicit-function-declaration] if (io_apic_assign_pci_irqs && !mp_should_keep_irq(&dev->dev) && ^ cc1: some warnings being treated as errors Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1409382916-10649-1-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-09-01 10:12:03 +02:00
Linus Torvalds	fd5984d7c8	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "One patch to avoid assigning interrupts we don't actually have on non-PC platforms, and two patches that addresses bugs in the new IOAPIC assignment code" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, irq, PCI: Keep IRQ assignment for runtime power management x86: irq: Fix bug in setting IOAPIC pin attributes x86: Fix non-PC platform kernel crash on boot due to NULL dereference	2014-08-29 17:22:27 -07:00
Hugh Dickins	b38af4721f	x86,mm: fix pte_special versus pte_numa Sasha Levin has shown oopses on ffffea0003480048 and ffffea0003480008 at mm/memory.c:1132, running Trinity on different 3.16-rc-next kernels: where zap_pte_range() checks page->mapping to see if PageAnon(page). Those addresses fit struct pages for pfns d2001 and d2000, and in each dump a register or a stack slot showed d2001730 or d2000730: pte flags 0x730 are PCD ACCESSED PROTNONE SPECIAL IOMAP; and Sasha's e820 map has a hole between cfffffff and 100000000, which would need special access. Commit `c46a7c817e` ("x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels") has broken vm_normal_page(): a PROTNONE SPECIAL pte no longer passes the pte_special() test, so zap_pte_range() goes on to try to access a non-existent struct page. Fix this by refining pte_special() (SPECIAL with PRESENT or PROTNONE) to complement pte_numa() (SPECIAL with neither PRESENT nor PROTNONE). A hint that this was a problem was that `c46a7c817e` added pte_numa() test to vm_normal_page(), and moved its is_zero_pfn() test from slow to fast path: This was papering over a pte_special() snag when the zero page was encountered during zap. This patch reverts vm_normal_page() to how it was before, relying on pte_special(). It still appears that this patch may be incomplete: aren't there other places which need to be handling PROTNONE along with PRESENT? For example, pte_mknuma() clears _PAGE_PRESENT and sets _PAGE_NUMA, but on a PROT_NONE area, that would make it pte_special(). This is side-stepped by the fact that NUMA hinting faults skipped PROT_NONE VMAs and there are no grounds where a NUMA hinting fault on a PROT_NONE VMA would be interesting. Fixes: `c46a7c817e` ("x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels") Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: <stable@vger.kernel.org> [3.16] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-29 16:28:16 -07:00
Radim Krčmář	13a34e067e	KVM: remove garbage arg to *hardware_{en,dis}able In the beggining was on_each_cpu(), which required an unused argument to kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten. Remove unnecessary arguments that stem from this. Signed-off-by: Radim KrÄmÃ¡Å™ <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-29 16:35:55 +02:00
Paolo Bonzini	656473003b	KVM: forward declare structs in kvm_types.h Opaque KVM structs are useful for prototypes in asm/kvm_host.h, to avoid "'struct foo' declared inside parameter list" warnings (and consequent breakage due to conflicting types). Move them from individual files to a generic place in linux/kvm_types.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-29 16:35:53 +02:00
Jiang Liu	9eabc99a63	x86, irq, PCI: Keep IRQ assignment for runtime power management Now IOAPIC driver dynamically allocates IRQ numbers for IOAPIC pins. We need to keep IRQ assignment for PCI devices during runtime power management, otherwise it may cause failure of device wakeups. Commit `3eec595235` "x86, irq, PCI: Keep IRQ assignment for PCI devices during suspend/hibernation" has fixed the issue for suspend/ hibernation, we also need the same fix for runtime device sleep too. Fix: https://bugzilla.kernel.org/show_bug.cgi?id=83271 Reported-and-Tested-by: EmanueL Czirai <amanual@openmailbox.org> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: EmanueL Czirai <amanual@openmailbox.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1409304383-18806-1-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-08-29 13:38:00 +02:00
Christoph Lameter	4ba2968420	percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t __get_cpu_var can paper over differences in the definitions of cpumask_var_t and either use the address of the cpumask variable directly or perform a fetch of the address of the struct cpumask allocated elsewhere. This is important particularly when using per cpu cpumask_var_t declarations because in one case we have an offset into a per cpu area to handle and in the other case we need to fetch a pointer from the offset. This patch introduces a new macro this_cpu_cpumask_var_ptr() that is defined where cpumask_var_t is defined and performs the proper actions. All use cases where __get_cpu_var is used with cpumask_var_t are converted to the use of this_cpu_cpumask_var_ptr(). Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-08-28 08:58:57 -04:00
Christoph Lameter	e16321709c	uv: Replace __get_cpu_var Use __this_cpu_read instead. Cc: Hedi Berriche <hedi@sgi.com> Cc: Mike Travis <travis@sgi.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-08-26 13:45:50 -04:00
Christoph Lameter	89cbc76768	x86: Replace __get_cpu_var uses __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int x = &__get_cpu_var(y); Converts to int x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Acked-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-08-26 13:45:49 -04:00
Ingo Molnar	83bc90e115	Merge branch 'linus' into perf/core, to fix conflicts Conflicts: arch/x86/kernel/cpu/perf_event_intel_uncore*.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-08-24 22:32:24 +02:00
Ingo Molnar	44afe60294	A bunch of cleanups from Henrique. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJT9z1RAAoJEBLB8Bhh3lVKA0UP/0sCiH8JtSaP+qdvuU5xtldM L1O76MrQ1De6BO1XqjucsxClbVPPxBE3nBGzxfSoIMdil0A8PmHmJj6/5GILAosd kX7LxoFVOFOQg+hekXkcd4t4lmiTPcmfgzywOnLykB9Iv4Bn12RpPhGjkgywHhGC 6Rvh9cawqJb2rAui2zseK9RAHHhTasXqo7vWAm5kQPFE98fV6gp4Pcf/iJb8Rd09 7Rp5f5vSvWkmCKx3nCzNGGkZgvwVD+sJb9jQJ6QFaoPYmNa0Te3mJ5nl32aNrsct 5s6BPEimHJc4lt5IUMc1MRhg2DNrdmcUxYaonOraeG59XRtSxH9OvOSCS291qd/m lZOADMjxD/d0mvpY937UOOiZ2yC0RDNx2OQDnQTEWYtGJnzqq1myVVEAo6bkPYhd 2JrqtElUY9o2iO2yaFge9/Po4rAtvHlgzKcdldSUOk6xLkj1FpWZK+PqO5vqKets KltJQupwUMgXCQ9jCF+Y4mdyuncRm/GhhCYsiWr16Xjp/znX6hS6i1DK9j70uim3 7Uu09fdlP7V3dFiX2cuKwH2tQ+jcAIOoZjtQKwhSLOrva/BWqSOFbX9W3UxQH0X9 5JVxTn6Rutr6pn7rm9zG82Xq+UueQMEzE1kklwjmmiNI2044PCvuYtWvfTY4pJ71 oW09NvqBAlMMaWDJ92ps =brfb -----END PGP SIGNATURE----- Merge tag 'microcode_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/microcode Pull x86/microcode updates from Borislav Petkov: "A bunch of cleanups from Henrique." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-08-24 11:27:42 +02:00
Radim Krčmář	ae97a3b818	KVM: x86: introduce sched_in to kvm_x86_ops sched_in preempt notifier is available for x86, allow its use in specific virtualization technlogies as well. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-21 18:45:22 +02:00
Ross Zwisler	448466b723	x86: Remove obsolete comment in uapi/e820.h A comment introduced by this old commit: `028b785888` ("x86 boot: extend some internal memory map arrays to handle larger EFI input") had to do with some nested preprocessor directives. The directives were split into separate files by this commit: `af170c5061` ("UAPI: (Scripted) Disintegrate arch/x86/include/asm") The comment explaining their interaction was retained and is now present in arch/x86/include/uapi/asm/e820.h. This comment is no longer correct, so delete it. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Link: http://lkml.kernel.org/r/1400521824-21040-1-git-send-email-ross.zwisler@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-08-21 08:43:39 +02:00
Paolo Bonzini	0d234daf7e	Revert "KVM: x86: Increase the number of fixed MTRR regs to 10" This reverts commit `682367c494`, which causes 32-bit SMP Windows 7 guests to panic. SeaBIOS has a limit on the number of MTRRs that it can handle, and this patch exceeded the limit. Better revert it. Thanks to Nadav Amit for debugging the cause. Cc: stable@nongnu.org Reported-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-19 15:12:28 +02:00
Wanpeng Li	4473b570a7	KVM: x86: drop fpu_activate hook The only user of the fpu_activate hook was dropped in commit `2d04a05bd7` (KVM: x86 emulator: emulate CLTS internally, 2011-04-20). vmx_fpu_activate and svm_fpu_activate are still called on #NM (and for Intel CLTS), but never from common code; hence, there's no need for a hook. Reviewed-by: Yang Zhang <yang.z.zhang@intel.com> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-19 15:12:28 +02:00
Josh Triplett	9def39be4e	x86: Support compiling out human-friendly processor feature names The table mapping CPUID bits to human-readable strings takes up a non-trivial amount of space, and only exists to support /proc/cpuinfo and a couple of kernel messages. Since programs depend on the format of /proc/cpuinfo, force inclusion of the table when building with /proc support; otherwise, support omitting that table to save space, in which case the kernel messages will print features numerically instead. In addition to saving 1408 bytes out of vmlinux, this also saves 1373 bytes out of the uncompressed setup code, which contributes directly to the size of bzImage. Signed-off-by: Josh Triplett <josh@joshtriplett.org>	2014-08-17 15:54:00 -07:00
Linus Torvalds	49899007b9	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull idle update from Len Brown: "Two Intel-platform-specific updates to intel_idle, and a cosmetic tweak to the turbostat utility" * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: tweak whitespace in output format intel_idle: Broadwell support intel_idle: Disable Baytrail Core and Module C6 auto-demotion	2014-08-16 09:25:34 -06:00
Len Brown	8c058d53f6	intel_idle: Disable Baytrail Core and Module C6 auto-demotion Power efficiency improves on Baytrail (Intel Atom Processor E3000) when Linux disables C6 auto-demotion. Based on work by Srinidhi Kasagar <srinidhi.kasagar@intel.com>. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org	2014-08-15 17:06:14 -04:00
Peter Zijlstra	f6b4ecee0e	locking,x86: Kill atomic_or_long() There are no users, kill it. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20140508135851.768177189@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-08-14 12:48:02 +02:00
Linus Torvalds	81c02a21b2	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/apic updates from Thomas Gleixner: "This is a major overhaul to the x86 apic subsystem consisting of the following parts: - Remove obsolete APIC driver abstractions (David Rientjes) - Use the irqdomain facilities to dynamically allocate IRQs for IOAPICs. This is a prerequisite to enable IOAPIC hotplug support, and it also frees up wasted vectors (Jiang Liu) - Misc fixlets. Despite the hickup in Ingos previous pull request - caused by the missing fixup for the suspend/resume issue reported by Borislav - I strongly recommend that this update finds its way into 3.17. Some history for you: This is preparatory work for physical IOAPIC hotplug. The first attempt to support this was done by Yinghai and I shot it down because it just added another layer of obscurity and complexity to the already existing mess without tackling the underlying shortcomings of the current implementation. After quite some on- and offlist discussions, I requested that the design of this functionality must use generic infrastructure, i.e. irq domains, which provide all the mechanisms to dynamically map linux interrupt numbers to physical interrupts. Jiang picked up the idea and did a great job of consolidating the existing interfaces to manage the x86 (IOAPIC) interrupt system by utilizing irq domains. The testing in tip, Linux-next and inside of Intel on various machines did not unearth any oddities until Borislav exposed it to one of his oddball machines. The issue was resolved quickly, but unfortunately the fix fell through the cracks and did not hit the tip tree before Ingo sent the pull request. Not entirely Ingos fault, I also assumed that the fix was already merged when Ingo asked me whether he could send it. Nevertheless this work has a proper design, has undergone several rounds of review and the final fallout after applying it to tip and integrating it into Linux-next has been more than moderate. It's the ground work not only for IOAPIC hotplug, it will also allow us to move the lowlevel vector allocation into the irqdomain hierarchy, which will benefit other architectures as well. Patches are posted already, but they are on hold for two weeks, see below. I really appreciate the competence and responsiveness Jiang has shown in course of this endavour. So I'm sure that any fallout of this will be addressed in a timely manner. FYI, I'm vanishing for 2 weeks into my annual kids summer camp kitchen duty^Wvacation, while you folks are drooling at KS/LinuxCon :) But HPA will have a look at the hopefully zero fallout until I'm back" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) x86, irq, PCI: Keep IRQ assignment for PCI devices during suspend/hibernation x86/apic/vsmp: Make is_vsmp_box() static x86, apic: Remove enable_apic_mode callback x86, apic: Remove setup_portio_remap callback x86, apic: Remove multi_timer_check callback x86, apic: Replace noop_check_apicid_used x86, apic: Remove check_apicid_present callback x86, apic: Remove mps_oem_check callback x86, apic: Remove smp_callin_clear_local_apic callback x86, apic: Replace trampoline physical addresses with defaults x86, apic: Remove x86_32_numa_cpu_node callback x86: intel-mid: Use the new io_apic interfaces x86, vsmp: Remove is_vsmp_box() from apic_is_clustered_box() x86, irq: Clean up irqdomain transition code x86, irq, devicetree: Release IOAPIC pin when PCI device is disabled x86, irq, SFI: Release IOAPIC pin when PCI device is disabled x86, irq, mpparse: Release IOAPIC pin when PCI device is disabled x86, irq, ACPI: Release IOAPIC pin when PCI device is disabled x86, irq: Introduce helper functions to release IOAPIC pin x86, irq: Simplify the way to handle ISA IRQ ...	2014-08-13 18:23:32 -06:00
Linus Torvalds	7453f33b2e	Merge branch 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/xsave changes from Peter Anvin: "This is a patchset to support the XSAVES instruction required to support context switch of supervisor-only features in upcoming silicon. This patchset missed the 3.16 merge window, which is why it is based on 3.15-rc7" * 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, xsave: Add forgotten inline annotation x86/xsaves: Clean up code in xstate offsets computation in xsave area x86/xsave: Make it clear that the XSAVE macros use (%edi)/(%rdi) Define kernel API to get address of each state in xsave area x86/xsaves: Enable xsaves/xrstors x86/xsaves: Call booting time xsaves and xrstors in setup_init_fpu_buf x86/xsaves: Save xstate to task's xsave area in __save_fpu during booting time x86/xsaves: Add xsaves and xrstors support for booting time x86/xsaves: Clear reserved bits in xsave header x86/xsaves: Use xsave/xrstor for saving and restoring user space context x86/xsaves: Use xsaves/xrstors for context switch x86/xsaves: Use xsaves/xrstors to save and restore xsave area x86/xsaves: Define a macro for handling xsave/xrstor instruction fault x86/xsaves: Define macros for xsave instructions x86/xsaves: Change compacted format xsave area header x86/alternative: Add alternative_input_2 to support alternative with two features and input x86/xsaves: Add a kernel parameter noxsaves to disable xsaves/xrstors	2014-08-13 18:20:04 -06:00
Andi Kleen	86a04461a9	perf/x86: Revamp PEBS event selection The basic idea is that it does not make sense to list all PEBS events individually. The list is very long, sometimes outdated and the hardware doesn't need it. If an event does not support PEBS it will just not count, there is no security issue. We need to only list events that something special, like supporting load or store addresses. This vastly simplifies the PEBS event selection. It also speeds up the scheduling because the scheduler doesn't have to walk as many constraints. Bugs fixed: - We do not allow setting forbidden flags with PEBS anymore (SDM 18.9.4), except for the special cycle event. This is done using a new constraint macro that also matches on the event flags. - Correct DataLA and load/store/na flags reporting on Haswell [Requires a followon patch] - We did not allow all PEBS events on Haswell: We were missing some valid subevents in d1-d2 (MEM_LOAD_UOPS_RETIRED., MEM_LOAD_UOPS_RETIRED_L3_HIT_RETIRED.) This includes the changes proposed by Stephane earlier and obsoletes his patchkit (except for some changes on pre Sandy Bridge/Silvermont CPUs) I only did Sandy Bridge and Silvermont and later so far, mostly because these are the parts I could directly confirm the hardware behavior with hardware architects. Also I do not believe the older CPUs have any missing events in their PEBS list, so there's no pressing need to change them. I did not implement the flag proposed by Peter to allow setting forbidden flags. If really needed this could be implemented on to of this patch. v2: Fix broken store events on SNB/IVB (Stephane Eranian) v3: More fixes. Rename some arguments (Stephane Eranian) v4: List most Haswell events individually again to report memory operation type correctly. Add new flags to describe load/store/na for datala. Update description. Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1407785233-32193-2-git-send-email-eranian@google.com Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Cc: Mark Davies <junk@eslaf.co.uk> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-08-13 07:51:13 +02:00
Vivek Goyal	dd5f726076	kexec: support for kexec on panic using new system call This patch adds support for loading a kexec on panic (kdump) kernel usning new system call. It prepares ELF headers for memory areas to be dumped and for saved cpu registers. Also prepares the memory map for second kernel and limits its boot to reserved areas only. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-08 15:57:33 -07:00
Vivek Goyal	27f48d3e63	kexec-bzImage64: support for loading bzImage using 64bit entry This is loader specific code which can load bzImage and set it up for 64bit entry. This does not take care of 32bit entry or real mode entry. 32bit mode entry can be implemented if somebody needs it. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-08 15:57:33 -07:00
Andy Lutomirski	a6c19dfe39	arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area The core mm code will provide a default gate area based on FIXADDR_USER_START and FIXADDR_USER_END if !defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR). This default is only useful for ia64. arm64, ppc, s390, sh, tile, 64-bit UML, and x86_32 have their own code just to disable it. arm, 32-bit UML, and x86_64 have gate areas, but they have their own implementations. This gets rid of the default and moves the code into ia64. This should save some code on architectures without a gate area: it's now possible to inline the gate_area functions in the default case. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Nathan Lynch <nathan_lynch@mentor.com> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [in principle] Acked-by: Richard Weinberger <richard@nod.at> [for um] Acked-by: Will Deacon <will.deacon@arm.com> [for arm64] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Nathan Lynch <Nathan_Lynch@mentor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-08 15:57:27 -07:00
Laura Abbott	308c09f17d	lib/scatterlist: make ARCH_HAS_SG_CHAIN an actual Kconfig Rather than have architectures #define ARCH_HAS_SG_CHAIN in an architecture specific scatterlist.h, make it a proper Kconfig option and use that instead. At same time, remove the header files are are now mostly useless and just include asm-generic/scatterlist.h. [sfr@canb.auug.org.au: powerpc files now need asm/dma.h] Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86] Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [powerpc] Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-08 15:57:26 -07:00
Linus Torvalds	7725131982	ACPI and power management updates for 3.17-rc1 - ACPICA update to upstream version 20140724. That includes ACPI 5.1 material (support for the _CCA and _DSD predefined names, changes related to the DMAR and PCCT tables and ARM support among other things) and cleanups related to using ACPICA's header files. A major part of it is related to acpidump and the core code used by that utility. Changes from Bob Moore, David E Box, Lv Zheng, Sascha Wildner, Tomasz Nowicki, Hanjun Guo. - Radix trees for memory bitmaps used by the hibernation core from Joerg Roedel. - Support for waking up the system from suspend-to-idle (also known as the "freeze" sleep state) using ACPI-based PCI wakeup signaling (Rafael J Wysocki). - Fixes for issues related to ACPI button events (Rafael J Wysocki). - New device ID for an ACPI-enumerated device included into the Wildcat Point PCH from Jie Yang. - ACPI video updates related to backlight handling from Hans de Goede and Linus Torvalds. - Preliminary changes needed to support ACPI on ARM from Hanjun Guo and Graeme Gregory. - ACPI PNP core cleanups from Arjun Sreedharan and Zhang Rui. - Cleanups related to ACPI_COMPANION() and ACPI_HANDLE() macros (Rafael J Wysocki). - ACPI-based device hotplug cleanups from Wei Yongjun and Rafael J Wysocki. - Cleanups and improvements related to system suspend from Lan Tianyu, Randy Dunlap and Rafael J Wysocki. - ACPI battery cleanup from Wei Yongjun. - cpufreq core fixes from Viresh Kumar. - Elimination of a deadband effect from the cpufreq ondemand governor and intel_pstate driver cleanups from Stratos Karafotis. - 350MHz CPU support for the powernow-k6 cpufreq driver from Mikulas Patocka. - Fix for the imx6 cpufreq driver from Anson Huang. - cpuidle core and governor cleanups from Daniel Lezcano, Sandeep Tripathy and Mohammad Merajul Islam Molla. - Build fix for the big_little cpuidle driver from Sachin Kamat. - Configuration fix for the Operation Performance Points (OPP) framework from Mark Brown. - APM cleanup from Jean Delvare. - cpupower utility fixes and cleanups from Peter Senna Tschudin, Andrey Utkin, Himangi Saraogi, Rickard Strandqvist, Thomas Renninger. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJT4nhtAAoJEILEb/54YlRxtZEP/2rtVQFSFdAW8l0Xm1SeSsl4 EnZpSNT1TFn+NdG23vSIot5Jzdz1/dLfeoJEbXpoVt4DPC9/PK4HPlv5FEDQYfh5 srftvvGcAva969sXzSBRNUeR+M8Yd2RdoYCfmqTEUjzf8GJLL4jC0VAIwMtsQklt EbiQX8JaHQS7RIql7MDg1N2vaTo+zxkf39Kkcl56usmO/uATP7cAPjFreF/xQ3d8 OyBhz1cOXIhPw7bd9Dv9AgpJzA8WFpktDYEgy2sluBWMv+mLYjdZRCFkfpIRzmea pt+hJDeAy8ZL6/bjWCzz2x6wG7uJdDLblreI28sgnJx/VHR3Co6u4H1BqUBj18ct CHV6zQ55WFmx9/uJqBtwFy333HS2ysJziC5ucwmg8QjkvAn4RK8S0qHMfRvSSaHj F9ejnHGxyrc3zzfsngUf/VXIp67FReaavyKX3LYxjHjMPZDMw2xCtCWEpUs52l2o fAbkv8YFBbUalIv0RtELH5XnKQ2ggMP8UgvT74KyfXU6LaliH8lEV20FFjMgwrPI sMr2xk04eS8mNRNAXL8OMMwvh6DY/Qsmb7BVg58RIw6CdHeFJl834yztzcf7+j56 4oUmA16QYBCFA3udGQ3Tb07mi8XTfrMdTOGA0koQG9tjswKXuLUXUk9WAXZe4vml ItRpZKE86BCs3mLJMYre =ZODv -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: "Again, ACPICA leads the pack (47 commits), followed by cpufreq (18 commits) and system suspend/hibernation (9 commits). From the new code perspective, the ACPICA update brings ACPI 5.1 to the table, including a new device configuration object called _DSD (Device Specific Data) that will hopefully help us to operate device properties like Device Trees do (at least to some extent) and changes related to supporting ACPI on ARM. Apart from that we have hibernation changes making it use radix trees to store memory bitmaps which should speed up some operations carried out by it quite significantly. We also have some power management changes related to suspend-to-idle (the "freeze" sleep state) support and more preliminary changes needed to support ACPI on ARM (outside of ACPICA). The rest is fixes and cleanups pretty much everywhere. Specifics: - ACPICA update to upstream version 20140724. That includes ACPI 5.1 material (support for the _CCA and _DSD predefined names, changes related to the DMAR and PCCT tables and ARM support among other things) and cleanups related to using ACPICA's header files. A major part of it is related to acpidump and the core code used by that utility. Changes from Bob Moore, David E Box, Lv Zheng, Sascha Wildner, Tomasz Nowicki, Hanjun Guo. - Radix trees for memory bitmaps used by the hibernation core from Joerg Roedel. - Support for waking up the system from suspend-to-idle (also known as the "freeze" sleep state) using ACPI-based PCI wakeup signaling (Rafael J Wysocki). - Fixes for issues related to ACPI button events (Rafael J Wysocki). - New device ID for an ACPI-enumerated device included into the Wildcat Point PCH from Jie Yang. - ACPI video updates related to backlight handling from Hans de Goede and Linus Torvalds. - Preliminary changes needed to support ACPI on ARM from Hanjun Guo and Graeme Gregory. - ACPI PNP core cleanups from Arjun Sreedharan and Zhang Rui. - Cleanups related to ACPI_COMPANION() and ACPI_HANDLE() macros (Rafael J Wysocki). - ACPI-based device hotplug cleanups from Wei Yongjun and Rafael J Wysocki. - Cleanups and improvements related to system suspend from Lan Tianyu, Randy Dunlap and Rafael J Wysocki. - ACPI battery cleanup from Wei Yongjun. - cpufreq core fixes from Viresh Kumar. - Elimination of a deadband effect from the cpufreq ondemand governor and intel_pstate driver cleanups from Stratos Karafotis. - 350MHz CPU support for the powernow-k6 cpufreq driver from Mikulas Patocka. - Fix for the imx6 cpufreq driver from Anson Huang. - cpuidle core and governor cleanups from Daniel Lezcano, Sandeep Tripathy and Mohammad Merajul Islam Molla. - Build fix for the big_little cpuidle driver from Sachin Kamat. - Configuration fix for the Operation Performance Points (OPP) framework from Mark Brown. - APM cleanup from Jean Delvare. - cpupower utility fixes and cleanups from Peter Senna Tschudin, Andrey Utkin, Himangi Saraogi, Rickard Strandqvist, Thomas Renninger" * tag 'pm+acpi-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (118 commits) ACPI / LPSS: add LPSS device for Wildcat Point PCH ACPI / PNP: Replace faulty is_hex_digit() by isxdigit() ACPICA: Update version to 20140724. ACPICA: ACPI 5.1: Update for PCCT table changes. ACPICA/ARM: ACPI 5.1: Update for GTDT table changes. ACPICA/ARM: ACPI 5.1: Update for MADT changes. ACPICA/ARM: ACPI 5.1: Update for FADT changes. ACPICA: ACPI 5.1: Support for the _CCA predifined name. ACPICA: ACPI 5.1: New notify value for System Affinity Update. ACPICA: ACPI 5.1: Support for the _DSD predefined name. ACPICA: Debug object: Add current value of Timer() to debug line prefix. ACPICA: acpihelp: Add UUID support, restructure some existing files. ACPICA: Utilities: Fix local printf issue. ACPICA: Tables: Update for DMAR table changes. ACPICA: Remove some extraneous printf arguments. ACPICA: Update for comments/formatting. No functional changes. ACPICA: Disassembler: Add support for the ToUUID opererator (macro). ACPICA: Remove a redundant cast to acpi_size for ACPI_OFFSET() macro. ACPICA: Work around an ancient GCC bug. ACPI / processor: Make it possible to get local x2apic id via _MAT ...	2014-08-06 20:34:19 -07:00
Linus Torvalds	930e0312bc	sound updates for 3.17-rc1 There've been many updates in ASoC side at this time, especially the framework enhancement for multiple CODECs on a single DAI and more componentization works. The only major change in ALSA core is the addition of timestamp type in sw_params field. This should behave in backward compatible way. Other than that, there are lots of small changes and new drivers in wide range, including a large code cut in HD-audio driver for deprecated static quirks. Some highlights are below: ALSA Core: - Add the new timestamp type field to sw_params to choose MONOTONIC_RAW type HD-audio: - Continued conversion to standard printk macros, generic code cleanups - Removal of obsoleted static quirk codes for Conexant and C-Media codecs - Fixups for HP Envy TS, Dell XPS 15, HP and Dell mute/mic LED, Gigabyte BXBT-2807 mobo - Intel Braswell support ASoC: - Support for multiple CODECs attached to a single DAI, enabling systems with for example multiple DAC/speaker drivers on a single link, contributed by Benoit Cousson based on work from Misael Lopez Cruz - Support for byte controls larger than 256 bytes based on the use of TLVs contributed by Omair Mohammed Abdullah - More componentisation work from Lars-Peter Clausen - The remainder of the conversions of CODEC drivers to params_width() by Mark Brown - Drivers for Cirrus Logic CS4265, Freescale i.MX ASRC blocks, Realtek RT286 and RT5670, Rockchip RK3xxx I2S controllers and Texas Instruments TAS2552 - Lots of updates and fixes, especially to the DaVinci, Intel, Freescale, Realtek, and rcar drivers -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABAgAGBQJT4fj0AAoJEGwxgFQ9KSmkXQ0QAIiRmVg40aiJoEdOLGgzNZtq r/nXj69AuB6JSy0hKbFyyijjCcRpyCCGvjDYlogjT75M3c35Npz/m85oZHx2tajD SB5OA+QxO4EQ3C0GjITIRHJROm4MM8/rnbnNYTsWnEGRkobTFTl0rHbSkA85RGFt 0zZqqs1R0s/nO9PMQ+5PA5x9xVFiZs2COeCK0CFA9s2ACf/hbxJBRIqYpIFWOo78 9L41jBOFuC/hIb4qwjgmsCWbKe1KQysTAf+Wty0CKipJ6VhfCbPn1Qn1zXGeUOxc mj4eZ6LpJTrVMr/UN02c5vgPOiaBrQ7fWZo3dVHLlIjC6cEI1tUvNYAin7CMEzx8 DUsvo9p30OheA+ijc9wKaYFY6YmmJZRtpnnMd39i0oPG+bhvoV7vjXjJSB1sLJt1 o82xLpVL4Th8H+DMDVwA7UIBvvZGZBusw1qsNGfcOPrmExi4ScGhA0gSOO6W2y1z VQLRbiXB/HtJGxeqWL6RqJOcLBOlJNmsk4UZMOSCu2OZrWd5I8MuRrNWeHDqhX1H +VDEJVhFmM21vMpnobzEPxWsMgTVIAVf3Thh+WgaPxL4Krh0vkpZsgZk16VVmy/o OJJF3n41FND4n9zSjOe4MkuL8UCOUpKCaBdqj9K1s6UKwOEKuDNslyT/zqutRWK5 x1uApU5y+E4iQT/b7cmA =RL72 -----END PGP SIGNATURE----- Merge tag 'sound-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "There've been many updates in ASoC side at this time, especially the framework enhancement for multiple CODECs on a single DAI and more componentization works. The only major change in ALSA core is the addition of timestamp type in sw_params field. This should behave in backward compatible way. Other than that, there are lots of small changes and new drivers in wide range, including a large code cut in HD-audio driver for deprecated static quirks. Some highlights are below: ALSA Core: - Add the new timestamp type field to sw_params to choose MONOTONIC_RAW type HD-audio: - Continued conversion to standard printk macros, generic code cleanups - Removal of obsoleted static quirk codes for Conexant and C-Media codecs - Fixups for HP Envy TS, Dell XPS 15, HP and Dell mute/mic LED, Gigabyte BXBT-2807 mobo - Intel Braswell support ASoC: - Support for multiple CODECs attached to a single DAI, enabling systems with for example multiple DAC/speaker drivers on a single link, contributed by Benoit Cousson based on work from Misael Lopez Cruz - Support for byte controls larger than 256 bytes based on the use of TLVs contributed by Omair Mohammed Abdullah - More componentisation work from Lars-Peter Clausen - The remainder of the conversions of CODEC drivers to params_width() by Mark Brown - Drivers for Cirrus Logic CS4265, Freescale i.MX ASRC blocks, Realtek RT286 and RT5670, Rockchip RK3xxx I2S controllers and Texas Instruments TAS2552 - Lots of updates and fixes, especially to the DaVinci, Intel, Freescale, Realtek, and rcar drivers" * tag 'sound-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (402 commits) ALSA: usb-audio: Whitespace cleanups for sound/usb/midi.* ALSA: usb-audio: Respond to suspend and resume callbacks for MIDI input sound/oss/pss: Remove typedefs pss_mixerdata and pss_confdata sound/oss/opl3: Remove typedef opl_devinfo ALSA: fireworks: fix specifiers in format strings for propper output ASoC: imx-audmux: Use uintptr_t for port numbers ASoC: davinci: Enable menuconfig entry for McASP ASoC: fsl_asrc: Don't access members of config before checking it ASoC: fsl_sarc_dma: Check pair before using it ASoC: adau1977: Fix truncation warning on 64 bit architectures ALSA: virtuoso: add Xonar Essence STX II support ALSA: riptide: fix %d confusingly prefixed with 0x in format strings ALSA: fireworks: fix %d confusingly prefixed with 0x in format strings ALSA: hda - add codec ID for Braswell display audio codec ALSA: hda - add PCI IDs for Intel Braswell ALSA: usb-audio: Adjust Gamecom 780 volume level ALSA: usb-audio: improve dmesg source grepability ASoC: rt5670: Fix duplicate const warnings ASoC: rt5670: Staticise non-exported symbols ASoC: Intel: update stream only on stream IPC msgs ...	2014-08-06 20:07:24 -07:00
Linus Torvalds	98a96f2022	Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso updates from Ingo Molnar: "Further simplifications and improvements to the VDSO code, by Andy Lutomirski" * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86_64/vsyscall: Fix warn_bad_vsyscall log output x86/vdso: Set VM_MAYREAD for the vvar vma x86, vdso: Get rid of the fake section mechanism x86, vdso: Move the vvar area before the vdso text	2014-08-04 17:27:47 -07:00
Linus Torvalds	5637a2a3e9	Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 UV TLB update from Ingo Molnar: "UV TLB shootdown logic updates for version of the UV architecture" * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/uv: Update the UV3 TLB shootdown logic	2014-08-04 17:24:56 -07:00
Linus Torvalds	8556d44fee	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updates from Ingo Molnar: "The main changes in this cycle are: - Intel SOC driver updates, by Aubrey Li. - TS5500 platform updates, by Vivien Didelot" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pmc_atom: Silence shift wrapping warnings in pmc_sleep_tmr_show() x86/pmc_atom: Expose PMC device state and platform sleep state x86/pmc_atom: Eisable a few S0ix wake up events for S0ix residency x86/platform: New Intel Atom SOC power management controller driver x86/platform/ts5500: Add support for TS-5400 boards x86/platform/ts5500: Add a 'name' sysfs attribute x86/platform/ts5500: Use the DEVICE_ATTR_RO() macro	2014-08-04 17:20:08 -07:00
Linus Torvalds	ce47479632	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "The main change in this cycle is the rework of the TLB range flushing code, to simplify, fix and consolidate the code. By Dave Hansen" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Set TLB flush tunable to sane value (33) x86/mm: New tunable for single vs full TLB flush x86/mm: Add tracepoints for TLB flushes x86/mm: Unify remote INVLPG code x86/mm: Fix missed global TLB flush stat x86/mm: Rip out complicated, out-of-date, buggy TLB flushing x86/mm: Clean up the TLB flushing code x86/smep: Be more informative when signalling an SMEP fault	2014-08-04 17:15:45 -07:00
Linus Torvalds	76f09aa464	Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI changes from Ingo Molnar: "Main changes in this cycle are: - arm64 efi stub fixes, preservation of FP/SIMD registers across firmware calls, and conversion of the EFI stub code into a static library - Ard Biesheuvel - Xen EFI support - Daniel Kiper - Support for autoloading the efivars driver - Lee, Chun-Yi - Use the PE/COFF headers in the x86 EFI boot stub to request that the stub be loaded with CONFIG_PHYSICAL_ALIGN alignment - Michael Brown - Consolidate all the x86 EFI quirks into one file - Saurabh Tangri - Additional error logging in x86 EFI boot stub - Ulf Winkelvos - Support loading initrd above 4G in EFI boot stub - Yinghai Lu - EFI reboot patches for ACPI hardware reduced platforms" * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) efi/arm64: Handle missing virtual mapping for UEFI System Table arch/x86/xen: Silence compiler warnings xen: Silence compiler warnings x86/efi: Request desired alignment via the PE/COFF headers x86/efi: Add better error logging to EFI boot stub efi: Autoload efivars efi: Update stale locking comment for struct efivars arch/x86: Remove efi_set_rtc_mmss() arch/x86: Replace plain strings with constants xen: Put EFI machinery in place xen: Define EFI related stuff arch/x86: Remove redundant set_bit(EFI_MEMMAP) call arch/x86: Remove redundant set_bit(EFI_SYSTEM_TABLES) call efi: Introduce EFI_PARAVIRT flag arch/x86: Do not access EFI memory map if it is not available efi: Use early_mem() instead of early_io() arch/ia64: Define early_memunmap() x86/reboot: Add EFI reboot quirk for ACPI Hardware Reduced flag efi/reboot: Allow powering off machines using EFI efi/reboot: Add generic wrapper around EfiResetSystem() ...	2014-08-04 17:13:50 -07:00
Linus Torvalds	e9c9eecaba	Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpufeature updates from Ingo Molnar: "The main changes in this cycle were: - Continued cleanups of CPU bugs mis-marked as 'missing features', by Borislav Petkov. - Detect the xsaves/xrstors feature and releated cleanup, by Fenghua Yu" * 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpu: Kill cpu_has_mp x86, amd: Cleanup init_amd x86/cpufeature: Add bug flags to /proc/cpuinfo x86, cpufeature: Convert more "features" to bugs x86/xsaves: Detect xsaves/xrstors feature x86/cpufeature.h: Reformat x86 feature macros	2014-08-04 17:12:45 -07:00
Linus Torvalds	19d402c1e7	Merge branches 'x86-build-for-linus', 'x86-cleanups-for-linus' and 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build/cleanup/debug updates from Ingo Molnar: "Robustify the build process with a quirk to avoid GCC reordering related bugs. Two code cleanups. Simplify entry_64.S CFI annotations, by Jan Beulich" * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, build: Change code16gcc.h from a C header to an assembly header * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Simplify __HAVE_ARCH_CMPXCHG tests x86/tsc: Get rid of custom DIV_ROUND() macro * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/debug: Drop several unnecessary CFI annotations	2014-08-04 16:56:16 -07:00
Linus Torvalds	ef35ad26f8	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Kernel side changes: - Consolidate the PMU interrupt-disabled code amongst architectures (Vince Weaver) - misc fixes Tooling changes (new features, user visible changes): - Add support for pagefault tracing in 'trace', please see multiple examples in the changeset messages (Stanislav Fomichev). - Add pagefault statistics in 'trace' (Stanislav Fomichev) - Add header for columns in 'top' and 'report' TUI browsers (Jiri Olsa) - Add pagefault statistics in 'trace' (Stanislav Fomichev) - Add IO mode into timechart command (Stanislav Fomichev) - Fallback to syscalls:* when raw_syscalls:* is not available in the perl and python perf scripts. (Daniel Bristot de Oliveira) - Add --repeat global option to 'perf bench' to be used in benchmarks such as the existing 'futex' one, that was modified to use it instead of a local option. (Davidlohr Bueso) - Fix fd -> pathname resolution in 'trace', be it using /proc or a vfs_getname probe point. (Arnaldo Carvalho de Melo) - Add suggestion of how to set perf_event_paranoid sysctl, to help non-root users trying tools like 'trace' to get a working environment. (Arnaldo Carvalho de Melo) - Updates from trace-cmd for traceevent plugin_kvm plus args cleanup (Steven Rostedt, Jan Kiszka) - Support S/390 in 'perf kvm stat' (Alexander Yarygin) Tooling infrastructure changes: - Allow reserving a row for header purposes in the hists browser (Arnaldo Carvalho de Melo) - Various fixes and prep work related to supporting Intel PT (Adrian Hunter) - Introduce multiple debug variables control (Jiri Olsa) - Add callchain and additional sample information for python scripts (Joseph Schuchart) - More prep work to support Intel PT: (Adrian Hunter) - Polishing 'script' BTS output - 'inject' can specify --kallsym - VDSO is per machine, not a global var - Expose data addr lookup functions previously private to 'script' - Large mmap fixes in events processing - Include standard stringify macros in power pc code (Sukadev Bhattiprolu) Tooling cleanups: - Convert open coded equivalents to asprintf() (Andy Shevchenko) - Remove needless reassignments in 'trace' (Arnaldo Carvalho de Melo) - Cache the is_exit syscall test in 'trace) (Arnaldo Carvalho de Melo) - No need to reimplement err() in 'perf bench sched-messaging', drop barf(). (Davidlohr Bueso). - Remove ev_name argument from perf_evsel__hists_browse, can be obtained from the other parameters. (Jiri Olsa) Tooling fixes: - Fix memory leak in the 'sched-messaging' perf bench test. (Davidlohr Bueso) - The -o and -n 'perf bench mem' options are mutually exclusive, emit error when both are specified. (Davidlohr Bueso) - Fix scrollbar refresh row index in the ui browser, problem exposed now that headers will be added and will be allowed to be switched on/off. (Jiri Olsa) - Handle the num array type in python properly (Sebastian Andrzej Siewior) - Fix wrong condition for allocation failure (Jiri Olsa) - Adjust callchain based on DWARF debug info on powerpc (Sukadev Bhattiprolu) - Fix a risk for doing free on uninitialized pointer in traceevent lib (Rickard Strandqvist) - Update attr test with PERF_FLAG_FD_CLOEXEC flag (Jiri Olsa) - Enable close-on-exec flag on perf file descriptor (Yann Droneaud) - Fix build on gcc 4.4.7 (Arnaldo Carvalho de Melo) - Event ordering fixes (Jiri Olsa)" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (123 commits) Revert "perf tools: Fix jump label always changing during tracing" perf tools: Fix perf usage string leftover perf: Check permission only for parent tracepoint event perf record: Store PERF_RECORD_FINISHED_ROUND only for nonempty rounds perf record: Always force PERF_RECORD_FINISHED_ROUND event perf inject: Add --kallsyms parameter perf tools: Expose 'addr' functions so they can be reused perf session: Fix accounting of ordered samples queue perf powerpc: Include util/util.h and remove stringify macros perf tools: Fix build on gcc 4.4.7 perf tools: Add thread parameter to vdso__dso_findnew() perf tools: Add dso__type() perf tools: Separate the VDSO map name from the VDSO dso name perf tools: Add vdso__new() perf machine: Fix the lifetime of the VDSO temporary file perf tools: Group VDSO global variables into a structure perf session: Add ability to skip 4GiB or more perf session: Add ability to 'skip' a non-piped event stream perf tools: Pass machine to vdso__dso_findnew() perf tools: Add dso__data_size() ...	2014-08-04 16:09:53 -07:00
Linus Torvalds	8efb90cf1e	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle are: - big rtmutex and futex cleanup and robustification from Thomas Gleixner - mutex optimizations and refinements from Jason Low - arch_mutex_cpu_relax() removal and related cleanups - smaller lockdep tweaks" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) arch, locking: Ciao arch_mutex_cpu_relax() locking/lockdep: Only ask for /proc/lock_stat output when available locking/mutexes: Optimize mutex trylock slowpath locking/mutexes: Try to acquire mutex only if it is unlocked locking/mutexes: Delete the MUTEX_SHOW_NO_WAITER macro locking/mutexes: Correct documentation on mutex optimistic spinning rtmutex: Make the rtmutex tester depend on BROKEN futex: Simplify futex_lock_pi_atomic() and make it more robust futex: Split out the first waiter attachment from lookup_pi_state() futex: Split out the waiter check from lookup_pi_state() futex: Use futex_top_waiter() in lookup_pi_state() futex: Make unlock_pi more robust rtmutex: Avoid pointless requeueing in the deadlock detection chain walk rtmutex: Cleanup deadlock detector debug logic rtmutex: Confine deadlock logic to futex rtmutex: Simplify remove_waiter() rtmutex: Document pi chain walk rtmutex: Clarify the boost/deboost part rtmutex: No need to keep task ref for lock owner check rtmutex: Simplify and document try_to_take_rtmutex() ...	2014-08-04 16:09:06 -07:00
Linus Torvalds	8533ce7271	These are the x86, MIPS and s390 changes; PPC and ARM will come in a few days. MIPS and s390 have little going on this release; just bugfixes, some small, some larger. The highlights for x86 are nested VMX improvements (Jan Kiszka), optimizations for old processor (up to Nehalem, by me and Bandan Das), and a lot of x86 emulator bugfixes (Nadav Amit). Stephen Rothwell reported a trivial conflict with the tracing branch. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABAgAGBQJT300XAAoJEBvWZb6bTYby3V8QAJz+XyajnhJ8wH55Vxczz22L i2gtUGmBLhEXsBcaVKO4BBfek88lLzg0SGLjfW5wCMQmKtxVlrwTCXNkBoPGjapd NwHtWkMKym44PDhRovn7zkSumkxC43uFIBR/ebrhP6Bvhh9s+MnkQUxfw9ILB+YV EeKyEG8sSgxFCciuHbp3mIXpDcO6r/ldy6I7009OdyhLoMY+Kvmk7kRe9wtAivdg CGJi60QvGOn2RGRPOCEtF6UWr8Ae8fe1t84o0hkXPv/j3jtabzAatXKJa4dYNbIs 7Mp4NQpxaGV6rq3WCYVeZRxGs+UReGDAS3Il4Z8C9eTOTooSfxdVr8acpM8PY6I8 UmLT6ECLGycc4ELXrETtR+QLmiXACyJqyVxz4aiLV3kWSWfamKD3hBeQK9NizNcE VoPDl+PyISvR1tW4KstBuzfUWAEXi+gO78cqqFr/VW6cl7HKpA1DFQaPfGkYKDae 2CPwcLwI5/M6RtSgkyXTkEqNZLc2BjldqSeM1lmWjhZVW56X2iqePUL46Vab3Yvt U+sELtwEE560NLN3hbaHUsLR1tcUix5w8vTzcXPxgoHQBszHCcAZTWd1XHulr64F rp/cangqtkPKcu5j1mNhQs38oLjHI1MUsbQrqFoD4tmHjQ75iXHRFzYGoIVKXyHG AnGbQzJzBcdAANhm3LW0 =UXxV -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM changes from Paolo Bonzini: "These are the x86, MIPS and s390 changes; PPC and ARM will come in a few days. MIPS and s390 have little going on this release; just bugfixes, some small, some larger. The highlights for x86 are nested VMX improvements (Jan Kiszka), optimizations for old processor (up to Nehalem, by me and Bandan Das), and a lot of x86 emulator bugfixes (Nadav Amit). Stephen Rothwell reported a trivial conflict with the tracing branch" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (104 commits) x86/kvm: Resolve shadow warnings in macro expansion KVM: s390: rework broken SIGP STOP interrupt handling KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table KVM: vmx: remove duplicate vmx_mpx_supported() prototype KVM: s390: Fix memory leak on busy SIGP stop x86/kvm: Resolve shadow warning from min macro kvm: Resolve missing-field-initializers warnings Replace NR_VMX_MSR with its definition KVM: x86: Assertions to check no overrun in MSR lists KVM: x86: set rflags.rf during fault injection KVM: x86: Setting rflags.rf during rep-string emulation KVM: x86: DR6/7.RTM cannot be written KVM: nVMX: clean up nested_release_vmcs12 and code around it KVM: nVMX: fix lifetime issues for vmcs02 KVM: x86: Defining missing x86 vectors KVM: x86: emulator injects #DB when RFLAGS.RF is set KVM: x86: Cleanup of rflags.rf cleaning KVM: x86: Clear rflags.rf on emulated instructions KVM: x86: popf emulation should not change RF KVM: x86: Clearing rflags.rf upon skipped emulated instruction ...	2014-08-04 12:16:46 -07:00
Linus Torvalds	b8c0aa46b3	This pull request has a lot of work done. The main thing is the changes to the ftrace function callback infrastructure. It's introducing a way to allow different functions to call directly different trampolines instead of all calling the same "mcount" one. The only user of this for now is the function graph tracer, which always had a different trampoline, but the function tracer trampoline was called and did basically nothing, and then the function graph tracer trampoline was called. The difference now, is that the function graph tracer trampoline can be called directly if a function is only being traced by the function graph trampoline. If function tracing is also happening on the same function, the old way is still done. The accounting for this takes up more memory when function graph tracing is activated, as it needs to keep track of which functions it uses. I have a new way that wont take as much memory, but it's not ready yet for this merge window, and will have to wait for the next one. Another big change was the removal of the ftrace_start/stop() calls that were used by the suspend/resume code that stopped function tracing when entering into suspend and resume paths. The stop of ftrace was done because there was some function that would crash the system if one called smp_processor_id()! The stop/start was a big hammer to solve the issue at the time, which was when ftrace was first introduced into Linux. Now ftrace has better infrastructure to debug such issues, and I found the problem function and labeled it with "notrace" and function tracing can now safely be activated all the way down into the guts of suspend and resume. Other changes include clean ups of uprobe code. Clean up of the trace_seq() code. And other various small fixes and clean ups to ftrace and tracing. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJT35zXAAoJEKQekfcNnQGuOz0H/38zqM0nLFhrgvz3EPk2UOjn xqpX8qyb2V7TJZL+IqeXU2a5cQZl5ba0D4WtBGpxbTae3CJYiuQ87iKUNFoH0om5 FDpn80igb368k8V3qRdRsziKVCCf0XBd/NkHJXc0ZkfXGyzB2Ga4bBxALxp2gj9y bnO+vKo6+tWYKG4hyQb4P3LRXUrK8/LWEsPr39cH2QH1Rdj69Lx9CgrCdUVJmwcb Bj8hEiLXL/RYCFNn79A3wNTUvW0rG/AOIf4SLqXtasSRZ0ToaU0ZyDnrNv+0Ol47 rX8tSk+LfXchL9hpIvjCf1vlAYq3pO02favteR/jip3lx/dTjEDE4RJ9qtJzZ4Q= =fwQY -----END PGP SIGNATURE----- Merge tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This pull request has a lot of work done. The main thing is the changes to the ftrace function callback infrastructure. It's introducing a way to allow different functions to call directly different trampolines instead of all calling the same "mcount" one. The only user of this for now is the function graph tracer, which always had a different trampoline, but the function tracer trampoline was called and did basically nothing, and then the function graph tracer trampoline was called. The difference now, is that the function graph tracer trampoline can be called directly if a function is only being traced by the function graph trampoline. If function tracing is also happening on the same function, the old way is still done. The accounting for this takes up more memory when function graph tracing is activated, as it needs to keep track of which functions it uses. I have a new way that wont take as much memory, but it's not ready yet for this merge window, and will have to wait for the next one. Another big change was the removal of the ftrace_start/stop() calls that were used by the suspend/resume code that stopped function tracing when entering into suspend and resume paths. The stop of ftrace was done because there was some function that would crash the system if one called smp_processor_id()! The stop/start was a big hammer to solve the issue at the time, which was when ftrace was first introduced into Linux. Now ftrace has better infrastructure to debug such issues, and I found the problem function and labeled it with "notrace" and function tracing can now safely be activated all the way down into the guts of suspend and resume Other changes include clean ups of uprobe code, clean up of the trace_seq() code, and other various small fixes and clean ups to ftrace and tracing" * tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits) ftrace: Add warning if tramp hash does not match nr_trampolines ftrace: Fix trampoline hash update check on rec->flags ring-buffer: Use rb_page_size() instead of open coded head_page size ftrace: Rename ftrace_ops field from trampolines to nr_trampolines tracing: Convert local function_graph functions to static ftrace: Do not copy old hash when resetting tracing: let user specify tracing_thresh after selecting function_graph ring-buffer: Always run per-cpu ring buffer resize with schedule_work_on() tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST s390/ftrace: remove check of obsolete variable function_trace_stop arm64, ftrace: Remove check of obsolete variable function_trace_stop Blackfin: ftrace: Remove check of obsolete variable function_trace_stop metag: ftrace: Remove check of obsolete variable function_trace_stop microblaze: ftrace: Remove check of obsolete variable function_trace_stop MIPS: ftrace: Remove check of obsolete variable function_trace_stop parisc: ftrace: Remove check of obsolete variable function_trace_stop sh: ftrace: Remove check of obsolete variable function_trace_stop sparc64,ftrace: Remove check of obsolete variable function_trace_stop tile: ftrace: Remove check of obsolete variable function_trace_stop ftrace: x86: Remove check of obsolete variable function_trace_stop ...	2014-08-04 11:50:00 -07:00
Linus Torvalds	f2a84170ed	Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu updates from Tejun Heo: - Major reorganization of percpu header files which I think makes things a lot more readable and logical than before. - percpu-refcount is updated so that it requires explicit destruction and can be reinitialized if necessary. This was pulled into the block tree to replace the custom percpu refcnting implemented in blk-mq. - In the process, percpu and percpu-refcount got cleaned up a bit * 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (21 commits) percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero() percpu-refcount: require percpu_ref to be exited explicitly percpu-refcount: use unsigned long for pcpu_count pointer percpu-refcount: add helpers for ->percpu_count accesses percpu-refcount: one bit is enough for REF_STATUS percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc() workqueue: stronger test in process_one_work() workqueue: clear POOL_DISASSOCIATED in rebind_workers() percpu: Use ALIGN macro instead of hand coding alignment calculation percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations percpu: preffity percpu header files percpu: use raw_cpu_() to define __this_cpu_() percpu: reorder macros in percpu header files percpu: move {raw\|this}_cpu_() definitions to include/linux/percpu-defs.h percpu: move generic {raw\|this}_cpu__N() definitions to include/asm-generic/percpu.h percpu: only allow sized arch overrides for {raw\|this}_cpu_*() ops percpu: reorganize include/linux/percpu-defs.h percpu: move accessors from include/linux/percpu.h to percpu-defs.h percpu: include/asm-generic/percpu.h should contain only arch-overridable parts percpu: introduce arch_raw_cpu_ptr() ...	2014-08-04 10:09:27 -07:00
Linus Torvalds	f74ad8df4e	PCI changes for the v3.17 merge window: Resource management - Support BAR sizes up to 128GB (Yinghai Lu) - Keep original resource if we fail to expand it (Guo Chao) - Return conventional error values from pci_revert_fw_address() (Bjorn Helgaas) - Tidy resource assignment messages (Bjorn Helgaas) - Don't exclude low BIOS area for non-PCI cards (Christoph Schulz) PCI device hotplug - Prevent NULL dereference during pciehp probe (Andreas Noever) - Make pciehp pcie_wait_cmd() self-contained (Bjorn Helgaas) - Wait for pciehp hotplug command completion lazily (Bjorn Helgaas) - Compute pciehp timeout from hotplug command start time (Bjorn Helgaas) - Remove pciehp assumptions about which commands cause completion events (Bjorn Helgaas) - Clear pciehp Data Link Layer State Changed during init (Myron Stowe) - Remove pciehp struct controller.no_cmd_complete (Rajat Jain) - Remove cpqphp unnecessary null test (Fabian Frederick) - Remove "invalid IRQ" warning for hot-added PCIe ports (Jiang Liu) IOMMU - Add DMA alias quirk for Intel 82801 bridge (Alex Williamson) MSI - Add internal msix_clear_and_set_ctrl() (Yijing Wang) - Remove unused msi_enabled_mask() (Yijing Wang) - Cache Multiple Message Capable in struct msi_desc (Yijing Wang) - Add msi_setup_entry() to clean up initialization (Yijing Wang) - Remove unused msi_remove_pci_irq_vectors() (Yijing Wang) - Retrieve first MSI IRQ from msi_desc rather than pci_dev (Yijing Wang) - Remove unused list access in __pci_restore_msix_state() (Yijing Wang) - Use irq_get_msi_desc() to simplify code (Yijing Wang) Generic host bridge driver - Fix GPL v2 license string typo (Bjorn Helgaas) Marvell MVEBU - Fix GPL v2 license string typo (Thierry Reding) NVIDIA Tegra - Use correct initial HW settings (Phil Edworthy) - Remove rcar_pcie_setup_window() resource argument (Phil Edworthy) - Fix GPL v2 license string typo (Thierry Reding) Renesas R-Car - Remove redundant config accessor register checks (Sergei Shtylyov) - Fix GPL v2 license string typo (Bjorn Helgaas) Virtualization - Factor secondary bus reset logic (Gavin Shan) - Remove duplicate powerpc reset logic (Gavin Shan) Miscellaneous - Rework default VGA detection for EFI (Bruno Prémont) - Fix sysfs "acpi_index" and "label" errors for NIC renaming (Simone Gotti) - Configure ASPM at pci_enable_device()-time (Vidya Sagar) - Add include/linux/pci_ids.h include guard (Rasmus Villemoes) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJTzppBAAoJEFmIoMA60/r8mtQP/jgVWCSU+0ulHjoxVSRLu4Lc UGKQFjS03oWWflHdvW6wZFqN82Ynva9fYCLMtiKdPg7cgTosSRT3I4DjAIm80ZI/ kZvHxSmi6DBYmchZBsWzj60zxNiYZeEgd7CevzcJRHuwbKNMr2y12s6hjJbyl5lF ygaXWpDveKjsEDjyk9vKjUGwul/NJKynar253Yh178XaoypdGuiEIw3D1lQFMZZp ADcRijIi+CD2BENtDr6fbldbj+yQ93yyUSloEnaKtWZD+Ao5IsHngN0IyRu+l1Wl LFob0AsopeYVFKdw22Gn1KAq9Jj01acsSBRXjgrauU+tLY512Vkbp1lFYl85B/38 /Z0VNHncmIh29rq9Tl2xQwEeI3Ja27FfnMjC70dLM5YjWf8vsYnDEQZHyxAAe15D p3H3YuuDjmvHkoSrHY/68DLfDl9ubw3/BFUlCMqijL7444ZWLXathrnCV8ZJimmr PlF/m7GtXYF4wIw19m9KQqNBUPJJEsVHExKzICOY4v5/nMlvx4ZkBDR3tPNEH1sk 3AYKjLDw21Nle7yKcAlxDI/TYWZqxuph23UpevzlQd16tutq2i2FqpauiqI3DFm4 VfYVbOVQwfeUJt11VOCgxvE7RsTxCk5QefB+YKVAdVK6vMZHeZxsetYvrCDptnea cId/NfiEFnmr+u3mAyPM =U5Ip -----END PGP SIGNATURE----- Merge tag 'pci-v3.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "I'll be on vacation until Aug 11, and I suspect the merge window will open before then, so I'm sending this to you early. There are more things I'd like to get into v3.17, so I hope to send another pull request soon after I return. The most notable pieces here are: - Support BARs up to 128GB (up from 8GB) - Fix SR-IOV resource assignment when we fail to expand a resource - Rework pciehp to handle a common hardware erratum - Cleanup MSI - Fix NIC renaming issue - Fix VGA default device issue on EFI systems - Fix ASPM configuration (previously we didn't enable it as expected) Alex Williamson has graciously agreed to take care of any major issues with this if you take it before I return. Details: Resource management - Support BAR sizes up to 128GB (Yinghai Lu) - Keep original resource if we fail to expand it (Guo Chao) - Return conventional error values from pci_revert_fw_address() (Bjorn Helgaas) - Tidy resource assignment messages (Bjorn Helgaas) - Don't exclude low BIOS area for non-PCI cards (Christoph Schulz) PCI device hotplug - Prevent NULL dereference during pciehp probe (Andreas Noever) - Make pciehp pcie_wait_cmd() self-contained (Bjorn Helgaas) - Wait for pciehp hotplug command completion lazily (Bjorn Helgaas) - Compute pciehp timeout from hotplug command start time (Bjorn Helgaas) - Remove pciehp assumptions about which commands cause completion events (Bjorn Helgaas) - Clear pciehp Data Link Layer State Changed during init (Myron Stowe) - Remove pciehp struct controller.no_cmd_complete (Rajat Jain) - Remove cpqphp unnecessary null test (Fabian Frederick) - Remove "invalid IRQ" warning for hot-added PCIe ports (Jiang Liu) IOMMU - Add DMA alias quirk for Intel 82801 bridge (Alex Williamson) MSI - Add internal msix_clear_and_set_ctrl() (Yijing Wang) - Remove unused msi_enabled_mask() (Yijing Wang) - Cache Multiple Message Capable in struct msi_desc (Yijing Wang) - Add msi_setup_entry() to clean up initialization (Yijing Wang) - Remove unused msi_remove_pci_irq_vectors() (Yijing Wang) - Retrieve first MSI IRQ from msi_desc rather than pci_dev (Yijing Wang) - Remove unused list access in __pci_restore_msix_state() (Yijing Wang) - Use irq_get_msi_desc() to simplify code (Yijing Wang) Generic host bridge driver - Fix GPL v2 license string typo (Bjorn Helgaas) Marvell MVEBU - Fix GPL v2 license string typo (Thierry Reding) NVIDIA Tegra - Use correct initial HW settings (Phil Edworthy) - Remove rcar_pcie_setup_window() resource argument (Phil Edworthy) - Fix GPL v2 license string typo (Thierry Reding) Renesas R-Car - Remove redundant config accessor register checks (Sergei Shtylyov) - Fix GPL v2 license string typo (Bjorn Helgaas) Virtualization - Factor secondary bus reset logic (Gavin Shan) - Remove duplicate powerpc reset logic (Gavin Shan) Miscellaneous - Rework default VGA detection for EFI (Bruno Prémont) - Fix sysfs "acpi_index" and "label" errors for NIC renaming (Simone Gotti) - Configure ASPM at pci_enable_device()-time (Vidya Sagar) - Add include/linux/pci_ids.h include guard (Rasmus Villemoes)" * tag 'pci-v3.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (38 commits) PCI/MSI: Use irq_get_msi_desc() to simplify code PCI/MSI: Remove unused list access in __pci_restore_msix_state() PCI/MSI: Retrieve first MSI IRQ from msi_desc rather than pci_dev PCI/MSI: Remove unused function msi_remove_pci_irq_vectors() PCI/MSI: Add msi_setup_entry() to clean up MSI initialization PCI: Configure ASPM when enabling device x86: don't exclude low BIOS area when allocating address space for non-PCI cards PCI: generic: Fix GPL v2 license string typo PCI: rcar: Fix GPL v2 license string typo PCI: tegra: Fix GPL v2 license string typo PCI: mvebu: Fix GPL v2 license string typo PCI: Add include guard to include/linux/pci_ids.h x86, ia64: Move EFI_FB vga_default_device() initialization to pci_vga_fixup() PCI: Tidy resource assignment messages PCI: Return conventional error values from pci_revert_fw_address() PCI: Cleanup control flow PCI: Support BAR sizes up to 128GB PCI: cpqphp: Remove unnecessary null test before debugfs_remove() PCI: pciehp: Clear Data Link Layer State Changed during init PCI: Add bridge DMA alias quirk for Intel 82801 bridge ...	2014-08-04 09:29:37 -07:00
Mark Brown	a6ce305207	Merge remote-tracking branches 'asoc/topic/intel', 'asoc/topic/kirkwood', 'asoc/topic/max98090' and 'asoc/topic/mc13783' into asoc-next	2014-08-04 16:31:45 +01:00
Dave Hansen	d17d8f9ded	x86/mm: Add tracepoints for TLB flushes We don't have any good way to figure out what kinds of flushes are being attempted. Right now, we can try to use the vm counters, but those only tell us what we actually did with the hardware (one-by-one vs full) and don't tell us what was actually _requested_. This allows us to select out "interesting" TLB flushes that we might want to optimize (like the ranged ones) and ignore the ones that we have very little control over (the ones at context switch). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140731154059.4C96CBA5@viggo.jf.intel.com Acked-by: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-07-31 08:48:51 -07:00
Dave Hansen	e9f4e0a9fe	x86/mm: Rip out complicated, out-of-date, buggy TLB flushing I think the flush_tlb_mm_range() code that tries to tune the flush sizes based on the CPU needs to get ripped out for several reasons: 1. It is obviously buggy. It uses mm->total_vm to judge the task's footprint in the TLB. It should certainly be using some measure of RSS, NOT ->total_vm since only resident memory can populate the TLB. 2. Haswell, and several other CPUs are missing from the intel_tlb_flushall_shift_set() function. Thus, it has been demonstrated to bitrot quickly in practice. 3. It is plain wrong in my vm: [ 0.037444] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 [ 0.037444] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0 [ 0.037444] tlb_flushall_shift: 6 Which leads to it to never use invlpg. 4. The assumptions about TLB refill costs are wrong: http://lkml.kernel.org/r/1337782555-8088-3-git-send-email-alex.shi@intel.com (more on this in later patches) 5. I can not reproduce the original data: https://lkml.org/lkml/2012/5/17/59 I believe the sample times were too short. Running the benchmark in a loop yields times that vary quite a bit. Note that this leaves us with a static ceiling of 1 page. This is a conservative, dumb setting, and will be revised in a later patch. This also removes the code which attempts to predict whether we are flushing data or instructions. We expect instruction flushes to be relatively rare and not worth tuning for explicitly. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140731154055.ABC88E89@viggo.jf.intel.com Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-07-31 08:48:50 -07:00
David Rientjes	2f078b9cb8	x86, apic: Remove enable_apic_mode callback The enable_apic_mode() apic callback is never called, so remove it. Signed-off-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302352320.17503@chino.kir.corp.google.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-07-31 08:05:44 -07:00
David Rientjes	11a8318ef5	x86, apic: Remove setup_portio_remap callback Since commit `b5660ba76b` ("x86, platforms: Remove NUMAQ") removed NUMAQ, the setup_portio_remap() apic callback has been obsolete. Remove it. Signed-off-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302351480.17503@chino.kir.corp.google.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-07-31 08:05:44 -07:00
David Rientjes	e76661ba09	x86, apic: Remove multi_timer_check callback Since commit `b5660ba76b` ("x86, platforms: Remove NUMAQ") removed NUMAQ, the multi_timer_check() apic callback has been obsolete. Remove it. Signed-off-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302351120.17503@chino.kir.corp.google.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-07-31 08:05:43 -07:00
David Rientjes	658ffd7e6f	x86, apic: Remove check_apicid_present callback The check_apicid_present() apic callback is never called, so remove it and functions that implement it. Signed-off-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302350160.17503@chino.kir.corp.google.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-07-31 08:05:42 -07:00
David Rientjes	c460b5d340	x86, apic: Remove mps_oem_check callback Since commit `b5660ba76b` ("x86, platforms: Remove NUMAQ") removed NUMAQ, the mps_oem_check() apic callback has been obsolete. Remove it. This allows generic_mps_oem_check() to be removed as well. Signed-off-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302349390.17503@chino.kir.corp.google.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-07-31 08:05:42 -07:00
David Rientjes	300eddf967	x86, apic: Remove smp_callin_clear_local_apic callback Since commit `b5660ba76b` ("x86, platforms: Remove NUMAQ") removed NUMAQ, the smp_callin_clear_local_apic() apic callback has been obsolete. Remove it. Signed-off-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302349040.17503@chino.kir.corp.google.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-07-31 08:05:41 -07:00
David Rientjes	6ab1b27c84	x86, apic: Replace trampoline physical addresses with defaults The trampoline_phys_{high,low} members of struct apic are always initialized to DEFAULT_TRAMPOLINE_PHYS_HIGH and TRAMPOLINE_PHYS_LOW, respectively. Hardwire the constants and remove the unneeded members. Signed-off-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302348330.17503@chino.kir.corp.google.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-07-31 08:05:41 -07:00
David Rientjes	80a2670379	x86, apic: Remove x86_32_numa_cpu_node callback Since commit `b5660ba76b` ("x86, platforms: Remove NUMAQ") removed NUMAQ, the x86_32_numa_cpu_node() apic callback has been obsolete. Remove it. Signed-off-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302348060.17503@chino.kir.corp.google.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-07-31 08:05:40 -07:00
Andy Lutomirski	7209a75d20	x86_64/entry/xen: Do not invoke espfix64 on Xen This moves the espfix64 logic into native_iret. To make this work, it gets rid of the native patch for INTERRUPT_RETURN: INTERRUPT_RETURN on native kernels is now 'jmp native_iret'. This changes the 16-bit SS behavior on Xen from OOPSing to leaking some bits of the Xen hypervisor's RSP (I think). [ hpa: this is a nonzero cost on native, but probably not enough to measure. Xen needs to fix this in their own code, probably doing something equivalent to espfix64. ] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org>	2014-07-28 15:25:40 -07:00
Henrique de Moraes Holschuh	ebc14ddcc9	x86, microcode, intel: Fix total_size computation According to the Intel SDM vol 3A (order code 253668-051US, June 2014), on section 9.11.1, page 9-28: "For microcode updates with a data size field equal to 00000000H, the size of the microcode update is 2048 bytes. The first 48 bytes contain the microcode update header. The remaining 2000 bytes contain encrypted data." "For microcode updates with a data size not equal to 00000000H, the total size field specifies the size of the microcode update." Up to 2002/2003, Intel used an "old format" for the microcode update containers that was always 2048 bytes in size. That old format did not have Data Size and Total Size fields, the quadwords at those positions in the microcode container header were "reserved". The microcode header of the "old format" microcode container has a hrdver of 0x01. You can hunt down an old copy of the Intel SDM to validate this through its order number (#243192). I found one from 1999 through a Google search. Sometime in 2002/2003 (AFAICT, for the Prescott processors), Intel documented a new format for the microcode containers and contributed in 2003 some code to the Linux kernel microcode driver implementing support for the new format. This new format has Data Size and Total Size fields, as well as the optional extended signature table. However, it reuses the same hrdver as the old format (0x01), and it can only be told apart from the old format by a non-zero Data Size field. In fact, the only reason we can even trust a Data Size of zero to mean that the microcode container is in the old format, is because Intel reatroatively promised that the old format would always have a zero there when they wrote the documentation for the _new_ format. This is a very old bug, dating back to 2003. It has been dormant ever since, as Intel seems to set all reserved fields to zero on the microcode updates they distribute: I could not find a public microcode update that would trigger this bug. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Link: http://lkml.kernel.org/r/1406146251-8540-1-git-send-email-hmh@hmh.eng.br Signed-off-by: Borislav Petkov <bp@suse.de>	2014-07-28 16:08:02 +02:00
Ingo Molnar	5030c69755	Linux 3.16-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJT1VYNAAoJEHm+PkMAQRiGQJwIAKSYp1Uqz5O/e5r0V1TlZKT4 1B4Njopl57PwSrJQWcGEuH2yHyM896vfPO4L6BJIOfyWzh8kwpQqclDt6uhXoF/v OsO1zb/7/j+n/pDZsePqP9AyIgErsHEBgUbhecDqzjN++ITPcZjQ6TIMPglZaumN jFAdAZuAaEwqAk8jqN2wlm689Fh9MuUEarHXbXLCqu5RgLrWhFGhp/cTWY62aqnZ XfEeQ9KtpRZmlR/IYjerbb1eRH7ZdJsZ88WngLX9dj/JdNxHWBkWQBXGAusXk5Fk y6LsIV3TjyBdrRKJ1Ifyg/2EIXHNBs8HxTFGXpjtp2HPuMLDxZOWOWikb9URtNg= =Fjf4 -----END PGP SIGNATURE----- Merge tag 'v3.16-rc7' into perf/core, to merge in the latest fixes before applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-07-28 10:00:33 +02:00
Rafael J. Wysocki	8989e1cc35	Merge branch 'acpi-config' * acpi-config: ACPI / processor: Introduce ARCH_MIGHT_HAVE_ACPI_PDC ACPI: Don't use acpi_lapic in ACPI core code ACPI: add config for BIOS table scan	2014-07-27 23:52:48 +02:00
Li, Aubrey	f855911c1f	x86/pmc_atom: Expose PMC device state and platform sleep state Add the following interfaces to exposes PMC device state and sleep state residency via debugfs: /sys/kernel/debugfs/pmc_atom/dev_state /sys/kernel/debugfs/pmc_atom/sleep_state Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com> Link: http://lkml.kernel.org/r/53B0FF59.8000600@linux.intel.com Signed-off-by: Kasagar, Srinidhi <srinidhi.kasagar@intel.com> Reviewed-by: Rudramuni, Vishwesh M <vishwesh.m.rudramuni@intel.com> Reviewed-by: Joe Perches <joe@perches.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-07-25 14:12:14 -07:00
Li, Aubrey	b00055cade	x86/pmc_atom: Eisable a few S0ix wake up events for S0ix residency Disable PMC S0IX_WAKE_EN events coming from LPC block(unused) and also from GPIO_SUS ored dedicated IRQs (must be disabled as per PMC programming rule), GPIOSCORE ored dedicated IRQs (must be disabled as per PMC programming rule), GPIO_SUS shared IRQ (not necessary since the IOAPIC_DS wake event will still work), GPIO_SCORE shared IRQ (not necessary since the IOAPIC_DS wake event will still work). Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com> Link: http://lkml.kernel.org/r/53B0FF22.5080403@linux.intel.com Signed-off-by: Olivier Leveque <olivier.leveque@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-07-25 14:11:58 -07:00
Li, Aubrey	93e5eadd1f	x86/platform: New Intel Atom SOC power management controller driver The Power Management Controller (PMC) controls many of the power management features present in the Atom SoC. This driver provides a native power off function via PMC PCI IO port. On some ACPI hardware-reduced platforms(e.g. ASUS-T100), ACPI sleep registers are not valid so that (*pm_power_off)() is not hooked by acpi_power_off(). The power off function in this driver is installed only when pm_power_off is NULL. Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com> Link: http://lkml.kernel.org/r/53B0FEEA.3010805@linux.intel.com Signed-off-by: Lejun Zhu <lejun.zhu@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-07-25 14:11:29 -07:00
Lv Zheng	d334c823b2	ACPICA: Linux: Add support to exclude <asm/acenv.h> inclusion. The forthcoming patch will make <acpi/acpi.h> to be visible to all kernel source code. Thus for the architectures that do not support ACPI and haven't implemented <asm/acenv.h>, we need to make it excluded. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-07-23 01:10:44 +02:00
Nadav Amit	6f43ed01e8	KVM: x86: DR6/7.RTM cannot be written Haswell and newer Intel CPUs have support for RTM, and in that case DR6.RTM is not fixed to 1 and DR7.RTM is not fixed to zero. That is not the case in the current KVM implementation. This bug is apparent only if the MOV-DR instruction is emulated or the host also debugs the guest. This patch is a partial fix which enables DR6.RTM and DR7.RTM to be cleared and set respectively. It also sets DR6.RTM upon every debug exception. Obviously, it is not a complete fix, as debugging of RTM is still unsupported. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-21 17:17:52 +02:00
Nadav Amit	c9cdd085bb	KVM: x86: Defining missing x86 vectors Defining XE, XM and VE vector numbers. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-21 14:18:51 +02:00
Graeme Gregory	b50154d53e	ACPI: Don't use acpi_lapic in ACPI core code Now ARM64 support is being added to ACPI so architecture specific values can not be used in core ACPI code. Following on the patch "ACPI / processor: Check if LAPIC is present during initialization" which uses acpi_lapic in acpi_processor.c, on ARM64 platform, GIC is used instead of local APIC, so acpi_lapic is not a suitable value for ARM64. What is actually important at this point is if there is/are CPU entry/entries (Local APIC/SAPIC, GICC) in MADT, so introduce acpi_has_cpu_in_madt() to be arch specific and generic. Signed-off-by: Graeme Gregory <graeme.gregory@linaro.org> Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-07-21 13:50:58 +02:00
Matt Fleming	44be28e9dd	x86/reboot: Add EFI reboot quirk for ACPI Hardware Reduced flag It appears that the BayTrail-T class of hardware requires EFI in order to powerdown and reboot and no other reliable method exists. This quirk is generally applicable to all hardware that has the ACPI Hardware Reduced bit set, since usually ACPI would be the preferred method. Cc: Len Brown <len.brown@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-07-18 21:23:52 +01:00
Steven Rostedt (Red Hat)	1026ff9b8e	ftrace/x86: Have function graph tracer use its own trampoline The function graph trampoline is called from the function trampoline and both do a save and restore of registers. The save of registers done by the function trampoline when only the function graph tracer is running is a waste of CPU cycles. As the function graph tracer trampoline in x86 is dependent from the function trampoline, we can call it directly when a function is only being traced by the function graph trampoline. Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-17 09:44:37 -04:00
Davidlohr Bueso	3a6bfbc91d	arch, locking: Ciao arch_mutex_cpu_relax() The arch_mutex_cpu_relax() function, introduced by `34b133f`, is hacky and ugly. It was added a few years ago to address the fact that common cpu_relax() calls include yielding on s390, and thus impact the optimistic spinning functionality of mutexes. Nowadays we use this function well beyond mutexes: rwsem, qrwlock, mcs and lockref. Since the macro that defines the call is in the mutex header, any users must include mutex.h and the naming is misleading as well. This patch (i) renames the call to cpu_relax_lowlatency ("relax, but only if you can do it with very low latency") and (ii) defines it in each arch's asm/processor.h local header, just like for regular cpu_relax functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax, and thus we can take it out of mutex.h. While this can seem redundant, I believe it is a good choice as it allows us to move out arch specific logic from generic locking primitives and enables future(?) archs to transparently define it, similarly to System Z. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Anton Blanchard <anton@samba.org> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bharat Bhushan <r65777@freescale.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Liqin <liqin.linux@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Howells <dhowells@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Cc: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Joe Perches <joe@perches.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Joseph Myers <joseph@codesourcery.com> Cc: Kees Cook <keescook@chromium.org> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Mikael Starvik <starvik@axis.com> Cc: Nicolas Pitre <nico@linaro.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Qais Yousef <qais.yousef@imgtec.com> Cc: Qiaowei Ren <qiaowei.ren@intel.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Steven Miao <realmz6@gmail.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Stratos Karafotis <stratosk@semaphore.gr> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vasily Kulikov <segoon@openwall.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com> Cc: Waiman Long <Waiman.Long@hp.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Wolfram Sang <wsa@the-dreams.de> Cc: adi-buildroot-devel@lists.sourceforge.net Cc: linux390@de.ibm.com Cc: linux-alpha@vger.kernel.org Cc: linux-am33-list@redhat.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-cris-kernel@axis.com Cc: linux-hexagon@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux@lists.openrisc.net Cc: linux-m32r-ja@ml.linux-m32r.org Cc: linux-m32r@ml.linux-m32r.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-metag@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org Cc: sparclinux@vger.kernel.org Link: http://lkml.kernel.org/r/1404079773.2619.4.camel@buesod1.americas.hpqcorp.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-07-17 12:32:47 +02:00
Ingo Molnar	b5e4111f02	Merge branch 'locking/urgent' into locking/core, before applying larger changes and to refresh the branch with fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-07-17 11:45:29 +02:00
Alexander Yarygin	44b3802122	perf kvm: Use defines of kvm events Currently perf-kvm uses string literals for kvm event names, but it works only for x86, because other architectures may have other names for those events. To reduce dependence on architecture, we add <asm/kvm_perf.h> file with defines for: - kvm_entry and kvm_exit events, - exit reason field name in kvm_exit event, - length of exit reasons strings, - vcpu_id field name in kvm trace events, and replace literals in perf-kvm. Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by David Ahern <dsahern@gmail.com> Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1404397747-20939-2-git-send-email-yarygin@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-07-16 17:57:32 -03:00
Borislav Petkov	af0fa6f6b5	x86, cpu: Kill cpu_has_mp It was used only for checking for some K7s which didn't have MP support, see http://www.hardwaresecrets.com/article/How-to-Transform-an-Athlon-XP-into-an-Athlon-MP/24 and it was unconditionally set on 64-bit for no reason. Kill it. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1403609105-8332-4-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-07-14 12:21:40 -07:00
Borislav Petkov	80a208bd39	x86/cpufeature: Add bug flags to /proc/cpuinfo Dump the flags which denote we have detected and/or have applied bug workarounds to the CPU we're executing on, in a similar manner to the feature flags. The advantage is that those are not accumulating over time like the CPU features. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1403609105-8332-2-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-07-14 12:21:39 -07:00
Oren Twaig	411cf9ee29	x86, vsmp: Remove is_vsmp_box() from apic_is_clustered_box() When a vSMP Foundation box is detected, the function apic_cluster_num() counts the number of APIC clusters found. If more than one found, a multi board configuration is assumed, and TSC marked as unstable. This behavior is incorrect as vSMP Foundation may use processors from single node only, attached to memory of other nodes - and such node may have more than one APIC cluster (typically any recent intel box has more than single APIC_CLUSTERID(x)). To fix this, we simply remove the code which detects a vSMP Foundation box and affects apic_is_clusted_box() return value. This can be done because later the kernel checks by itself if the TSC is stable using the check_tsc_sync_[source\|target]() functions and marks TSC as unstable if needed. Acked-by: Shai Fultheim <shai@scalemp.com> Signed-off-by: Oren Twaig <oren@scalemp.com> Link: http://lkml.kernel.org/r/1404036068-11674-1-git-send-email-oren@scalemp.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-07-13 17:48:03 -07:00
Borislav Petkov	b08ee5f7e4	x86: Simplify __HAVE_ARCH_CMPXCHG tests Both the 32-bit and 64-bit cmpxchg.h header define __HAVE_ARCH_CMPXCHG and there's ifdeffery which checks it. But since both bitness define it, we can just as well move it up to the main cmpxchg header and simpify a bit of code in doing that. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20140711104338.GB17083@pd.tnic Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-07-11 17:28:51 -07:00
Andy Lutomirski	e6577a7ce9	x86, vdso: Move the vvar area before the vdso text Putting the vvar area after the vdso text is rather complicated: it only works of the total length of the vdso text mapping is known at vdso link time, and the linker doesn't allow symbol addresses to depend on the sizes of non-allocatable data after the PT_LOAD segment. Moving the vvar area before the vdso text will allow is to safely map non-allocatable data after the vdso text, which is a nice simplification. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/156c78c0d93144ff1055a66493783b9e56813983.1405040914.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-07-11 16:57:51 -07:00
Paolo Bonzini	17052f16a5	KVM: emulate: put pointers in the fetch_cache This simplifies the code a bit, especially the overflow checks. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-11 09:14:03 +02:00
Bandan Das	41061cdb98	KVM: emulate: do not initialize memopp rip_relative is only set if decode_modrm runs, and if you have ModRM you will also have a memopp. We can then access memopp unconditionally. Note that rip_relative cannot be hoisted up to decode_modrm, or you break "mov $0, xyz(%rip)". Also, move typecast on "out of range value" of mem.ea to decode_modrm. Together, all these optimizations save about 50 cycles on each emulated instructions (4-6%). Signed-off-by: Bandan Das <bsd@redhat.com> [Fix immediate operands with rip-relative addressing. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-11 09:14:01 +02:00
Bandan Das	573e80fe04	KVM: emulate: rework seg_override x86_decode_insn already sets a default for seg_override, so remove it from the zeroed area. Also replace set/get functions with direct access to the field. Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-11 09:14:01 +02:00
Bandan Das	c44b4c6ab8	KVM: emulate: clean up initializations in init_decode_cache A lot of initializations are unnecessary as they get set to appropriate values before actually being used. Optimize placement of fields in x86_emulate_ctxt Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-11 09:14:00 +02:00
Bandan Das	1498507a47	KVM: emulate: move init_decode_cache to emulate.c Core emulator functions all belong in emulator.c, x86 should have no knowledge of emulator internals Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-11 09:13:59 +02:00
Paolo Bonzini	54cfdb3e95	KVM: emulate: speed up emulated moves We can just blindly move all 16 bytes of ctxt->src's value to ctxt->dst. write_register_operand will take care of writing only the lower bytes. Avoiding a call to memcpy (the compiler optimizes it out) gains about 200 cycles on kvm-unit-tests for register-to-register moves, and makes them about as fast as arithmetic instructions. We could perhaps get a larger speedup by moving all instructions _except_ moves out of x86_emulate_insn, removing opcode_len, and replacing the switch statement with an inlined em_mov. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-11 09:13:58 +02:00
Paolo Bonzini	37ccdcbe07	KVM: x86: return all bits from get_interrupt_shadow For the next patch we will need to know the full state of the interrupt shadow; we will then set KVM_REQ_EVENT when one bit is cleared. However, right now get_interrupt_shadow only returns the one corresponding to the emulated instruction, or an unconditional 0 if the emulated instruction does not have an interrupt shadow. This is confusing and does not allow us to check for cleared bits as mentioned above. Clean the callback up, and modify toggle_interruptibility to match the comment above the call. As a small result, the call to set_interrupt_shadow will be skipped in the common case where int_shadow == 0 && mask == 0. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-11 09:13:56 +02:00
Bruno Prémont	20cde69402	x86, ia64: Move EFI_FB vga_default_device() initialization to pci_vga_fixup() Commit `b4aa016305` ("efifb: Implement vga_default_device() (v2)") added efifb vga_default_device() so EFI systems that do not load shadow VBIOS or setup VGA get proper value for boot_vga PCI sysfs attribute on the corresponding PCI device. Xorg doesn't detect devices when boot_vga=0, e.g., on some EFI systems such as MacBookAir2,1. Xorg detects the GPU and finds the DRI device but then bails out with "no devices detected". Note: When vga_default_device() is set boot_vga PCI sysfs attribute reflects its state. When unset this attribute is 1 whenever IORESOURCE_ROM_SHADOW flag is set. With introduction of sysfb/simplefb/simpledrm efifb is getting obsolete while having native drivers for the GPU also makes selecting sysfb/efifb optional. Remove the efifb implementation of vga_default_device() and initialize vgaarb's vga_default_device() with the PCI GPU that matches boot screen_info in pci_fixup_video(). [bhelgaas: remove unused "dev" in efifb_setup()] Fixes: `b4aa016305` ("efifb: Implement vga_default_device() (v2)") Tested-by: Anibal Francisco Martinez Cortina <linuxkid.zeuz@gmail.com> Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Matthew Garrett <matthew.garrett@nebula.com> CC: stable@vger.kernel.org # v3.5+	2014-07-10 16:48:48 -06:00
Tomasz Grabiec	0d3da0d26e	KVM: x86: fix TSC matching I've observed kvmclock being marked as unstable on a modern single-socket system with a stable TSC and qemu-1.6.2 or qemu-2.0.0. The culprit was failure in TSC matching because of overflow of kvm_arch::nr_vcpus_matched_tsc in case there were multiple TSC writes in a single synchronization cycle. Turns out that qemu does multiple TSC writes during init, below is the evidence of that (qemu-2.0.0): The first one: 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel] 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm] 0xffffffffa04cfd6b : kvm_arch_vcpu_postcreate+0x4b/0x80 [kvm] 0xffffffffa04b8188 : kvm_vm_ioctl+0x418/0x750 [kvm] The second one: 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel] 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm] 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel] 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm] 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm] 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm] 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm] #0 kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780 #1 kvm_put_msrs at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270 #2 kvm_arch_put_registers at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909 #3 kvm_cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1641 #4 cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:330 #5 cpu_synchronize_all_post_init () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:521 #6 main at /build/buildd/qemu-2.0.0+dfsg/vl.c:4390 The third one: 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel] 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm] 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel] 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm] 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm] 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm] 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm] #0 kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780 #1 kvm_put_msrs at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270 #2 kvm_arch_put_registers at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909 #3 kvm_cpu_synchronize_post_reset at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1635 #4 cpu_synchronize_post_reset at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:323 #5 cpu_synchronize_all_post_reset () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:512 #6 main at /build/buildd/qemu-2.0.0+dfsg/vl.c:4482 The fix is to count each vCPU only once when matched, so that nr_vcpus_matched_tsc holds the size of the matched set. This is achieved by reusing generation counters. Every vCPU with this_tsc_generation == cur_tsc_generation is in the matched set. The match set is cleared by setting cur_tsc_generation to a value which no other vCPU is set to (by incrementing it). I needed to bump up the counter size form u8 to u64 to ensure it never overflows. Otherwise in cases TSC is not written the same number of times on each vCPU the counter could overflow and incorrectly indicate some vCPUs as being in the matched set. This scenario seems unlikely but I'm not sure if it can be disregarded. Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-09 18:09:57 +02:00
Jan Kiszka	6cbc5f5a80	KVM: nSVM: Set correct port for IOIO interception evaluation Obtaining the port number from DX is bogus as a) there are immediate port accesses and b) user space may have changed the register content while processing the PIO access. Forward the correct value from the instruction emulator instead. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-09 18:09:56 +02:00
Ard Biesheuvel	f23cf8bd5c	efi/x86: efistub: Move shared dependencies to <asm/efi.h> This moves definitions depended upon both by code under arch/x86/boot and under drivers/firmware/efi to <asm/efi.h>. This is in preparation of turning the stub code under drivers/firmware/efi into a static library. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-07-07 20:29:46 +01:00
Tejun Heo	b9cd18de4d	ptrace,x86: force IRET path after a ptrace_stop() The 'sysret' fastpath does not correctly restore even all regular registers, much less any segment registers or reflags values. That is very much part of why it's faster than 'iret'. Normally that isn't a problem, because the normal ptrace() interface catches the process using the signal handler infrastructure, which always returns with an iret. However, some paths can get caught using ptrace_event() instead of the signal path, and for those we need to make sure that we aren't going to return to user space using 'sysret'. Otherwise the modifications that may have been done to the register set by the tracer wouldn't necessarily take effect. Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from arch_ptrace_stop_needed() which is invoked from ptrace_stop(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-07-03 17:27:23 -07:00
Linus Torvalds	4f23174981	A bunch of one-liners (except the s390 one). The two more serious bugs ("KVM: SVM: Fix CPL export via SS.DPL" and "KVM: s390: add sie.h uapi header file to Kbuild and remove header dependency") were introduced in the 3.16 merge window. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJTso6OAAoJEBvWZb6bTYbyB5IP/j/1d0hKsVjOGMco+dJ3uwjh X2gYBVxT3Nm9fyjAjSM6OjbFF2mj9zFNEGu0NvEaDnIlWtifvtXckFB1asp/o3/M UTbeaaN5US9Ou9W87KpJQLp7Wp9ENMgXeFsywpf9qMNyV04OHSP5cCwIspShCkNH r21oEwgvrnc4Trh4oBVUaykuPuU4mzAMBSiXxbQWVXkkPYBVBjGYNzdas7K3EfS9 YmY1NgS1HDrUvRuM0b3guHqEizA717xFxpVpXAYhuxRqb1fRWMDuIy7q21hEoXxE RtuFjztfpAWIgK9O2j4mTuqR32nedzsieVMzF486oMPXRrUs+oQ16/AYV7K5eOZF Q1QJ5zx7890ncfxjXYMdUTI7d5sDFCc7F3DmRwtWh1jYrsNh8p+VRDTbNdmiNuIa 1wXkoNpTsFHidXA6Uhl2pfI+o9OOWleEP3bB746RanS4bk54cDrx6UK2gJK6MMHl bekjzQrXRlh3qN1mqS+nMShq/vd2G6cCG0Y9ez8/aHrJoU5DPIOQ7IcOt2IZGtwZ MiBZAoWgHuYpEV4tXzqjHQy8IAddvGnM3RqWfNc0XLlVwcKosdI4fusg8wKnR02Z sLYRfe5BTnHG/ieIlx/iQmRXN04hhIUJiggFZrLizGemGYi16SaN/ixwb4YoA3nH ksZxlGKUi9SUftnv0Ph6 =Rjb4 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "A bunch of one-liners (except the s390 one). The two more serious bugs ("KVM: SVM: Fix CPL export via SS.DPL" and "KVM: s390: add sie.h uapi header file to Kbuild and remove header dependency") were introduced in the 3.16 merge window" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Fix CPL export via SS.DPL KVM: s390: add sie.h uapi header file to Kbuild and remove header dependency MIPS: KVM: Fix memory leak on VCPU KVM: x86: preserve the high 32-bits of the PAT register kvm: fix wrong address when writing Hyper-V tsc page KVM: x86: Increase the number of fixed MTRR regs to 10	2014-07-01 09:27:34 -07:00
Aaron Tomlin	f3aca3d095	nmi: provide the option to issue an NMI back trace to every cpu but current Sometimes it is preferred not to use the trigger_all_cpu_backtrace() routine when one wants to avoid capturing a back trace for current. For instance if one was previously captured recently. This patch provides a new routine namely trigger_allbutself_cpu_backtrace() which offers the flexibility to issue an NMI to every cpu but current and capture a back trace accordingly. Patch x86 and sparc to support new routine. [dzickus@redhat.com: add stub in #else clause] [dzickus@redhat.com: don't print message in single processor case, wrap with get/put_cpu based on Oleg's suggestion] [sfr@canb.auug.org.au: undo C99ism] Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-23 16:47:44 -07:00
Vinod Koul	61b165caa6	ASoC: Intel: add mrfld pipelines Merrifield DSP used various pipelines to identify the streams and processing modules. Add these defination in the pcm driver and also add a table for device entries to firmware pipeline id conversion Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Mark Brown <broonie@linaro.org>	2014-06-23 12:24:27 +01:00
Jiang Liu	df334bead7	x86, irq: Introduce helper functions to release IOAPIC pin Introduce function mp_unmap_irq() to release IOAPIC IRQ when IRQ is not used any more, which will typically called by pcibios_disabled_irq. And function mp_irqdomain_unmap() is a common implementation of irq_domain_ops.unmap for IOAPIC. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1402302011-23642-38-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-06-21 23:05:44 +02:00
Jiang Liu	9f354b0252	x86, irq: Clean up unused IOAPIC interface Now we have converted all x86 platforms to use the common irqdomain map interface. There's no caller of io_apic_set_pci_routing(), setup_IO_APIC_irq_extra() and io_apic_setup_irq_pin_once() any more, so kill them. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1402302011-23642-35-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-06-21 23:05:44 +02:00
Jiang Liu	15a3c7cc91	x86, irq: Introduce two helper functions to support irqdomain map operation Currently there are multiple entries to program IOAPIC pins, such as io_apic_setup_irq_pin_once(), io_apic_set_pci_routing() and setup_IO_APIC_irq_extra() etc. This patch introduces two functions to help consolidate the code to program IOAPIC pins. Function mp_set_pin_attr() is used to optionally set trigger, polarity and NUMA node property for an IOAPIC pin. If mp_set_pin_attr() is not invoked for a pin, the default configuration from BIOS will be used. Function mp_irqdomain_map() is an common implementation of irqdomain map() operation. It figures out attribures for pin and then actually programs the IOAPIC pin. We hope this will be the only entrance for programming IOAPIC pin. And the flow will: 1) caller such as xxx_pci_irq_enable figures out pin attributes. 2) Invoke mp_set_pin_attr() to set attributes for a pin. If the pin has already bin programmed, mp_set_pin_attr() will aslo detects attribute confictions. 3) Invoke mp_map_pin_to_irq() 3.1) If IRQ has already been assigned, return irq_find_mapping() 3.2) Else irq_create_mapping() ->irq_domain_associate() ->mp_irqdomain_map() ->io_apic_setup_irq_pin() So every pin will only programmed once by mp_irqdomain_map(), so we could kill io_apic_setup_irq_pin_once(), io_apic_set_pci_routing() and setup_IO_APIC_irq_extra() etc. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1402302011-23642-30-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-06-21 23:05:43 +02:00
Jiang Liu	facd8fdb25	x86, devicetree, irq: Use common mechanism to support irqdomain Now the ioapic driver provides a common interface to create irqdomain, so replace the private implementation. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Tony Lindgren <tony@atomide.com> Link: http://lkml.kernel.org/r/1402302011-23642-29-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-06-21 23:05:43 +02:00
Jiang Liu	44767bfaae	x86, irq: Enhance mp_register_ioapic() to support irqdomain Enhance function mp_register_ioapic() to support irqdomain. When registering IOAPIC, caller may provide callbacks and parameters for creating irqdomain. The IOAPIC core will create irqdomain later if caller has passed in corresponding parameters. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: sfi-devel@simplefirmware.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Tony Lindgren <tony@atomide.com> Link: http://lkml.kernel.org/r/1402302011-23642-25-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-06-21 23:05:42 +02:00
Jiang Liu	d7f3d47818	x86, irq: Introduce mechanisms to support dynamically allocate IRQ for IOAPIC Currently x86 support identity mapping between GSI(IOAPIC pin) and IRQ number, so continous IRQs at low end are statically allocated to IOAPICs at boot time. This design causes trouble to support IOAPIC hotplug. This patch implements basic mechanism to dynamically allocate IRQ on demand for IOAPIC pins by using irqdomain framework. It first adds several fields into struct ioapic to support irqdomain. Then it implements an algorithm to dynamically allocate IRQ number for IOAPIC pins on demand. Currently it supports three types of irqdomain: 1) LEGACY: used to support IOAPIC hosting legacy IRQs and building identity mapping for legacy IRQs. A speical case, we dynamically allocate IRQ number for IOAPIC pin which has GSI number below nr_legacy_irqs() but isn't legacy IRQ. This is for backward compatibility and avoid regression. 2) STRICT: build identity mapping between GSI and IRQ nubmer. 3) DYNAMIC: dynamically allocate IRQ number for IOAPIC pin on demand. Legacy(ISA) IRQs is not managed by irqdomain because there may be multiple pins sharing the same IRQ number and current irqdomain only supports 1:1 mapping between pins and IRQ. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Link: http://lkml.kernel.org/r/1402302011-23642-24-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-06-21 23:05:42 +02:00
Jiang Liu	6b9fb70824	x86, ACPI, irq: Consolidate algorithm of mapping (ioapic, pin) to IRQ number Currently ACPI and ioapic both implement algorithms to map (ioapic, pin) to IRQ number. So consolidate the common part into one place, which is also preparing for irqdomain support. It introduces mp_map_gsi_to_irq(), which will be used to allocate IRQ number IOAPIC pins when irqdomain is enabled. Also rename gsi_to_irq() to map_gsi_to_irq(), later we will introduce unmap_gsi_to_irq() when enabling IOAPIC hotplug. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Link: http://lkml.kernel.org/r/1402380812-32446-1-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-06-21 23:05:42 +02:00
Jiang Liu	95d76acc75	x86, irq: Count legacy IRQs by legacy_pic->nr_legacy_irqs instead of NR_IRQS_LEGACY Some platforms, such as Intel MID and mshypv, do not support legacy interrupt controllers. So count legacy IRQs by legacy_pic->nr_legacy_irqs instead of hard-coded NR_IRQS_LEGACY. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: xen-devel@lists.xenproject.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Tony Lindgren <tony@atomide.com> Acked-by: David Vrabel <david.vrabel@citrix.com> Link: http://lkml.kernel.org/r/1402302011-23642-20-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-06-21 23:05:42 +02:00
Jiang Liu	18e4855186	x86, irq: Introduce some helper utilities to improve readability It also fixes an off by one bug in if ((ioapic_idx > 0) && (irq > NR_IRQS_LEGACY)) It should be if ((ioapic_idx > 0) && (irq >= NR_IRQS_LEGACY)) Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1402302011-23642-17-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-06-21 23:05:41 +02:00
Jiang Liu	4035ed0134	x86, ioapic: Kill unused global variable timer_through_8259 Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1402302011-23642-12-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-06-21 23:05:41 +02:00
Jiang Liu	3eb2be5f49	x86, irq, trivial: Minor improvements of IRQ related code 1) Kill unused MAX_HARDIRQS_PER_CPU. 2) Improve function prototype declararions. 3) Simple typo fix, change "gsit" to "gsi". 4) Use macro VECTOR_UNDEFINED instead of hard-coded -1. 5) Kill redundant comments. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Jiri Kosina <trivial@kernel.org> Link: http://lkml.kernel.org/r/1402302011-23642-11-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-06-21 23:05:41 +02:00
Jiang Liu	a491cc902c	x86, mpparse: Simplify arch/x86/include/asm/mpspec.h Simplify arch/x86/include/asm/mpspec.h by 1) Change max_physical_apicid to static as it's only used in apic.c. 2) Kill declaration of mpc_default_type, it's never defined. 3) Delete default_acpi_madt_oem_check(), it has already been declared in apic.h. 4) Make default_acpi_madt_oem_check() depends on CONFIG_X86_LOCAL_APIC instead of CONFIG_X86_64 to support i386. 5) Change mp_override_legacy_irq(), mp_config_acpi_legacy_irqs() and mp_register_gsi() as static because they are only used in acpi/boot.c. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Richard Weinberger <richard@nod.at> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1402302011-23642-4-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-06-21 23:05:40 +02:00
Paolo Bonzini	7cb060a91c	KVM: x86: preserve the high 32-bits of the PAT register KVM does not really do much with the PAT, so this went unnoticed for a long time. It is exposed however if you try to do rdmsr on the PAT register. Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-06-19 13:43:44 +02:00
Jan Kiszka	560b7ee12c	KVM: nVMX: Fix returned value of MSR_IA32_VMX_PROCBASED_CTLS SDM says bits 1, 4-6, 8, 13-16, and 26 have to be set. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-06-19 12:52:12 +02:00
Jan Kiszka	3dbcd8da7b	KVM: nVMX: Advertise support for MSR_IA32_VMX_TRUE_*_CTLS We already implemented them but failed to advertise them. Currently they all return the identical values to the capability MSRs they are augmenting. So there is no change in exposed features yet. Drop related comments at this chance that are partially incorrect and redundant anyway. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-06-19 12:52:11 +02:00
Jan Kiszka	e4aa5288ff	KVM: x86: Fix constant value of VM_{EXIT_SAVE,ENTRY_LOAD}_DEBUG_CONTROLS The spec says those controls are at bit position 2 - makes 4 as value. The impact of this mistake is effectively zero as we only use them to ensure that these features are set at position 2 (or, previously, 1) in MSR_IA32_VMX_{EXIT,ENTRY}_CTLS - which is and will be always true according to the spec. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-06-19 12:52:11 +02:00
Saurabh Tangri	eeb9db09f7	x86/efi: Move all workarounds to a separate file quirks.c Currently, it's difficult to find all the workarounds that are applied when running on EFI, because they're littered throughout various code paths. This change moves all of them into a separate file with the hope that it will be come the single location for all our well documented quirks. Signed-off-by: Saurabh Tangri <saurabh.tangri@intel.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-06-19 11:14:33 +01:00
Nadav Amit	682367c494	KVM: x86: Increase the number of fixed MTRR regs to 10 Recent Intel CPUs have 10 variable range MTRRs. Since operating systems sometime make assumptions on CPUs while they ignore capability MSRs, it is better for KVM to be consistent with recent CPUs. Reporting more MTRRs than actually supported has no functional implications. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-06-19 11:41:14 +02:00
Borislav Petkov	9b13a93df2	x86, cpufeature: Convert more "features" to bugs X86_FEATURE_FXSAVE_LEAK, X86_FEATURE_11AP and X86_FEATURE_CLFLUSH_MONITOR are not really features but synthetic bits we use for applying different bug workarounds. Call them what they really are, and make sure they get the proper cross-CPU behavior (OR rather than AND). Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1403042783-23278-1-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-06-18 15:27:04 -07:00
H. Peter Anvin	03ab3da3b2	Linux 3.16-rc1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJTnmhkAAoJEHm+PkMAQRiGYyoIAJ+dCxQjx0Jmu5VLs48yvUkx AVbnJeEq0ScgFPBAlfrGnnXczOyZGD+wveNRwb3bN4f6hGkWHPncfExAeTU35QQ+ 92kCLScPU6kn4tnTqjac4ZrUhBCfgg5hFFamXOPidVDG4f5wYwql3IvP4O3eMOcZ IOx7kgweqoPm4A/LjXQ3L21kghs88NXySWMwwfWFdXaV++ey07slCWLzGF5rMDh/ xfBDfgTS4gdrbGeE+1z/qkoWyHwKnCad8Uh2Fu5CcprElwCjLLhrLPccYSRKO2IR 2ZSj/mcMb1FhH7AOyXBYMVbjhOH5MCIlHvJYYp7kwHfs66UTmnkczOJxq1ynKP0= =Whty -----END PGP SIGNATURE----- Merge tag 'v3.16-rc1' into x86/cpufeature Linux 3.16-rc1 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-06-18 15:26:19 -07:00
Borislav Petkov	d0f2dd1861	x86, xsave: Add forgotten inline annotation Add a missing inline annotation on a static function, in order to shut up a bunch of warnings like: In file included from arch/x86/crypto/camellia_aesni_avx_glue.c:23:0: ./arch/x86/include/asm/xsave.h:73:12: warning: ‘xsave_state_booting’ defined but not used [-Wunused-function] static int xsave_state_booting(struct xsave_struct fx, u64 mask) ^ In file included from arch/x86/crypto/camellia_aesni_avx2_glue.c:23:0: ./arch/x86/include/asm/xsave.h:73:12: warning: ‘xsave_state_booting’ defined but not used [-Wunused-function] static int xsave_state_booting(struct xsave_struct fx, u64 mask) ^ ... Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1403000468-30094-1-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-06-18 15:19:59 -07:00
Peter Zijlstra	4f3aaf2c2b	x86, locking: Use no more OOSTORE nonsense Paul reported that X86_OOSTORE is dead, yay! Update a comment and remove a newly added reference. Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Dave Jones <davej@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Waiman Long <Waiman.Long@hp.com> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/20140611090509.GC3588@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-06-18 18:41:22 +02:00
Nadav Amit	67f4d4288c	KVM: x86: rdpmc emulation checks the counter incorrectly The rdpmc emulation checks that the counter (ECX) is not higher than 2, without taking into considerations bits 30:31 role (e.g., bit 30 marks whether the counter is fixed). The fix uses the pmu information for checking the validity of the pmu counter. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-06-18 17:46:18 +02:00
Tejun Heo	6fbc07bbe2	percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations __verify_pcpu_ptr() is used to verify that a specified parameter is actually an percpu pointer by percpu accessor and operation implementations. Currently, where it's called isn't clearly defined and we just ensure that it's invoked at least once for all accessors and operations. The lack of clarity on when it should be called isn't nice and given that this is a completely generic issue, there's no reason to make archs worry about it. This patch updates __verify_pcpu_ptr() invocations such that it's always invoked from the final generic wrapper once per access or operation. As this is already the case for {raw\|this}_cpu_() definitions through __pcpu_size_(), only the {raw\|per\|this}_cpu_ptr() accessors need to be updated. This change makes it unnecessary for archs to worry about __verify_pcpu_ptr(). x86's arch_raw_cpu_ptr() is updated accordingly. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com>	2014-06-17 19:12:40 -04:00
Tejun Heo	bbc344e1e3	percpu: introduce arch_raw_cpu_ptr() Currently, archs can override raw_cpu_ptr() directly; however, we wanna build a layer of indirection in the generic part of percpu so that we can implement generic features there without affecting archs. Introduce arch_raw_cpu_ptr() which is used to define raw_cpu_ptr() by generic percpu code. The two are identical for now. x86 is currently the only arch which overrides raw_cpu_ptr() and is converted to define arch_raw_cpu_ptr() instead. This doesn't introduce any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com>	2014-06-17 19:12:34 -04:00
Linus Torvalds	3737a12761	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more perf updates from Ingo Molnar: "A second round of perf updates: - wide reaching kprobes sanitization and robustization, with the hope of fixing all 'probe this function crashes the kernel' bugs, by Masami Hiramatsu. - uprobes updates from Oleg Nesterov: tmpfs support, corner case fixes and robustization work. - perf tooling updates and fixes from Jiri Olsa, Namhyung Ki, Arnaldo et al: * Add support to accumulate hist periods (Namhyung Kim) * various fixes, refactorings and enhancements" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits) perf: Differentiate exec() and non-exec() comm events perf: Fix perf_event_comm() vs. exec() assumption uprobes/x86: Rename arch_uprobe->def to ->defparam, minor comment updates perf/documentation: Add description for conditional branch filter perf/x86: Add conditional branch filtering support perf/tool: Add conditional branch filter 'cond' to perf record perf: Add new conditional branch filter 'PERF_SAMPLE_BRANCH_COND' uprobes: Teach copy_insn() to support tmpfs uprobes: Shift ->readpage check from __copy_insn() to uprobe_register() perf/x86: Use common PMU interrupt disabled code perf/ARM: Use common PMU interrupt disabled code perf: Disable sampled events if no PMU interrupt perf: Fix use after free in perf_remove_from_context() perf tools: Fix 'make help' message error perf record: Fix poll return value propagation perf tools: Move elide bool into perf_hpp_fmt struct perf tools: Remove elide setup for SORT_MODE__MEMORY mode perf tools: Fix "==" into "=" in ui_browser__warning assignment perf tools: Allow overriding sysfs and proc finding with env var perf tools: Consider header files outside perf directory in tags target ...	2014-06-12 19:18:49 -07:00
Linus Torvalds	c29deef32e	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more locking changes from Ingo Molnar: "This is the second round of locking tree updates for v3.16, offering large system scalability improvements: - optimistic spinning for rwsems, from Davidlohr Bueso. - 'qrwlocks' core code and x86 enablement, from Waiman Long and PeterZ" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, locking/rwlocks: Enable qrwlocks on x86 locking/rwlocks: Introduce 'qrwlocks' - fair, queued rwlocks locking/mutexes: Documentation update/rewrite locking/rwsem: Fix checkpatch.pl warnings locking/rwsem: Fix warnings for CONFIG_RWSEM_GENERIC_SPINLOCK locking/rwsem: Support optimistic spinning	2014-06-12 18:48:15 -07:00
Linus Torvalds	f9da455b93	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov. 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J Benniston. 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn Mork. 4) BPF now has a "random" opcode, from Chema Gonzalez. 5) Add more BPF documentation and improve test framework, from Daniel Borkmann. 6) Support TCP fastopen over ipv6, from Daniel Lee. 7) Add software TSO helper functions and use them to support software TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia. 8) Support software TSO in fec driver too, from Nimrod Andy. 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli. 10) Handle broadcasts more gracefully over macvlan when there are large numbers of interfaces configured, from Herbert Xu. 11) Allow more control over fwmark used for non-socket based responses, from Lorenzo Colitti. 12) Do TCP congestion window limiting based upon measurements, from Neal Cardwell. 13) Support busy polling in SCTP, from Neal Horman. 14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru. 15) Bridge promisc mode handling improvements from Vlad Yasevich. 16) Don't use inetpeer entries to implement ID generation any more, it performs poorly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits) rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 tcp: fixing TLP's FIN recovery net: fec: Add software TSO support net: fec: Add Scatter/gather support net: fec: Increase buffer descriptor entry number net: fec: Factorize feature setting net: fec: Enable IP header hardware checksum net: fec: Factorize the .xmit transmit function bridge: fix compile error when compiling without IPv6 support bridge: fix smatch warning / potential null pointer dereference via-rhine: fix full-duplex with autoneg disable bnx2x: Enlarge the dorq threshold for VFs bnx2x: Check for UNDI in uncommon branch bnx2x: Fix 1G-baseT link bnx2x: Fix link for KR with swapped polarity lane sctp: Fix sk_ack_backlog wrap-around problem net/core: Add VF link state control policy net/fsl: xgmac_mdio is dependent on OF_MDIO net/fsl: Make xgmac_mdio read error message useful net_sched: drr: warn when qdisc is not work conserving ...	2014-06-12 14:27:40 -07:00
Oleg Nesterov	36fac0a214	signals: kill sigfindinword() It has no users and it doesn't look useful. I do not know why/when it was introduced, I can't even find any user in the git history. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-06 16:08:11 -07:00
Waiman Long	bd01ec1a13	x86, locking/rwlocks: Enable qrwlocks on x86 Make x86 use the fair rwlock_t. Implement the custom queue_write_unlock() for best performance. Signed-off-by: Waiman Long <Waiman.Long@hp.com> [peterz: near complete rewrite] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Dave Jones <davej@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com> Cc: linux-kernel@vger.kernel.org Cc: x86@kernel.org Link: http://lkml.kernel.org/n/tip-r1xuzmdysvuhl3h86n5fbxi7@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-06-06 07:58:29 +02:00
Ingo Molnar	ec00010972	Merge branch 'perf/urgent' into perf/core, to resolve conflict and to prepare for new patches Conflicts: arch/x86/kernel/traps.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-06-06 07:55:06 +02:00
Linus Torvalds	046f153343	Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull x86 EFI updates from Peter Anvin: "A collection of EFI changes. The perhaps most important one is to fully save and restore the FPU state around each invocation of EFI runtime, and to not choke on non-ASCII characters in the boot stub" * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efivars: Add compatibility code for compat tasks efivars: Refactor sanity checking code into separate function efivars: Stop passing a struct argument to efivar_validate() efivars: Check size of user object efivars: Use local variables instead of a pointer dereference x86/efi: Save and restore FPU context around efi_calls (i386) x86/efi: Save and restore FPU context around efi_calls (x86_64) x86/efi: Implement a __efi_call_virt macro x86, fpu: Extend the use of static_cpu_has_safe x86/efi: Delete most of the efi_call* macros efi: x86: Handle arbitrary Unicode characters efi: Add get_dram_base() helper function efi: Add shared printk wrapper for consistent prefixing efi: create memory map iteration helper efi: efi-stub-helper cleanup	2014-06-05 08:16:29 -07:00
Linus Torvalds	a0abcf2e8f	Merge branch 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull x86 cdso updates from Peter Anvin: "Vdso cleanups and improvements largely from Andy Lutomirski. This makes the vdso a lot less ''special''" * 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso, build: Make LE access macros clearer, host-safe x86/vdso, build: Fix cross-compilation from big-endian architectures x86/vdso, build: When vdso2c fails, unlink the output x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET x86, mm: Replace arch_vma_name with vm_ops->name for vsyscalls x86, mm: Improve _install_special_mapping and fix x86 vdso naming mm, fs: Add vm_ops->name as an alternative to arch_vma_name x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET x86, vdso: Remove vestiges of VDSO_PRELINK and some outdated comments x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSO x86, vdso: Move the 32-bit vdso special pages after the text x86, vdso: Reimplement vdso.so preparation in build-time C x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.c x86, vdso: Clean up 32-bit vs 64-bit vdso params x86, mm: Ensure correct alignment of the fixmap	2014-06-05 08:05:29 -07:00
Linus Torvalds	2071b3e34f	Merge branch 'x86/espfix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull x86-64 espfix changes from Peter Anvin: "This is the espfix64 code, which fixes the IRET information leak as well as the associated functionality problem. With this code applied, 16-bit stack segments finally work as intended even on a 64-bit kernel. Consequently, this patchset also removes the runtime option that we added as an interim measure. To help the people working on Linux kernels for very small systems, this patchset also makes these compile-time configurable features" * 'x86/espfix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option" x86, espfix: Make it possible to disable 16-bit support x86, espfix: Make espfix64 a Kconfig option, fix UML x86, espfix: Fix broken header guard x86, espfix: Move espfix definitions into a separate header file x86-32, espfix: Remove filter for espfix32 due to race x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack	2014-06-05 07:46:15 -07:00
Ingo Molnar	8c6e549a44	Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core Pull uprobes tmpfs support patches from Oleg Nesterov. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-06-05 16:35:30 +02:00
Oleg Nesterov	5cdb76d6f0	uprobes/x86: Rename arch_uprobe->def to ->defparam, minor comment updates Purely cosmetic, no changes in .o, 1. As Jim pointed out arch_uprobe->def looks ambiguous, rename it to ->defparam. 2. Add the comment into default_post_xol_op() to explain "regs->sp +=". 3. Remove the stale part of the comment in arch_uprobe_analyze_insn(). Suggested-by: Jim Keniston <jkenisto@us.ibm.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2014-06-05 16:21:57 +02:00
Cliff Wickman	a26fd71953	x86/uv: Update the UV3 TLB shootdown logic Update of TLB shootdown code for UV3. Kernel function native_flush_tlb_others() calls uv_flush_tlb_others() on UV to invalidate tlb page definitions on remote cpus. The UV systems have a hardware 'broadcast assist unit' which can be used to broadcast shootdown messages to all cpu's of selected nodes. The behavior of the BAU has changed only slightly with UV3: - UV3 is recognized with is_uv3_hub(). - UV2 functions and structures (uv2_xxx) are in most cases simply renamed to uv2_3_xxx. - Some UV2 error workarounds are not needed for UV3. (see uv_bau_message_interrupt and enable_timeouts) Signed-off-by: Cliff Wickman <cpw@sgi.com> Link: http://lkml.kernel.org/r/E1WkgWh-0001yJ-3K@eag09.americas.sgi.com [ Removed a few linebreak uglies. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-06-05 14:17:20 +02:00
Ingo Molnar	10b0256496	Merge branch 'perf/kprobes' into perf/core Conflicts: arch/x86/kernel/traps.c The kprobes enhancements are fully cooked, ship them upstream. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-06-05 12:26:50 +02:00
Linus Torvalds	00170fdd08	Merge branch 'akpm' (patchbomb from Andrew) into next Merge misc updates from Andrew Morton: - a few fixes for 3.16. Cc'ed to stable so they'll get there somehow. - various misc fixes and cleanups - most of the ocfs2 queue. Review is slow... - most of MM. The MM queue is pretty huge this time, but not much in the way of feature work. - some tweaks under kernel/ - printk maintenance work - updates to lib/ - checkpatch updates - tweaks to init/ * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (276 commits) fs/autofs4/dev-ioctl.c: add __init to autofs_dev_ioctl_init fs/ncpfs/getopt.c: replace simple_strtoul by kstrtoul init/main.c: remove an ifdef kthreads: kill CLONE_KERNEL, change kernel_thread(kernel_init) to avoid CLONE_SIGHAND init/main.c: add initcall_blacklist kernel parameter init/main.c: don't use pr_debug() fs/binfmt_flat.c: make old_reloc() static fs/binfmt_elf.c: fix bool assignements fs/efs: convert printk(KERN_DEBUG to pr_debug fs/efs: add pr_fmt / use __func__ fs/efs: convert printk to pr_foo() scripts/checkpatch.pl: device_initcall is not the only __initcall substitute checkpatch: check stable email address checkpatch: warn on unnecessary void function return statements checkpatch: prefer kstrto<foo> to sscanf(buf, "%<lhuidx>", &bar); checkpatch: add warning for kmalloc/kzalloc with multiply checkpatch: warn on #defines ending in semicolon checkpatch: make --strict a default for files in drivers/net and net/ checkpatch: always warn on missing blank line after variable declaration block checkpatch: fix wildcard DT compatible string checking ...	2014-06-04 16:55:13 -07:00
Fabian Frederick	f6187769da	sys_sgetmask/sys_ssetmask: add CONFIG_SGETMASK_SYSCALL sys_sgetmask and sys_ssetmask are obsolete system calls no longer supported in libc. This patch replaces architecture related __ARCH_WANT_SYS_SGETMAX by expert mode configuration.That option is enabled by default for those architectures. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Steven Miao <realmz6@gmail.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-04 16:54:14 -07:00
Chen Yucong	65eb71823b	hwpoison: remove unused global variable in do_machine_check() Remove an unused global variable mce_entry and relative operations in do_machine_check(). Signed-off-by: Chen Yucong <slaoub@gmail.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-04 16:54:11 -07:00
Cyrill Gorcunov	2bf01f9f0c	mm: x86 pgtable: require X86_64 for soft-dirty tracker Tracking dirty status on 2 level pages requires very ugly macros and taking into account how old the machines who can operate without PAE mode only are, lets drop soft dirty tracker from them for code simplicity (note I can't drop all the macros from 2 level pages by now since _PAGE_BIT_PROTNONE and _PAGE_BIT_FILE are still used even without tracker). Linus proposed to completely rip off softdirty support on x86-32 (even with PAE) and since for CRIU we're not planning to support native x86-32 mode, lets do that. (Softdirty tracker is relatively new feature which is mostly used by CRIU so I don't expect if such API change would cause problems for userspace). Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Noonan <steven@uplinklabs.net> Cc: Rik van Riel <riel@redhat.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-04 16:54:05 -07:00
Cyrill Gorcunov	2373eaecff	mm: x86 pgtable: drop unneeded preprocessor ifdef _PAGE_BIT_FILE (bit 6) is always less than _PAGE_BIT_PROTNONE (bit 8), so drop redundant #ifdef. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Noonan <steven@uplinklabs.net> Cc: Rik van Riel <riel@redhat.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-04 16:54:05 -07:00
Akinobu Mita	9c5a362142	x86: enable DMA CMA with swiotlb The DMA Contiguous Memory Allocator support on x86 is disabled when swiotlb config option is enabled. So DMA CMA is always disabled on x86_64 because swiotlb is always enabled. This attempts to support for DMA CMA with enabling swiotlb config option. The contiguous memory allocator on x86 is integrated in the function dma_generic_alloc_coherent() which is .alloc callback in nommu_dma_ops for dma_alloc_coherent(). x86_swiotlb_alloc_coherent() which is .alloc callback in swiotlb_dma_ops tries to allocate with dma_generic_alloc_coherent() firstly and then swiotlb_alloc_coherent() is called as a fallback. The main part of supporting DMA CMA with swiotlb is that changing x86_swiotlb_free_coherent() which is .free callback in swiotlb_dma_ops for dma_free_coherent() so that it can distinguish memory allocated by dma_generic_alloc_coherent() from one allocated by swiotlb_alloc_coherent() and release it with dma_generic_free_coherent() which can handle contiguous memory. This change requires making is_swiotlb_buffer() global function. This also needs to change .free callback in the dma_map_ops for amd_gart and sta2x11, because these dma_ops are also using dma_generic_alloc_coherent(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Don Dutile <ddutile@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-04 16:53:57 -07:00
Mel Gorman	c46a7c817e	x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting faults on x86. Care is taken such that _PAGE_NUMA is used only in situations where the VMA flags distinguish between NUMA hinting faults and prot_none faults. This decision was x86-specific and conceptually it is difficult requiring special casing to distinguish between PROTNONE and NUMA ptes based on context. Fundamentally, we only need the _PAGE_NUMA bit to tell the difference between an entry that is really unmapped and a page that is protected for NUMA hinting faults as if the PTE is not present then a fault will be trapped. Swap PTEs on x86-64 use the bits after _PAGE_GLOBAL for the offset. This patch shrinks the maximum possible swap size and uses the bit to uniquely distinguish between NUMA hinting ptes and swap ptes. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Noonan <steven@uplinklabs.net> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-04 16:53:55 -07:00
Linus Torvalds	d09cc3659d	Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull core irq updates from Thomas Gleixner: "The irq department delivers: - Another tree wide update to get rid of the horrible create_irq interface along with its even more horrible variants. That also gets rid of the last leftovers of the initial sparse irq hackery. arch/driver specific changes have been either acked or ignored. - A fix for the spurious interrupt detection logic with threaded interrupts. - A new ARM SoC interrupt controller - The usual pile of fixes and improvements all over the place" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) Documentation: brcmstb-l2: Add Broadcom STB Level-2 interrupt controller binding irqchip: brcmstb-l2: Add Broadcom Set Top Box Level-2 interrupt controller genirq: Improve documentation to match current implementation ARM: iop13xx: fix msi support with sparse IRQ genirq: Provide !SMP stub for irq_set_affinity_notifier() irqchip: armada-370-xp: Move the devicetree binding documentation irqchip: gic: Use mask field in GICC_IAR genirq: Remove dynamic_irq mess ia64: Use irq_init_desc genirq: Replace dynamic_irq_init/cleanup genirq: Remove irq_reserve_irq[s] genirq: Replace reserve_irqs in core code s390: Avoid call to irq_reserve_irqs() s390: Remove pointless arch_show_interrupts() s390: pci: Check return value of alloc_irq_desc() proper sh: intc: Remove pointless irq_reserve_irqs() invocation x86, irq: Remove pointless irq_reserve_irqs() call genirq: Make create/destroy_irq() ia64 private tile: Use SPARSE_IRQ tile: pci: Use irq_alloc/free_hwirq() ...	2014-06-04 15:59:13 -07:00
Linus Torvalds	4dc4226f99	ACPI and power management updates for 3.16-rc1 - ACPICA update to upstream version 20140424. That includes a number of fixes and improvements related to things like GPE handling, table loading, headers, memory mapping and unmapping, DSDT/SSDT overriding, and the Unload() operator. The acpidump utility from upstream ACPICA is included too. From Bob Moore, Lv Zheng, David Box, David Binderman, and Colin Ian King. - Fixes and cleanups related to ACPI video and backlight interfaces from Hans de Goede. That includes blacklist entries for some new machines and using native backlight by default. - ACPI device enumeration changes to create platform devices rather than PNP devices for ACPI device objects with _HID by default. PNP devices will still be created for the ACPI device object with device IDs corresponding to real PNP devices, so that change should not break things left and right, and we're expecting to see more and more ACPI-enumerated platform devices in the future. From Zhang Rui and Rafael J Wysocki. - Updates for the ACPI LPSS (Low-Power Subsystem) driver allowing it to handle system suspend/resume on Asus T100 correctly. From Heikki Krogerus and Rafael J Wysocki. - PM core update introducing a mechanism to allow runtime-suspended devices to stay suspended over system suspend/resume transitions if certain additional conditions related to coordination within device hierarchy are met. Related PM documentation update and ACPI PM domain support for the new feature. From Rafael J Wysocki. - Fixes and improvements related to the "freeze" sleep state. They affect several places including cpuidle, PM core, ACPI core, and the ACPI battery driver. From Rafael J Wysocki and Zhang Rui. - Miscellaneous fixes and updates of the ACPI core from Aaron Lu, Bjørn Mork, Hanjun Guo, Lan Tianyu, and Rafael J Wysocki. - Fixes and cleanups for the ACPI processor and ACPI PAD (Processor Aggregator Device) drivers from Baoquan He, Manuel Schölling, Tony Camuso, and Toshi Kani. - System suspend/resume optimization in the ACPI battery driver from Lan Tianyu. - OPP (Operating Performance Points) subsystem updates from Chander Kashyap, Mark Brown, and Nishanth Menon. - cpufreq core fixes, updates and cleanups from Srivatsa S Bhat, Stratos Karafotis, and Viresh Kumar. - Updates, fixes and cleanups for the Tegra, powernow-k8, imx6q, s5pv210, nforce2, and powernv cpufreq drivers from Brian Norris, Jingoo Han, Paul Bolle, Philipp Zabel, Stratos Karafotis, and Viresh Kumar. - intel_pstate driver fixes and cleanups from Dirk Brandewie, Doug Smythies, and Stratos Karafotis. - Enabling the big.LITTLE cpufreq driver on arm64 from Mark Brown. - Fix for the cpuidle menu governor from Chander Kashyap. - New ARM clps711x cpuidle driver from Alexander Shiyan. - Hibernate core fixes and cleanups from Chen Gang, Dan Carpenter, Fabian Frederick, Pali Rohár, and Sebastian Capella. - Intel RAPL (Running Average Power Limit) driver updates from Jacob Pan. - PNP subsystem updates from Bjorn Helgaas and Fabian Frederick. - devfreq core updates from Chanwoo Choi and Paul Bolle. - devfreq updates for exynos4 and exynos5 from Chanwoo Choi and Bartlomiej Zolnierkiewicz. - turbostat tool fix from Jean Delvare. - cpupower tool updates from Prarit Bhargava, Ramkumar Ramachandra and Thomas Renninger. - New ACPI ec_access.c tool for poking at the EC in a safe way from Thomas Renninger. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJTjl16AAoJEILEb/54YlRxeKgP/RRQSV7lFtf582Dw/5M/iWOg qYeNtuYFLArEmJ7SpxHdKsU1ZRm3CahAS1j7grvQMQasUxTzoavMcSBNZefeaoNK d01LVNqcyKCZs3+izRezk5N1IY+AjdrOcqCdIk8rfgFnc6kOttYUrVcIzKuIKAvJ MsJ5s/uqP8G69FsAA3Ttdtr0HKiQhN4skSt424wntQRDeJNZPBs74mPKBGh8bxlO Zr/VCDibKQ2Z8jS7x+TzwZrOxgE1/9x0Cub6GAdTvAfS8A+utPwSkneUyopNqpQ+ tJ5rz5R+HpmPMerizBuU+5s+tvjDPtH4/OZvOPSpYraQSFLOwx3hAm+a5k7fOGmc XWjXnXWT0i0V3iQkwrspTNjX1RgywbsHbmXrcWn192HResvMQ9zk2gH2ch6m8JhN yTV5V51dOZicpPuaTCvIkJpsV33p6vRz+EdPBiXoEdua5KKtOg8EnQ470dNaMR92 3ZtWmIvSgGlyPyHlSHLfGXbPUwTYvDNV3aheIoXp9E6WY3WJN9J3WXm4EHKBNVaI H83kwuk1s92cgqh22H5Pcb0CmDcrbkUdP6hhsPS/aL80/EJMljRP2AYW1Y+l1LAf pzMLmekHFqQEDjFQltwGvFV/EjFeMHnqOgQONx9ygMaayCGGTYSDx3FbRDesf8t9 qhoFcTPSxoo0XjrGrR6b =tpdF -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm into next Pull ACPI and power management updates from Rafael Wysocki: "ACPICA is the leader this time (63 commits), followed by cpufreq (28 commits), devfreq (15 commits), system suspend/hibernation (12 commits), ACPI video and ACPI device enumeration (10 commits each). We have no major new features this time, but there are a few significant changes of how things work. The most visible one will probably be that we are now going to create platform devices rather than PNP devices by default for ACPI device objects with _HID. That was long overdue and will be really necessary to be able to use the same drivers for the same hardware blocks on ACPI and DT-based systems going forward. We're not expecting fallout from this one (as usual), but it's something to watch nevertheless. The second change having a chance to be visible is that ACPI video will now default to using native backlight rather than the ACPI backlight interface which should generally help systems with broken Win8 BIOSes. We're hoping that all problems with the native backlight handling that we had previously have been addressed and we are in a good enough shape to flip the default, but this change should be easy enough to revert if need be. In addition to that, the system suspend core has a new mechanism to allow runtime-suspended devices to stay suspended throughout system suspend/resume transitions if some extra conditions are met (generally, they are related to coordination within device hierarchy). However, enabling this feature requires cooperation from the bus type layer and for now it has only been implemented for the ACPI PM domain (used by ACPI-enumerated platform devices mostly today). Also, the acpidump utility that was previously shipped as a separate tool will now be provided by the upstream ACPICA along with the rest of ACPICA code, which will allow it to be more up to date and better supported, and we have one new cpuidle driver (ARM clps711x). The rest is improvements related to certain specific use cases, cleanups and fixes all over the place. Specifics: - ACPICA update to upstream version 20140424. That includes a number of fixes and improvements related to things like GPE handling, table loading, headers, memory mapping and unmapping, DSDT/SSDT overriding, and the Unload() operator. The acpidump utility from upstream ACPICA is included too. From Bob Moore, Lv Zheng, David Box, David Binderman, and Colin Ian King. - Fixes and cleanups related to ACPI video and backlight interfaces from Hans de Goede. That includes blacklist entries for some new machines and using native backlight by default. - ACPI device enumeration changes to create platform devices rather than PNP devices for ACPI device objects with _HID by default. PNP devices will still be created for the ACPI device object with device IDs corresponding to real PNP devices, so that change should not break things left and right, and we're expecting to see more and more ACPI-enumerated platform devices in the future. From Zhang Rui and Rafael J Wysocki. - Updates for the ACPI LPSS (Low-Power Subsystem) driver allowing it to handle system suspend/resume on Asus T100 correctly. From Heikki Krogerus and Rafael J Wysocki. - PM core update introducing a mechanism to allow runtime-suspended devices to stay suspended over system suspend/resume transitions if certain additional conditions related to coordination within device hierarchy are met. Related PM documentation update and ACPI PM domain support for the new feature. From Rafael J Wysocki. - Fixes and improvements related to the "freeze" sleep state. They affect several places including cpuidle, PM core, ACPI core, and the ACPI battery driver. From Rafael J Wysocki and Zhang Rui. - Miscellaneous fixes and updates of the ACPI core from Aaron Lu, Bjørn Mork, Hanjun Guo, Lan Tianyu, and Rafael J Wysocki. - Fixes and cleanups for the ACPI processor and ACPI PAD (Processor Aggregator Device) drivers from Baoquan He, Manuel Schölling, Tony Camuso, and Toshi Kani. - System suspend/resume optimization in the ACPI battery driver from Lan Tianyu. - OPP (Operating Performance Points) subsystem updates from Chander Kashyap, Mark Brown, and Nishanth Menon. - cpufreq core fixes, updates and cleanups from Srivatsa S Bhat, Stratos Karafotis, and Viresh Kumar. - Updates, fixes and cleanups for the Tegra, powernow-k8, imx6q, s5pv210, nforce2, and powernv cpufreq drivers from Brian Norris, Jingoo Han, Paul Bolle, Philipp Zabel, Stratos Karafotis, and Viresh Kumar. - intel_pstate driver fixes and cleanups from Dirk Brandewie, Doug Smythies, and Stratos Karafotis. - Enabling the big.LITTLE cpufreq driver on arm64 from Mark Brown. - Fix for the cpuidle menu governor from Chander Kashyap. - New ARM clps711x cpuidle driver from Alexander Shiyan. - Hibernate core fixes and cleanups from Chen Gang, Dan Carpenter, Fabian Frederick, Pali Rohár, and Sebastian Capella. - Intel RAPL (Running Average Power Limit) driver updates from Jacob Pan. - PNP subsystem updates from Bjorn Helgaas and Fabian Frederick. - devfreq core updates from Chanwoo Choi and Paul Bolle. - devfreq updates for exynos4 and exynos5 from Chanwoo Choi and Bartlomiej Zolnierkiewicz. - turbostat tool fix from Jean Delvare. - cpupower tool updates from Prarit Bhargava, Ramkumar Ramachandra and Thomas Renninger. - New ACPI ec_access.c tool for poking at the EC in a safe way from Thomas Renninger" * tag 'pm+acpi-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (187 commits) ACPICA: Namespace: Remove _PRP method support. intel_pstate: Improve initial busy calculation intel_pstate: add sample time scaling intel_pstate: Correct rounding in busy calculation intel_pstate: Remove C0 tracking PM / hibernate: fixed typo in comment ACPI: Fix x86 regression related to early mapping size limitation ACPICA: Tables: Add mechanism to control early table checksum verification. ACPI / scan: use platform bus type by default for _HID enumeration ACPI / scan: always register ACPI LPSS scan handler ACPI / scan: always register memory hotplug scan handler ACPI / scan: always register container scan handler ACPI / scan: Change the meaning of missing .attach() in scan handlers ACPI / scan: introduce platform_id device PNP type flag ACPI / scan: drop unsupported serial IDs from PNP ACPI scan handler ID list ACPI / scan: drop IDs that do not comply with the ACPI PNP ID rule ACPI / PNP: use device ID list for PNPACPI device enumeration ACPI / scan: .match() callback for ACPI scan handlers ACPI / battery: wakeup the system only when necessary power_supply: allow power supply devices registered w/o wakeup source ...	2014-06-04 08:57:16 -07:00
Linus Torvalds	b05d59dfce	At over 200 commits, covering almost all supported architectures, this was a pretty active cycle for KVM. Changes include: - a lot of s390 changes: optimizations, support for migration, GDB support and more - ARM changes are pretty small: support for the PSCI 0.2 hypercall interface on both the guest and the host (the latter acked by Catalin) - initial POWER8 and little-endian host support - support for running u-boot on embedded POWER targets - pretty large changes to MIPS too, completing the userspace interface and improving the handling of virtualized timer hardware - for x86, a larger set of changes is scheduled for 3.17. Still, we have a few emulator bugfixes and support for running nested fully-virtualized Xen guests (para-virtualized Xen guests have always worked). And some optimizations too. The only missing architecture here is ia64. It's not a coincidence that support for KVM on ia64 is scheduled for removal in 3.17. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJTjtlBAAoJEBvWZb6bTYbyMOUP/2NAePghE3IjG99ikHFdn+BX BfrURsuR6GD0AhYQnBidBmpFbAmN/LwSJxv/M7sV7OBRWLu3qbt69DrPTU2e/FK1 j9q25peu8jRyHzJ1q9rBroo74nD9lQYuVr3uXNxxcg0DRnw14JHGlM3y8LDEknO8 W+gpWTeAQ+2AuOX98MpRbCRMuzziCSv5bP5FhBVnsWHiZfvMbcUrbeJt+zYSiDAZ 0tHm/5dFKzfj/vVrrnjD4EZcRr688Bs5rztG96hY6aoVJryjZGLtLp92wCWkRRmH CCvZwd245NmNthuKHzcs27/duSWfU0uOlu7AMrD44QYhzeDGyB/2nbCxbGqLLoBA nnOviXH4cC65/CnisZ79zfo979HbZcX+Lzg747EjBgCSxJmLlwgiG8yXtDvk5otB TH6GUeGDiEEPj//JD3XtgSz0sF2NvjREWRyemjDMvhz6JC/bLytXKb3sn+NXSj8m ujzF9eQoa4qKDcBL4IQYGTJ4z5nY3Pd68dHFIPHB7n82OxFLSQUBKxXw8/1fb5og VVb8PL4GOcmakQlAKtTMlFPmuy4bbL2r/2iV5xJiOZKmXIu8Hs1JezBE3SFAltbl 3cAGwSM9/dDkKxUbTFblyOE9bkKbg4WYmq0LkdzsPEomb3IZWntOT25rYnX+LrBz bAknaZpPiOrW11Et1htY =j5Od -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm into next Pull KVM updates from Paolo Bonzini: "At over 200 commits, covering almost all supported architectures, this was a pretty active cycle for KVM. Changes include: - a lot of s390 changes: optimizations, support for migration, GDB support and more - ARM changes are pretty small: support for the PSCI 0.2 hypercall interface on both the guest and the host (the latter acked by Catalin) - initial POWER8 and little-endian host support - support for running u-boot on embedded POWER targets - pretty large changes to MIPS too, completing the userspace interface and improving the handling of virtualized timer hardware - for x86, a larger set of changes is scheduled for 3.17. Still, we have a few emulator bugfixes and support for running nested fully-virtualized Xen guests (para-virtualized Xen guests have always worked). And some optimizations too. The only missing architecture here is ia64. It's not a coincidence that support for KVM on ia64 is scheduled for removal in 3.17" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (203 commits) KVM: add missing cleanup_srcu_struct KVM: PPC: Book3S PR: Rework SLB switching code KVM: PPC: Book3S PR: Use SLB entry 0 KVM: PPC: Book3S HV: Fix machine check delivery to guest KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs KVM: PPC: Book3S HV: Make sure we don't miss dirty pages KVM: PPC: Book3S HV: Fix dirty map for hugepages KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates() KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number KVM: PPC: Book3S: Add ONE_REG register names that were missed KVM: PPC: Add CAP to indicate hcall fixes KVM: PPC: MPIC: Reset IRQ source private members KVM: PPC: Graciously fail broken LE hypercalls PPC: ePAPR: Fix hypercall on LE guest KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler KVM: PPC: BOOK3S: Always use the saved DAR value PPC: KVM: Make NX bit available with magic page KVM: PPC: Disable NX for old magic page using guests KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest ...	2014-06-04 08:47:12 -07:00
David S. Miller	c99f7abf0e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: include/net/inetpeer.h net/ipv6/output_core.c Changes in net were fixing bugs in code removed in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-03 23:32:12 -07:00
Linus Torvalds	3de0ef8d0d	Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull x86/UV changes from Ingo Molnar: "Continued updates for SGI UV 3 hardware support" * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/UV: Fix conditional in gru_exit() x86/UV: Set n_lshift based on GAM_GR_CONFIG MMR for UV3	2014-06-03 15:48:23 -07:00
Linus Torvalds	4aef77b2fe	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull x86 IOSF platform updates from Ingo Molnar: "IOSF (Intel OnChip System Fabric) updates: - generalize the IOSF interface to allow mixed mode drivers: non-IOSF drivers to utilize of IOSF features on IOSF platforms. - add 'Quark X1000' IOSF/MBI support - clean up BayTrail and Quark PCI ID enumeration" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, iosf: Add PCI ID macros for better readability x86, iosf: Add Quark X1000 PCI ID x86, iosf: Added Quark MBI identifiers x86, iosf: Make IOSF driver modular and usable by more drivers	2014-06-03 15:46:38 -07:00
Linus Torvalds	c33c40549e	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull x86 microcode changes from Ingo Molnar: "A microcode-debugging boot flag plus related refactoring" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode: Add a disable chicken bit x86, boot: Carve out early cmdline parsing function	2014-06-03 15:44:50 -07:00
Linus Torvalds	e30c631be6	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull x86 irq cleanup from Ingo Molnar: "A single, trivial cleanup" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Clean up VECTOR_UNDEFINED and VECTOR_RETRIGGERED definition	2014-06-03 15:43:38 -07:00
Rafael J. Wysocki	0e36d43c9c	Merge branch 'acpica' * acpica: (63 commits) ACPICA: Namespace: Remove _PRP method support. ACPI: Fix x86 regression related to early mapping size limitation ACPICA: Tables: Add mechanism to control early table checksum verification. ACPICA: acpidump: Fix repetitive table dump in -n mode. ACPI: Clean up acpi_os_map/unmap_memory() to eliminate __iomem. ACPICA: Clean up redudant definitions already defined elsewhere ACPICA: Linux headers: Add <asm/acenv.h> to remove mis-ordered inclusion of <asm/acpi.h> ACPICA: Linux headers: Add <acpi/platform/aclinuxex.h> ACPICA: Linux headers: Remove ACPI_PREEMPTION_POINT() due to no usages. ACPICA: Update version to 20140424. ACPICA: Comment/format update, no functional change. ACPICA: Events: Update GPE handling and initialization code. ACPICA: Remove extraneous error message for large number of GPEs. ACPICA: Tables: Remove old mechanism to validate if XSDT contains NULL entries. ACPICA: Tables: Add new mechanism to skip NULL entries in RSDT and XSDT. ACPICA: acpidump: Add support to force using RSDT. ACPICA: Back port of improvements on exception code. ACPICA: Back port of _PRP update. ACPICA: acpidump: Fix truncated RSDP signature validation. ACPICA: Linux header: Add support for stubbed externals. ...	2014-06-03 23:12:27 +02:00
Linus Torvalds	c84a1e32ee	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull scheduler updates from Ingo Molnar: "The main scheduling related changes in this cycle were: - various sched/numa updates, for better performance - tree wide cleanup of open coded nice levels - nohz fix related to rq->nr_running use - cpuidle changes and continued consolidation to improve the kernel/sched/idle.c high level idle scheduling logic. As part of this effort I pulled cpuidle driver changes from Rafael as well. - standardized idle polling amongst architectures - continued work on preparing better power/energy aware scheduling - sched/rt updates - misc fixlets and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits) sched/numa: Decay ->wakee_flips instead of zeroing sched/numa: Update migrate_improves/degrades_locality() sched/numa: Allow task switch if load imbalance improves sched/rt: Fix 'struct sched_dl_entity' and dl_task_time() comments, to match the current upstream code sched: Consolidate open coded implementations of nice level frobbing into nice_to_rlimit() and rlimit_to_nice() sched: Initialize rq->age_stamp on processor start sched, nohz: Change rq->nr_running to always use wrappers sched: Fix the rq->next_balance logic in rebalance_domains() and idle_balance() sched: Use clamp() and clamp_val() to make sys_nice() more readable sched: Do not zero sg->cpumask and sg->sgp->power in build_sched_groups() sched/numa: Fix initialization of sched_domain_topology for NUMA sched: Call select_idle_sibling() when not affine_sd sched: Simplify return logic in sched_read_attr() sched: Simplify return logic in sched_copy_attr() sched: Fix exec_start/task_hot on migrated tasks arm64: Remove TIF_POLLING_NRFLAG metag: Remove TIF_POLLING_NRFLAG sched/idle: Make cpuidle_idle_call() void sched/idle: Reflow cpuidle_idle_call() sched/idle: Delay clearing the polling bit ...	2014-06-03 14:00:15 -07:00
Linus Torvalds	3d521f9151	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull perf updates from Ingo Molnar: "The tooling changes maintained by Jiri Olsa until Arnaldo is on vacation: User visible changes: - Add -F option for specifying output fields (Namhyung Kim) - Propagate exit status of a command line workload for record command (Namhyung Kim) - Use tid for finding thread (Namhyung Kim) - Clarify the output of perf sched map plus small sched command fixes (Dongsheng Yang) - Wire up perf_regs and unwind support for ARM64 (Jean Pihet) - Factor hists statistics counts processing which in turn also fixes several bugs in TUI report command (Namhyung Kim) - Add --percentage option to control absolute/relative percentage output (Namhyung Kim) - Add --list-cmds to 'kmem', 'mem', 'lock' and 'sched', for use by completion scripts (Ramkumar Ramachandra) Development/infrastructure changes and fixes: - Android related fixes for pager and map dso resolving (Michael Lentine) - Add libdw DWARF post unwind support for ARM (Jean Pihet) - Consolidate types.h for ARM and ARM64 (Jean Pihet) - Fix possible null pointer dereference in session.c (Masanari Iida) - Cleanup, remove unused variables in map_switch_event() (Dongsheng Yang) - Remove nr_state_machine_bugs in perf latency (Dongsheng Yang) - Remove usage of trace_sched_wakeup(.success) (Peter Zijlstra) - Cleanups for perf.h header (Jiri Olsa) - Consolidate types.h and export.h within tools (Borislav Petkov) - Move u64_swap union to its single user's header, evsel.h (Borislav Petkov) - Fix for s390 to properly parse tracepoints plus test code (Alexander Yarygin) - Handle EINTR error for readn/writen (Namhyung Kim) - Add a test case for hists filtering (Namhyung Kim) - Share map_groups among threads of the same group (Arnaldo Carvalho de Melo, Jiri Olsa) - Making some code (cpu node map and report parse callchain callback) global to be usable by upcomming changes (Don Zickus) - Fix pmu object compilation error (Jiri Olsa) Kernel side changes: - intrusive uprobes fixes from Oleg Nesterov. Since the interface is admin-only, and the bug only affects user-space ("any probed jmp/call can kill the application"), we queued these fixes via the development tree, as a special exception. - more fuzzer motivated race fixes and related refactoring and robustization. - allow PMU drivers to be built as modules. (No actual module yet, because the x86 Intel uncore module wasn't ready in time for this)" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits) perf tools: Add automatic remapping of Android libraries perf tools: Add cat as fallback pager perf tests: Add a testcase for histogram output sorting perf tests: Factor out print_hists_*() perf tools: Introduce reset_output_field() perf tools: Get rid of obsolete hist_entry__sort_list perf hists: Reset width of output fields with header length perf tools: Skip elided sort entries perf top: Add --fields option to specify output fields perf report/tui: Fix a bug when --fields/sort is given perf tools: Add ->sort() member to struct sort_entry perf report: Add -F option to specify output fields perf tools: Call perf_hpp__init() before setting up GUI browsers perf tools: Consolidate management of default sort orders perf tools: Allow hpp fields to be sort keys perf ui: Get rid of callback from __hpp__fmt() perf tools: Consolidate output field handling to hpp format routines perf tools: Use hpp formats to sort final output perf tools: Support event grouping in hpp ->sort() perf tools: Use hpp formats to sort hist entries ...	2014-06-03 13:18:00 -07:00
Linus Torvalds	776edb5931	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull core locking updates from Ingo Molnar: "The main changes in this cycle were: - reduced/streamlined smp_mb__() interface that allows more usecases and makes the existing ones less buggy, especially in rarer architectures - add rwsem implementation comments - bump up lockdep limits" 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) rwsem: Add comments to explain the meaning of the rwsem's count field lockdep: Increase static allocations arch: Mass conversion of smp_mb__() arch,doc: Convert smp_mb__() arch,xtensa: Convert smp_mb__() arch,x86: Convert smp_mb__() arch,tile: Convert smp_mb__() arch,sparc: Convert smp_mb__() arch,sh: Convert smp_mb__() arch,score: Convert smp_mb__() arch,s390: Convert smp_mb__() arch,powerpc: Convert smp_mb__() arch,parisc: Convert smp_mb__() arch,openrisc: Convert smp_mb__() arch,mn10300: Convert smp_mb__() arch,mips: Convert smp_mb__() arch,metag: Convert smp_mb__() arch,m68k: Convert smp_mb__() arch,m32r: Convert smp_mb__() arch,ia64: Convert smp_mb__() ...	2014-06-03 12:57:53 -07:00
Linus Torvalds	425553209b	PCI changes for the v3.16 merge window: Enumeration - Notify driver before and after device reset (Keith Busch) - Use reset notification in NVMe (Keith Busch) NUMA - Warn if we have to guess host bridge node information (Myron Stowe) - Work around AMD Fam15h BIOSes that fail to provide _PXM (Suravee Suthikulpanit) - Clean up and mark early_root_info_init() as deprecated (Suravee Suthikulpanit) Driver binding - Add "driver_override" for force specific binding (Alex Williamson) - Fail "new_id" addition for devices we already know about (Bandan Das) Resource management - Support BAR sizes up to 8GB (Nikhil Rao, Alan Cox) - Don't move IORESOURCE_PCI_FIXED resources (Bjorn Helgaas) - Mark SBx00 HPET BAR as IORESOURCE_PCI_FIXED (Bjorn Helgaas) - Fail safely if we can't handle BARs larger than 4GB (Bjorn Helgaas) - Reject BAR above 4GB if dma_addr_t is too small (Bjorn Helgaas) - Don't convert BAR address to resource if dma_addr_t is too small (Bjorn Helgaas) - Don't set BAR to zero if dma_addr_t is too small (Bjorn Helgaas) - Don't print anything while decoding is disabled (Bjorn Helgaas) - Don't add disabled subtractive decode bus resources (Bjorn Helgaas) - Add resource allocation comments (Bjorn Helgaas) - Restrict 64-bit prefetchable bridge windows to 64-bit resources (Yinghai Lu) - Assign i82875p_edac PCI resources before adding device (Yinghai Lu) PCI device hotplug - Remove unnecessary "dev->bus" test (Bjorn Helgaas) - Use PCI_EXP_SLTCAP_PSN define (Bjorn Helgaas) - Fix rphahp endianess issues (Laurent Dufour) - Acknowledge spurious "cmd completed" event (Rajat Jain) - Allow hotplug service drivers to operate in polling mode (Rajat Jain) - Fix cpqphp possible NULL dereference (Rickard Strandqvist) MSI - Replace pci_enable_msi_block() by pci_enable_msi_exact() (Alexander Gordeev) - Replace pci_enable_msix() by pci_enable_msix_exact() (Alexander Gordeev) - Simplify populate_msi_sysfs() (Jan Beulich) Virtualization - Add Intel Patsburg (X79) root port ACS quirk (Alex Williamson) - Mark RTL8110SC INTx masking as broken (Alex Williamson) Generic host bridge driver - Add generic PCI host controller driver (Will Deacon) Freescale i.MX6 - Use new clock names (Lucas Stach) - Drop old IRQ mapping (Lucas Stach) - Remove optional (and unused) IRQs (Lucas Stach) - Add support for MSI (Lucas Stach) - Fix imx6_add_pcie_port() section mismatch warning (Sachin Kamat) Renesas R-Car - Add gen2 device tree support (Ben Dooks) - Use new OF interrupt mapping when possible (Lucas Stach) - Add PCIe driver (Phil Edworthy) - Add PCIe MSI support (Phil Edworthy) - Add PCIe device tree bindings (Phil Edworthy) Samsung Exynos - Remove unnecessary OOM messages (Jingoo Han) - Fix add_pcie_port() section mismatch warning (Sachin Kamat) Synopsys DesignWare - Make MSI ISR shared IRQ aware (Lucas Stach) Miscellaneous - Check for broken config space aliasing (Alex Williamson) - Update email address (Ben Hutchings) - Fix Broadcom CNB20LE unintended sign extension (Bjorn Helgaas) - Fix incorrect vgaarb conditional in WARN_ON() (Bjorn Helgaas) - Remove unnecessary __ref annotations (Bjorn Helgaas) - Add arch/x86/kernel/quirks.c to MAINTAINERS PCI file patterns (Bjorn Helgaas) - Fix use of uninitialized MPS value (Bjorn Helgaas) - Tidy x86/gart messages (Bjorn Helgaas) - Fix return value from pci_user_{read,write}_config_() (Gavin Shan) - Turn pcibios_penalize_isa_irq() into a weak function (Hanjun Guo) - Remove unused serial device IDs (Jean Delvare) - Use designated initialization in PCI_VDEVICE (Mark Rustad) - Fix powerpc NULL dereference in pci_root_buses traversal (Mike Qiu) - Configure MPS on ARM (Murali Karicheri) - Remove unnecessary includes of <linux/init.h> (Paul Gortmaker) - Move Open Firmware devspec attribute to PCI common code (Sebastian Ott) - Use pdev->dev.groups for attribute creation on s390 (Sebastian Ott) - Remove pcibios_add_platform_entries() (Sebastian Ott) - Add new ID for Intel GPU "spurious interrupt" quirk (Thomas Jarosch) - Rename pci_is_bridge() to pci_has_subordinate() (Yijing Wang) - Add and use new pci_is_bridge() interface (Yijing Wang) - Make pci_bus_add_device() void (Yijing Wang) DMA API - Clarify physical/bus address distinction in docs (Bjorn Helgaas) - Fix typos in docs (Emilio López) - Update dma_pool_create ()and dma_pool_alloc() descriptions (Gioh Kim) - Change dma_declare_coherent_memory() CPU address to phys_addr_t (Bjorn Helgaas) - Pass GAPSPCI_DMA_BASE CPU & bus address to dma_declare_coherent_memory() (Bjorn Helgaas) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJTjMaQAAoJEFmIoMA60/r8XncQAKX7cD6btXCZnrcYo7inseyp 3rwOlrsNkWyHqSj/RqqzE1NY6L1h5G2uliI6xg1SKenuHPDcosm5d8FYO0ORKiUs xrqBkmZJHXN7fck//tJwsTXiYh5u42RO8QWbvZVr5UqXe40LyaMHMh9Y7VarrU/o sM2ADzFKagv1qMQ13nmYxqT+Zl+CqpimyLP+ep6Nfqxi6ils+KJ6b9SKYqrpqE6t Mcq2K5ShqU5SaYub1JIXLcQ9XylID+t1M9+cwixcs7a87HbJiktfkGqQvQJoUIuK Q5U+abcIGk4vfOnDCctSnoRyrcbTAZ/vqfo0vpX22TokESjwrD8hFOX5HPOFtD+4 wIDbYurW/8oBhLRaJ0uTPzSH8bXjXTynAwxHZgIrEur5908eECKQ/WiFCxyrovvv r4ThAN0FaobllEr0XOFESOzDNSt/ME00WWI7+puAJ/KJkFEtcXt9othLmLmvLz8H 2GWXrm/aOR0WUO7foGUxI3bXYlDN6NbSKpfuZsLAi2VAyJJ6L6yVSo/fT0X07e3z qRy9LOohuiwIKv/I4F2SEq2REfGGsnkrJBoeQi/oBZDcBy1Lsi7P9LWIERhLQEM+ Hm+30lC/f326nI3hoyThj2k2xxZOQzCIvrt658xP4qd9Zfe1bvCH58FF8K62CoOd p8XAf7Sl6v6YUodUrT/t =km55 -----END PGP SIGNATURE----- Merge tag 'pci-v3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into next Pull PCI changes from Bjorn Helgaas: "Enumeration - Notify driver before and after device reset (Keith Busch) - Use reset notification in NVMe (Keith Busch) NUMA - Warn if we have to guess host bridge node information (Myron Stowe) - Work around AMD Fam15h BIOSes that fail to provide _PXM (Suravee Suthikulpanit) - Clean up and mark early_root_info_init() as deprecated (Suravee Suthikulpanit) Driver binding - Add "driver_override" for force specific binding (Alex Williamson) - Fail "new_id" addition for devices we already know about (Bandan Das) Resource management - Support BAR sizes up to 8GB (Nikhil Rao, Alan Cox) - Don't move IORESOURCE_PCI_FIXED resources (Bjorn Helgaas) - Mark SBx00 HPET BAR as IORESOURCE_PCI_FIXED (Bjorn Helgaas) - Fail safely if we can't handle BARs larger than 4GB (Bjorn Helgaas) - Reject BAR above 4GB if dma_addr_t is too small (Bjorn Helgaas) - Don't convert BAR address to resource if dma_addr_t is too small (Bjorn Helgaas) - Don't set BAR to zero if dma_addr_t is too small (Bjorn Helgaas) - Don't print anything while decoding is disabled (Bjorn Helgaas) - Don't add disabled subtractive decode bus resources (Bjorn Helgaas) - Add resource allocation comments (Bjorn Helgaas) - Restrict 64-bit prefetchable bridge windows to 64-bit resources (Yinghai Lu) - Assign i82875p_edac PCI resources before adding device (Yinghai Lu) PCI device hotplug - Remove unnecessary "dev->bus" test (Bjorn Helgaas) - Use PCI_EXP_SLTCAP_PSN define (Bjorn Helgaas) - Fix rphahp endianess issues (Laurent Dufour) - Acknowledge spurious "cmd completed" event (Rajat Jain) - Allow hotplug service drivers to operate in polling mode (Rajat Jain) - Fix cpqphp possible NULL dereference (Rickard Strandqvist) MSI - Replace pci_enable_msi_block() by pci_enable_msi_exact() (Alexander Gordeev) - Replace pci_enable_msix() by pci_enable_msix_exact() (Alexander Gordeev) - Simplify populate_msi_sysfs() (Jan Beulich) Virtualization - Add Intel Patsburg (X79) root port ACS quirk (Alex Williamson) - Mark RTL8110SC INTx masking as broken (Alex Williamson) Generic host bridge driver - Add generic PCI host controller driver (Will Deacon) Freescale i.MX6 - Use new clock names (Lucas Stach) - Drop old IRQ mapping (Lucas Stach) - Remove optional (and unused) IRQs (Lucas Stach) - Add support for MSI (Lucas Stach) - Fix imx6_add_pcie_port() section mismatch warning (Sachin Kamat) Renesas R-Car - Add gen2 device tree support (Ben Dooks) - Use new OF interrupt mapping when possible (Lucas Stach) - Add PCIe driver (Phil Edworthy) - Add PCIe MSI support (Phil Edworthy) - Add PCIe device tree bindings (Phil Edworthy) Samsung Exynos - Remove unnecessary OOM messages (Jingoo Han) - Fix add_pcie_port() section mismatch warning (Sachin Kamat) Synopsys DesignWare - Make MSI ISR shared IRQ aware (Lucas Stach) Miscellaneous - Check for broken config space aliasing (Alex Williamson) - Update email address (Ben Hutchings) - Fix Broadcom CNB20LE unintended sign extension (Bjorn Helgaas) - Fix incorrect vgaarb conditional in WARN_ON() (Bjorn Helgaas) - Remove unnecessary __ref annotations (Bjorn Helgaas) - Add arch/x86/kernel/quirks.c to MAINTAINERS PCI file patterns (Bjorn Helgaas) - Fix use of uninitialized MPS value (Bjorn Helgaas) - Tidy x86/gart messages (Bjorn Helgaas) - Fix return value from pci_user_{read,write}_config_() (Gavin Shan) - Turn pcibios_penalize_isa_irq() into a weak function (Hanjun Guo) - Remove unused serial device IDs (Jean Delvare) - Use designated initialization in PCI_VDEVICE (Mark Rustad) - Fix powerpc NULL dereference in pci_root_buses traversal (Mike Qiu) - Configure MPS on ARM (Murali Karicheri) - Remove unnecessary includes of <linux/init.h> (Paul Gortmaker) - Move Open Firmware devspec attribute to PCI common code (Sebastian Ott) - Use pdev->dev.groups for attribute creation on s390 (Sebastian Ott) - Remove pcibios_add_platform_entries() (Sebastian Ott) - Add new ID for Intel GPU "spurious interrupt" quirk (Thomas Jarosch) - Rename pci_is_bridge() to pci_has_subordinate() (Yijing Wang) - Add and use new pci_is_bridge() interface (Yijing Wang) - Make pci_bus_add_device() void (Yijing Wang) DMA API - Clarify physical/bus address distinction in docs (Bjorn Helgaas) - Fix typos in docs (Emilio López) - Update dma_pool_create ()and dma_pool_alloc() descriptions (Gioh Kim) - Change dma_declare_coherent_memory() CPU address to phys_addr_t (Bjorn Helgaas) - Pass GAPSPCI_DMA_BASE CPU & bus address to dma_declare_coherent_memory() (Bjorn Helgaas)" * tag 'pci-v3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (92 commits) MAINTAINERS: Add generic PCI host controller driver PCI: generic: Add generic PCI host controller driver PCI: imx6: Add support for MSI PCI: designware: Make MSI ISR shared IRQ aware PCI: imx6: Remove optional (and unused) IRQs PCI: imx6: Drop old IRQ mapping PCI: imx6: Use new clock names i82875p_edac: Assign PCI resources before adding device ARM/PCI: Call pcie_bus_configure_settings() to set MPS PCI: imx6: Fix imx6_add_pcie_port() section mismatch warning PCI: Make pci_bus_add_device() void PCI: exynos: Fix add_pcie_port() section mismatch warning PCI: Introduce new device binding path using pci_dev.driver_override PCI: rcar: Add gen2 device tree support PCI: cpqphp: Fix possible null pointer dereference PCI: rcar: Add R-Car PCIe device tree bindings PCI: rcar: Add MSI support for PCIe PCI: rcar: Add Renesas R-Car PCIe driver PCI: Fix return value from pci_user_{read,write}_config_*() PCI: exynos: Remove unnecessary OOM messages ...	2014-06-02 12:15:19 -07:00
Linus Torvalds	9f888b3a10	xen: features and fixes for 3.16-rc0 - Support foreign mappings in PVH domains (needed when dom0 is PVH) - Fix mapping high MMIO regions in x86 PV guests (this is also the first half of removing the PAGE_IOMAP PTE flag). - ARM suspend/resume support. - ARM multicall support. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJTjE5MAAoJEFxbo/MsZsTRtl8H/2lfS9w05e60vRxjolPV0vRc 5k9DcYFeJ+k2cz/2T3mNlIvKdfBTesSfgVquH+28GhQz+uKFQ1OrJpYNDTougSw5 Wv0Ae8e+7eLABvJ9XMiZdDsPzsICw2wqWOvqrnQi2qR3SIimBc5tBigR4+Rccv+e btuBLlYT4WPQ8qgNyCBPgxzuyxteu5wK/0XryX6NcbrxeEbAzQAeDKkmvCD4fSvx KxrwTO3mwV4Lefmf/WS4Z9fDcPujQOUqKEtUWanw/2JalO1BzDPo+1wvYs0LduLC QI/YJN4SL3UeGOmbX2tyIaRgMsAcQVVrYkTm1cp8eD7vcRuvXaqy6dxuX05+V4g= =cxfG -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.16-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip into next Pull Xen updates from David Vrabel: "xen: features and fixes for 3.16-rc0 - support foreign mappings in PVH domains (needed when dom0 is PVH) - fix mapping high MMIO regions in x86 PV guests (this is also the first half of removing the PAGE_IOMAP PTE flag). - ARM suspend/resume support. - ARM multicall support" * tag 'stable/for-linus-3.16-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: map foreign pfns for autotranslated guests xen-acpi-processor: Don't display errors when we get -ENOSYS xen/pciback: Document the entry points for 'pcistub_put_pci_dev' xen/pciback: Document when the 'unbind' and 'bind' functions are called. xen-pciback: Document when we FLR an PCI device. xen-pciback: First reset, then free. xen-pciback: Cleanup up pcistub_put_pci_dev x86/xen: do not use _PAGE_IOMAP in xen_remap_domain_mfn_range() x86/xen: set regions above the end of RAM as 1:1 x86/xen: only warn once if bad MFNs are found during setup x86/xen: compactly store large identity ranges in the p2m x86/xen: fix set_phys_range_identity() if pfn_e > MAX_P2M_PFN x86/xen: rename early_p2m_alloc() and early_p2m_alloc_middle() xen/x86: set panic notifier priority to minimum arm,arm64/xen: introduce HYPERVISOR_suspend() xen: refactor suspend pre/post hooks arm: xen: export HYPERVISOR_multicall to modules. arm64: introduce virt_to_pfn arm/xen: Remove definiition of virt_to_pfn in asm/xen/page.h arm: xen: implement multicall hypercall support.	2014-06-02 08:24:12 -07:00
Minchan Kim	6538b8ea88	x86_64: expand kernel stack to 16K While I play inhouse patches with much memory pressure on qemu-kvm, 3.14 kernel was randomly crashed. The reason was kernel stack overflow. When I investigated the problem, the callstack was a little bit deeper by involve with reclaim functions but not direct reclaim path. I tried to diet stack size of some functions related with alloc/reclaim so did a hundred of byte but overflow was't disappeard so that I encounter overflow by another deeper callstack on reclaim/allocator path. Of course, we might sweep every sites we have found for reducing stack usage but I'm not sure how long it saves the world(surely, lots of developer start to add nice features which will use stack agains) and if we consider another more complex feature in I/O layer and/or reclaim path, it might be better to increase stack size( meanwhile, stack usage on 64bit machine was doubled compared to 32bit while it have sticked to 8K. Hmm, it's not a fair to me and arm64 already expaned to 16K. ) So, my stupid idea is just let's expand stack size and keep an eye toward stack consumption on each kernel functions via stacktrace of ftrace. For example, we can have a bar like that each funcion shouldn't exceed 200K and emit the warning when some function consumes more in runtime. Of course, it could make false positive but at least, it could make a chance to think over it. I guess this topic was discussed several time so there might be strong reason not to increase kernel stack size on x86_64, for me not knowing so Ccing x86_64 maintainers, other MM guys and virtio maintainers. Here's an example call trace using up the kernel stack: Depth Size Location (51 entries) ----- ---- -------- 0) 7696 16 lookup_address 1) 7680 16 _lookup_address_cpa.isra.3 2) 7664 24 __change_page_attr_set_clr 3) 7640 392 kernel_map_pages 4) 7248 256 get_page_from_freelist 5) 6992 352 __alloc_pages_nodemask 6) 6640 8 alloc_pages_current 7) 6632 168 new_slab 8) 6464 8 __slab_alloc 9) 6456 80 __kmalloc 10) 6376 376 vring_add_indirect 11) 6000 144 virtqueue_add_sgs 12) 5856 288 __virtblk_add_req 13) 5568 96 virtio_queue_rq 14) 5472 128 __blk_mq_run_hw_queue 15) 5344 16 blk_mq_run_hw_queue 16) 5328 96 blk_mq_insert_requests 17) 5232 112 blk_mq_flush_plug_list 18) 5120 112 blk_flush_plug_list 19) 5008 64 io_schedule_timeout 20) 4944 128 mempool_alloc 21) 4816 96 bio_alloc_bioset 22) 4720 48 get_swap_bio 23) 4672 160 __swap_writepage 24) 4512 32 swap_writepage 25) 4480 320 shrink_page_list 26) 4160 208 shrink_inactive_list 27) 3952 304 shrink_lruvec 28) 3648 80 shrink_zone 29) 3568 128 do_try_to_free_pages 30) 3440 208 try_to_free_pages 31) 3232 352 __alloc_pages_nodemask 32) 2880 8 alloc_pages_current 33) 2872 200 __page_cache_alloc 34) 2672 80 find_or_create_page 35) 2592 80 ext4_mb_load_buddy 36) 2512 176 ext4_mb_regular_allocator 37) 2336 128 ext4_mb_new_blocks 38) 2208 256 ext4_ext_map_blocks 39) 1952 160 ext4_map_blocks 40) 1792 384 ext4_writepages 41) 1408 16 do_writepages 42) 1392 96 __writeback_single_inode 43) 1296 176 writeback_sb_inodes 44) 1120 80 __writeback_inodes_wb 45) 1040 160 wb_writeback 46) 880 208 bdi_writeback_workfn 47) 672 144 process_one_work 48) 528 112 worker_thread 49) 416 240 kthread 50) 176 176 ret_from_fork [ Note: the problem is exacerbated by certain gcc versions that seem to generate much bigger stack frames due to apparently bad coalescing of temporaries and generating too many spills. Rusty saw gcc-4.6.4 using 35% more stack on the virtio path than 4.8.2 does, for example. Minchan not only uses such a bad gcc version (4.6.3 in his case), but some of the stack use is due to debugging (CONFIG_DEBUG_PAGEALLOC is what causes that kernel_map_pages() frame, for example). But we're clearly getting too close. The VM code also seems to have excessive stack frames partly for the same compiler reason, triggered by excessive inlining and lots of function arguments. We need to improve on our stack use, but in the meantime let's do this simple stack increase too. Unlike most earlier reports, there is nothing simple that stands out as being really horribly wrong here, apart from the fact that the stack frames are just bigger than they should need to be. - Linus ] Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Jones <davej@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S Tsirkin <mst@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: PJ Waskiewicz <pjwaskiewicz@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-05-30 11:52:51 -07:00
H. Peter Anvin	c9e5a5a703	x86/xsave: Make it clear that the XSAVE macros use (%edi)/(%rdi) The XSAVE instruction family takes a memory argment. The macros use (%edi)/(%rdi) as that memory argument - make that clear to the reader. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-7-git-send-email-fenghua.yu@intel.com	2014-05-30 08:19:21 -07:00
Fenghua Yu	7496d6458f	Define kernel API to get address of each state in xsave area In standard form, each state is saved in the xsave area in fixed offset. But in compacted form, offset of each saved state only can be calculated during run time because some xstates may not be enabled and saved. We define kernel API get_xsave_addr() returns address of a given state saved in a xsave area. It can be called in kernel to get address of each xstate in xsave area in either standard format or compacted format. It's useful when kernel wants to directly access each state in xsave area. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-17-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-29 14:33:09 -07:00
Fenghua Yu	f41d830fa8	x86/xsaves: Save xstate to task's xsave area in __save_fpu during booting time __save_fpu() can be called during early booting time when cpu caps are not enabled and alternative can not be used yet. Therefore, it calls xsave_state_booting() during booting time to save xstate to task's xsave area. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-14-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-29 14:33:04 -07:00
Fenghua Yu	adb9d526e9	x86/xsaves: Add xsaves and xrstors support for booting time Since boot_cpu_data and cpu capabilities are not enabled yet during early booting time, alternative can not be used in some functions to access xsave area. Therefore, we define two new functions xrstor_state_booting() and xsave_state_booting() to access xsave area just during early booting time. xrstor_state_booting restores xstate from xsave area during early booting time. xsave_state_booting saves xstate to xsave area during early booting time. The two functions are similar to xrstor_state and xsave_state respectively. But the two functions don't use alternatives because alternatives are not enabled when they are called in such early booting time. xrstor_state_booting is called only by functions defined as __init. So it's defined as __init and will be removed from memory after booting time. There is no extra memory cost caused by this function during running time. But because xsave_state_booting can be called by run-time function __save_fpu(), it's not defined as __init and will stay in memory during running time although it will not be called anymore during running time. It is not ideal to have this function stay in memory during running time. But it's a pretty small function and the memory cost will be small. By doing in this way, we can avoid to change a lot of code to just remove this small function and save a bit memory for running time. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-13-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-29 14:33:02 -07:00
Fenghua Yu	facbf4d91a	x86/xsaves: Use xsave/xrstor for saving and restoring user space context We use legacy xsave/xrstor to save and restore standard form of xsave area in user space context. No xsaveopt or xsaves is used here for two reasons. First, we don't want to use modified optimization which is implemented in xsaveopt and xsaves because xrstor/xrstors might track a wrong user space application. Secondly, we don't use compacted format of xsave area for backward compatibility because legacy user space applications only don't understand the compacted format of the xsave area. Using standard form of the xsave area may allocate more memory for user context than compacted form, but preserves compatibility with legacy applications. Furthermore, even with holes, the relevant cache lines don't get touched and thus the performance impact is limited. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-11-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-29 14:32:57 -07:00
Fenghua Yu	f9de314b34	x86/xsaves: Use xsaves/xrstors for context switch If xsaves is eanbled, use xsaves/xrstors for context switch to support compacted format xsave area to occupy less memory and modified optimization to improve saving performance. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-10-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-29 14:31:25 -07:00
Fenghua Yu	f31a9f7c71	x86/xsaves: Use xsaves/xrstors to save and restore xsave area If xsaves is eanbled, use xsaves/xrstors instrucitons to save and restore xstate. xsaves and xrstors support compacted format, init optimization, modified optimization, and supervisor states. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-9-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-29 14:31:21 -07:00
Fenghua Yu	b84e70552e	x86/xsaves: Define a macro for handling xsave/xrstor instruction fault Define a macro to handle fault generated by xsave, xsaveopt, xsaves, xrstor, and xrstors instructions. It is used in functions like xsave_state() etc. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-8-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-29 14:31:18 -07:00
Fenghua Yu	200b08a970	x86/xsaves: Define macros for xsave instructions Define macros for xsave, xsaveopt, xsaves, xrstor, and xrstors inline instructions. The instructions will be used for saving and restoring xstate. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-7-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-29 14:31:16 -07:00
Fenghua Yu	0b29643a58	x86/xsaves: Change compacted format xsave area header The XSAVE area header is changed to support both compacted format and standard format of xsave area. The XSAVE header of an xsave area comprises the 64 bytes starting at offset 512 from the area base address: - Bytes 7:0 of the xsave header is a state-component bitmap called xstate_bv. It identifies the state components in the xsave area. - Bytes 15:8 of the xsave header is a state-component bitmap called xcomp_bv. It is used as follows: - xcomp_bv[63] indicates the format of the extended region of the xsave area. If it is clear, the standard format is used. If it is set, the compacted format is used. - xcomp_bv[62:0] indicate which features (starting at feature 2) have space allocated for them in the compacted format. - Bytes 63:16 of the xsave header are reserved. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-6-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-29 14:31:10 -07:00
Fenghua Yu	5b3e83f46a	x86/alternative: Add alternative_input_2 to support alternative with two features and input alternative_input_2() replaces old instruction with new instructions with input based on two features. In alternative_input_2(oldinstr, newinstr1, feature1, newinstr2, feature2, input...), feature2 has higher priority to replace oldinstr than feature1. If CPU has feature2, newinstr2 replaces oldinstr and newinstr2 is executed during run time. If CPU doesn't have feature2, but it has feature1, newinstr1 replaces oldinstr and newinstr1 is executed during run time. If CPU doesn't have feature2 and feature1, oldinstr is executed during run time. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-5-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-29 14:24:53 -07:00
Fenghua Yu	6229ad278c	x86/xsaves: Detect xsaves/xrstors feature Detect the xsaveopt, xsavec, xgetbv, and xsaves features in processor extended state enumberation sub-leaf (eax=0x0d, ecx=1): Bit 00: XSAVEOPT is available Bit 01: Supports XSAVEC and the compacted form of XRSTOR if set Bit 02: Supports XGETBV with ECX = 1 if set Bit 03: Supports XSAVES/XRSTORS and IA32_XSS if set The above features are defined in the new word 10 in cpu features. The IA32_XSS MSR (index DA0H) contains a state-component bitmap that specifies the state components that software has enabled xsaves and xrstors to manage. If the bit corresponding to a state component is clear in XCR0 \| IA32_XSS, xsaves and xrstors will not operate on that state component, regardless of the value of the instruction mask. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-3-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-29 14:24:28 -07:00
Fenghua Yu	446fd806f5	x86/cpufeature.h: Reformat x86 feature macros In each X86 feature macro definition, add one space in front of the word number which is a one-digit number currently. The purpose of reformatting the macros is to align one-digit and two-digit word numbers. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-2-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-29 12:37:10 -07:00
Hanjun Guo	a43ae58c84	PCI: Turn pcibios_penalize_isa_irq() into a weak function pcibios_penalize_isa_irq() is only implemented by x86 now, and legacy ISA is not used by some architectures. Make pcibios_penalize_isa_irq() a __weak function to simplify the code. This removes the need for new platforms to add stub implementations of pcibios_penalize_isa_irq(). [bhelgaas: changelog, comments] Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Arnd Bergmann <arnd@arndb.de>	2014-05-27 16:23:58 -06:00
Lv Zheng	92985ef1db	ACPICA: Clean up redudant definitions already defined elsewhere Since mis-order issues have been solved, we can cleanup redundant definitions that already have defaults in <acpi/platform/acenv.h>. This patch removes redudant environments for __KERNEL__ surrounded code. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-05-27 18:13:08 +02:00
Lv Zheng	07d8391433	ACPICA: Linux headers: Add <asm/acenv.h> to remove mis-ordered inclusion of <asm/acpi.h> There is a mis-order inclusion for <asm/acpi.h>. As we will enforce including <linux/acpi.h> for all Linux ACPI users, we can find the inclusion order is as follows: <linux/acpi.h> <acpi/acpi.h> <acpi/platform/acenv.h> (acenv.h before including aclinux.h) <acpi/platform/aclinux.h> ........................................................................... (aclinux.h before including asm/acpi.h) <asm/acpi.h> @Redundant@ (ACPICA specific stuff) ........................................................................... ........................................................................... (Linux ACPI specific stuff) ? - - - - - - - - - - - - + (aclinux.h after including asm/acpi.h) @Invisible@ \| (acenv.h after including aclinux.h) @Invisible@ \| other ACPICA headers @Invisible@ \| ............................................................\|.............. <acpi/acpi_bus.h> \| <acpi/acpi_drivers.h> \| <asm/acpi.h> (Excluded) \| (Linux ACPI specific stuff) ! <- - - - - - - - - - - - - + NOTE that, in ACPICA, <acpi/platform/acenv.h> is more like Kconfig generated <generated/autoconf.h> for Linux, it is meant to be included before including any ACPICA code. In the above figure, there is a question mark for "Linux ACPI specific stuff" in <asm/acpi.h> which should be included after including all other ACPICA header files. Thus they really need to be moved to the position marked with exclaimation mark or the definitions in the blocks marked with "@Invisible@" will be invisible to such architecture specific "Linux ACPI specific stuff" header blocks. This leaves 2 issues: 1. All environmental definitions in these blocks should have a copy in the area marked with "@Redundant@" if they are required by the "Linux ACPI specific stuff". 2. We cannot use any ACPICA defined types in <asm/acpi.h>. This patch splits architecture specific ACPICA stuff from <asm/acpi.h> to fix this issue. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-05-27 18:13:07 +02:00
David S. Miller	54e5c4def0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/bonding/bond_alb.c drivers/net/ethernet/altera/altera_msgdma.c drivers/net/ethernet/altera/altera_sgdma.c net/ipv6/xfrm6_output.c Several cases of overlapping changes. The xfrm6_output.c has a bug fix which overlaps the renaming of skb->local_df to skb->ignore_df. In the Altera TSE driver cases, the register access cleanups in net-next overlapped with bug fixes done in net. Similarly a bug fix to send ALB packets in the bonding driver using the right source address overlaps with cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-24 00:32:30 -04:00
Dave Hansen	65a7f03f6b	x86: fix page fault tracing when KVM guest support enabled I noticed on some of my systems that page fault tracing doesn't work: cd /sys/kernel/debug/tracing echo 1 > events/exceptions/enable cat trace; # nothing shows up I eventually traced it down to CONFIG_KVM_GUEST. At least in a KVM VM, enabling that option breaks page fault tracing, and disabling fixes it. I tried on some old kernels and this does not appear to be a regression: it never worked. There are two page-fault entry functions today. One when tracing is on and another when it is off. The KVM code calls do_page_fault() directly instead of calling the traced version: > dotraplinkage void __kprobes > do_async_page_fault(struct pt_regs *regs, unsigned long > error_code) > { > enum ctx_state prev_state; > > switch (kvm_read_and_reset_pf_reason()) { > default: > do_page_fault(regs, error_code); > break; > case KVM_PV_REASON_PAGE_NOT_PRESENT: I'm also having problems with the page fault tracing on bare metal (same symptom of no trace output). I'm unsure if it's related. Steven had an alternative to this which has zero overhead when tracing is off where this includes the standard noops even when tracing is disabled. I'm unconvinced that the extra complexity of his apporach: http://lkml.kernel.org/r/20140508194508.561ed220@gandalf.local.home is worth it, expecially considering that the KVM code is already making page fault entry slower here. This solution is dirt-simple. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Gleb Natapov <gleb@redhat.com> Cc: kvm@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-22 17:47:17 +02:00
Paolo Bonzini	ae9fedc793	KVM: x86: get CPL from SS.DPL CS.RPL is not equal to the CPL in the few instructions between setting CR0.PE and reloading CS. And CS.DPL is also not equal to the CPL for conforming code segments. However, SS.DPL is always equal to the CPL except for the weird case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the value in the STAR MSR, but force CPL=3 (Intel instead forces SS.DPL=SS.RPL=CPL=3). So this patch: - modifies SVM to update the CPL from SS.DPL rather than CS.RPL; the above case with SYSRET is not broken further, and the way to fix it would be to pass the CPL to userspace and back - modifies VMX to always return the CPL from SS.DPL (except forcing it to 0 if we are emulating real mode via vm86 mode; in vm86 mode all DPLs have to be 3, but real mode does allow privileged instructions). It also removes the CPL cache, which becomes a duplicate of the SS access rights cache. This fixes doing KVM_IOCTL_SET_SREGS exactly after setting CR0.PE=1 but before CS has been reloaded. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-22 17:47:17 +02:00
Paolo Bonzini	fb5e336b97	KVM: x86: drop set_rflags callback Not needed anymore now that the CPL is computed directly during task switch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-22 17:47:16 +02:00
Ingo Molnar	65c2ce7004	Linux 3.15-rc6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJTfR2zAAoJEHm+PkMAQRiG3noH/2s+KUge3qO2M+AmxttUo74B +npAMdbqYR3MdEiwxYZfsHcMu4Ye/IKLcrh4pydB5hI2mdjtGkH1bnmia0f1ve/c Z/a0256+W8gWp7mcUBqSNztqLPAWa7wKOqNdLjj5idr1BSj6u8im+fQ9FBh2woki 1fyYAuq/60lq4CMOKJvkA95V1Ome/jO+8tS4PguOgsCETQxCVFGurZcBbG3Mx5Y3 v+ioCqeRc6GvxPFR6YngnTZCrsLxSRT3tnO2Qy5zX7dxjIQkCEbvIckpBQv01Y3R wNUaX+2Jae207igxrEv8CjmCFnmZFuUI15aWWCy6fOS/j8bjuk6ThYJO8N4ZBM0= =2ShG -----END PGP SIGNATURE----- Merge tag 'v3.15-rc6' into sched/core, to pick up the latest fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-22 10:28:56 +02:00
H. Peter Anvin	03c1b4e8e5	Merge remote-tracking branch 'origin/x86/espfix' into x86/vdso Merge x86/espfix into x86/vdso, due to changes in the vdso setup code that otherwise cause conflicts. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-21 17:36:33 -07:00
H. Peter Anvin	e6ab9a20e7	Merge commit '7ed6fb9b5a5510e4ef78ab27419184741169978a' into x86/espfix Merge in Linus' tree with: `fa81511bb0` x86-64, modify_ldt: Make support for 16-bit segments a runtime option ... reverted, to avoid a conflict. This commit is no longer necessary with the proper fix in place. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-21 15:23:19 -07:00
Borislav Petkov	65cef1311d	x86, microcode: Add a disable chicken bit Add a cmdline param which disables the microcode loader. This is useful mostly in debugging situations where we want to turn off microcode loading, both early from the initrd and late, as a means to be able to rule out its influence on the machine. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1400525957-11525-3-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-05-20 20:21:27 -07:00
Borislav Petkov	1b1ded57a4	x86, boot: Carve out early cmdline parsing function Carve out early cmdline parsing function into .../lib/cmdline.c so it can be used by early code in the kernel proper as well. Adapted from arch/x86/boot/cmdline.c. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1400525957-11525-2-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-05-20 20:21:24 -07:00
Andy Lutomirski	a62c34bd2a	x86, mm: Improve _install_special_mapping and fix x86 vdso naming Using arch_vma_name to give special mappings a name is awkward. x86 currently implements it by comparing the start address of the vma to the expected address of the vdso. This requires tracking the start address of special mappings and is probably buggy if a special vma is split or moved. Improve _install_special_mapping to just name the vma directly. Use it to give the x86 vvar area a name, which should make CRIU's life easier. As a side effect, the vvar area will show up in core dumps. This could be considered weird and is fixable. [hpa: I say we accept this as-is but be prepared to deal with knocking out the vvars from core dumps if this becomes a problem.] Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-20 11:38:42 -07:00
Thomas Gleixner	a553b142b8	iommu: dmar: Provide arch specific irq allocation ia64 and x86 share this driver. x86 is moving to a different irq allocation and ia64 keeps its private irq_create/destroy stuff. Use macros to redirect to one or the other. Yes, macros to avoid include hell. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Grant Likely <grant.likely@linaro.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Joerg Roedel <joro@8bytes.org> Cc: x86@kernel.org Cc: linux-ia64@vger.kernel.org Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20140507154336.372289825@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-05-16 14:05:19 +02:00
Thomas Gleixner	d07c9f1875	x86: Get rid of get_nr_irqs_gsi() No need to expose this outside of the ioapic code. The dynamic allocations are guaranteed not to happen in the gsi space. See commit `62a08ae2a`. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Grant Likely <grant.likely@linaro.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: x86@kernel.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20140507154335.959870037@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-05-16 14:05:19 +02:00
Oleg Nesterov	5e1b05beec	x86/traps: Make math_error() static Trivial, make math_error() static. Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2014-05-14 13:57:26 +02:00
Denys Vlasenko	50204c6f6d	uprobes/x86: Simplify rip-relative handling It is possible to replace rip-relative addressing mode with addressing mode of the same length: (reg+disp32). This eliminates the need to fix up immediate and correct for changing instruction length. And we can kill arch_uprobe->def.riprel_target. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2014-05-14 13:57:25 +02:00
Anthony Iliopoulos	9844f54623	x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow() The invalidation is required in order to maintain proper semantics under CoW conditions. In scenarios where a process clones several threads, a thread operating on a core whose DTLB entry for a particular hugepage has not been invalidated, will be reading from the hugepage that belongs to the forked child process, even after hugetlb_cow(). The thread will not see the updated page as long as the stale DTLB entry remains cached, the thread attempts to write into the page, the child process exits, or the thread gets migrated to a different processor. Signed-off-by: Anthony Iliopoulos <anthony.iliopoulos@huawei.com> Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corp Suggested-by: Shay Goikhman <shay.goikhman@huawei.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v2.6.16+ (!)	2014-05-13 16:34:09 -07:00
Ong Boon Leong	7ef1def800	x86, iosf: Added Quark MBI identifiers Added all the MBI units below and their associated read/write opcodes: - Host Bridge Arbiter - Host Bridge - Remote Management Unit - Memory Manager & eSRAM - SoC Unit Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Link: http://lkml.kernel.org/r/1399668248-24199-3-git-send-email-david.e.box@linux.intel.com Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-09 14:57:08 -07:00
David E. Box	6b8f0c8780	x86, iosf: Make IOSF driver modular and usable by more drivers Currently drivers that run on non-IOSF systems (Core/Xeon) can't use the IOSF driver on SOC's without selecting it which forces an unnecessary and limiting dependency. Provides dummy functions to allow these modules to conditionally use the driver on IOSF equipped platforms without impacting their ability to compile and load on non-IOSF platforms. Build default m to ensure availability on x86 SOC's. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Link: http://lkml.kernel.org/r/1399668248-24199-2-git-send-email-david.e.box@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-09 14:56:15 -07:00
Andres Freund	c45f77364b	x86: Fix typo in MSR_IA32_MISC_ENABLE_LIMIT_CPUID macro The spuriously added semicolon didn't have any effect because the macro isn't currently in use. `c0a639ad0b` Signed-off-by: Andres Freund <andres@anarazel.de> Link: http://lkml.kernel.org/r/1399598957-7011-3-git-send-email-andres@anarazel.de Cc: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-05-09 08:42:47 -07:00
Peter Zijlstra	f80c5b39b8	sched/idle, x86: Switch from TS_POLLING to TIF_POLLING_NRFLAG Standardize the idle polling indicator to TIF_POLLING_NRFLAG such that both TIF_NEED_RESCHED and TIF_POLLING_NRFLAG are in the same word. This will allow us, using fetch_or(), to both set NEED_RESCHED and check for POLLING_NRFLAG in a single operation and avoid pointless wakeups. Changing from the non-atomic thread_info::status flags to the atomic thread_info::flags shouldn't be a big issue since most polling state changes were followed/preceded by a full memory barrier anyway. Also, fix up the apm_32 idle function, clearly that was forgotten in the last conversion. The default idle state is !POLLING so just kill the lot. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Link: http://lkml.kernel.org/n/tip-7yksmqtlv4nfowmlqr1rifoi@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-08 09:16:56 +02:00
Feng Tang	f10f383d84	x86/hpet: Make boot_hpet_disable extern HPET on some platform has accuracy problem. Making "boot_hpet_disable" extern so that we can runtime disable the HPET timer by using quirk to check the platform. Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1398327498-13163-1-git-send-email-feng.tang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-08 08:15:34 +02:00
Andy Lutomirski	f40c330091	x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSO This makes the 64-bit and x32 vdsos use the same mechanism as the 32-bit vdso. Most of the churn is deleting all the old fixmap code. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/8af87023f57f6bb96ec8d17fce3f88018195b49b.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-05 13:19:01 -07:00
Andy Lutomirski	18d0a6fd22	x86, vdso: Move the 32-bit vdso special pages after the text This unifies the vdso mapping code and teaches it how to map special pages at addresses corresponding to symbols in the vdso image. The new code is used for all vdso variants, but so far only the 32-bit variants use the new vvar page position. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/b6d7858ad7b5ac3fd3c29cab6d6d769bc45d195e.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-05 13:18:56 -07:00
Andy Lutomirski	6f121e548f	x86, vdso: Reimplement vdso.so preparation in build-time C Currently, vdso.so files are prepared and analyzed by a combination of objcopy, nm, some linker script tricks, and some simple ELF parsers in the kernel. Replace all of that with plain C code that runs at build time. All five vdso images now generate .c files that are compiled and linked in to the kernel image. This should cause only one userspace-visible change: the loaded vDSO images are stripped more heavily than they used to be. Everything outside the loadable segment is dropped. In particular, this causes the section table and section name strings to be missing. This should be fine: real dynamic loaders don't load or inspect these tables anyway. The result is roughly equivalent to eu-strip's --strip-sections option. The purpose of this change is to enable the vvar and hpet mappings to be moved to the page following the vDSO load segment. Currently, it is possible for the section table to extend into the page after the load segment, so, if we map it, it risks overlapping the vvar or hpet page. This happens whenever the load segment is just under a multiple of PAGE_SIZE. The only real subtlety here is that the old code had a C file with inline assembler that did 'call VDSO32_vsyscall' and a linker script that defined 'VDSO32_vsyscall = __kernel_vsyscall'. This most likely worked by accident: the linker script entry defines a symbol associated with an address as opposed to an alias for the real dynamic symbol __kernel_vsyscall. That caused ld to relocate the reference at link time instead of leaving an interposable dynamic relocation. Since the VDSO32_vsyscall hack is no longer needed, I now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it work. vdso2c will generate an error and abort the build if the resulting image contains any dynamic relocations, so we won't silently generate bad vdso images. (Dynamic relocations are a problem because nothing will even attempt to relocate the vdso.) Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-05 13:18:51 -07:00
Andy Lutomirski	cfda7bb9ec	x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.c This code is used during CPU setup, and it isn't strictly speaking related to the 32-bit vdso. It's easier to understand how this works when the code is closer to its callers. This also lets syscall32_cpu_init be static, which might save some trivial amount of kernel text. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/4e466987204e232d7b55a53ff6b9739f12237461.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-05 13:18:47 -07:00
Andy Lutomirski	3d7ee969bf	x86, vdso: Clean up 32-bit vs 64-bit vdso params Rather than using 'vdso_enabled' and an awful #define, just call the parameters vdso32_enabled and vdso64_enabled. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/87913de56bdcbae3d93917938302fc369b05caee.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-05 13:18:40 -07:00
Tom Herbert	4405b4d635	net: Change x86_64 add32_with_carry to allow memory operand Note add32_with_carry(a, b) is suboptimal, as it forces a and b in registers. b could be a memory or a register operand. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 15:26:29 -04:00
Tom Herbert	a278534406	x86_64: csum_add for x86_64 Add csum_add function for x86_64. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 15:26:29 -04:00
H. Peter Anvin	20b68535cd	x86, espfix: Fix broken header guard Header guard is #ifndef, not #ifdef... Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-02 11:34:17 -07:00
H. Peter Anvin	e1fe9ed8d2	x86, espfix: Move espfix definitions into a separate header file Sparse warns that the percpu variables aren't declared before they are defined. Rather than hacking around it, move espfix definitions into a proper header file. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-01 14:16:15 -07:00
H. Peter Anvin	3891a04aaf	x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack The IRET instruction, when returning to a 16-bit segment, only restores the bottom 16 bits of the user space stack pointer. This causes some 16-bit software to break, but it also leaks kernel state to user space. We have a software workaround for that ("espfix") for the 32-bit kernel, but it relies on a nonzero stack segment base which is not available in 64-bit mode. In checkin: `b3b42ac2cb` x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels we "solved" this by forbidding 16-bit segments on 64-bit kernels, with the logic that 16-bit support is crippled on 64-bit kernels anyway (no V86 support), but it turns out that people are doing stuff like running old Win16 binaries under Wine and expect it to work. This works around this by creating percpu "ministacks", each of which is mapped 2^16 times 64K apart. When we detect that the return SS is on the LDT, we copy the IRET frame to the ministack and use the relevant alias to return to userspace. The ministacks are mapped readonly, so if IRET faults we promote #GP to #DF which is an IST vector and thus has its own stack; we then do the fixup in the #DF handler. (Making #GP an IST exception would make the msr_safe functions unsafe in NMI/MC context, and quite possibly have other effects.) Special thanks to: - Andy Lutomirski, for the suggestion of using very small stack slots and copy (as opposed to map) the IRET frame there, and for the suggestion to mark them readonly and let the fault promote to #DF. - Konrad Wilk for paravirt fixup and testing. - Borislav Petkov for testing help and useful comments. Reported-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Lutomriski <amluto@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dirk Hohndel <dirk@hohndel.org> Cc: Arjan van de Ven <arjan.van.de.ven@intel.com> Cc: comex <comexk@gmail.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: <stable@vger.kernel.org> # consider after upstream merge	2014-04-30 14:14:28 -07:00
Oleg Nesterov	1dc76e6eac	uprobes/x86: Kill adjust_ret_addr(), simplify UPROBE_FIX_CALL logic The only insn which could have both UPROBE_FIX_IP and UPROBE_FIX_CALL was 0xe8 "call relative", and now it is handled by branch_xol_ops. So we can change default_post_xol_op(UPROBE_FIX_CALL) to simply push the address of next insn == utask->vaddr + insn.length, just we need to record insn.length into the new auprobe->def.ilen member. Note: if/when we teach branch_xol_ops to support jcxz/loopz we can remove the "correction" logic, UPROBE_FIX_IP can use the same address. Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2014-04-30 19:10:39 +02:00
Oleg Nesterov	78d9af4cd3	uprobes/x86: Cleanup the usage of arch_uprobe->def.fixups, make it u8 handle_riprel_insn() assumes that nobody else could modify ->fixups before. This is correct but fragile, change it to use "\|=". Also make ->fixups u8, we are going to add the new members into the union. It is not clear why UPROBE_FIX_RIP_.X lived in the upper byte, redefine them so that they can fit into u8. Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2014-04-30 19:10:38 +02:00
Oleg Nesterov	97aa5cddbe	uprobes/x86: Move default_xol_ops's data into arch_uprobe->def Finally we can move arch_uprobe->fixups/rip_rela_target_address into the new "def" struct and place this struct in the union, they are only used by default_xol_ops paths. The patch also renames rip_rela_target_address to riprel_target just to make this name shorter. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>	2014-04-30 19:10:37 +02:00
Ian Campbell	5e40704ed2	arm: xen: implement multicall hypercall support. As part of this make the usual change to xen_ulong_t in place of unsigned long. This change has no impact on x86. The Linux definition of struct multicall_entry.result differs from the Xen definition, I think for good reasons, and used a long rather than an unsigned long. Therefore introduce a xen_long_t, which is a long on x86 architectures and a signed 64-bit integer on ARM. Use uint32_t nr_calls on x86 for consistency with the ARM definition. Build tested on amd64 and i386 builds. Runtime tested on ARM. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-04-24 13:09:46 +01:00
Masami Hiramatsu	9326638cbe	kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation Use NOKPROBE_SYMBOL macro for protecting functions from kprobes instead of __kprobes annotation under arch/x86. This applies nokprobe_inline annotation for some cases, because NOKPROBE_SYMBOL() will inhibit inlining by referring the symbol address. This just folds a bunch of previous NOKPROBE_SYMBOL() cleanup patches for x86 to one patch. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/20140417081814.26341.51656.stgit@ltc230.yrl.intra.hitachi.co.jp Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp> Cc: Gleb Natapov <gleb@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Lebon <jlebon@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Michel Lespinasse <walken@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-24 10:26:38 +02:00
Masami Hiramatsu	6f6343f53d	kprobes/x86: Call exception handlers directly from do_int3/do_debug To avoid a kernel crash by probing on lockdep code, call kprobe_int3_handler() and kprobe_debug_handler()(which was formerly called post_kprobe_handler()) directly from do_int3 and do_debug. Currently kprobes uses notify_die() to hook the int3/debug exceptoins. Since there is a locking code in notify_die, the lockdep code can be invoked. And because the lockdep involves printk() related things, theoretically, we need to prohibit probing on such code, which means much longer blacklist we'll have. Instead, hooking the int3/debug for kprobes before notify_die() can avoid this problem. Anyway, most of the int3 handlers in the kernel are already called from do_int3 directly, e.g. ftrace_int3_handler, poke_int3_handler, kgdb_ll_trap. Actually only kprobe_exceptions_notify is on the notifier_call_chain. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Lebon <jlebon@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/20140417081733.26341.24423.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-24 10:02:59 +02:00
Masami Hiramatsu	376e242429	kprobes: Introduce NOKPROBE_SYMBOL() macro to maintain kprobes blacklist Introduce NOKPROBE_SYMBOL() macro which builds a kprobes blacklist at kernel build time. The usage of this macro is similar to EXPORT_SYMBOL(), placed after the function definition: NOKPROBE_SYMBOL(function); Since this macro will inhibit inlining of static/inline functions, this patch also introduces a nokprobe_inline macro for static/inline functions. In this case, we must use NOKPROBE_SYMBOL() for the inline function caller. When CONFIG_KPROBES=y, the macro stores the given function address in the "_kprobe_blacklist" section. Since the data structures are not fully initialized by the macro (because there is no "size" information), those are re-initialized at boot time by using kallsyms. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/20140417081705.26341.96719.stgit@ltc230.yrl.intra.hitachi.co.jp Cc: Alok Kataria <akataria@vmware.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christopher Li <sparse@chrisli.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: David S. Miller <davem@davemloft.net> Cc: Jan-Simon Möller <dl9pf@gmx.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-sparse@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-24 10:02:56 +02:00
Nadav Amit	346874c950	KVM: x86: Fix CR3 reserved bits According to Intel specifications, PAE and non-PAE does not have any reserved bits. In long-mode, regardless to PCIDE, only the high bits (above the physical address) are reserved. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-04-23 17:46:57 -03:00
Peter Zijlstra	d00a569284	arch,x86: Convert smp_mb__*() x86 is strongly ordered and all its atomic ops imply a full barrier. Implement the two new primitives as the old ones were. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-knswsr5mldkr0w1lrdxvc81w@git.kernel.org Cc: Dave Jones <davej@redhat.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michel Lespinasse <walken@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-18 14:20:46 +02:00
Ingo Molnar	1111b680d3	Merge branch 'perf/urgent' into perf/core, to pick up PMU driver fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-18 12:14:55 +02:00
Oleg Nesterov	8e89c0be17	uprobes/x86: Emulate relative call's See the previous "Emulate unconditional relative jmp's" which explains why we can not execute "jmp" out-of-line, the same applies to "call". Emulating of rip-relative call is trivial, we only need to additionally push the ret-address. If this fails, we execute this instruction out of line and this should trigger the trap, the probed application should die or the same insn will be restarted if a signal handler expands the stack. We do not even need ->post_xol() for this case. But there is a corner (and almost theoretical) case: another thread can expand the stack right before we execute this insn out of line. In this case it hit the same problem we are trying to solve. So we simply turn the probed insn into "call 1f; 1:" and add ->post_xol() which restores ->sp and restarts. Many thanks to Jonathan who finally found the standalone reproducer, otherwise I would never resolve the "random SIGSEGV's under systemtap" bug-report. Now that the problem is clear we can write the simplified test-case: void probe_func(void), callee(void); int failed = 1; asm ( ".text\n" ".align 4096\n" ".globl probe_func\n" "probe_func:\n" "call callee\n" "ret" ); /* * This assumes that: * * - &probe_func = 0x401000 + a_bit, aligned = 0x402000 * * - xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000 * as xol_add_vma() asks; the 1st slot = 0x7fffffffe080 * * so we can target the non-canonical address from xol_vma using * the simple math below, 100 * 4096 is just the random offset / asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1 + 100 4096\n"); void callee(void) { failed = 0; } int main(void) { probe_func(); return failed; } It SIGSEGV's if you probe "probe_func" (although this is not very reliable, randomize_va_space/etc can change the placement of xol area). Note: as Denys Vlasenko pointed out, amd and intel treat "callw" (0x66 0xe8) differently. This patch relies on lib/insn.c and thus implements the intel's behaviour: 0x66 is simply ignored. Fortunately nothing sane should ever use this insn, so we postpone the fix until we decide what should we do; emulate or not, support or not, etc. Reported-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>	2014-04-17 21:58:23 +02:00
Oleg Nesterov	7ba6db2d68	uprobes/x86: Emulate unconditional relative jmp's Currently we always execute all insns out-of-line, including relative jmp's and call's. This assumes that even if regs->ip points to nowhere after the single-step, default_post_xol_op(UPROBE_FIX_IP) logic will update it correctly. However, this doesn't work if this regs->ip == xol_vaddr + insn_offset is not canonical. In this case CPU generates #GP and general_protection() kills the task which tries to execute this insn out-of-line. Now that we have uprobe_xol_ops we can teach uprobes to emulate these insns and solve the problem. This patch adds branch_xol_ops which has a single branch_emulate_op() hook, so far it can only handle rel8/32 relative jmp's. TODO: move ->fixup into the union along with rip_rela_target_address. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Jonathan Lebon <jlebon@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>	2014-04-17 21:58:22 +02:00
Oleg Nesterov	8ad8e9d3fd	uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops Introduce arch_uprobe->ops pointing to the "struct uprobe_xol_ops", move the current UPROBE_FIX_{RIP*,IP,CALL} code into the default set of methods and change arch_uprobe_pre/post_xol() accordingly. This way we can add the new uprobe_xol_ops's to handle the insns which need the special processing (rip-relative jmp/call at least). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>	2014-04-17 21:58:19 +02:00
Ricardo Neri	b738c6ea49	x86/efi: Save and restore FPU context around efi_calls (i386) Do a complete FPU context save/restore around the EFI calls. This required as runtime EFI firmware may potentially use the FPU. This change covers only the i386 configuration. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-04-17 13:26:33 +01:00
Ricardo Neri	de05764e0b	x86/efi: Save and restore FPU context around efi_calls (x86_64) Do a complete FPU context save/restore around the EFI calls. This required as runtime EFI firmware may potentially use the FPU. This change covers only the x86_64 configuration. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-04-17 13:26:32 +01:00
Ricardo Neri	982e239cd2	x86/efi: Implement a __efi_call_virt macro For i386, all the EFI system runtime services functions return efi_status_t except efi_reset_system_system. Therefore, not all functions can be covered by the same macro in case the macro needs to do more than calling the function (i.e., return a value). The purpose of the __efi_call_virt macro is to be used when no return value is expected. For x86_64, this macro would not be needed as all the runtime services return u64. However, the same code is used for both x86_64 and i386. Thus, the macro __efi_call_virt is also defined to not break compilation. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-04-17 13:26:32 +01:00
Matt Fleming	c6b4069192	x86, fpu: Extend the use of static_cpu_has_safe It may be necessary to save and restore the FPU context during EFI runtime system services calls. However, this may happen during boot and before alternatives have run. Thus, we need to use static_cpu_has_safe instead. The rationale behind the use of static_cpu_has_safe is the same as in commit `5f8c421814` ("x86, fpu: Use static_cpu_has_safe before alternatives") by Borislav Petkov. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Borislav Petkov <bp@suse.de>	2014-04-17 13:26:31 +01:00
Matt Fleming	62fa6e69a4	x86/efi: Delete most of the efi_call* macros We really only need one phys and one virt function call, and then only one assembly function to make firmware calls. Since we are not using the C type system anyway, we're not really losing much by deleting the macros apart from no longer having a check that we are passing the correct number of parameters. The lack of duplicated code seems like a worthwhile trade-off. Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-04-17 13:26:30 +01:00
Linus Torvalds	55101e2d6c	Merge git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Marcelo Tosatti: - Fix for guest triggerable BUG_ON (CVE-2014-0155) - CR4.SMAP support - Spurious WARN_ON() fix * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: remove WARN_ON from get_kernel_ns() KVM: Rename variable smep to cr4_smep KVM: expose SMAP feature to guest KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode KVM: Add SMAP support when setting CR4 KVM: Remove SMAP bit from CR4_RESERVED_BITS KVM: ioapic: try to recover if pending_eoi goes out of range KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155)	2014-04-14 16:21:28 -07:00
Feng Wu	56d6efc2de	KVM: Remove SMAP bit from CR4_RESERVED_BITS This patch removes SMAP bit from CR4_RESERVED_BITS. Signed-off-by: Feng Wu <feng.wu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-04-14 17:50:33 -03:00
Prarit Bhargava	79a51b25ba	x86/irq: Clean up VECTOR_UNDEFINED and VECTOR_RETRIGGERED definition During another patch review, David Rientjes noted that VECTOR_UNDEFINED and VECTOR_RETRIGGERED should be defined with ()s so that they are not erroneously used in an arithmetic operation. Suggested-by: David Rientjes <rientjes@google.com> Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Yang Zhang <yang.z.zhang@Intel.com> Link: http://lkml.kernel.org/r/1396440827-18352-1-git-send-email-prarit@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-14 13:42:05 +02:00
Linus Torvalds	0b747172dc	Merge git://git.infradead.org/users/eparis/audit Pull audit updates from Eric Paris. * git://git.infradead.org/users/eparis/audit: (28 commits) AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range audit: do not cast audit_rule_data pointers pointlesly AUDIT: Allow login in non-init namespaces audit: define audit_is_compat in kernel internal header kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c sched: declare pid_alive as inline audit: use uapi/linux/audit.h for AUDIT_ARCH declarations syscall_get_arch: remove useless function arguments audit: remove stray newline from audit_log_execve_info() audit_panic() call audit: remove stray newlines from audit_log_lost messages audit: include subject in login records audit: remove superfluous new- prefix in AUDIT_LOGIN messages audit: allow user processes to log from another PID namespace audit: anchor all pid references in the initial pid namespace audit: convert PPIDs to the inital PID namespace. pid: get pid_t ppid of task in init_pid_ns audit: rename the misleading audit_get_context() to audit_take_context() audit: Add generic compat syscall support audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL ...	2014-04-12 12:38:53 -07:00
Mark Salter	5b7c73e009	x86: use generic early_ioremap Move x86 over to the generic early ioremap implementation. Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-04-07 16:36:15 -07:00
Dave Young	6b550f6f20	x86/mm: sparse warning fix for early_memremap This patch series takes the common bits from the x86 early ioremap implementation and creates a generic implementation which may be used by other architectures. The early ioremap interfaces are intended for situations where boot code needs to make temporary virtual mappings before the normal ioremap interfaces are available. Typically, this means before paging_init() has run. This patch (of 6): There's a lot of sparse warnings for code like below: void a = early_memremap(phys_addr, size); early_memremap intend to map kernel memory with ioremap facility, the return pointer should be a kernel ram pointer instead of iomem one. For making the function clearer and supressing sparse warnings this patch do below two things: 1. cast to (__force void ) for the return value of early_memremap 2. add early_memunmap function and pass (__force void __iomem *) to iounmap From Boris: "Ingo told me yesterday, it makes sense too. I'd guess we can try it. FWIW, all callers of early_memremap use the memory they get remapped as normal memory so we should be safe" Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-04-07 16:36:14 -07:00
Christoph Lameter	b3ca1c10d7	percpu: add raw_cpu_ops The kernel has never been audited to ensure that this_cpu operations are consistently used throughout the kernel. The code generated in many places can be improved through the use of this_cpu operations (which uses a segment register for relocation of per cpu offsets instead of performing address calculations). The patch set also addresses various consistency issues in general with the per cpu macros. A. The semantics of __this_cpu_ptr() differs from this_cpu_ptr only because checks are skipped. This is typically shown through a raw_ prefix. So this patch set changes the places where __this_cpu_ptr() is used to raw_cpu_ptr(). B. There has been the long term wish by some that __this_cpu operations would check for preemption. However, there are cases where preemption checks need to be skipped. This patch set adds raw_cpu operations that do not check for preemption and then adds preemption checks to the __this_cpu operations. C. The use of __get_cpu_var is always a reference to a percpu variable that can also be handled via a this_cpu operation. This patch set replaces all uses of __get_cpu_var with this_cpu operations. D. We can then use this_cpu RMW operations in various places replacing sequences of instructions by a single one. E. The use of this_cpu operations throughout will allow other arches than x86 to implement optimized references and RMV operations to work with per cpu local data. F. The use of this_cpu operations opens up the possibility to further optimize code that relies on synchronization through per cpu data. The patch set works in a couple of stages: I. Patch 1 adds the additional raw_cpu operations and raw_cpu_ptr(). Also converts the existing __this_cpu_xx_# primitive in the x86 code to raw_cpu_xx_#. II. Patch 2-4 use the raw_cpu operations in places that would give us false positives once they are enabled. III. Patch 5 adds preemption checks to __this_cpu operations to allow checking if preemption is properly disabled when these functions are used. IV. Patches 6-20 are patches that simply replace uses of __get_cpu_var with this_cpu_ptr. They do not depend on any changes to the percpu code. No preemption tests are skipped if they are applied. V. Patches 21-46 are conversion patches that use this_cpu operations in various kernel subsystems/drivers or arch code. VI. Patches 47/48 (not included in this series) remove no longer used functions (__this_cpu_ptr and __get_cpu_var). These should only be applied after all the conversion patches have made it and after we have done additional passes through the kernel to ensure that none of the uses of these functions remain. This patch (of 46): The patches following this one will add preemption checks to __this_cpu ops so we need to have an alternative way to use this_cpu operations without preemption checks. raw_cpu_ops will be the basis for all other ops since these will be the operations that do not implement any checks. Primitive operations are renamed by this patch from __this_cpu_xxx to raw_cpu_xxxx. Also change the uses of the x86 percpu primitives in preempt.h. These depend directly on asm/percpu.h (header #include nesting issue). Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Christoph Lameter <cl@linux.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Alex Shi <alex.shi@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bryan Wu <cooloney@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: David Daney <david.daney@cavium.com> Cc: David Miller <davem@davemloft.net> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Hedi Berriche <hedi@sgi.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: John Stultz <john.stultz@linaro.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Mike Travis <travis@sgi.com> Cc: Neil Brown <neilb@suse.de> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Robert Richter <rric@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-04-07 16:36:13 -07:00
Josh Triplett	b06dd879f5	x86: always define BUG() and HAVE_ARCH_BUG, even with !CONFIG_BUG This ensures that BUG() always has a definition that causes a trap (via an undefined instruction), and that the compiler still recognizes the code following BUG() as unreachable, avoiding warnings that would otherwise appear (such as on non-void functions that don't return a value after BUG()). In addition to saving a few bytes over the generic infinite-loop implementation, this implementation traps rather than looping, which potentially allows for better error-recovery behavior (such as by rebooting). Signed-off-by: Josh Triplett <josh@joshtriplett.org> Reported-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-04-07 16:36:10 -07:00
Linus Torvalds	d1d9cfc330	A number of cleanups plus support for the RDSEED instruction, which will be showing up in Intel Broadwell CPU's. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJTPaDyAAoJENNvdpvBGATwIVkP/Ao77ATdIBKr7kJJ+7q/jroo hfg2EOBzYwFSTZ4EUCwTwCNOqcAnr8xBYRWDDTxoiT0ORQN+4QkIXlukTBZbeCUg 5ZCvRYhNcrieCBX69X/0aQ7yAkMR5/KSjccww3Tr6bLEItEBS7/gkeGv0H/E1DOg /1C3vbDDRPchM2qGM/TS/2Wd3WU5jWJk4cURJUlweBWHGXItOk6yHIY/+h3t4Bj1 eJfxwG21BLJ1t159uvMj5Sw9Ketwnzgc4hsU6Ai/o37+Q12LYJ6xRCHsAatr6aA8 dsObrfnD15T3gH/elVzB0tpBkor9pESkJAvUxWdMNMNGVU01ONuBz6rbSUqLdKu4 GXA+9iHq4Ti43f4M886PweSweFr1+lFIiQrdAQaWZdrooLCMJCgjMNM+7wnlu8ho Eit65rq4k6y5Fb5GTB8Am/TCb3HNYhXnrMERA0meKx+/K8yaMygzHtPoT+DHdn8z GWDwNiMYrtTeiJ5nduljO3M2lYJDIWLkCD9SwupXe5HEj3qSHS/aiT5ng6YP+oOK mmUiWWgpVv3unO6pJw49pbzuAWXnbHVWMV2VDILVxn7mZZ0NAOL3OGAmK7US+Z/J qww9s89A2oba8vTbdrW7JuiP9ESlujpfrLQQgnW6WInMHY2dfo5hSo84ULLeLMXQ SqT1yywl4U888jqBYioB =RtMT -----END PGP SIGNATURE----- Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random Pull /dev/random changes from Ted Ts'o: "A number of cleanups plus support for the RDSEED instruction, which will be showing up in Intel Broadwell CPU's" * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: random: Add arch_has_random[_seed]() random: If we have arch_get_random_seed(), try it before blocking random: Use arch_get_random_seed() at init time and once a second x86, random: Enable the RDSEED instruction random: use the architectural HWRNG for the SHA's IV in extract_buf() random: clarify bits/bytes in wakeup thresholds random: entropy_bytes is actually bits random: simplify accounting code random: tighten bound on random_read_wakeup_thresh random: forget lock in lockless accounting random: simplify accounting logic random: fix comment on "account" random: simplify loop in random_read random: fix description of get_random_bytes random: fix comment on proc_do_uuid random: fix typos / spelling errors in comments	2014-04-04 21:25:28 -07:00
Linus Torvalds	a372c967a3	Support PCI devices with multiple MSIs, performance improvement for kernel-based backends (by not populated m2p overrides when mapping), and assorted minor bug fixes and cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQEcBAABAgAGBQJTPTGyAAoJEFxbo/MsZsTRnjgH/10j5CbOK1RFvIyCSslGTf4G slhK8P8dhhplGAxwXXji322lWNYEx9Jd+V0Bhxnvr4drSlsP/qkWuBWf+u1LBvRq AVPM99tk0XHCVAuvMMNo/lc62dTIR9IpQvnY6WhHSHnSlfqyVcdnbaGk8/LRuxWJ u2F0MXzDNH00b/kt6hDBt3F7CkHfjwsEn43LCkkxyHPp5MJGD7bGDIe+bKtnjv9u D9VJtCWQkrjWQ6jNpjdP833JCNCGQrXtVO3DeTAGs3T1tGmiEsqp6kT6Gp5zCFnh oaQk9jfQL2S+IVnVhHVMW9nTwNPPrnIrD69FlgTrK301mcYW1mKoFotTogzHu+0= =2IG+ -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen features and fixes from David Vrabel: "Support PCI devices with multiple MSIs, performance improvement for kernel-based backends (by not populated m2p overrides when mapping), and assorted minor bug fixes and cleanups" * tag 'stable/for-linus-3.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/acpi-processor: fix enabling interrupts on syscore_resume xen/grant-table: Refactor gnttab_[un]map_refs to avoid m2p_override xen: remove XEN_PRIVILEGED_GUEST xen: add support for MSI message groups xen-pciback: Use pci_enable_msix_exact() instead of pci_enable_msix() xen/xenbus: remove unused xenbus_bind_evtchn() xen/events: remove unnecessary call to bind_evtchn_to_cpu() xen/events: remove the unused resend_irq_on_evtchn() drivers:xen-selfballoon:reset 'frontswap_inertia_counter' after frontswap_shrink drivers: xen: Include appropriate header file in pcpu.c drivers: xen: Mark function as static in platform-pci.c	2014-04-03 14:01:37 -07:00
Linus Torvalds	7cbb39d4d4	Merge tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "PPC and ARM do not have much going on this time. Most of the cool stuff, instead, is in s390 and (after a few releases) x86. ARM has some caching fixes and PPC has transactional memory support in guests. MIPS has some fixes, with more probably coming in 3.16 as QEMU will soon get support for MIPS KVM. For x86 there are optimizations for debug registers, which trigger on some Windows games, and other important fixes for Windows guests. We now expose to the guest Broadwell instruction set extensions and also Intel MPX. There's also a fix/workaround for OS X guests, nested virtualization features (preemption timer), and a couple kvmclock refinements. For s390, the main news is asynchronous page faults, together with improvements to IRQs (floating irqs and adapter irqs) that speed up virtio devices" * tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (96 commits) KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8 KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offset KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode KVM: PPC: Book3S HV: Return ENODEV error rather than EIO KVM: PPC: Book3S: Trim top 4 bits of physical address in RTAS code KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state KVM: PPC: Book3S HV: Add transactional memory support KVM: Specify byte order for KVM_EXIT_MMIO KVM: vmx: fix MPX detection KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCE KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write KVM: s390: clear local interrupts at cpu initial reset KVM: s390: Fix possible memory leak in SIGP functions KVM: s390: fix calculation of idle_mask array size KVM: s390: randomize sca address KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP KVM: Bump KVM_MAX_IRQ_ROUTES for s390 KVM: s390: irq routing for adapter interrupts. KVM: s390: adapter interrupt sources ...	2014-04-02 14:50:10 -07:00
Linus Torvalds	467cbd207a	Merge branch 'x86-nuke-platforms-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 old platform removal from Peter Anvin: "This patchset removes support for several completely obsolete platforms, where the maintainers either have completely vanished or acked the removal. For some of them it is questionable if there even exists functional specimens of the hardware" Geert Uytterhoeven apparently thought this was a April Fool's pull request ;) * 'x86-nuke-platforms-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, platforms: Remove NUMAQ x86, platforms: Remove SGI Visual Workstation x86, apic: Remove support for IBM Summit/EXA chipset x86, apic: Remove support for ia32-based Unisys ES7000	2014-04-02 13:15:58 -07:00
Linus Torvalds	c6f21243ce	Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso changes from Peter Anvin: "This is the revamp of the 32-bit vdso and the associated cleanups. This adds timekeeping support to the 32-bit vdso that we already have in the 64-bit vdso. Although 32-bit x86 is legacy, it is likely to remain in the embedded space for a very long time to come. This removes the traditional COMPAT_VDSO support; the configuration variable is reused for simply removing the 32-bit vdso, which will produce correct results but obviously suffer a performance penalty. Only one beta version of glibc was affected, but that version was unfortunately included in one OpenSUSE release. This is not the end of the vdso cleanups. Stefani and Andy have agreed to continue work for the next kernel cycle; in fact Andy has already produced another set of cleanups that came too late for this cycle. An incidental, but arguably important, change is that this ensures that unused space in the VVAR page is properly zeroed. It wasn't before, and would contain whatever garbage was left in memory by BIOS or the bootloader. Since the VVAR page is accessible to user space this had the potential of information leaks" * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86, vdso: Fix the symbol versions on the 32-bit vDSO x86, vdso, build: Don't rebuild 32-bit vdsos on every make x86, vdso: Actually discard the .discard sections x86, vdso: Fix size of get_unmapped_area() x86, vdso: Finish removing VDSO32_PRELINK x86, vdso: Move more vdso definitions into vdso.h x86: Load the 32-bit vdso in place, just like the 64-bit vdsos x86, vdso32: handle 32 bit vDSO larger one page x86, vdso32: Disable stack protector, adjust optimizations x86, vdso: Zero-pad the VVAR page x86, vdso: Add 32 bit VDSO time support for 64 bit kernel x86, vdso: Add 32 bit VDSO time support for 32 bit kernel x86, vdso: Patch alternatives in the 32-bit VDSO x86, vdso: Introduce VVAR marco for vdso32 x86, vdso: Cleanup __vdso_gettimeofday() x86, vdso: Replace VVAR(vsyscall_gtod_data) by gtod macro x86, vdso: __vdso_clock_gettime() cleanup x86, vdso: Revamp vclock_gettime.c mm: Add new func _install_special_mapping() to mmap.c x86, vdso: Make vsyscall_gtod_data handling x86 generic ...	2014-04-02 12:26:43 -07:00
Linus Torvalds	158e0d3621	Driver core / sysfs patches for 3.15-rc1 Here's the big driver core / sysfs update for 3.15-rc1. Lots of kernfs updates to make it useful for other subsystems, and a few other tiny driver core patches. All have been in linux-next for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEABECAAYFAlM7A0wACgkQMUfUDdst+ynJNACfZlY+KNKIhNFt1OOW8rQfSZzy 1PYAnjYuOoly01JlPrpJD5b4TdxaAq71 =GVUg -----END PGP SIGNATURE----- Merge tag 'driver-core-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core and sysfs updates from Greg KH: "Here's the big driver core / sysfs update for 3.15-rc1. Lots of kernfs updates to make it useful for other subsystems, and a few other tiny driver core patches. All have been in linux-next for a while" * tag 'driver-core-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (42 commits) Revert "sysfs, driver-core: remove unused {sysfs\|device}_schedule_callback_owner()" kernfs: cache atomic_write_len in kernfs_open_file numa: fix NULL pointer access and memory leak in unregister_one_node() Revert "driver core: synchronize device shutdown" kernfs: fix off by one error. kernfs: remove duplicate dir.c at the top dir x86: align x86 arch with generic CPU modalias handling cpu: add generic support for CPU feature based module autoloading sysfs: create bin_attributes under the requested group driver core: unexport static function create_syslog_header firmware: use power efficient workqueue for unloading and aborting fw load firmware: give a protection when map page failed firmware: google memconsole driver fixes firmware: fix google/gsmi duplicate efivars_sysfs_init() drivers/base: delete non-required instances of include <linux/init.h> kernfs: fix kernfs_node_from_dentry() ACPI / platform: drop redundant ACPI_HANDLE check kernfs: fix hash calculation in kernfs_rename_ns() kernfs: add CONFIG_KERNFS sysfs, kobject: add sysfs wrapper for kernfs_enable_ns() ...	2014-04-01 16:28:19 -07:00
Linus Torvalds	4b1779c2cf	PCI changes for the v3.15 merge window: Enumeration - Increment max correctly in pci_scan_bridge() (Andreas Noever) - Clarify the "scan anyway" comment in pci_scan_bridge() (Andreas Noever) - Assign CardBus bus number only during the second pass (Andreas Noever) - Use request_resource_conflict() instead of insert_ for bus numbers (Andreas Noever) - Make sure bus number resources stay within their parents bounds (Andreas Noever) - Remove pci_fixup_parent_subordinate_busnr() (Andreas Noever) - Check for child busses which use more bus numbers than allocated (Andreas Noever) - Don't scan random busses in pci_scan_bridge() (Andreas Noever) - x86: Drop pcibios_scan_root() check for bus already scanned (Bjorn Helgaas) - x86: Use pcibios_scan_root() instead of pci_scan_bus_with_sysdata() (Bjorn Helgaas) - x86: Use pcibios_scan_root() instead of pci_scan_bus_on_node() (Bjorn Helgaas) - x86: Merge pci_scan_bus_on_node() into pcibios_scan_root() (Bjorn Helgaas) - x86: Drop return value of pcibios_scan_root() (Bjorn Helgaas) NUMA - x86: Add x86_pci_root_bus_node() to look up NUMA node from PCI bus (Bjorn Helgaas) - x86: Use x86_pci_root_bus_node() instead of get_mp_bus_to_node() (Bjorn Helgaas) - x86: Remove mp_bus_to_node[], set_mp_bus_to_node(), get_mp_bus_to_node() (Bjorn Helgaas) - x86: Use NUMA_NO_NODE, not -1, for unknown node (Bjorn Helgaas) - x86: Remove acpi_get_pxm() usage (Bjorn Helgaas) - ia64: Use NUMA_NO_NODE, not MAX_NUMNODES, for unknown node (Bjorn Helgaas) - ia64: Remove acpi_get_pxm() usage (Bjorn Helgaas) - ACPI: Fix acpi_get_node() prototype (Bjorn Helgaas) Resource management - i2o: Fix and refactor PCI space allocation (Bjorn Helgaas) - Add resource_contains() (Bjorn Helgaas) - Add %pR support for IORESOURCE_UNSET (Bjorn Helgaas) - Mark resources as IORESOURCE_UNSET if we can't assign them (Bjorn Helgaas) - Don't clear IORESOURCE_UNSET when updating BAR (Bjorn Helgaas) - Check IORESOURCE_UNSET before updating BAR (Bjorn Helgaas) - Don't try to claim IORESOURCE_UNSET resources (Bjorn Helgaas) - Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit (Bjorn Helgaas) - Don't enable decoding if BAR hasn't been assigned an address (Bjorn Helgaas) - Add "weak" generic pcibios_enable_device() implementation (Bjorn Helgaas) - alpha, microblaze, sh, sparc, tile: Use default pcibios_enable_device() (Bjorn Helgaas) - s390: Use generic pci_enable_resources() (Bjorn Helgaas) - Don't check resource_size() in pci_bus_alloc_resource() (Bjorn Helgaas) - Set type in __request_region() (Bjorn Helgaas) - Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() (Bjorn Helgaas) - Change pci_bus_alloc_resource() type_mask to unsigned long (Bjorn Helgaas) - Log IDE resource quirk in dmesg (Bjorn Helgaas) - Revert "[PATCH] Insert GART region into resource map" (Bjorn Helgaas) PCI device hotplug - Make check_link_active() non-static (Rajat Jain) - Use link change notifications for hot-plug and removal (Rajat Jain) - Enable link state change notifications (Rajat Jain) - Don't disable the link permanently during removal (Rajat Jain) - Don't check adapter or latch status while disabling (Rajat Jain) - Disable link notification across slot reset (Rajat Jain) - Ensure very fast hotplug events are also processed (Rajat Jain) - Add hotplug_lock to serialize hotplug events (Rajat Jain) - Remove a non-existent card, regardless of "surprise" capability (Rajat Jain) - Don't turn slot off when hot-added device already exists (Yijing Wang) MSI - Keep pci_enable_msi() documentation (Alexander Gordeev) - ahci: Fix broken single MSI fallback (Alexander Gordeev) - ahci, vfio: Use pci_enable_msi_range() (Alexander Gordeev) - Check kmalloc() return value, fix leak of name (Greg Kroah-Hartman) - Fix leak of msi_attrs (Greg Kroah-Hartman) - Fix pci_msix_vec_count() htmldocs failure (Masanari Iida) Virtualization - Device-specific ACS support (Alex Williamson) Freescale i.MX6 - Wait for retraining (Marek Vasut) Marvell MVEBU - Use Device ID and revision from underlying endpoint (Andrew Lunn) - Fix incorrect size for PCI aperture resources (Jason Gunthorpe) - Call request_resource() on the apertures (Jason Gunthorpe) - Fix potential issue in range parsing (Jean-Jacques Hiblot) Renesas R-Car - Check platform_get_irq() return code (Ben Dooks) - Add error interrupt handling (Ben Dooks) - Fix bridge logic configuration accesses (Ben Dooks) - Register each instance independently (Magnus Damm) - Break out window size handling (Magnus Damm) - Make the Kconfig dependencies more generic (Magnus Damm) Synopsys DesignWare - Fix RC BAR to be single 64-bit non-prefetchable memory (Mohit Kumar) Miscellaneous - Remove unused SR-IOV VF Migration support (Bjorn Helgaas) - Enable INTx if BIOS left them disabled (Bjorn Helgaas) - Fix hex vs decimal typo in cpqhpc_probe() (Dan Carpenter) - Clean up par-arch object file list (Liviu Dudau) - Set IORESOURCE_ROM_SHADOW only for the default VGA device (Sander Eikelenboom) - ACPI, ARM, drm, powerpc, pcmcia, PCI: Use list_for_each_entry() for bus traversal (Yijing Wang) - Fix pci_bus_b() build failure (Paul Gortmaker) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJTOdAZAAoJEFmIoMA60/r8VYUQALRrReyMBk3pjRt/fKIX4Kwi ydSo/YJeeKTN8K93fLw8bb8bdPItJScJFTfEa4Q2SpZezR/ecGXLowisy0BBaPHK qtOyB8EqjkLS17GfyecIe9Nd2SIAI2De/0bchK3kDtIX1YlZB/k/tD3eCPMHDnnl m8c5kAHKPQYd8g01I+S8nrtGHk/A33grfYpJXPZbcqyhE0lWU3SI8KDAGbcKzNHE 23Do0yNyd4nHIdixWlhETcNvzHn35Q/O38JJwW9Mf1aI9gusYuml6GFefCgu/iov lxqp3CEW7iPZgQEgNbrQ0HzWn/durL2Trd6S/Yh6f2xbm1LGYKWh3LZUFLd3AQDd INEpUgKsyb//nF3dtiyGnZlp0QykoqFyLo2AEDrb+ILTd4up5DeRY/m1UpjAXR5p QicBmrDksHrSivPmMZwLx1DFQYKjQbdx5lOqy9hQM/Jmsr+N3/l7QBrbQWXks3JZ NNAyn4RZHQB7UDQS/MmVPArs+JK5qaEDQD57QuOTlqgP19VY9C9E/l/aEqefjdFo XOAm7CwGpB/iBAkIbE6ROEDiJArigRVHEfxLYeE/jtGOdRDCD1deWk+g3S8DWD7m ZxWSgIVB00PMAmomczdg59YVFBhocgwPUa8/cw6yqzx2QKP4mWXIFZ/Sjau5I3tn WWoxXlUirZfTJc29XnVy =3mNS -----END PGP SIGNATURE----- Merge tag 'pci-v3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI changes from Bjorn Helgaas: "Enumeration - Increment max correctly in pci_scan_bridge() (Andreas Noever) - Clarify the "scan anyway" comment in pci_scan_bridge() (Andreas Noever) - Assign CardBus bus number only during the second pass (Andreas Noever) - Use request_resource_conflict() instead of insert_ for bus numbers (Andreas Noever) - Make sure bus number resources stay within their parents bounds (Andreas Noever) - Remove pci_fixup_parent_subordinate_busnr() (Andreas Noever) - Check for child busses which use more bus numbers than allocated (Andreas Noever) - Don't scan random busses in pci_scan_bridge() (Andreas Noever) - x86: Drop pcibios_scan_root() check for bus already scanned (Bjorn Helgaas) - x86: Use pcibios_scan_root() instead of pci_scan_bus_with_sysdata() (Bjorn Helgaas) - x86: Use pcibios_scan_root() instead of pci_scan_bus_on_node() (Bjorn Helgaas) - x86: Merge pci_scan_bus_on_node() into pcibios_scan_root() (Bjorn Helgaas) - x86: Drop return value of pcibios_scan_root() (Bjorn Helgaas) NUMA - x86: Add x86_pci_root_bus_node() to look up NUMA node from PCI bus (Bjorn Helgaas) - x86: Use x86_pci_root_bus_node() instead of get_mp_bus_to_node() (Bjorn Helgaas) - x86: Remove mp_bus_to_node[], set_mp_bus_to_node(), get_mp_bus_to_node() (Bjorn Helgaas) - x86: Use NUMA_NO_NODE, not -1, for unknown node (Bjorn Helgaas) - x86: Remove acpi_get_pxm() usage (Bjorn Helgaas) - ia64: Use NUMA_NO_NODE, not MAX_NUMNODES, for unknown node (Bjorn Helgaas) - ia64: Remove acpi_get_pxm() usage (Bjorn Helgaas) - ACPI: Fix acpi_get_node() prototype (Bjorn Helgaas) Resource management - i2o: Fix and refactor PCI space allocation (Bjorn Helgaas) - Add resource_contains() (Bjorn Helgaas) - Add %pR support for IORESOURCE_UNSET (Bjorn Helgaas) - Mark resources as IORESOURCE_UNSET if we can't assign them (Bjorn Helgaas) - Don't clear IORESOURCE_UNSET when updating BAR (Bjorn Helgaas) - Check IORESOURCE_UNSET before updating BAR (Bjorn Helgaas) - Don't try to claim IORESOURCE_UNSET resources (Bjorn Helgaas) - Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit (Bjorn Helgaas) - Don't enable decoding if BAR hasn't been assigned an address (Bjorn Helgaas) - Add "weak" generic pcibios_enable_device() implementation (Bjorn Helgaas) - alpha, microblaze, sh, sparc, tile: Use default pcibios_enable_device() (Bjorn Helgaas) - s390: Use generic pci_enable_resources() (Bjorn Helgaas) - Don't check resource_size() in pci_bus_alloc_resource() (Bjorn Helgaas) - Set type in __request_region() (Bjorn Helgaas) - Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() (Bjorn Helgaas) - Change pci_bus_alloc_resource() type_mask to unsigned long (Bjorn Helgaas) - Log IDE resource quirk in dmesg (Bjorn Helgaas) - Revert "[PATCH] Insert GART region into resource map" (Bjorn Helgaas) PCI device hotplug - Make check_link_active() non-static (Rajat Jain) - Use link change notifications for hot-plug and removal (Rajat Jain) - Enable link state change notifications (Rajat Jain) - Don't disable the link permanently during removal (Rajat Jain) - Don't check adapter or latch status while disabling (Rajat Jain) - Disable link notification across slot reset (Rajat Jain) - Ensure very fast hotplug events are also processed (Rajat Jain) - Add hotplug_lock to serialize hotplug events (Rajat Jain) - Remove a non-existent card, regardless of "surprise" capability (Rajat Jain) - Don't turn slot off when hot-added device already exists (Yijing Wang) MSI - Keep pci_enable_msi() documentation (Alexander Gordeev) - ahci: Fix broken single MSI fallback (Alexander Gordeev) - ahci, vfio: Use pci_enable_msi_range() (Alexander Gordeev) - Check kmalloc() return value, fix leak of name (Greg Kroah-Hartman) - Fix leak of msi_attrs (Greg Kroah-Hartman) - Fix pci_msix_vec_count() htmldocs failure (Masanari Iida) Virtualization - Device-specific ACS support (Alex Williamson) Freescale i.MX6 - Wait for retraining (Marek Vasut) Marvell MVEBU - Use Device ID and revision from underlying endpoint (Andrew Lunn) - Fix incorrect size for PCI aperture resources (Jason Gunthorpe) - Call request_resource() on the apertures (Jason Gunthorpe) - Fix potential issue in range parsing (Jean-Jacques Hiblot) Renesas R-Car - Check platform_get_irq() return code (Ben Dooks) - Add error interrupt handling (Ben Dooks) - Fix bridge logic configuration accesses (Ben Dooks) - Register each instance independently (Magnus Damm) - Break out window size handling (Magnus Damm) - Make the Kconfig dependencies more generic (Magnus Damm) Synopsys DesignWare - Fix RC BAR to be single 64-bit non-prefetchable memory (Mohit Kumar) Miscellaneous - Remove unused SR-IOV VF Migration support (Bjorn Helgaas) - Enable INTx if BIOS left them disabled (Bjorn Helgaas) - Fix hex vs decimal typo in cpqhpc_probe() (Dan Carpenter) - Clean up par-arch object file list (Liviu Dudau) - Set IORESOURCE_ROM_SHADOW only for the default VGA device (Sander Eikelenboom) - ACPI, ARM, drm, powerpc, pcmcia, PCI: Use list_for_each_entry() for bus traversal (Yijing Wang) - Fix pci_bus_b() build failure (Paul Gortmaker)" * tag 'pci-v3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (108 commits) Revert "[PATCH] Insert GART region into resource map" PCI: Log IDE resource quirk in dmesg PCI: Change pci_bus_alloc_resource() type_mask to unsigned long PCI: Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() resources: Set type in __request_region() PCI: Don't check resource_size() in pci_bus_alloc_resource() s390/PCI: Use generic pci_enable_resources() tile PCI RC: Use default pcibios_enable_device() sparc/PCI: Use default pcibios_enable_device() (Leon only) sh/PCI: Use default pcibios_enable_device() microblaze/PCI: Use default pcibios_enable_device() alpha/PCI: Use default pcibios_enable_device() PCI: Add "weak" generic pcibios_enable_device() implementation PCI: Don't enable decoding if BAR hasn't been assigned an address PCI: Enable INTx in pci_reenable_device() only when MSI/MSI-X not enabled PCI: Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit PCI: Don't try to claim IORESOURCE_UNSET resources PCI: Check IORESOURCE_UNSET before updating BAR PCI: Don't clear IORESOURCE_UNSET when updating BAR PCI: Mark resources as IORESOURCE_UNSET if we can't assign them ... Conflicts: arch/x86/include/asm/topology.h drivers/ata/ahci.c	2014-04-01 15:14:04 -07:00
Linus Torvalds	683b6c6f82	Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq code updates from Thomas Gleixner: "The irq department proudly presents: - Another tree wide sweep of irq infrastructure abuse. Clear winner of the trainwreck engineering contest was: #include "../../../kernel/irq/settings.h" - Tree wide update of irq_set_affinity() callbacks which miss a cpu online check when picking a single cpu out of the affinity mask. - Tree wide consolidation of interrupt statistics. - Updates to the threaded interrupt infrastructure to allow explicit wakeup of the interrupt thread and a variant of synchronize_irq() which synchronizes only the hard interrupt handler. Both are needed to replace the homebrewn thread handling in the mmc/sdhci code. - New irq chip callbacks to allow proper support for GPIO based irqs. The GPIO based interrupts need to request/release GPIO resources from request/free_irq. - A few new ARM interrupt chips. No revolutionary new hardware, just differently wreckaged variations of the scheme. - Small improvments, cleanups and updates all over the place" I was hoping that that trainwreck engineering contest was a April Fools' joke. But no. * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (68 commits) irqchip: sun7i/sun6i: Disable NMI before registering the handler ARM: sun7i/sun6i: dts: Fix IRQ number for sun6i NMI controller ARM: sun7i/sun6i: irqchip: Update the documentation ARM: sun7i/sun6i: dts: Add NMI irqchip support ARM: sun7i/sun6i: irqchip: Add irqchip driver for NMI controller genirq: Export symbol no_action() arm: omap: Fix typo in ams-delta-fiq.c m68k: atari: Fix the last kernel_stat.h fallout irqchip: sun4i: Simplify sun4i_irq_ack irqchip: sun4i: Use handle_fasteoi_irq for all interrupts genirq: procfs: Make smp_affinity values go+r softirq: Add linux/irq.h to make it compile again m68k: amiga: Add linux/irq.h to make it compile again irqchip: sun4i: Don't ack IRQs > 0, fix acking of IRQ 0 irqchip: sun4i: Fix a comment about mask register initialization irqchip: sun4i: Fix irq 0 not working genirq: Add a new IRQCHIP_EOI_THREADED flag genirq: Document IRQCHIP_ONESHOT_SAFE flag ARM: sunxi: dt: Convert to the new irq controller compatibles irqchip: sunxi: Change compatibles ...	2014-04-01 11:22:57 -07:00
Linus Torvalds	99f7b025bf	Merge branch 'x86-threadinfo-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 threadinfo changes from Ingo Molnar: "The main change here is the consolidation/unification of 32 and 64 bit thread_info handling methods, from Steve Rostedt" * 'x86-threadinfo-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, threadinfo: Redo "x86: Use inline assembler to get sp" x86: Clean up dumpstack_64.c code x86: Keep thread_info on thread stack in x86_32 x86: Prepare removal of previous_esp from i386 thread_info structure x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386 x86: Nuke the supervisor_stack field in i386 thread_info	2014-04-01 10:17:18 -07:00
Linus Torvalds	a21e40877a	Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Ingo Molnar: "The main purpose is to fix a full dynticks bug related to virtualization, where steal time accounting appears to be zero in /proc/stat even after a few seconds of competing guests running busy loops in a same host CPU. It's not a regression though as it was there since the beginning. The other commits are preparatory work to fix the bug and various cleanups" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arch: Remove stub cputime.h headers sched: Remove needless round trip nsecs <-> tick conversion of steal time cputime: Fix jiffies based cputime assumption on steal accounting cputime: Bring cputime -> nsecs conversion cputime: Default implementation of nsecs -> cputime conversion cputime: Fix nsecs_to_cputime() return type cast	2014-04-01 10:16:10 -07:00
Linus Torvalds	b9b16a7922	Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpufeature update from Ingo Molnar: "Two refinements to clflushopt support" * 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpufeature: If we disable CLFLUSH, we should disable CLFLUSHOPT x86, cpufeature: Rename X86_FEATURE_CLFLSH to X86_FEATURE_CLFLUSH	2014-04-01 10:11:21 -07:00
Dimitri Sivanich	5f40f7d938	x86/UV: Set n_lshift based on GAM_GR_CONFIG MMR for UV3 The value of n_lshift for UV is currently set based on the socket m_val. For UV3, set the n_lshift value based on the GAM_GR_CONFIG MMR. This will allow bios to control the n_lshift value independent of the socket m_val. Then n_lshift can be assigned a fixed value across a multi-partition system, allowing for a fixed common global physical address format that is independent of socket m_val. Cleanup unneeded macros. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Link: http://lkml.kernel.org/r/20140331143700.GB29916@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-01 12:10:44 +02:00
Linus Torvalds	190f918660	Merge branch 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 compat wrapper rework from Heiko Carstens: "S390 compat system call wrapper simplification work. The intention of this work is to get rid of all hand written assembly compat system call wrappers on s390, which perform proper sign or zero extension, or pointer conversion of compat system call parameters. Instead all of this should be done with C code eg by using Al's COMPAT_SYSCALL_DEFINEx() macro. Therefore all common code and s390 specific compat system calls have been converted to the COMPAT_SYSCALL_DEFINEx() macro. In order to generate correct code all compat system calls may only have eg compat_ulong_t parameters, but no unsigned long parameters. Those patches which change parameter types from unsigned long to compat_ulong_t parameters are separate in this series, but shouldn't cause any harm. The only compat system calls which intentionally have 64 bit parameters (preadv64 and pwritev64) in support of the x86/32 ABI haven't been changed, but are now only available if an architecture defines __ARCH_WANT_COMPAT_SYS_PREADV64/PWRITEV64. System calls which do not have a compat variant but still need proper zero extension on s390, like eg "long sys_brk(unsigned long brk)" will get a proper wrapper function with the new s390 specific COMPAT_SYSCALL_WRAPx() macro: COMPAT_SYSCALL_WRAP1(brk, unsigned long, brk); which generates the following code (simplified): asmlinkage long sys_brk(unsigned long brk); asmlinkage long compat_sys_brk(long brk) { return sys_brk((u32)brk); } Given that the C file which contains all the COMPAT_SYSCALL_WRAP lines includes both linux/syscall.h and linux/compat.h, it will generate build errors, if the declaration of sys_brk() doesn't match, or if there exists a non-matching compat_sys_brk() declaration. In addition this will intentionally result in a link error if somewhere else a compat_sys_brk() function exists, which probably should have been used instead. Two more BUILD_BUG_ONs make sure the size and type of each compat syscall parameter can be handled correctly with the s390 specific macros. I converted the compat system calls step by step to verify the generated code is correct and matches the previous code. In fact it did not always match, however that was always a bug in the hand written asm code. In result we get less code, less bugs, and much more sanity checking" * 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (44 commits) s390/compat: add copyright statement compat: include linux/unistd.h within linux/compat.h s390/compat: get rid of compat wrapper assembly code s390/compat: build error for large compat syscall args mm/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types kexec/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types net/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types ipc/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types ipc/compat: convert to COMPAT_SYSCALL_DEFINE fs/compat: convert to COMPAT_SYSCALL_DEFINE security/compat: convert to COMPAT_SYSCALL_DEFINE mm/compat: convert to COMPAT_SYSCALL_DEFINE net/compat: convert to COMPAT_SYSCALL_DEFINE kernel/compat: convert to COMPAT_SYSCALL_DEFINE fs/compat: optional preadv64/pwrite64 compat system calls ipc/compat_sys_msgrcv: change msgtyp type from long to compat_long_t s390/compat: partial parameter conversion within syscall wrappers s390/compat: automatic zero, sign and pointer conversion of syscalls s390/compat: add sync_file_range and fallocate compat syscalls ...	2014-03-31 14:32:17 -07:00
Linus Torvalds	7cc3afdf43	Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 EFI changes from Ingo Molnar: "The main changes: - Add debug code to the dump EFI pagetable - Borislav Petkov - Make 1:1 runtime mapping robust when booting on machines with lots of memory - Borislav Petkov - Move the EFI facilities bits out of 'x86_efi_facility' and into efi.flags which is the standard architecture independent place to keep EFI state, by Matt Fleming. - Add 'EFI mixed mode' support: this allows 64-bit kernels to be booted from 32-bit firmware. This needs a bootloader that supports the 'EFI handover protocol'. By Matt Fleming" * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86, efi: Abstract x86 efi_early calls x86/efi: Restore 'attr' argument to query_variable_info() x86/efi: Rip out phys_efi_get_time() x86/efi: Preserve segment registers in mixed mode x86/boot: Fix non-EFI build x86, tools: Fix up compiler warnings x86/efi: Re-disable interrupts after calling firmware services x86/boot: Don't overwrite cr4 when enabling PAE x86/efi: Wire up CONFIG_EFI_MIXED x86/efi: Add mixed runtime services support x86/efi: Firmware agnostic handover entry points x86/efi: Split the boot stub into 32/64 code paths x86/efi: Add early thunk code to go from 64-bit to 32-bit x86/efi: Build our own EFI services pointer table efi: Add separate 32-bit/64-bit definitions x86/efi: Delete dead code when checking for non-native x86/mm/pageattr: Always dump the right page table in an oops x86, tools: Consolidate #ifdef code x86/boot: Cleanup header.S by removing some #ifdefs efi: Use NULL instead of 0 for pointer ...	2014-03-31 12:26:05 -07:00
Linus Torvalds	918d80a136	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu handling changes from Ingo Molnar: "Bigger changes: - Intel CPU hardware-enablement: new vector instructions support (AVX-512), by Fenghua Yu. - Support the clflushopt instruction and use it in appropriate places. clflushopt is similar to clflush but with more relaxed ordering, by Ross Zwisler. - MSR accessor cleanups, by Borislav Petkov. - 'forcepae' boot flag for those who have way too much time to spend on way too old Pentium-M systems and want to live way too dangerously, by Chris Bainbridge" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC x86, intel: Make MSR_IA32_MISC_ENABLE bit constants systematic x86, Intel: Convert to the new bit access MSR accessors x86, AMD: Convert to the new bit access MSR accessors x86: Add another set of MSR accessor functions x86: Use clflushopt in drm_clflush_virt_range x86: Use clflushopt in drm_clflush_page x86: Use clflushopt in clflush_cache_range x86: Add support for the clflushopt instruction x86, AVX-512: Enable AVX-512 States Context Switch x86, AVX-512: AVX-512 Feature Detection	2014-03-31 12:00:45 -07:00
Linus Torvalds	6ed7705167	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic changes from Ingo Molnar: "An xAPIC CPU hotplug race fix, plus cleanups and minor fixes" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Plug racy xAPIC access of CPU hotplug code x86/apic: Always define nox2apic and define it as initdata x86/apic: Remove unused function prototypes x86/apic: Switch wait_for_init_deassert() to a bool flag x86/apic: Only use default_wait_for_init_deassert()	2014-03-31 11:58:45 -07:00
Linus Torvalds	971eae7c99	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "Bigger changes: - sched/idle restructuring: they are WIP preparation for deeper integration between the scheduler and idle state selection, by Nicolas Pitre. - add NUMA scheduling pseudo-interleaving, by Rik van Riel. - optimize cgroup context switches, by Peter Zijlstra. - RT scheduling enhancements, by Thomas Gleixner. The rest is smaller changes, non-urgnt fixes and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (68 commits) sched: Clean up the task_hot() function sched: Remove double calculation in fix_small_imbalance() sched: Fix broken setscheduler() sparc64, sched: Remove unused sparc64_multi_core sched: Remove unused mc_capable() and smt_capable() sched/numa: Move task_numa_free() to __put_task_struct() sched/fair: Fix endless loop in idle_balance() sched/core: Fix endless loop in pick_next_task() sched/fair: Push down check for high priority class task into idle_balance() sched/rt: Fix picking RT and DL tasks from empty queue trace: Replace hardcoding of 19 with MAX_NICE sched: Guarantee task priority in pick_next_task() sched/idle: Remove stale old file sched: Put rq's sched_avg under CONFIG_FAIR_GROUP_SCHED cpuidle/arm64: Remove redundant cpuidle_idle_call() cpuidle/powernv: Remove redundant cpuidle_idle_call() sched, nohz: Exclude isolated cores from load balancing sched: Fix select_task_rq_fair() description comments workqueue: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE sys: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE ...	2014-03-31 11:21:19 -07:00
Linus Torvalds	8c292f1174	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Main changes: Kernel side changes: - Add SNB/IVB/HSW client uncore memory controller support (Stephane Eranian) - Fix various x86/P4 PMU driver bugs (Don Zickus) Tooling, user visible changes: - Add several futex 'perf bench' microbenchmarks (Davidlohr Bueso) - Speed up thread map generation (Don Zickus) - Introduce 'perf kvm --list-cmds' command line option for use by scripts (Ramkumar Ramachandra) - Print the evsel name in the annotate stdio output, prep to fix support outputting annotation for multiple events, not just for the first one (Arnaldo Carvalho de Melo) - Allow setting preferred callchain method in .perfconfig (Jiri Olsa) - Show in what binaries/modules 'perf probe's are set (Masami Hiramatsu) - Support distro-style debuginfo for uprobe in 'perf probe' (Masami Hiramatsu) Tooling, internal changes and fixes: - Use tid in mmap/mmap2 events to find maps (Don Zickus) - Record the reason for filtering an address_location (Namhyung Kim) - Apply all filters to an addr_location (Namhyung Kim) - Merge al->filtered with hist_entry->filtered in report/hists (Namhyung Kim) - Fix memory leak when synthesizing thread records (Namhyung Kim) - Use ui__has_annotation() in 'report' (Namhyung Kim) - hists browser refactorings to reuse code accross UIs (Namhyung Kim) - Add support for the new DWARF unwinder library in elfutils (Jiri Olsa) - Fix build race in the generation of bison files (Jiri Olsa) - Further streamline the feature detection display, trimming it a bit to show just the libraries detected, using VF=1 gets a more verbose output, showing the less interesting feature checks as well (Jiri Olsa). - Check compatible symtab type before loading dso (Namhyung Kim) - Check return value of filename__read_debuglink() (Stephane Eranian) - Move some hashing and fs related code from tools/perf/util/ to tools/lib/ so that it can be used by more tools/ living utilities (Borislav Petkov) - Prepare DWARF unwinding code for using an elfutils alternative unwinding library (Jiri Olsa) - Fix DWARF unwind max_stack processing (Jiri Olsa) - Add dwarf unwind 'perf test' entry (Jiri Olsa) - 'perf probe' improvements including memory leak fixes, sharing the intlist class with other tools, uprobes/kprobes code sharing and use of ref_reloc_sym (Masami Hiramatsu) - Shorten sample symbol resolving by adding cpumode to struct addr_location (Arnaldo Carvalho de Melo) - Fix synthesizing mmaps for threads (Don Zickus) - Fix invalid output on event group stdio report (Namhyung Kim) - Fixup header alignment in 'perf sched latency' output (Ramkumar Ramachandra) - Fix off-by-one error in 'perf timechart record' argv handling (Ramkumar Ramachandra) Tooling, cleanups: - Remove unused thread__find_map function (Jiri Olsa) - Remove unused simple_strtoul() function (Ramkumar Ramachandra) Tooling, documentation updates: - Update function names in debug messages (Ramkumar Ramachandra) - Update some code references in design.txt (Ramkumar Ramachandra) - Clarify load-latency information in the 'perf mem' docs (Andi Kleen) - Clarify x86 register naming in 'perf probe' docs (Andi Kleen)" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (96 commits) perf tools: Remove unused simple_strtoul() function perf tools: Update some code references in design.txt perf evsel: Update function names in debug messages perf tools: Remove thread__find_map function perf annotate: Print the evsel name in the stdio output perf report: Use ui__has_annotation() perf tools: Fix memory leak when synthesizing thread records perf tools: Use tid in mmap/mmap2 events to find maps perf report: Merge al->filtered with hist_entry->filtered perf symbols: Apply all filters to an addr_location perf symbols: Record the reason for filtering an address_location perf sched: Fixup header alignment in 'latency' output perf timechart: Fix off-by-one error in 'record' argv handling perf machine: Factor machine__find_thread to take tid argument perf tools: Speed up thread map generation perf kvm: introduce --list-cmds for use by scripts perf ui hists: Pass evsel to hpp->header/width functions explicitly perf symbols: Introduce thread__find_cpumode_addr_location perf session: Change header.misc dump from decimal to hex perf ui/tui: Reuse generic __hpp__fmt() code ...	2014-03-31 11:13:25 -07:00
Linus Torvalds	462bf234a8	Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking updates from Ingo Molnar: "The biggest change is the MCS spinlock generalization changes from Tim Chen, Peter Zijlstra, Jason Low et al. There's also lockdep fixes/enhancements from Oleg Nesterov, in particular a false negative fix related to lockdep_set_novalidate_class() usage" * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) locking/mutex: Fix debug checks locking/mutexes: Add extra reschedule point locking/mutexes: Introduce cancelable MCS lock for adaptive spinning locking/mutexes: Unlock the mutex without the wait_lock locking/mutexes: Modify the way optimistic spinners are queued locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner() locking: Move mcs_spinlock.h into kernel/locking/ m68k: Skip futex_atomic_cmpxchg_inatomic() test futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test Revert "sched/wait: Suppress Sparse 'variable shadowing' warning" lockdep: Change lockdep_set_novalidate_class() to use _and_name lockdep: Change mark_held_locks() to check hlock->check instead of lockdep_no_validate lockdep: Don't create the wrong dependency on hlock->check == 0 lockdep: Make held_lock->check and "int check" argument bool locking/mcs: Allow architecture specific asm files to be used for contended case locking/mcs: Order the header files in Kbuild of each architecture in alphabetical order sched/wait: Suppress Sparse 'variable shadowing' warning hung_task/Documentation: Fix hung_task_warnings description locking/mcs: Allow architectures to hook in to contended paths locking/mcs: Micro-optimize the MCS code, add extra comments ...	2014-03-31 10:59:39 -07:00
Artem Fetishev	825600c0f2	x86: fix boot on uniprocessor systems On x86 uniprocessor systems topology_physical_package_id() returns -1 which causes rapl_cpu_prepare() to leave rapl_pmu variable uninitialized which leads to GPF in rapl_pmu_init(). See arch/x86/kernel/cpu/perf_event_intel_rapl.c. It turns out that physical_package_id and core_id can actually be retreived for uniprocessor systems too. Enabling them also fixes rapl_pmu code. Signed-off-by: Artem Fetishev <artem_fetishev@epam.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-03-28 13:56:58 -07:00
David Vrabel	5926f87fda	Revert "xen: properly account for _PAGE_NUMA during xen pte translations" This reverts commit `a9c8e4beee`. PTEs in Xen PV guests must contain machine addresses if _PAGE_PRESENT is set and pseudo-physical addresses is _PAGE_PRESENT is clear. This is because during a domain save/restore (migration) the page table entries are "canonicalised" and uncanonicalised". i.e., MFNs are converted to PFNs during domain save so that on a restore the page table entries may be rewritten with the new MFNs on the destination. This canonicalisation is only done for PTEs that are present. This change resulted in writing PTEs with MFNs if _PAGE_PROTNONE (or _PAGE_NUMA) was set but _PAGE_PRESENT was clear. These PTEs would be migrated as-is which would result in unexpected behaviour in the destination domain. Either a) the MFN would be translated to the wrong PFN/page; b) setting the _PAGE_PRESENT bit would clear the PTE because the MFN is no longer owned by the domain; or c) the present bit would not get set. Symptoms include "Bad page" reports when munmapping after migrating a domain. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <stable@vger.kernel.org> [3.12+]	2014-03-25 11:11:42 +00:00
Andy Lutomirski	3c1b63b9e4	x86, vdso: Finish removing VDSO32_PRELINK It's a declaration of a nonexistent symbol. We can get rid of the 64-bit versions, too, but that's more intrusive. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/2ce2ce18447d8a0b78d44a278a066b6c0af06b32.1395366931.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-03-20 20:20:18 -07:00
Andy Lutomirski	9e6f450f94	x86, vdso: Move more vdso definitions into vdso.h This fixes the Xen build and gets rid of a silly header file. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/1df77311795aff75f5742c787d277518314a38d3.1395366931.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-03-20 20:20:08 -07:00
Andy Lutomirski	b67e612cef	x86: Load the 32-bit vdso in place, just like the 64-bit vdsos This replaces a decent amount of incomprehensible and buggy code with much more straightforward code. It also brings the 32-bit vdso more in line with the 64-bit vdsos, so maybe someday they can share even more code. This wastes a small amount of kernel .data and .text space, but it avoids a couple of allocations on startup, so it should be more or less a wash memory-wise. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/b8093933fad09ce181edb08a61dcd5d2592e9814.1395352498.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-20 15:19:14 -07:00
Eric Paris	579ec9e1ab	audit: use uapi/linux/audit.h for AUDIT_ARCH declarations The syscall.h headers were including linux/audit.h but really only needed the uapi/linux/audit.h to get the requisite defines. Switch to the uapi headers. Signed-off-by: Eric Paris <eparis@redhat.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@linux-mips.org Cc: linux-s390@vger.kernel.org Cc: x86@kernel.org	2014-03-20 10:11:59 -04:00
Eric Paris	5e937a9ae9	syscall_get_arch: remove useless function arguments Every caller of syscall_get_arch() uses current for the task and no implementors of the function need args. So just get rid of both of those things. Admittedly, since these are inline functions we aren't wasting stack space, but it just makes the prototypes better. Signed-off-by: Eric Paris <eparis@redhat.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@linux-mips.org Cc: linux390@de.ibm.com Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linux-arch@vger.kernel.org	2014-03-20 10:11:59 -04:00
H. Peter Anvin	7b878d4b48	random: Add arch_has_random[_seed]() Add predicate functions for having arch_get_random[_seed]*(). The only current use is to avoid the loop in arch_random_refill() when arch_get_random_seed_long() is unavailable. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-03-19 22:24:08 -04:00
H. Peter Anvin	d20f78d252	x86, random: Enable the RDSEED instruction Upcoming Intel silicon adds a new RDSEED instruction, which is similar to RDRAND but provides a stronger guarantee: unlike RDRAND, RDSEED will always reseed the PRNG from the true random number source between each read. Thus, the output of RDSEED is guaranteed to be 100% entropic, unlike RDRAND which is only architecturally guaranteed to be 1/512 entropic (although in practice is much more.) The RDSEED instruction takes the same time to execute as RDRAND, but RDSEED unlike RDRAND can legitimately return failure (CF=0) due to entropy exhaustion if too many threads on too many cores are hammering the RDSEED instruction at the same time. Therefore, we have to be more conservative and only use it in places where we can tolerate failures. This patch introduces the primitives arch_get_random_seed_{int,long}() but does not use it yet. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-03-19 22:22:06 -04:00
Stefani Seibold	7c03156f34	x86, vdso: Add 32 bit VDSO time support for 64 bit kernel This patch add the VDSO time support for the IA32 Emulation Layer. Due the nature of the kernel headers and the LP64 compiler where the size of a long and a pointer differs against a 32 bit compiler, there is some type hacking necessary for optimal performance. The vsyscall_gtod_data struture must be a rearranged to serve 32- and 64-bit code access at the same time: - The seqcount_t was replaced by an unsigned, this makes the vsyscall_gtod_data intedepend of kernel configuration and internal functions. - All kernel internal structures are replaced by fix size elements which works for 32- and 64-bit access - The inner struct clock was removed to pack the whole struct. The "unsigned seq" would be handled by functions derivated from seqcount_t. Signed-off-by: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/1395094933-14252-11-git-send-email-stefani@seibold.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-18 12:52:41 -07:00
Stefani Seibold	7a59ed415f	x86, vdso: Add 32 bit VDSO time support for 32 bit kernel This patch add the time support for 32 bit a VDSO to a 32 bit kernel. For 32 bit programs running on a 32 bit kernel, the same mechanism is used as for 64 bit programs running on a 64 bit kernel. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/1395094933-14252-10-git-send-email-stefani@seibold.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-18 12:52:37 -07:00
Andy Lutomirski	b4b541a610	x86, vdso: Patch alternatives in the 32-bit VDSO We need the alternatives mechanism for rdtsc_barrier() to work. Signed-off-by: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/1395094933-14252-9-git-send-email-stefani@seibold.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-18 12:52:33 -07:00
Stefani Seibold	ef721987ae	x86, vdso: Introduce VVAR marco for vdso32 This patch revamps the vvar.h for introduce the VVAR macro for vdso32. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/1395094933-14252-8-git-send-email-stefani@seibold.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-18 12:52:29 -07:00
Stefani Seibold	d2312e3379	x86, vdso: Make vsyscall_gtod_data handling x86 generic This patch move the vsyscall_gtod_data handling out of vsyscall_64.c into an additonal file vsyscall_gtod.c to make the functionality available for x86 32 bit kernel. It also adds a new vsyscall_32.c which setup the VVAR page. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/1395094933-14252-2-git-send-email-stefani@seibold.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-18 12:51:52 -07:00
Zoltan Kiss	1429d46df4	xen/grant-table: Refactor gnttab_[un]map_refs to avoid m2p_override The grant mapping API does m2p_override unnecessarily: only gntdev needs it, for blkback and future netback patches it just cause a lock contention, as those pages never go to userspace. Therefore this series does the following: - the bulk of the original function (everything after the mapping hypercall) is moved to arch-dependent set/clear_foreign_p2m_mapping - the "if (xen_feature(XENFEAT_auto_translated_physmap))" branch goes to ARM - therefore the ARM function could be much smaller, the m2p_override stubs could be also removed - on x86 the set_phys_to_machine calls were moved up to this new funcion from m2p_override functions - and m2p_override functions are only called when there is a kmap_ops param It also removes a stray space from arch/x86/include/asm/xen/page.h. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Suggested-by: Anthony Liguori <aliguori@amazon.com> Suggested-by: David Vrabel <david.vrabel@citrix.com> Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2014-03-18 14:40:19 +00:00
Andy Lutomirski	7dda038756	x86_32, mm: Remove user bit from identity map PDE The only reason that the user bit was set was to support userspace access to the compat vDSO in the fixmap. The compat vDSO is gone, so the user bit can be removed. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/e240a977f3c7cbd525a091fd6521499ec4b8e94f.1394751608.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-13 16:20:17 -07:00
Andy Lutomirski	b0b49f2673	x86, vdso: Remove compat vdso support The compat vDSO is a complicated hack that's needed to maintain compatibility with a small range of glibc versions. This removes it and replaces it with a much simpler hack: a config option to disable the 32-bit vDSO by default. This also changes the default value of CONFIG_COMPAT_VDSO to n -- users configuring kernels from scratch almost certainly want that choice. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/4bb4690899106eb11430b1186d5cc66ca9d1660c.1394751608.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-13 16:20:09 -07:00
H. Peter Anvin	0b131be8d4	x86, intel: Make MSR_IA32_MISC_ENABLE bit constants systematic Replace somewhat arbitrary constants for bits in MSR_IA32_MISC_ENABLE with verbose but systematic ones. Add _BIT defines for all the rest of them, too. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-13 15:55:46 -07:00
Borislav Petkov	c0a639ad0b	x86, Intel: Convert to the new bit access MSR accessors ... and save some lines of code. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1394384725-10796-4-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-13 15:35:09 -07:00
Borislav Petkov	22085a66c2	x86: Add another set of MSR accessor functions We very often need to set or clear a bit in an MSR as a result of doing some sort of a hardware configuration. Add generic versions of that repeated functionality in order to save us a bunch of duplicated code in the early CPU vendor detection/config code. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1394384725-10796-2-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-13 15:34:45 -07:00
Frederic Weisbecker	073d8224d2	arch: Remove stub cputime.h headers Many architectures have a stub cputime.h that only include the default cputime.h Lets remove the useless headers, we only need to mention that we want the default headers on the Kbuild files. Cc: Archs <linux-arch@vger.kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2014-03-13 16:09:30 +01:00
Thomas Gleixner	ffb12cf002	Merge branch 'irq/for-gpio' into irq/core Merge the request/release callbacks which are in a separate branch for consumption by the gpio folks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-03-12 16:01:07 +01:00
Dave Jones	09df7c4c80	x86: Remove CONFIG_X86_OOSTORE This was an optimization that made memcpy type benchmarks a little faster on ancient (Circa 1998) IDT Winchip CPUs. In real-life workloads, it wasn't even noticable, and I doubt anyone is running benchmarks on 16 year old silicon any more. Given this code has likely seen very little use over the last decade, let's just remove it. Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-03-11 10:16:18 -07:00
Bjorn Helgaas	36fc5500bb	sched: Remove unused mc_capable() and smt_capable() Remove mc_capable() and smt_capable(). Neither is used. Both were added by `5c45bf279d` ("sched: mc/smt power savings sched policy"). Uses of both were removed by `8e7fbcbc22` ("sched: Remove stale power aware scheduling remnants and dysfunctional knobs"). Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Link: http://lkml.kernel.org/r/20140304210737.16893.54289.stgit@bhelgaas-glaptop.roam.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-03-11 12:05:45 +01:00
Ingo Molnar	0066f3b93e	Merge branch 'perf/urgent' into perf/core Merge the latest fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-03-11 11:53:50 +01:00
Paolo Bonzini	c77fb5fe6f	KVM: x86: Allow the guest to run with dirty debug registers When not running in guest-debug mode, the guest controls the debug registers and having to take an exit for each DR access is a waste of time. If the guest gets into a state where each context switch causes DR to be saved and restored, this can take away as much as 40% of the execution time from the guest. After this patch, VMX- and SVM-specific code can set a flag in switch_db_regs, telling vcpu_enter_guest that on the next exit the debug registers might be dirty and need to be reloaded (syncing will be taken care of by a new callback in kvm_x86_ops). This flag can be set on the first access to a debug registers, so that multiple accesses to the debug registers only cause one vmexit. Note that since the guest will be able to read debug registers and enable breakpoints in DR7, we need to ensure that they are synchronized on entry to the guest---including DR6 that was not synced before. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-03-11 10:46:02 +01:00
Paolo Bonzini	360b948d88	KVM: x86: change vcpu->arch.switch_db_regs to a bit mask The next patch will add another bit that we can test with the same "if". Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-03-11 10:46:02 +01:00
Jan Kiszka	c9a7953f09	KVM: x86: Remove return code from enable_irq/nmi_window It's no longer possible to enter enable_irq_window in guest mode when L1 intercepts external interrupts and we are entering L2. This is now caught in vcpu_enter_guest. So we can remove the check from the VMX version of enable_irq_window, thus the need to return an error code from both enable_irq_window and enable_nmi_window. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-03-11 08:41:47 +01:00
Jan Kiszka	b6b8a1451f	KVM: nVMX: Rework interception of IRQs and NMIs Move the check for leaving L2 on pending and intercepted IRQs or NMIs from the *_allowed handler into a dedicated callback. Invoke this callback at the relevant points before KVM checks if IRQs/NMIs can be injected. The callback has the task to switch from L2 to L1 if needed and inject the proper vmexit events. The rework fixes L2 wakeups from HLT and provides the foundation for preemption timer emulation. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-03-11 08:41:45 +01:00
Steven Rostedt	198d208df4	x86: Keep thread_info on thread stack in x86_32 x86_64 uses a per_cpu variable kernel_stack to always point to the thread stack of current. This is where the thread_info is stored and is accessed from this location even when the irq or exception stack is in use. This removes the complexity of having to maintain the thread info on the stack when interrupts are running and having to copy the preempt_count and other fields to the interrupt stack. x86_32 uses the old method of copying the thread_info from the thread stack to the exception stack just before executing the exception. Having the two different requires #ifdefs and also the x86_32 way is a bit of a pain to maintain. By converting x86_32 to the same method of x86_64, we can remove #ifdefs, clean up the x86_32 code a little, and remove the overhead of the copy. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20110806012354.263834829@goodmis.org Link: http://lkml.kernel.org/r/20140206144321.852942014@goodmis.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-06 16:56:55 -08:00
Steven Rostedt	0788aa6a23	x86: Prepare removal of previous_esp from i386 thread_info structure The i386 thread_info contains a previous_esp field that is used to daisy chain the different stacks for dump_stack() (ie. irq, softirq, thread stacks). The goal is to eventual make i386 handling of thread_info the same as x86_64, which means that the thread_info will not be in the stack but as a per_cpu variable. We will no longer depend on thread_info being able to daisy chain different stacks as it will only exist in one location (the thread stack). By moving previous_esp to the end of thread_info and referencing it as an offset instead of using a thread_info field, this becomes a stepping stone to moving the thread_info. The offset to get to the previous stack is rather ugly in this patch, but this is only temporary and the prev_esp will be changed in the next commit. This commit is more for sanity checks of the change. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Robert Richter <rric@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20110806012353.891757693@goodmis.org Link: http://lkml.kernel.org/r/20140206144321.608754481@goodmis.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-06 16:56:54 -08:00
Steven Rostedt (Red Hat)	b807902a88	x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386 According to a git log -p, GET_THREAD_INFO_WITH_ESP() has only been defined and never been used. Get rid of it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20140206144321.409045251@goodmis.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-06 16:56:54 -08:00
Steven Rostedt	2432e1364b	x86: Nuke the supervisor_stack field in i386 thread_info Nothing references the supervisor_stack in the thread_info field, and it does not exist in x86_64. To make the two more the same, it is being removed. Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20110806012353.546183789@goodmis.org Link: http://lkml.kernel.org/r/20140206144321.203619611@goodmis.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-06 16:56:54 -08:00
Heiko Carstens	378a10f3ae	fs/compat: optional preadv64/pwrite64 compat system calls The preadv64/pwrite64 have been implemented for the x32 ABI, in order to allow passing 64 bit arguments from user space without splitting them into two 32 bit parameters, like it would be necessary for usual compat tasks. Howevert these two system calls are only being used for the x32 ABI, so add __ARCH_WANT_COMPAT defines for these two compat syscalls and make these two only visible for x86. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-06 15:35:09 +01:00
Thomas Gleixner	7ff42473eb	x86: hardirq: Make irq_hv_callback_count available for CONFIG_HYPERV=m as well Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-03-06 12:08:37 +01:00
Matt Fleming	994448f1af	Merge remote-tracking branch 'tip/x86/efi-mixed' into efi-for-mingo Conflicts: arch/x86/kernel/setup.c arch/x86/platform/efi/efi.c arch/x86/platform/efi/efi_64.c	2014-03-05 18:15:37 +00:00
Matt Fleming	4fd69331ad	Merge remote-tracking branch 'tip/x86/urgent' into efi-for-mingo Conflicts: arch/x86/include/asm/efi.h	2014-03-05 17:31:41 +00:00
Thomas Gleixner	76d388cd72	x86: hyperv: Fixup the (brain) damage caused by the irq cleanup Compiling last minute changes without setting the proper config options is not really clever. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-03-05 13:42:14 +01:00
H. Peter Anvin	3c0b566334	* Disable the new EFI 1:1 virtual mapping for SGI UV because using it causes a crash during boot - Borislav Petkov -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTFmV4AAoJEC84WcCNIz1VeSgP/1gykrBiH3Vr4H4c32la/arZ ktXRAT+RHdiebEXopt6+A1Pv6iyRYz3fBB1cKKb7q8fhDmVVmefGVkyO4qg4NbgL Ic1dP6uPgiFidcYML9/c6+UIovbwDD7f5wwJEjXxoZKg7b7P0TIykd8z8YPQ/6A6 fIY5Z2L9A8eVt4k1m6Dg2PUJ2B+XKOLa5BbL7gXB6u9avgAVAoLM9oQStss+V9Pv JQu+BZxeEwuxgi0LOxk1sFWXaAoRtgVDNH6nPK93CRvO2H83voWFf+OcW9hQWsF5 tLam6aMkvjuM5Dv1IA69PBAWrE/jxcMvjEdJyrGMWRKWtH/iTN4ez4EoFiO38o4R IgzXh9L5xbP3o+g3rntIu4h3/5yde9TRM18mER0lLTdNFZ8QzmLt4L0TptntRoBv bTXffILACq0uQU6T10P+EwseT472HphMeswaWVDkRxkN2hTCipFqQX0ekb0qbw3O yQeRyz1/t7DeA77iAGG96SfeziMdr44d6u6zDQPrTAJV+H1ZZ3XNIlJDi01CAg3b PDT6nHb/9V0tPZQebntZnczVRP+5EBdn5RbJaAMldEwVgjdjlFNY5slsF01KeCSF 6Cx6UyAI9XKuZMoayC3QDHKKWi+BTOwbKcVAdtZ1c9AMLt9pyxsDfdoOAV4zBiQa PsVs9t/l7TZoL1MQHlEJ =YIab -----END PGP SIGNATURE----- Merge tag 'efi-urgent' into x86/urgent * Disable the new EFI 1:1 virtual mapping for SGI UV because using it causes a crash during boot - Borislav Petkov Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-04 15:50:06 -08:00
Borislav Petkov	a5d90c923b	x86/efi: Quirk out SGI UV Alex reported hitting the following BUG after the EFI 1:1 virtual mapping work was merged, kernel BUG at arch/x86/mm/init_64.c:351! invalid opcode: 0000 [#1] SMP Call Trace: [<ffffffff818aa71d>] init_extra_mapping_uc+0x13/0x15 [<ffffffff818a5e20>] uv_system_init+0x22b/0x124b [<ffffffff8108b886>] ? clockevents_register_device+0x138/0x13d [<ffffffff81028dbb>] ? setup_APIC_timer+0xc5/0xc7 [<ffffffff8108b620>] ? clockevent_delta2ns+0xb/0xd [<ffffffff818a3a92>] ? setup_boot_APIC_clock+0x4a8/0x4b7 [<ffffffff8153d955>] ? printk+0x72/0x74 [<ffffffff818a1757>] native_smp_prepare_cpus+0x389/0x3d6 [<ffffffff818957bc>] kernel_init_freeable+0xb7/0x1fb [<ffffffff81535530>] ? rest_init+0x74/0x74 [<ffffffff81535539>] kernel_init+0x9/0xff [<ffffffff81541dfc>] ret_from_fork+0x7c/0xb0 [<ffffffff81535530>] ? rest_init+0x74/0x74 Getting this thing to work with the new mapping scheme would need more work, so automatically switch to the old memmap layout for SGI UV. Acked-by: Russ Anderson <rja@sgi.com> Cc: Alex Thorlton <athorlton@sgi.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-03-04 23:43:33 +00:00
Matt Fleming	7d453eee36	x86/efi: Wire up CONFIG_EFI_MIXED Add the Kconfig option and bump the kernel header version so that boot loaders can check whether the handover code is available if they want. The xloadflags field in the bzImage header is also updated to reflect that the kernel supports both entry points by setting both of XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y. XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is guaranteed to be addressable with 32-bits. Note that no boot loaders should be using the bits set in xloadflags to decide which entry point to jump to. The entire scheme is based on the concept that 32-bit bootloaders always jump to ->handover_offset and 64-bit loaders always jump to ->handover_offset + 512. We set both bits merely to inform the boot loader that it's safe to use the native handover offset even if the machine type in the PE/COFF header claims otherwise. Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-03-04 21:43:57 +00:00
Matt Fleming	4f9dbcfc40	x86/efi: Add mixed runtime services support Setup the runtime services based on whether we're booting in EFI native mode or not. For non-native mode we need to thunk from 64-bit into 32-bit mode before invoking the EFI runtime services. Using the runtime services after SetVirtualAddressMap() is slightly more complicated because we need to ensure that all the addresses we pass to the firmware are below the 4GB boundary so that they can be addressed with 32-bit pointers, see efi_setup_page_tables(). Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-03-04 21:43:14 +00:00
Matt Fleming	b8ff87a615	x86/efi: Firmware agnostic handover entry points The EFI handover code only works if the "bitness" of the firmware and the kernel match, i.e. 64-bit firmware and 64-bit kernel - it is not possible to mix the two. This goes against the tradition that a 32-bit kernel can be loaded on a 64-bit BIOS platform without having to do anything special in the boot loader. Linux distributions, for one thing, regularly run only 32-bit kernels on their live media. Despite having only one 'handover_offset' field in the kernel header, EFI boot loaders use two separate entry points to enter the kernel based on the architecture the boot loader was compiled for, (1) 32-bit loader: handover_offset (2) 64-bit loader: handover_offset + 512 Since we already have two entry points, we can leverage them to infer the bitness of the firmware we're running on, without requiring any boot loader modifications, by making (1) and (2) valid entry points for both CONFIG_X86_32 and CONFIG_X86_64 kernels. To be clear, a 32-bit boot loader will always use (1) and a 64-bit boot loader will always use (2). It's just that, if a single kernel image supports (1) and (2) that image can be used with both 32-bit and 64-bit boot loaders, and hence both 32-bit and 64-bit EFI. (1) and (2) must be 512 bytes apart at all times, but that is already part of the boot ABI and we could never change that delta without breaking existing boot loaders anyhow. Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-03-04 21:25:06 +00:00
Matt Fleming	426e34cc4f	x86/mm/pageattr: Always dump the right page table in an oops Now that we have EFI-specific page tables we need to lookup the pgd when dumping those page tables, rather than assuming that swapper_pgdir is the current pgdir. Remove the double underscore prefix, which is usually reserved for static functions. Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-03-04 21:23:36 +00:00
Michael Opdenacker	d20d2efbf2	x86: Remove deprecated IRQF_DISABLED This patch removes the IRQF_DISABLED flag from x86 architecture code. It's a NOOP since 2.6.35 and it will be removed one day. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Cc: venki@google.com Link: http://lkml.kernel.org/r/1393965305-17248-1-git-send-email-michael.opdenacker@free-electrons.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-03-04 21:47:51 +01:00
Thomas Gleixner	1aec169673	x86: Hyperv: Cleanup the irq mess The vmbus/hyperv interrupt handling is another complete trainwreck and probably the worst of all currently in tree. If CONFIG_HYPERV=y then the interrupt delivery to the vmbus happens via the direct HYPERVISOR_CALLBACK_VECTOR. So far so good, but: The driver requests first a normal device interrupt. The only reason to do so is to increment the interrupt stats of that device interrupt. For no reason it also installs a private flow handler. We have proper accounting mechanisms for direct vectors, but of course it's too much effort to add that 5 lines of code. Aside of that the alloc_intr_gate() is not protected against reallocation which makes module reload impossible. Solution to the problem is simple to rip out the whole mess and implement it correctly. First of all move all that code to arch/x86/kernel/cpu/mshyperv.c and merily install the HYPERVISOR_CALLBACK_VECTOR with proper reallocation protection and use the proper direct vector accounting mechanism. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linuxdrivers <devel@linuxdriverproject.org> Cc: x86 <x86@kernel.org> Link: http://lkml.kernel.org/r/20140223212739.028307673@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-03-04 17:37:54 +01:00
Thomas Gleixner	929320e4b4	x86: Add proper vector accounting for HYPERVISOR_CALLBACK_VECTOR HyperV abuses a device interrupt to account for the HYPERVISOR_CALLBACK_VECTOR. Provide proper accounting as we have for the other vectors as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: x86 <x86@kernel.org> Link: http://lkml.kernel.org/r/20140223212738.681855582@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-03-04 17:37:54 +01:00
Borislav Petkov	b7b898ae0c	x86/efi: Make efi virtual runtime map passing more robust Currently, running SetVirtualAddressMap() and passing the physical address of the virtual map array was working only by a lucky coincidence because the memory was present in the EFI page table too. Until Toshi went and booted this on a big HP box - the krealloc() manner of resizing the memmap we're doing did allocate from such physical addresses which were not mapped anymore and boom: http://lkml.kernel.org/r/1386806463.1791.295.camel@misato.fc.hp.com One way to take care of that issue is to reimplement the krealloc thing but with pages. We start with contiguous pages of order 1, i.e. 2 pages, and when we deplete that memory (shouldn't happen all that often but you know firmware) we realloc the next power-of-two pages. Having the pages, it is much more handy and easy to map them into the EFI page table with the already existing mapping code which we're using for building the virtual mappings. Thanks to Toshi Kani and Matt for the great debugging help. Reported-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-03-04 16:17:18 +00:00
Borislav Petkov	42a5477251	x86, pageattr: Export page unmapping interface We will use it in efi so expose it. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-03-04 16:17:17 +00:00
Borislav Petkov	11cc851254	x86/efi: Dump the EFI page table This is very useful for debugging issues with the recently added pagetable switching code for EFI virtual mode. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-03-04 16:17:17 +00:00
Borislav Petkov	ef6bea6ddf	x86, ptdump: Add the functionality to dump an arbitrary pagetable With reusing the ->trampoline_pgd page table for mapping EFI regions in order to use them after having switched to EFI virtual mode, it is very useful to be able to dump aforementioned page table in dmesg. This adds that functionality through the walk_pgd_level() interface which can be called from somewhere else. The original functionality of dumping to debugfs remains untouched. Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-03-04 16:16:17 +00:00
Matt Fleming	3e90959921	efi: Move facility flags to struct efi As we grow support for more EFI architectures they're going to want the ability to query which EFI features are available on the running system. Instead of storing this information in an architecture-specific place, stick it in the global 'struct efi', which is already the central location for EFI state. While we're at it, let's change the return value of efi_enabled() to be bool and replace all references to 'facility' with 'feature', which is the usual word used to describe the attributes of the running system. Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-03-04 16:16:16 +00:00
Paolo Bonzini	1c2af4968e	Merge tag 'kvm-for-3.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into kvm-next	2014-03-04 15:58:00 +01:00
Andrew Jones	332967a3ea	x86: kvm: introduce periodic global clock updates commit `0061d53daf` introduced a mechanism to execute a global clock update for a vm. We can apply this periodically in order to propagate host NTP corrections. Also, if all vcpus of a vm are pinned, then without an additional trigger, no guest NTP corrections can propagate either, as the current trigger is only vcpu cpu migration. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-03-04 11:50:54 +01:00
Andrew Jones	7e44e4495a	x86: kvm: rate-limit global clock updates When we update a vcpu's local clock it may pick up an NTP correction. We can't wait an indeterminate amount of time for other vcpus to pick up that correction, so commit `0061d53daf` introduced a global clock update. However, we can't request a global clock update on every vcpu load either (which is what happens if the tsc is marked as unstable). The solution is to rate-limit the global clock updates. Marcelo calculated that we should delay the global clock updates no more than 0.1s as follows: Assume an NTP correction c is applied to one vcpu, but not the other, then in n seconds the delta of the vcpu system_timestamps will be c * n. If we assume a correction of 500ppm (worst-case), then the two vcpus will diverge 50us in 0.1s, which is a considerable amount. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-03-04 11:50:47 +01:00
Heiko Carstens	0473c9b5f0	compat: let architectures define __ARCH_WANT_COMPAT_SYS_GETDENTS64 For architecture dependent compat syscalls in common code an architecture must define something like __ARCH_WANT_<WHATEVER> if it wants to use the code. This however is not true for compat_sys_getdents64 for which architectures must define __ARCH_OMIT_COMPAT_SYS_GETDENTS64 if they do not want the code. This leads to the situation where all architectures, except mips, get the compat code but only x86_64, arm64 and the generic syscall architectures actually use it. So invert the logic, so that architectures actively must do something to get the compat code. This way a couple of architectures get rid of otherwise dead code. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:33 +01:00
Greg Kroah-Hartman	13df797743	Merge 3.14-rc5 into driver-core-next We want the fixes in here.	2014-03-02 20:09:08 -08:00
H. Peter Anvin	840d2830e6	x86, cpufeature: Rename X86_FEATURE_CLFLSH to X86_FEATURE_CLFLUSH We call this "clflush" in /proc/cpuinfo, and have cpu_has_clflush()... let's be consistent and just call it that. Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alan Cox <alan@linux.intel.com> Link: http://lkml.kernel.org/n/tip-mlytfzjkvuf739okyn40p8a5@git.kernel.org	2014-02-27 08:31:30 -08:00
Ross Zwisler	171699f763	x86: Add support for the clflushopt instruction Add support for the new clflushopt instruction. This instruction was announced in the document "Intel Architecture Instruction Set Extensions Programming Reference" with Ref # 319433-018. http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf [ hpa: changed the feature flag to simply X86_FEATURE_CLFLUSHOPT - if that is what we want to report in /proc/cpuinfo anyway... ] Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Link: http://lkml.kernel.org/r/1393441612-19729-2-git-send-email-ross.zwisler@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-02-27 08:23:28 -08:00
H. Peter Anvin	b5660ba76b	x86, platforms: Remove NUMAQ The NUMAQ support seems to be unmaintained, remove it. Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: David Rientjes <rientjes@google.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/n/530CFD6C.7040705@zytor.com	2014-02-27 08:07:39 -08:00
H. Peter Anvin	c5f9ee3d66	x86, platforms: Remove SGI Visual Workstation The SGI Visual Workstation seems to be dead; remove support so we don't have to continue maintaining it. Cc: Andrey Panin <pazke@donpac.ru> Cc: Michael Reed <mdr@sgi.com> Link: http://lkml.kernel.org/r/530CFD6C.7040705@zytor.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-02-27 08:07:39 -08:00
Ingo Molnar	ff5a7088f0	Merge branch 'perf/urgent' into perf/core Merge the latest fixes before queueing up new changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-02-27 12:41:17 +01:00
Liu, Jinsong	da8999d318	KVM: x86: Intel MPX vmx and msr handle From caddc009a6d2019034af8f2346b2fd37a81608d0 Mon Sep 17 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com> Date: Mon, 24 Feb 2014 18:11:11 +0800 Subject: [PATCH v5 1/3] KVM: x86: Intel MPX vmx and msr handle This patch handle vmx and msr of Intel MPX feature. Signed-off-by: Xudong Hao <xudong.hao@intel.com> Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-02-24 12:14:00 +01:00
Liu, Jinsong	56c103ec04	KVM: x86: Fix xsave cpuid exposing bug From 00c920c96127d20d4c3bb790082700ae375c39a0 Mon Sep 17 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com> Date: Fri, 21 Feb 2014 23:47:18 +0800 Subject: [PATCH] KVM: x86: Fix xsave cpuid exposing bug EBX of cpuid(0xD, 0) is dynamic per XCR0 features enable/disable. Bit 63 of XCR0 is reserved for future expansion. Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-02-22 15:53:34 +01:00
Fenghua Yu	c2bc11f10a	x86, AVX-512: Enable AVX-512 States Context Switch This patch enables Opmask, ZMM_Hi256, and Hi16_ZMM AVX-512 states for xstate context switch. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1392931491-33237-2-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # hw enabling	2014-02-20 13:56:55 -08:00
Fenghua Yu	8e5780fdee	x86, AVX-512: AVX-512 Feature Detection AVX-512 is an extention of AVX2. Its spec can be found at: http://download-software.intel.com/sites/default/files/managed/71/2e/319433-017.pdf This patch detects AVX-512 features by CPUID. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1392931491-33237-1-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # hw enabling	2014-02-20 13:56:55 -08:00
Thomas Gleixner	5f0e030930	x86, tsc: Fallback to normal calibration if fast MSR calibration fails If we cannot calibrate TSC via MSR based calibration try_msr_calibrate_tsc() stores zero to fast_calibrate and returns that to the caller. This value gets then propagated further to clockevents code resulting division by zero oops like the one below: divide error: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 3.13.0+ #47 task: ffff880075508000 ti: ffff880075506000 task.ti: ffff880075506000 RIP: 0010:[<ffffffff810aec14>] [<ffffffff810aec14>] clockevents_config.part.3+0x24/0xa0 RSP: 0000:ffff880075507e58 EFLAGS: 00010246 RAX: ffffffffffffffff RBX: ffff880079c0cd80 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffff RBP: ffff880075507e70 R08: 0000000000000001 R09: 00000000000000be R10: 00000000000000bd R11: 0000000000000003 R12: 000000000000b008 R13: 0000000000000008 R14: 000000000000b010 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880079c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff880079fff000 CR3: 0000000001c0b000 CR4: 00000000001006f0 Stack: ffff880079c0cd80 000000000000b008 0000000000000008 ffff880075507e88 ffffffff810aecb0 ffff880079c0cd80 ffff880075507e98 ffffffff81030168 ffff880075507ed8 ffffffff81d1104f 00000000000000c3 0000000000000000 Call Trace: [<ffffffff810aecb0>] clockevents_config_and_register+0x20/0x30 [<ffffffff81030168>] setup_APIC_timer+0xc8/0xd0 [<ffffffff81d1104f>] setup_boot_APIC_clock+0x4cc/0x4d8 [<ffffffff81d0f5de>] native_smp_prepare_cpus+0x3dd/0x3f0 [<ffffffff81d02ee9>] kernel_init_freeable+0xc3/0x205 [<ffffffff8177c910>] ? rest_init+0x90/0x90 [<ffffffff8177c91e>] kernel_init+0xe/0x120 [<ffffffff8178deec>] ret_from_fork+0x7c/0xb0 [<ffffffff8177c910>] ? rest_init+0x90/0x90 Prevent this from happening by: 1) Modifying try_msr_calibrate_tsc() to return calibration value or zero if it fails. 2) Check this return value in native_calibrate_tsc() and in case of zero fallback to use normal non-MSR based calibration. [mw: Added subject and changelog] Reported-and-tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bin Gao <bin.gao@linux.intel.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1392810750-18660-1-git-send-email-mika.westerberg@linux.intel.com Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-02-19 17:12:24 +01:00
Ard Biesheuvel	2b9c1f0327	x86: align x86 arch with generic CPU modalias handling The x86 CPU feature modalias handling existed before it was reimplemented generically. This patch aligns the x86 handling so that it (a) reuses some more code that is now generic; (b) uses the generic format for the modalias module metadata entry, i.e., it now uses 'cpu:type:x86,venVVVVfamFFFFmodMMMM:feature:,XXXX,YYYY' instead of the 'x86cpu:vendor:VVVV👪FFFF:model:MMMM:feature:,XXXX,YYYY' that was used before. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-02-18 12:45:38 -08:00
Linus Torvalds	7fc9280462	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 EFI fixes from Peter Anvin: "A few more EFI-related fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/efi: Check status field to validate BGRT header x86/efi: Fix 32-bit fallout	2014-02-15 15:02:28 -08:00
Borislav Petkov	c55d016f7a	x86/efi: Fix 32-bit fallout We do not enable the new efi memmap on 32-bit and thus we need to run runtime_code_page_mkexec() unconditionally there. Fix that. Reported-and-tested-by: Lejun Zhu <lejun.zhu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-02-14 09:30:19 +00:00
Mel Gorman	a9c8e4beee	xen: properly account for _PAGE_NUMA during xen pte translations Steven Noonan forwarded a users report where they had a problem starting vsftpd on a Xen paravirtualized guest, with this in dmesg: BUG: Bad page map in process vsftpd pte:8000000493b88165 pmd:e9cc01067 page:ffffea00124ee200 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x2ffc0000000014(referenced\|dirty) addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping: (null) index:7f97eea74 CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1 Call Trace: dump_stack+0x45/0x56 print_bad_pte+0x22e/0x250 unmap_single_vma+0x583/0x890 unmap_vmas+0x65/0x90 exit_mmap+0xc5/0x170 mmput+0x65/0x100 do_exit+0x393/0x9e0 do_group_exit+0xcc/0x140 SyS_exit_group+0x14/0x20 system_call_fastpath+0x1a/0x1f Disabling lock debugging due to kernel taint BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1 BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1 The issue could not be reproduced under an HVM instance with the same kernel, so it appears to be exclusive to paravirtual Xen guests. He bisected the problem to commit `1667918b64` ("mm: numa: clear numa hinting information on mprotect") that was also included in 3.12-stable. The problem was related to how xen translates ptes because it was not accounting for the _PAGE_NUMA bit. This patch splits pte_present to add a pteval_present helper for use by xen so both bare metal and xen use the same code when checking if a PTE is present. [mgorman@suse.de: wrote changelog, proposed minor modifications] [akpm@linux-foundation.org: fix typo in comment] Reported-by: Steven Noonan <steven@uplinklabs.net> Tested-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:41 -08:00
Tim Chen	ddf1d169c0	locking/mcs: Allow architecture specific asm files to be used for contended case This patch allows each architecture to add its specific assembly optimized arch_mcs_spin_lock_contended and arch_mcs_spinlock_uncontended for MCS lock and unlock functions. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: AswinChandramouleeswaran <aswin@hp.com> Cc: George Spelvin <linux@horizon.com> Cc: Rik vanRiel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: MichelLespinasse <walken@google.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alex Shi <alex.shi@linaro.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Figo.zhang" <figo1802@gmail.com> Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Waiman Long <waiman.long@hp.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew R Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1390347382.3138.67.camel@schen9-DESK Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-02-09 21:18:52 +01:00
David Rientjes	dc9788f40a	x86/apic: Always define nox2apic and define it as initdata The "nox2apic" variable can be defined as __initdata since it is only used for bootstrap. It can now unconditionally be defined since it will later be freed. At the same time, it is also better off as a bool. Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402042354380.7839@chino.kir.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-02-09 15:15:11 +01:00
David Rientjes	6d4989835e	x86/apic: Remove unused function prototypes Some function prototypes declared in asm/apic.h are never defined, so remove them. Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402042354210.7839@chino.kir.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-02-09 15:15:09 +01:00
David Rientjes	465822cfc8	x86/apic: Switch wait_for_init_deassert() to a bool flag Now that there is only a single wait_for_init_deassert() function, just convert the member of struct apic to a bool to determine whether we need to wait for init_deassert to become non-zero. There are no more callers of default_wait_for_init_deassert(), so fold it into the caller. Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402042354010.7839@chino.kir.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-02-09 15:15:08 +01:00
David Rientjes	d3c63ae1e2	x86/apic: Only use default_wait_for_init_deassert() es7000_wait_for_init_deassert() is functionally equivalent to default_wait_for_init_deassert(), so remove the duplicate code and use only a single function. Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402042353030.7839@chino.kir.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-02-09 15:15:07 +01:00
Peter Zijlstra	e90c785352	x86/nmi: Push duration printk() to irq context Calling printk() from NMI context is bad (TM), so move it to IRQ context. In doing so we slightly change (probably wreck) the debugfs nmi_longest_ns thingy, in that it doesn't update to reflect the longest, nor does writing to it reset the count. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Don Zickus <dzickus@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/n/tip-rdw0au56a5ymis1u8p48c12d@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-02-09 13:17:22 +01:00
Linus Torvalds	c1ff84317f	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "Quite a varied little collection of fixes. Most of them are relatively small or isolated; the biggest one is Mel Gorman's fixes for TLB range flushing. A couple of AMD-related fixes (including not crashing when given an invalid microcode image) and fix a crash when compiled with gcov" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode, AMD: Unify valid container checks x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y x86/efi: Allow mapping BGRT on x86-32 x86: Fix the initialization of physnode_map x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable() x86/intel/mid: Fix X86_INTEL_MID dependencies arch/x86/mm/srat: Skip NUMA_NO_NODE while parsing SLIT mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridge x86: mm: change tlb_flushall_shift for IvyBridge x86/mm: Eliminate redundant page table walk during TLB range flushing x86/mm: Clean up inconsistencies when flushing TLB ranges mm, x86: Account for TLB flushes only when debugging x86/AMD/NB: Fix amd_set_subcaches() parameter type x86/quirks: Add workaround for AMD F16h Erratum792 x86, doc, kconfig: Fix dud URL for Microcode data	2014-02-08 11:54:43 -08:00
H. Peter Anvin	a3b072cd18	* Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJS9P2pAAoJEC84WcCNIz1VABsP/j0fwiWdaoEn/9p9EUQcBcMn CILuiKI8GoBR+0nH7vm/jSgUKfUxzM6T0+qjoZPVkO/u/qup+JCT5JxoywUyEbfb +wPNIhBgKSwJdWXPd4ZvObA9jknrP6r5sJTsixpESVpVWmcC8egPgkyBFILjokKS BCJqqruHZcXfWaDQTocYErl3T217J7RfnCG1o7yP96g4jteX8EdvIkPjEf2mM83I GXacHLy8IhGhdyPKSXcRqZ0ivhZ2WX1iFQYIY19FhmzBs364WulyXve+oV+l/4h4 Kx7ks4Ob5AlUIizchGNwVz7058F9o/v7m0CezTgi4Q/RrZi34samxbjk/95/Xc60 JSvWkekm3/jOODubad5zj+ZnJmG83/ZlCUuTaqsE47ftbaedSUHBN9QSpm8iHEex n3d4J3AdP/3amcP33kZ5MRALDYIFKb4ZxtDkADqDcXhS56COivGAdZe5hnyCpb/9 RPUDXTOlxfJQK/y2Atcdb400JjJ/Yr9Kew81LRIt0UMZU3dKSh05UZ+a7Ym0yCkt 3k0NNkgsFCZbYTO/Z3aPDcprwU5Lq9UrwjB17U2ev/qK+qRYDzCzSR0XGPrLMRv7 C5Bnov6uCn/0ZG/NlAx8UXK9wdWDsLhp1QkBz+daX3sGwRAS+OiKBv6+l8dqsdOc 1L8PMkTX2rgtELiv4PJ/ =hLCC -----END PGP SIGNATURE----- Merge tag 'efi-urgent' into x86/urgent * Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit). Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-02-07 11:27:30 -08:00
Linus Torvalds	1cd731df09	Bug-fixes: - Revert "xen/grant-table: Avoid m2p_override during mapping" as it broke Xen ARM build. - Fix CR4 not being set on AP processors in Xen PVH mode. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJS8AyQAAoJEFjIrFwIi8fJbD4IAJssMuaLI5CRsSWBgDFHHDFt srVJpDOYQiDr/TxkwFCVcL4sFy9Htb3KMArU4eIBl6uMqQbGa+3rHyXcHYI219YY XH3D8RG+9JChwsxtaeUEzwx1C8ehcygD34vtdcoQXa7eBuEi4TL3HeLifR+HrXKO UdFrTA34FmvpVFbSuRXkZh5sd6ca9et9xHuQHM8SIY6pVokY6xaEYOp17tfPZpwM 7A6LFjUjXeugHC2L3+/H8UOHA9nSZQvnMiZOWq2Cusc2Dt2V7emzgk2wcc2CHttf EA6GbtiJzHqMPmt5EjubI9hHdSMB31HpY4hnQE38+ucl+BwiSdRE9z2Rm4TYClg= =IX4M -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen fixes from Konrad Rzeszutek Wilk: "Bug-fixes: - Revert "xen/grant-table: Avoid m2p_override during mapping" as it broke Xen ARM build. - Fix CR4 not being set on AP processors in Xen PVH mode" * tag 'stable/for-linus-3.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pvh: set CR4 flags for APs Revert "xen/grant-table: Avoid m2p_override during mapping"	2014-02-05 16:01:11 -08:00
Marcelo Tosatti	4f34d683e5	KVM: x86: remove unused last_kernel_ns variable Remove unused last_kernel_ns variable. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-02-04 04:20:42 +01:00
Bjorn Helgaas	25453e9e52	x86/PCI: Remove mp_bus_to_node[], set_mp_bus_to_node(), get_mp_bus_to_node() There are no callers of get_mp_bus_to_node(), so we no longer need mp_bus_to_node[], get_mp_bus_to_node(), or set_mp_bus_to_node(). This removes them. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-02-03 10:38:46 -07:00
Bjorn Helgaas	afcf21c2be	x86/PCI: Add x86_pci_root_bus_node() to look up NUMA node from PCI bus The AMD early_fill_mp_bus_info() already allocates a struct pci_root_info for each PCI host bridge it finds, and that structure contains the NUMA node number. We don't need to keep the same information in the mp_bus_to_node[] table. This adds x86_pci_root_bus_node(), which returns the NUMA node number, or NUMA_NO_NODE if the node is unknown. Note that unlike get_mp_bus_to_node(), x86_pci_root_bus_node() only works for root buses. For example, if amd_bus.c finds a host bridge on node 1 to [bus 00-0f], get_mp_bus_to_node() returns 1 for any bus between 00 and 0f, but x86_pci_root_bus_node() returns 1 for bus 00 and NUMA_NO_NODE for buses 01-0f. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-02-03 10:38:35 -07:00
Bjorn Helgaas	49886cf4c4	x86/PCI: Drop return value of pcibios_scan_root() Nobody really uses the return value of pcibios_scan_root() (one place uses it to control a printk, but the printk is not very useful). This converts pcibios_scan_root() to a void function. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-02-03 10:38:29 -07:00
Bjorn Helgaas	289a24a699	x86/PCI: Merge pci_scan_bus_on_node() into pcibios_scan_root() pci_scan_bus_on_node() is only called by pcibios_scan_root(). This merges pci_scan_bus_on_node() into pcibios_scan_root() and removes pci_scan_bus_on_node(). Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-02-03 10:38:23 -07:00
Bjorn Helgaas	8d7d818676	x86/PCI: Use pcibios_scan_root() instead of pci_scan_bus_with_sysdata() pci_scan_bus_with_sysdata() and pcibios_scan_root() are quite similar: pci_scan_bus_with_sysdata pci_scan_bus_on_node(..., &pci_root_ops, -1) pcibios_scan_root pci_scan_bus_on_node(..., &pci_root_ops, get_mp_bus_to_node(busnum)) get_mp_bus_to_node() returns -1 if it couldn't find the node number, so this removes pci_scan_bus_with_sysdata() and uses pcibios_scan_root() instead. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-02-03 10:38:11 -07:00
Konrad Rzeszutek Wilk	e85fc98055	Revert "xen/grant-table: Avoid m2p_override during mapping" This reverts commit `08ece5bb23`. As it breaks ARM builds and needs more attention on the ARM side. Acked-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2014-02-03 06:44:49 -05:00
Linus Torvalds	ab5318788c	Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core debug changes from Ingo Molnar: "This contains mostly kernel debugging related updates: - make hung_task detection more configurable to distros - add final bits for x86 UV NMI debugging, with related KGDB changes - update the mailing-list of MAINTAINERS entries I'm involved with" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hung_task: Display every hung task warning sysctl: Add neg_one as a standard constraint x86/uv/nmi, kgdb/kdb: Fix UV NMI handler when KDB not configured x86/uv/nmi: Fix Sparse warnings kgdb/kdb: Fix no KDB config problem MAINTAINERS: Restore "L: linux-kernel@vger.kernel.org" entries	2014-01-31 08:59:46 -08:00
Linus Torvalds	14164b46fc	Bug-fixes: - Xen ARM couldn't use the new FIFO events - Xen ARM couldn't use the SWIOTLB if compiled as 32-bit with 64-bit PCIe devices. - Grant table were doing needless M2P operations. - Ratchet down the self-balloon code so it won't OOM. - Fix misplaced kfree in Xen PVH error code paths. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJS68IQAAoJEFjIrFwIi8fJAWgH/j4HStEey3rgGcqwIWSHkkap +t55wsrT8Ylq6CzZjaUtCo3pB7HotW526x/0rA2pxVqHn/8oCN/1EtdrNtYm/umX qOoda+db5NIjAEGVLWSLqGyokJQDrX/brXIWfYR300e9fnJi7yT/rFC4QHoZVUYl 5LME8XH/jE012vvYelNu6DbbodlRmVCT8hctJS+eB5ER2WmtD9Pkw4GybEXPVYJz hE0Ts1DN91nKP2FGJb+mfB9UFT5X8i00akAK+Qc1R3sRnRh6eRoNV8dgyCnudKpO UPEdiAZvgij+mzlgIYSz6nKH0U/VbvRsG3lc3i5Si3o+vR3CYPCkvzOGX2d0rjw= =7cxW -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.14-rc0-late-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen bugfixes from Konrad Rzeszutek Wilk: "Bug-fixes for the new features that were added during this cycle. There are also two fixes for long-standing issues for which we have a solution: grant-table operations extra work that was not needed causing performance issues and the self balloon code was too aggressive causing OOMs. Details: - Xen ARM couldn't use the new FIFO events - Xen ARM couldn't use the SWIOTLB if compiled as 32-bit with 64-bit PCIe devices. - Grant table were doing needless M2P operations. - Ratchet down the self-balloon code so it won't OOM. - Fix misplaced kfree in Xen PVH error code paths" * tag 'stable/for-linus-3.14-rc0-late-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pvh: Fix misplaced kfree from xlated_setup_gnttab_pages drivers: xen: deaggressive selfballoon driver xen/grant-table: Avoid m2p_override during mapping xen/gnttab: Use phys_addr_t to describe the grant frame base address xen: swiotlb: handle sizeof(dma_addr_t) != sizeof(phys_addr_t) arm/xen: Initialize event channels earlier	2014-01-31 08:38:18 -08:00
Linus Torvalds	e2a0f813e0	Second batch of KVM updates. Some minor x86 fixes, two s390 guest features that need some handling in the host, and all the PPC changes. The PPC changes include support for little-endian guests and enablement for new POWER8 features. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJS6UF5AAoJEBvWZb6bTYby55kP/AgTJnyu7avN653/2aSHkjkx KgYSMYhZPIFoY5LyZuNetXaoXFRvCykux1VYSZ6V6s35h2PZ+hdJNbHGjFYKPGTq FQ92xQVNuWCAPxmFCjDNuDV/0BauG5y08/Orh/jpjz+GAfH43LruUQGbtXUuyJ8u vf+yTHniU5gguqsAmodqjHUgbf+GoPJ1j7hmRoWwt8IWm7Ns3v/IK4l0p6G0h26a RjE6aK+Tm208Yr5hD/dRAqeTbBNt3c4xub+QPsKoiEMaZBSuAOiux7D3Kx+If1gp WsmqEQxoymihVtkZhUFO9ONLJepvmG2QwJVVyMSUW9iqxX9rraXsvVyVMwcQAhog JuOAYxBftH07xu6Fs4eym5KvCFghM+EaJvxxt+kgnvdD4htK1+eK5trntc2zygSi /qGiIrkqjXpkskW8kujLayF0eAU3CrZvFWveEPBfFgYiOGX/2wzJCtSm/bt9Jo0M v60qgNFK3LNqAyeEfnm9VtlwGr6ZgsAB6DHNPX4fM5s2IBjL+qloXk/e/+aVKkW0 I3yeRdy/ExhLAab6w81JtMeR7G3YS0UNuAEVvcoxzNb5wIBY8qnpfUzTKyMxQR94 64EVpxWEYO1s55eCCyMujWrSvc+YAwhJcWHGKgC4K7mxxLD3FVyQXX6YZvgRozMX HjQju+DToj9CskyrFlRL =yd0Z -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull more KVM updates from Paolo Bonzini: "Second batch of KVM updates. Some minor x86 fixes, two s390 guest features that need some handling in the host, and all the PPC changes. The PPC changes include support for little-endian guests and enablement for new POWER8 features" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (45 commits) x86, kvm: correctly access the KVM_CPUID_FEATURES leaf at 0x40000101 x86, kvm: cache the base of the KVM cpuid leaves kvm: x86: move KVM_CAP_HYPERV_TIME outside #ifdef KVM: PPC: Book3S PR: Cope with doorbell interrupts KVM: PPC: Book3S HV: Add software abort codes for transactional memory KVM: PPC: Book3S HV: Add new state for transactional memory powerpc/Kconfig: Make TM select VSX and VMX KVM: PPC: Book3S HV: Basic little-endian guest support KVM: PPC: Book3S HV: Add support for DABRX register on POWER7 KVM: PPC: Book3S HV: Prepare for host using hypervisor doorbells KVM: PPC: Book3S HV: Handle new LPCR bits on POWER8 KVM: PPC: Book3S HV: Handle guest using doorbells for IPIs KVM: PPC: Book3S HV: Consolidate code that checks reason for wake from nap KVM: PPC: Book3S HV: Implement architecture compatibility modes for POWER8 KVM: PPC: Book3S HV: Add handler for HV facility unavailable KVM: PPC: Book3S HV: Flush the correct number of TLB sets on POWER8 KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbers KVM: PPC: Book3S HV: Don't set DABR on POWER8 kvm/ppc: IRQ disabling cleanup ...	2014-01-31 08:37:32 -08:00
Zoltan Kiss	08ece5bb23	xen/grant-table: Avoid m2p_override during mapping The grant mapping API does m2p_override unnecessarily: only gntdev needs it, for blkback and future netback patches it just cause a lock contention, as those pages never go to userspace. Therefore this series does the following: - the original functions were renamed to __gnttab_[un]map_refs, with a new parameter m2p_override - based on m2p_override either they follow the original behaviour, or just set the private flag and call set_phys_to_machine - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with m2p_override false - a new function gnttab_[un]map_refs_userspace provides the old behaviour It also removes a stray space from page.h and change ret to 0 if XENFEAT_auto_translated_physmap, as that is the only possible return value there. v2: - move the storing of the old mfn in page->index to gnttab_map_refs - move the function header update to a separate patch v3: - a new approach to retain old behaviour where it needed - squash the patches into one v4: - move out the common bits from m2p* functions, and pass pfn/mfn as parameter - clear page->private before doing anything with the page, so m2p_find_override won't race with this v5: - change return value handling in __gnttab_[un]map_refs - remove a stray space in page.h - add detail why ret = 0 now at some places v6: - don't pass pfn to m2p* functions, just get it locally Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Suggested-by: David Vrabel <david.vrabel@citrix.com> Acked-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2014-01-31 09:48:32 -05:00
Linus Torvalds	aa2e7100e3	Merge branch 'akpm' (patches from Andrew Morton) Merge misc fixes from Andrew Morton: "A few hotfixes and various leftovers which were awaiting other merges. Mainly movement of zram into mm/" * emailed patches fron Andrew Morton <akpm@linux-foundation.org>: (25 commits) memcg: fix mutex not unlocked on memcg_create_kmem_cache fail path Documentation/filesystems/vfs.txt: update file_operations documentation mm, oom: base root bonus on current usage mm: don't lose the SOFT_DIRTY flag on mprotect mm/slub.c: fix page->_count corruption (again) mm/mempolicy.c: fix mempolicy printing in numa_maps zram: remove zram->lock in read path and change it with mutex zram: remove workqueue for freeing removed pending slot zram: introduce zram->tb_lock zram: use atomic operation for stat zram: remove unnecessary free zram: delay pending free request in read path zram: fix race between reset and flushing pending work zsmalloc: add maintainers zram: add zram maintainers zsmalloc: add copyright zram: add copyright zram: remove old private project comment zram: promote zram from staging zsmalloc: move it under mm ...	2014-01-30 18:44:44 -08:00
Linus Torvalds	12f2bbd609	Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asmlinkage (LTO) changes from Peter Anvin: "This patchset adds more infrastructure for link time optimization (LTO). This patchset was pulled into my tree late because of a miscommunication (part of the patchset was picked up by other maintainers). However, the patchset is strictly build-related and seems to be okay in testing" * 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, asmlinkage, xen: Fix type of NMI x86, asmlinkage, xen, kvm: Make {xen,kvm}_lock_spinning global and visible x86: Use inline assembler instead of global register variable to get sp x86, asmlinkage, paravirt: Make paravirt thunks global x86, asmlinkage, paravirt: Don't rely on local assembler labels x86, asmlinkage, lguest: Fix C functions used by inline assembler	2014-01-30 18:15:32 -08:00
Andrey Vagin	24f91eba18	mm: don't lose the SOFT_DIRTY flag on mprotect The SOFT_DIRTY bit shows that the content of memory was changed after a defined point in the past. mprotect() doesn't change the content of memory, so it must not change the SOFT_DIRTY bit. This bug causes a malfunction: on the first iteration all pages are dumped. On other iterations only pages with the SOFT_DIRTY bit are dumped. So if the SOFT_DIRTY bit is cleared from a page by mistake, the page is not dumped and its content will be restored incorrectly. This patch does nothing with _PAGE_SWP_SOFT_DIRTY, becase pte_modify() is called only for present pages. Fixes commit `0f8975ec4d` ("mm: soft-dirty bits for user memory changes tracking"). Signed-off-by: Andrey Vagin <avagin@openvz.org> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Borislav Petkov <bp@suse.de> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-30 16:56:56 -08:00
Andi Kleen	dff38e3e93	x86: Use inline assembler instead of global register variable to get sp LTO in gcc 4.6/47. has trouble with global register variables. They were used to read the stack pointer. Use a simple inline assembler statement with a mov instead. This also helps LLVM/clang, which does not support global register variables. [ hpa: Ideally this should become a builtin in both gcc and clang. ] v2: More general asm constraint. Fix description (Jan Beulich) Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-6-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-01-29 22:17:17 -08:00
Andi Kleen	a2e7f0e3a4	x86, asmlinkage, paravirt: Make paravirt thunks global The paravirt thunks use a hack of using a static reference to a static function to reference that function from the top level statement. This assumes that gcc always generates static function names in a specific format, which is not necessarily true. Simply make these functions global and asmlinkage or __visible. This way the static __used variables are not needed and everything works. Functions with arguments are __visible to keep the register calling convention on 32bit. Changed in paravirt and in all users (Xen and vsmp) v2: Use __visible for functions with arguments Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Ido Yariv <ido@wizery.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-5-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-01-29 22:17:17 -08:00
Andi Kleen	824a287009	x86, asmlinkage, paravirt: Don't rely on local assembler labels The paravirt patching code assumes that it can reference a local assembler label between two different top level assembler statements. This does not work with LTO where the assembler code may end up in different assembler files. Replace it with extern / global /asm linkage labels. This also removes one redundant copy of the macro. Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-4-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-01-29 22:17:17 -08:00
Linus Torvalds	cca21640d2	Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more x32 uabi type fixes from Peter Anvin: "Despite the branch name, most of these changes are to generic code. They change types so that they make an increasing amount of the exported uapi kernel headers usable for libc. The ARM64 people are also interested in these changes for their ILP32 ABI" * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: uapi: Use __kernel_long_t in struct mq_attr uapi: Use __kernel_ulong_t in shmid64_ds/shminfo64/shm_info x86, uapi, x32: Use __kernel_ulong_t in x86 struct semid64_ds uapi: Use __kernel_ulong_t in struct msqid64_ds uapi: Use __kernel_long_t in struct msgbuf uapi, asm-generic: Use __kernel_ulong_t in uapi struct ipc64_perm uapi: Use __kernel_long_t/__kernel_ulong_t in <linux/resource.h> uapi: Use __kernel_long_t in struct timex	2014-01-29 18:22:16 -08:00
Paolo Bonzini	77f01bdfa5	x86, kvm: correctly access the KVM_CPUID_FEATURES leaf at 0x40000101 When Hyper-V hypervisor leaves are present, KVM must relocate its own leaves at 0x40000100, because Windows does not look for Hyper-V leaves at indices other than 0x40000000. In this case, the KVM features are at 0x40000101, but the old code would always look at 0x40000001. Fix by using kvm_cpuid_base(). This also requires making the function non-inline, since kvm_cpuid_base() is static. Fixes: `1085ba7f55` Cc: stable@vger.kernel.org Cc: mtosatti@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-29 18:11:55 +01:00
Paolo Bonzini	1c300a4077	x86, kvm: cache the base of the KVM cpuid leaves It is unnecessary to go through hypervisor_cpuid_base every time a leaf is found (which will be every time a feature is requested after the next patch). Fixes: `1085ba7f55` Cc: stable@vger.kernel.org Cc: mtosatti@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-29 18:11:54 +01:00
Yinghai Lu	4ce7a8697c	x86: revert wrong memblock current limit setting Dave reported big numa system booting is broken. It turns out that commit `5b6e529521` ("x86: memblock: set current limit to max low memory address") sets the limit to low wrongly. max_low_pfn_mapped is different from max_pfn_mapped. max_low_pfn_mapped is always under 4G. That will memblock_alloc_nid all go under 4G. Revert `5b6e529521` to fix a no-boot regression which was triggered by `457ff1de2d` ("lib/swiotlb.c: use memblock apis for early memory allocations"). Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reported-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-27 21:02:38 -08:00
Linus Torvalds	4ba9920e5e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) BPF debugger and asm tool by Daniel Borkmann. 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann. 3) Correct reciprocal_divide and update users, from Hannes Frederic Sowa and Daniel Borkmann. 4) Currently we only have a "set" operation for the hw timestamp socket ioctl, add a "get" operation to match. From Ben Hutchings. 5) Add better trace events for debugging driver datapath problems, also from Ben Hutchings. 6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we have a small send and a previous packet is already in the qdisc or device queue, defer until TX completion or we get more data. 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko. 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel Borkmann. 9) Share IP header compression code between Bluetooth and IEEE802154 layers, from Jukka Rissanen. 10) Fix ipv6 router reachability probing, from Jiri Benc. 11) Allow packets to be captured on macvtap devices, from Vlad Yasevich. 12) Support tunneling in GRO layer, from Jerry Chu. 13) Allow bonding to be configured fully using netlink, from Scott Feldman. 14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can already get the TCI. From Atzm Watanabe. 15) New "Heavy Hitter" qdisc, from Terry Lam. 16) Significantly improve the IPSEC support in pktgen, from Fan Du. 17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom Herbert. 18) Add Proportional Integral Enhanced packet scheduler, from Vijay Subramanian. 19) Allow openvswitch to mmap'd netlink, from Thomas Graf. 20) Key TCP metrics blobs also by source address, not just destination address. From Christoph Paasch. 21) Support 10G in generic phylib. From Andy Fleming. 22) Try to short-circuit GRO flow compares using device provided RX hash, if provided. From Tom Herbert. The wireless and netfilter folks have been busy little bees too. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits) net/cxgb4: Fix referencing freed adapter ipv6: reallocate addrconf router for ipv6 address when lo device up fib_frontend: fix possible NULL pointer dereference rtnetlink: remove IFLA_BOND_SLAVE definition rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info qlcnic: update version to 5.3.55 qlcnic: Enhance logic to calculate msix vectors. qlcnic: Refactor interrupt coalescing code for all adapters. qlcnic: Update poll controller code path qlcnic: Interrupt code cleanup qlcnic: Enhance Tx timeout debugging. qlcnic: Use bool for rx_mac_learn. bonding: fix u64 division rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC sfc: Use the correct maximum TX DMA ring size for SFC9100 Add Shradha Shah as the sfc driver maintainer. net/vxlan: Share RX skb de-marking and checksum checks with ovs tulip: cleanup by using ARRAY_SIZE() ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called net/cxgb4: Don't retrieve stats during recovery ...	2014-01-25 11:17:34 -08:00
Ingo Molnar	2b45e0f9f3	Merge branch 'linus' into x86/urgent Merge in the x86 changes to apply a fix. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-25 09:16:14 +01:00
Mel Gorman	ec65993443	mm, x86: Account for TLB flushes only when debugging Bisection between 3.11 and 3.12 fingered commit `9824cf97` ("mm: vmstats: tlb flush counters") to cause overhead problems. The counters are undeniably useful but how often do we really need to debug TLB flush related issues? It does not justify taking the penalty everywhere so make it a debugging option. Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Alex Shi <alex.shi@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-XzxjntugxuwpxXhcrxqqh53b@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-25 09:10:41 +01:00
Mike Travis	74c93f9d39	x86/uv/nmi: Fix Sparse warnings Make uv_register_nmi_notifier() and uv_handle_nmi_ping() static to address sparse warnings. Fix problem where uv_nmi_kexec_failed is unused when CONFIG_KEXEC is not defined. Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Hedi Berriche <hedi@sgi.com> Cc: Russ Anderson <rja@sgi.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/20140114162551.480872353@asylum.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-25 08:55:10 +01:00
Dan Carpenter	2993ae3305	x86/AMD/NB: Fix amd_set_subcaches() parameter type This is under CAP_SYS_ADMIN, but Smatch complains that mask comes from the user and the test for "mask > 0xf" can underflow. The fix is simple: amd_set_subcaches() should hand down not an 'int' but an 'unsigned long' like it was originally indended to do. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/20140121072209.GA22095@elgon.mountain Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-25 08:50:09 +01:00
Ard Biesheuvel	cf0744021c	firmware/dmi_scan: generalize for use by other archs This patch makes a couple of changes to the SMBIOS/DMI scanning code so it can be used on other archs (such as ARM and arm64): (a) wrap the calls to ioremap()/iounmap(), this allows the use of a flavor of ioremap() more suitable for random unaligned access; (b) allow the non-EFI fallback probe into hardcoded physical address 0xF0000 to be disabled. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Grant Likely <grant.likely@linaro.org> Cc: Ingo Molnar <mingo@elte.hu> Cc "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-23 16:36:57 -08:00
Mark Salter	114cefc87e	x86: use generic fixmap.h Signed-off-by: Mark Salter <msalter@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-23 16:36:54 -08:00
Linus Torvalds	84621c9b18	Features: - FIFO event channels. Key advantages: support for over 100,000 events (2^17), 16 different event priorities, improved fairness in event latency through the use of FIFOs. - Xen PVH support. "It’s a fully PV kernel mode, running with paravirtualized disk and network, paravirtualized interrupts and timers, no emulated devices of any kind (and thus no qemu), no BIOS or legacy boot — but instead of requiring PV MMU, it uses the HVM hardware extensions to virtualize the pagetables, as well as system calls and other privileged operations." (from "The Paravirtualization Spectrum, Part 2: From poles to a spectrum") Bug-fixes: - Fixes in balloon driver (refactor and make it work under ARM) - Allow xenfb to be used in HVM guests. - Allow xen_platform_pci=0 to work properly. - Refactors in event channels. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJS4BmLAAoJEFjIrFwIi8fJ4SAH/iNGESowgMhfW64vRA8pBWq+ NRJpUjYjjwmbxpwoNl6NPwn15cIXFyc3sMtvvrDD3taRDyko2RFuT+NTjpO05xPh d/cRpRXpXERHoiFgPf/WTp7ONBDhvPtHG0+BzJKwgqEIOUYXdbhD+gEjaVlFJScS CAY68OLmk7XYMSZBNzPfKNbSCyhVgZF7wpaimK9lxZBKsFRCDIq6jIyrAsC8epIL 6V/V4l2S6lk/uUeGB6ULphYeINjI2kkpbSfCd1vyenLfWpVscc2o8uWEYFcZMAxy V4HpsoseuqrfdDqgPfud3VgogdISvbkCvDfW85rzfDP4MWxei2mVHFtJ/gSBV+g= =ToNG -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.14-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen updates from Konrad Rzeszutek Wilk: "Two major features that Xen community is excited about: The first is event channel scalability by David Vrabel - we switch over from an two-level per-cpu bitmap of events (IRQs) - to an FIFO queue with priorities. This lets us be able to handle more events, have lower latency, and better scalability. Good stuff. The other is PVH by Mukesh Rathor. In short, PV is a mode where the kernel lets the hypervisor program page-tables, segments, etc. With EPT/NPT capabilities in current processors, the overhead of doing this in an HVM (Hardware Virtual Machine) container is much lower than the hypervisor doing it for us. In short we let a PV guest run without doing page-table, segment, syscall, etc updates through the hypervisor - instead it is all done within the guest container. It is a "hybrid" PV - hence the 'PVH' name - a PV guest within an HVM container. The major benefits are less code to deal with - for example we only use one function from the the pv_mmu_ops (which has 39 function calls); faster performance for syscall (no context switches into the hypervisor); less traps on various operations; etc. It is still being baked - the ABI is not yet set in stone. But it is pretty awesome and we are excited about it. Lastly, there are some changes to ARM code - you should get a simple conflict which has been resolved in #linux-next. In short, this pull has awesome features. Features: - FIFO event channels. Key advantages: support for over 100,000 events (2^17), 16 different event priorities, improved fairness in event latency through the use of FIFOs. - Xen PVH support. "It’s a fully PV kernel mode, running with paravirtualized disk and network, paravirtualized interrupts and timers, no emulated devices of any kind (and thus no qemu), no BIOS or legacy boot — but instead of requiring PV MMU, it uses the HVM hardware extensions to virtualize the pagetables, as well as system calls and other privileged operations." (from "The Paravirtualization Spectrum, Part 2: From poles to a spectrum") Bug-fixes: - Fixes in balloon driver (refactor and make it work under ARM) - Allow xenfb to be used in HVM guests. - Allow xen_platform_pci=0 to work properly. - Refactors in event channels" * tag 'stable/for-linus-3.14-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (52 commits) xen/pvh: Set X86_CR0_WP and others in CR0 (v2) MAINTAINERS: add git repository for Xen xen/pvh: Use 'depend' instead of 'select'. xen: delete new instances of __cpuinit usage xen/fb: allow xenfb initialization for hvm guests xen/evtchn_fifo: fix error return code in evtchn_fifo_setup() xen-platform: fix error return code in platform_pci_init() xen/pvh: remove duplicated include from enlighten.c xen/pvh: Fix compile issues with xen_pvh_domain() xen: Use dev_is_pci() to check whether it is pci device xen/grant-table: Force to use v1 of grants. xen/pvh: Support ParaVirtualized Hardware extensions (v3). xen/pvh: Piggyback on PVHVM XenBus. xen/pvh: Piggyback on PVHVM for grant driver (v4) xen/grant: Implement an grant frame array struct (v3). xen/grant-table: Refactor gnttab_init xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init. xen/pvh: Piggyback on PVHVM for event channels (v2) xen/pvh: Update E820 to work with PVH (v2) xen/pvh: Secondary VCPU bringup (non-bootup CPUs) ...	2014-01-22 22:00:18 -08:00
Linus Torvalds	7ebd3faa9b	First round of KVM updates for 3.14; PPC parts will come next week. Nothing major here, just bugfixes all over the place. The most interesting part is the ARM guys' virtualized interrupt controller overhaul, which lets userspace get/set the state and thus enables migration of ARM VMs. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJS3TVKAAoJEBvWZb6bTYbyIFgP/2cmt4ifCuFMaZv4+G1S8jZU uC9ZB/+7vzht/p6zAy+4BxurKbHmSBFkC1OKcxYuy7yB4CQkHabzj4V2vRtqFdwH 5lExP9qh3kqaVLuhnvxLTmkktR3EW4PFy6OI53l5kRNktOXSuZ0aN6K3V7tCg/X0 iL7ASo4bJKlxeWcDpmuVrNgAajmZVfXrjKY7robgBQno+yIsgKhRZRBQHjozA6B8 FpCo/k48RZd/EzIbV/PDDRI4hmmry/lgrO9SKjzq56wSqff2bd/k/KYze4dbAPfd Ps60enPTuHmeEjjb4MMMU4EKHVdTQFUMx/xZCmT4xzoh8s4of6RHphXbfE0SUznQ dTveyEQAR7E3JNS0k1+3WEX5fWlFesp0hO2NeE0wzUq4TAr9ztgVO9NQ6Si15e7Z 2HysO0T5Ojtt0lY08/PvS6i48eCAuuBomrejJS8hLW4SUZ5adn+yW4Qo7Fp9JeBR l9a3LsVT8BZMtUWrUuFcVhlM4MbzElUPjDbgWhR8UYU/kpfVZOQu8qWgGKR4UWXy X7/t9l/tjR99CmfMJBAOzJid+ScSpAfg77BdaKiQrVfVIJmsjEjlO8vUMyj5b1HF hPX5wNyJjHAOfridLeHSs4Rdm4a8sk8Az5d4h76pLVz8M4jyTi2v0rO3N4/dU/pu x7N8KR5hAj+mLBoM9/Al =8sYU -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "First round of KVM updates for 3.14; PPC parts will come next week. Nothing major here, just bugfixes all over the place. The most interesting part is the ARM guys' virtualized interrupt controller overhaul, which lets userspace get/set the state and thus enables migration of ARM VMs" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits) kvm: make KVM_MMU_AUDIT help text more readable KVM: s390: Fix memory access error detection KVM: nVMX: Update guest activity state field on L2 exits KVM: nVMX: Fix nested_run_pending on activity state HLT KVM: nVMX: Clean up handling of VMX-related MSRs KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit KVM: nVMX: Leave VMX mode on clearing of feature control MSR KVM: VMX: Fix DR6 update on #DB exception KVM: SVM: Fix reading of DR6 KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS add support for Hyper-V reference time counter KVM: remove useless write to vcpu->hv_clock.tsc_timestamp KVM: x86: fix tsc catchup issue with tsc scaling KVM: x86: limit PIT timer frequency KVM: x86: handle invalid root_hpa everywhere kvm: Provide kvm_vcpu_eligible_for_directed_yield() stub kvm: vfio: silence GCC warning KVM: ARM: Remove duplicate include arm/arm64: KVM: relax the requirements of VMA alignment for THP ...	2014-01-22 21:40:43 -08:00
Linus Torvalds	e1ba84597c	PCI changes for the v3.14 merge window: Resource management - Change pci_bus_region addresses to dma_addr_t (Bjorn Helgaas) - Support 64-bit AGP BARs (Bjorn Helgaas, Yinghai Lu) - Add pci_bus_address() to get bus address of a BAR (Bjorn Helgaas) - Use pci_resource_start() for CPU address of AGP BARs (Bjorn Helgaas) - Enforce bus address limits in resource allocation (Yinghai Lu) - Allocate 64-bit BARs above 4G when possible (Yinghai Lu) - Convert pcibios_resource_to_bus() to take pci_bus, not pci_dev (Yinghai Lu) PCI device hotplug - Major rescan/remove locking update (Rafael J. Wysocki) - Make ioapic builtin only (not modular) (Yinghai Lu) - Fix release/free issues (Yinghai Lu) - Clean up pciehp (Bjorn Helgaas) - Announce pciehp slot info during enumeration (Bjorn Helgaas) MSI - Add pci_msi_vec_count(), pci_msix_vec_count() (Alexander Gordeev) - Add pci_enable_msi_range(), pci_enable_msix_range() (Alexander Gordeev) - Deprecate "tri-state" interfaces: fail/success/fail+info (Alexander Gordeev) - Export MSI mode using attributes, not kobjects (Greg Kroah-Hartman) - Drop "irq" param from _restore_msi_irqs() (DuanZhenzhong) SR-IOV - Clear NumVFs when disabling SR-IOV in sriov_init() (ethan.zhao) Virtualization - Add support for save/restore of extended capabilities (Alex Williamson) - Add Virtual Channel to save/restore support (Alex Williamson) - Never treat a VF as a multifunction device (Alex Williamson) - Add pci_try_reset_function(), et al (Alex Williamson) AER - Ignore non-PCIe error sources (Betty Dall) - Support ACPI HEST error sources for domains other than 0 (Betty Dall) - Consolidate HEST error source parsers (Bjorn Helgaas) - Add a TLP header print helper (Borislav Petkov) Freescale i.MX6 - Remove unnecessary code (Fabio Estevam) - Make reset-gpio optional (Marek Vasut) - Report "link up" only after link training completes (Marek Vasut) - Start link in Gen1 before negotiating for Gen2 mode (Marek Vasut) - Fix PCIe startup code (Richard Zhu) Marvell MVEBU - Remove duplicate of_clk_get_by_name() call (Andrew Lunn) - Drop writes to bridge Secondary Status register (Jason Gunthorpe) - Obey bridge PCI_COMMAND_MEM and PCI_COMMAND_IO bits (Jason Gunthorpe) - Support a bridge with no IO port window (Jason Gunthorpe) - Use max_t() instead of max(resource_size_t,) (Jingoo Han) - Remove redundant of_match_ptr (Sachin Kamat) - Call pci_ioremap_io() at startup instead of dynamically (Thomas Petazzoni) NVIDIA Tegra - Disable Gen2 for Tegra20 and Tegra30 (Eric Brower) Renesas R-Car - Add runtime PM support (Valentine Barshak) - Fix rcar_pci_probe() return value check (Wei Yongjun) Synopsys DesignWare - Fix crash in dw_msi_teardown_irq() (Bjørn Erik Nilsen) - Remove redundant call to pci_write_config_word() (Bjørn Erik Nilsen) - Fix missing MSI IRQs (Harro Haan) - Add dw_pcie prefix before cfg_read/write (Pratyush Anand) - Fix I/O transfers by using CPU (not realio) address (Pratyush Anand) - Whitespace cleanup (Jingoo Han) EISA - Call put_device() if device_register() fails (Levente Kurusa) - Revert EISA initialization breakage ((Bjorn Helgaas) Miscellaneous - Remove unused code, including PCIe 3.0 interfaces (Stephen Hemminger) - Prevent bus conflicts while checking for bridge apertures (Bjorn Helgaas) - Stop clearing bridge Secondary Status when setting up I/O aperture (Bjorn Helgaas) - Use dev_is_pci() to identify PCI devices (Yijing Wang) - Deprecate DEFINE_PCI_DEVICE_TABLE (Joe Perches) - Update documentation 00-INDEX (Erik Ekman) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJS3ujEAAoJEFmIoMA60/r8A4EQAK9AZSUSVNWvlKdC1PrBfT3w 7fVILx5A4KWsOU8eoFwCPQLrgvUtMltg16yN2tbCjqpKEdrVc36biMO9bwhnXSyZ KopHKMWnn0sza/z2H8mcGy+0azGdWcIjcErX/a8WeS6zyWBjm+yzckrHNVpPu4Ca SpCBhfgBMjKyIZyLtP6juFSH34S2DfQex4oUSyPC+gjqPy5wW/xw/kBxZfOXl+yU P9pQT+geMIc31pETMdG9wd/TT+47YAui4ieSggoVxfVrphCXv6S8mOMCMuQc2bAy MHy9uFm1jbvKZZIYrzJ+9HFiiU/6MNiOO3Ygua52xuSp1Zrcjwi2CLD9/QBXbDVs pTKU5JIO7q43llkQUpIXTwBvEApSZRhuqzXegsMAYIg4AWmbfm/2fXkfWlQThYMp J48blAJZ4t0vhMr9usgwbtdBe8F5euExOxpwH0QMCMABbuu8/B3TLm39+LTcIbsw Efgm3N9iUTyiV5fe9Rr62nflhyqXjTevPl4wbZZe4OOdm0MXZY+/BzuNJhg3wyY8 QKz2J3FB6OR7BCLHCp80l50s5+Ih4F5kmOXwFKjT1D1MFRaNaPDmp9BY6TitU6hg zj55gP4c8x6n3alakbf972Yhgs/4oi1va8cZL+pCYWb8nPO5ldaMiT7QBBLUreQV BtDtC7u/AFWJ5e73+jVO =La1R -----END PGP SIGNATURE----- Merge tag 'pci-v3.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for the v3.14 merge window: Resource management - Change pci_bus_region addresses to dma_addr_t (Bjorn Helgaas) - Support 64-bit AGP BARs (Bjorn Helgaas, Yinghai Lu) - Add pci_bus_address() to get bus address of a BAR (Bjorn Helgaas) - Use pci_resource_start() for CPU address of AGP BARs (Bjorn Helgaas) - Enforce bus address limits in resource allocation (Yinghai Lu) - Allocate 64-bit BARs above 4G when possible (Yinghai Lu) - Convert pcibios_resource_to_bus() to take pci_bus, not pci_dev (Yinghai Lu) PCI device hotplug - Major rescan/remove locking update (Rafael J. Wysocki) - Make ioapic builtin only (not modular) (Yinghai Lu) - Fix release/free issues (Yinghai Lu) - Clean up pciehp (Bjorn Helgaas) - Announce pciehp slot info during enumeration (Bjorn Helgaas) MSI - Add pci_msi_vec_count(), pci_msix_vec_count() (Alexander Gordeev) - Add pci_enable_msi_range(), pci_enable_msix_range() (Alexander Gordeev) - Deprecate "tri-state" interfaces: fail/success/fail+info (Alexander Gordeev) - Export MSI mode using attributes, not kobjects (Greg Kroah-Hartman) - Drop "irq" param from _restore_msi_irqs() (DuanZhenzhong) SR-IOV - Clear NumVFs when disabling SR-IOV in sriov_init() (ethan.zhao) Virtualization - Add support for save/restore of extended capabilities (Alex Williamson) - Add Virtual Channel to save/restore support (Alex Williamson) - Never treat a VF as a multifunction device (Alex Williamson) - Add pci_try_reset_function(), et al (Alex Williamson) AER - Ignore non-PCIe error sources (Betty Dall) - Support ACPI HEST error sources for domains other than 0 (Betty Dall) - Consolidate HEST error source parsers (Bjorn Helgaas) - Add a TLP header print helper (Borislav Petkov) Freescale i.MX6 - Remove unnecessary code (Fabio Estevam) - Make reset-gpio optional (Marek Vasut) - Report "link up" only after link training completes (Marek Vasut) - Start link in Gen1 before negotiating for Gen2 mode (Marek Vasut) - Fix PCIe startup code (Richard Zhu) Marvell MVEBU - Remove duplicate of_clk_get_by_name() call (Andrew Lunn) - Drop writes to bridge Secondary Status register (Jason Gunthorpe) - Obey bridge PCI_COMMAND_MEM and PCI_COMMAND_IO bits (Jason Gunthorpe) - Support a bridge with no IO port window (Jason Gunthorpe) - Use max_t() instead of max(resource_size_t,) (Jingoo Han) - Remove redundant of_match_ptr (Sachin Kamat) - Call pci_ioremap_io() at startup instead of dynamically (Thomas Petazzoni) NVIDIA Tegra - Disable Gen2 for Tegra20 and Tegra30 (Eric Brower) Renesas R-Car - Add runtime PM support (Valentine Barshak) - Fix rcar_pci_probe() return value check (Wei Yongjun) Synopsys DesignWare - Fix crash in dw_msi_teardown_irq() (Bjørn Erik Nilsen) - Remove redundant call to pci_write_config_word() (Bjørn Erik Nilsen) - Fix missing MSI IRQs (Harro Haan) - Add dw_pcie prefix before cfg_read/write (Pratyush Anand) - Fix I/O transfers by using CPU (not realio) address (Pratyush Anand) - Whitespace cleanup (Jingoo Han) EISA - Call put_device() if device_register() fails (Levente Kurusa) - Revert EISA initialization breakage ((Bjorn Helgaas) Miscellaneous - Remove unused code, including PCIe 3.0 interfaces (Stephen Hemminger) - Prevent bus conflicts while checking for bridge apertures (Bjorn Helgaas) - Stop clearing bridge Secondary Status when setting up I/O aperture (Bjorn Helgaas) - Use dev_is_pci() to identify PCI devices (Yijing Wang) - Deprecate DEFINE_PCI_DEVICE_TABLE (Joe Perches) - Update documentation 00-INDEX (Erik Ekman)" * tag 'pci-v3.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (119 commits) Revert "EISA: Initialize device before its resources" Revert "EISA: Log device resources in dmesg" vfio-pci: Use pci "try" reset interface PCI: Check parent kobject in pci_destroy_dev() xen/pcifront: Use global PCI rescan-remove locking powerpc/eeh: Use global PCI rescan-remove locking PCI: Fix pci_check_and_unmask_intx() comment typos PCI: Add pci_try_reset_function(), pci_try_reset_slot(), pci_try_reset_bus() MPT / PCI: Use pci_stop_and_remove_bus_device_locked() platform / x86: Use global PCI rescan-remove locking PCI: hotplug: Use global PCI rescan-remove locking pcmcia: Use global PCI rescan-remove locking ACPI / hotplug / PCI: Use global PCI rescan-remove locking ACPI / PCI: Use global PCI rescan-remove locking in PCI root hotplug PCI: Add global pci_lock_rescan_remove() PCI: Cleanup pci.h whitespace PCI: Reorder so actual code comes before stubs PCI/AER: Support ACPI HEST AER error sources for PCI domains other than 0 ACPICA: Add helper macros to extract bus/segment numbers from HEST table. PCI: Make local functions static ...	2014-01-22 16:39:28 -08:00
Santosh Shilimkar	5b6e529521	x86: memblock: set current limit to max low memory address The memblock current limit value is used to limit early boot memory allocations below max low memory address by default, as the kernel can access only to the low memory. Hence, set memblock current limit value to the max mapped low memory address instead of max mapped memory address. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Tejun Heo <tj@kernel.org> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Paul Walmsley <paul@pwsan.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-21 16:19:46 -08:00
Linus Torvalds	15c8102620	Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x32 uapi changes from Peter Anvin: "This is the first few of a set of patches by H.J. Lu to make the kernel uapi headers usable for x32, as required by some non-glibc libcs. These particular patches make the stat and statfs structures usable" * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, x32: Use __kernel_long_t for __statfs_word x86, x32: Use __kernel_long_t/__kernel_ulong_t in x86-64 stat.h	2014-01-20 15:12:49 -08:00
Linus Torvalds	c9cdd9a6ae	Merge branch 'x86/mpx' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpufeature and mpx updates from Peter Anvin: "This includes the basic infrastructure for MPX (Memory Protection Extensions) support, but does not include MPX support itself. It is, however, a prerequisite for KVM support for MPX, which I believe will be pushed later this merge window by the KVM team. This includes moving the functionality in futex_atomic_cmpxchg_inatomic() into a new function in uaccess.h so it can be reused - this will be used by the final MPX patches. The actual MPX functionality (map management and so on) will be pushed in a future merge window, when ready" * 'x86/mpx' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel/mpx: Remove unused LWP structure x86, mpx: Add MPX related opcodes to the x86 opcode map x86: replace futex_atomic_cmpxchg_inatomic() with user_atomic_cmpxchg_inatomic x86: add user_atomic_cmpxchg_inatomic at uaccess.h x86, xsave: Support eager-only xsave features, add MPX support x86, cpufeature: Define the Intel MPX feature flag	2014-01-20 14:46:32 -08:00
Linus Torvalds	f4bcd8ccdd	Merge branch 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 kernel address space randomization support from Peter Anvin: "This enables kernel address space randomization for x86" * 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, kaslr: Clarify RANDOMIZE_BASE_MAX_OFFSET x86, kaslr: Remove unused including <linux/version.h> x86, kaslr: Use char array to gain sizeof sanity x86, kaslr: Add a circular multiply for better bit diffusion x86, kaslr: Mix entropy sources together as needed x86/relocs: Add percpu fixup for GNU ld 2.23 x86, boot: Rename get_flags() and check_flags() to *_cpuflags() x86, kaslr: Raise the maximum virtual address to -1 GiB on x86_64 x86, kaslr: Report kernel offset on panic x86, kaslr: Select random position from e820 maps x86, kaslr: Provide randomness functions x86, kaslr: Return location from decompress_kernel x86, boot: Move CPU flags out of cpucheck x86, relocs: Add more per-cpu gold special cases	2014-01-20 14:45:50 -08:00
H.J. Lu	386916598e	x86, uapi, x32: Use __kernel_ulong_t in x86 struct semid64_ds Both x32 and x86-64 use the same struct semid64_ds for system calls. But x32 long is 32-bit. This patch replaces unsigned long with __kernel_ulong_t in x86 struct semid64_ds. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Link: http://lkml.kernel.org/r/1388182464-28428-7-git-send-email-hjl.tools@gmail.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-01-20 14:45:13 -08:00
Linus Torvalds	7fe67a1180	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull leftover x86 fixes from Ingo Molnar: "Two leftover fixes that did not make it into v3.13" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Add check for number of available vectors before CPU down x86, cpu, amd: Add workaround for family 16h, erratum 793	2014-01-20 12:11:41 -08:00
Linus Torvalds	74e8ee8262	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull Intel SoC changes from Ingo Molnar: "Improved Intel SoC platform support" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, tsc, apic: Unbreak static (MSR) calibration when CONFIG_X86_LOCAL_APIC=n x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs arch: x86: New MailBox support driver for Intel SOC's	2014-01-20 12:09:31 -08:00
Linus Torvalds	8bd6964cbd	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "A cleanup, a fix and ASLR support for hugetlb mappings" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/numa: Fix 32-bit kernel NUMA boot x86/mm: Implement ASLR for hugetlb mappings x86/mm: Unify pte_to_pgoff() and pgoff_to_pte() helpers	2014-01-20 12:08:51 -08:00
Linus Torvalds	2bb2c5e235	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loader updates from Ingo Molnar: "There are two main changes in this tree: - AMD microcode early loading fixes - some microcode loader source files reorganization" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode: Move to a proper location x86, microcode, AMD: Fix early ucode loading x86, microcode: Share native MSR accessing variants x86, ramdisk: Export relocated ramdisk VA	2014-01-20 12:07:54 -08:00
Linus Torvalds	4500cf60db	Merge branch 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull Intel MID updates from Ingo Molnar: "This tree improves Intel MID (Mobile Internet Device) platform support: - Merrifield platform support (David Cohen) - Clovertrail platform support (Kuppuswamy Sathyanarayanan) - Various cleanups and fixes (David Cohen)" * 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, intel_mid: Replace memcpy with struct assignment x86, intel-mid: Return proper error code from get_gpio_by_name() x86, intel-mid: Check get_gpio_by_name() error code on platform code x86, intel-mid: sfi_handle_*_dev() should check for pdata error code x86, intel-mid: Remove deprecated X86_MDFLD and X86_WANT_INTEL_MID configs x86, intel-mid: Add Merrifield platform support x86, intel-mid: Add Clovertrail platform support x86, intel-mid: Move Medfield code out of intel-mid.c core file	2014-01-20 12:06:50 -08:00
Linus Torvalds	972d5e7e5b	Merge branch 'x86-efi-kexec-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 EFI changes from Ingo Molnar: "This consists of two main parts: - New static EFI runtime services virtual mapping layout which is groundwork for kexec support on EFI (Borislav Petkov) - EFI kexec support itself (Dave Young)" * 'x86-efi-kexec-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/efi: parse_efi_setup() build fix x86: ksysfs.c build fix x86/efi: Delete superfluous global variables x86: Reserve setup_data ranges late after parsing memmap cmdline x86: Export x86 boot_params to sysfs x86: Add xloadflags bit for EFI runtime support on kexec x86/efi: Pass necessary EFI data for kexec via setup_data efi: Export EFI runtime memory mapping to sysfs efi: Export more EFI table variables to sysfs x86/efi: Cleanup efi_enter_virtual_mode() function x86/efi: Fix off-by-one bug in EFI Boot Services reservation x86/efi: Add a wrapper function efi_map_region_fixed() x86/efi: Remove unused variables in __map_region() x86/efi: Check krealloc return value x86/efi: Runtime services virtual mapping x86/mm/cpa: Map in an arbitrary pgd x86/mm/pageattr: Add last levels of error path x86/mm/pageattr: Add a PUD error unwinding path x86/mm/pageattr: Add a PTE pagetable populating function x86/mm/pageattr: Add a PMD pagetable populating function ...	2014-01-20 12:05:30 -08:00
Linus Torvalds	5d4863e4cc	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 TLB detection update from Ingo Molnar: "A single change that extends our TLB cache size detection+reporting code" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpu: Detect more TLB configuration	2014-01-20 12:04:45 -08:00
Linus Torvalds	2a0fede97f	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpu, amd: Fix a shadowed variable situation um, x86: Fix vDSO build x86: Delete non-required instances of include <linux/init.h> x86, realmode: Pointer walk cleanups, pull out invariant use of __pa() x86/traps: Clean up error exception handler definitions	2014-01-20 12:03:57 -08:00
Linus Torvalds	06bc0f4a2e	Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/build changes from Ingo Molnar: "Misc smaller improvements" * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, boot: Move intcall() to the .inittext section x86, boot: Use .code16 instead of .code16gcc x86, sparse: Do not force removal of __user when calling copy_to/from_user_nocheck()	2014-01-20 12:03:21 -08:00
Linus Torvalds	4cd4156994	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/asm changes from Ingo Molnar: "Misc optimizations" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Slightly tweak the access_ok() C variant for better code x86: Replace assembly access_ok() with a C variant x86-64, copy_user: Use leal to produce 32-bit results x86-64, copy_user: Remove zero byte check before copy user buffer.	2014-01-20 11:51:00 -08:00
Ingo Molnar	741e3902cd	x86/intel/mpx: Remove unused LWP structure We don't support LWP yet, don't give the impression that we do: represent the LWP state as opaque 128 bytes, the way Linux sees it currently. Cc: Qiaowei Ren <qiaowei.ren@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-ecarmjtfKpanpAapfck6dj6g@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-20 19:57:39 +01:00
Linus Torvalds	a0fa1dd3cd	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: - Add the initial implementation of SCHED_DEADLINE support: a real-time scheduling policy where tasks that meet their deadlines and periodically execute their instances in less than their runtime quota see real-time scheduling and won't miss any of their deadlines. Tasks that go over their quota get delayed (Available to privileged users for now) - Clean up and fix preempt_enable_no_resched() abuse all around the tree - Do sched_clock() performance optimizations on x86 and elsewhere - Fix and improve auto-NUMA balancing - Fix and clean up the idle loop - Apply various cleanups and fixes * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) sched: Fix __sched_setscheduler() nice test sched: Move SCHED_RESET_ON_FORK into attr::sched_flags sched: Fix up attr::sched_priority warning sched: Fix up scheduler syscall LTP fails sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls sched/core: Fix htmldocs warnings sched/deadline: No need to check p if dl_se is valid sched/deadline: Remove unused variables sched/deadline: Fix sparse static warnings m68k: Fix build warning in mac_via.h sched, thermal: Clean up preempt_enable_no_resched() abuse sched, net: Fixup busy_loop_us_clock() sched, net: Clean up preempt_enable_no_resched() abuse sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding sched/preempt, locking: Rework local_bh_{dis,en}able() sched/clock, x86: Avoid a runtime condition in native_sched_clock() sched/clock: Fix up clear_sched_clock_stable() sched/clock, x86: Use a static_key for sched_clock_stable sched/clock: Remove local_irq_disable() from the clocks sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs ...	2014-01-20 10:42:08 -08:00
Linus Torvalds	2cc3f16cad	Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull IRQ changes from Ingo Molnar: "The only change in this cycle is a CPU hotplug related spurious warning fix" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Fix kbuild warning in smp_irq_move_cleanup_interrupt() x86/irq: Fix do_IRQ() interrupt warning for cpu hotplug retriggered irqs	2014-01-20 10:27:52 -08:00
Linus Torvalds	6ffbe7d1fa	Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking changes from Ingo Molnar: - futex performance increases: larger hashes, smarter wakeups - mutex debugging improvements - lots of SMP ordering documentation updates - introduce the smp_load_acquire(), smp_store_release() primitives. (There are WIP patches that make use of them - not yet merged) - lockdep micro-optimizations - lockdep improvement: better cover IRQ contexts - liblockdep at last. We'll continue to monitor how useful this is * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) futexes: Fix futex_hashsize initialization arch: Re-sort some Kbuild files to hopefully help avoid some conflicts futexes: Avoid taking the hb->lock if there's nothing to wake up futexes: Document multiprocessor ordering guarantees futexes: Increase hash table size for better performance futexes: Clean up various details arch: Introduce smp_load_acquire(), smp_store_release() arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.h locking/doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE mutexes: Give more informative mutex warning in the !lock->owner case powerpc: Full barrier for smp_mb__after_unlock_lock() rcu: Apply smp_mb__after_unlock_lock() to preserve grace periods Documentation/memory-barriers.txt: Downgrade UNLOCK+BLOCK locking: Add an smp_mb__after_unlock_lock() for UNLOCK+BLOCK barrier Documentation/memory-barriers.txt: Document ACCESS_ONCE() Documentation/memory-barriers.txt: Prohibit speculative writes Documentation/memory-barriers.txt: Add long atomic examples to memory-barriers.txt Documentation/memory-barriers.txt: Add needed ACCESS_ONCE() calls to memory-barriers.txt Revert "smp/cpumask: Make CONFIG_CPUMASK_OFFSTACK=y usable without debug dependency" ...	2014-01-20 10:23:08 -08:00
David S. Miller	4180442058	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c net/ipv4/tcp_metrics.c Overlapping changes between the "don't create two tcp metrics objects with the same key" race fix in net and the addition of the destination address in the lookup key in net-next. Minor overlapping changes in bnx2x driver. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-18 00:55:41 -08:00
Jan Kiszka	cae501397a	KVM: nVMX: Clean up handling of VMX-related MSRs This simplifies the code and also stops issuing warning about writing to unhandled MSRs when VMX is disabled or the Feature Control MSR is locked - we do handle them all according to the spec. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-17 10:22:16 +01:00
Jan Kiszka	73aaf249ee	KVM: SVM: Fix reading of DR6 In contrast to VMX, SVM dose not automatically transfer DR6 into the VCPU's arch.dr6. So if we face a DR6 read, we must consult a new vendor hook to obtain the current value. And as SVM now picks the DR6 state from its VMCB, we also need a set callback in order to write updates of DR6 back. Fixes a regression of `020df0794f`. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-17 10:22:10 +01:00
Vadim Rozenfeld	e984097b55	add support for Hyper-V reference time counter Signed-off: Peter Lieven <pl@kamp.de> Signed-off: Gleb Natapov Signed-off: Vadim Rozenfeld <vrozenfe@redhat.com> After some consideration I decided to submit only Hyper-V reference counters support this time. I will submit iTSC support as a separate patch as soon as it is ready. v1 -> v2 1. mark TSC page dirty as suggested by Eric Northup <digitaleric@google.com> and Gleb 2. disable local irq when calling get_kernel_ns, as it was done by Peter Lieven <pl@amp.de> 3. move check for TSC page enable from second patch to this one. v3 -> v4 Get rid of ref counter offset. v4 -> v5 replace __copy_to_user with kvm_write_guest when updateing iTSC page. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-17 10:22:08 +01:00
Bin Gao	7da7c15613	x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs On SoCs that have the calibration MSRs available, either there is no PIT, HPET or PMTIMER to calibrate against, or the PIT/HPET/PMTIMER is driven from the same clock as the TSC, so calibration is redundant and just slows down the boot. TSC rate is caculated by this formula: <maximum core-clock to bus-clock ratio> * <maximum resolved frequency> The ratio and the resolved frequency ID can be obtained from MSR. See Intel 64 and IA-32 System Programming Guid section 16.12 and 30.11.5 for details. Signed-off-by: Bin Gao <bin.gao@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-rgm7xmg7k6qnjlw3ynkcjsmh@git.kernel.org	2014-01-15 22:28:48 -08:00
Prarit Bhargava	da6139e49c	x86: Add check for number of available vectors before CPU down Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791 When a cpu is downed on a system, the irqs on the cpu are assigned to other cpus. It is possible, however, that when a cpu is downed there aren't enough free vectors on the remaining cpus to account for the vectors from the cpu that is being downed. This results in an interesting "overflow" condition where irqs are "assigned" to a CPU but are not handled. For example, when downing cpus on a 1-64 logical processor system: <snip> [ 232.021745] smpboot: CPU 61 is now offline [ 238.480275] smpboot: CPU 62 is now offline [ 245.991080] ------------[ cut here ]------------ [ 245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x246/0x250() [ 246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out [ 246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas scsi_transport_sas [ 246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14 [ 246.044451] Hardware name: Intel Corporation S4600LH ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [ 246.057371] 0000000000000009 ffff88081fa03d40 ffffffff8164fbf6 ffff88081fa0ee48 [ 246.065728] ffff88081fa03d90 ffff88081fa03d80 ffffffff81054ecc ffff88081fa13040 [ 246.074073] 0000000000000000 ffff88200cce0000 0000000000000040 0000000000000000 [ 246.082430] Call Trace: [ 246.085174] <IRQ> [<ffffffff8164fbf6>] dump_stack+0x46/0x58 [ 246.091633] [<ffffffff81054ecc>] warn_slowpath_common+0x8c/0xc0 [ 246.098352] [<ffffffff81054fb6>] warn_slowpath_fmt+0x46/0x50 [ 246.104786] [<ffffffff815710d6>] dev_watchdog+0x246/0x250 [ 246.110923] [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80 [ 246.119097] [<ffffffff8106092a>] call_timer_fn+0x3a/0x110 [ 246.125224] [<ffffffff8106280f>] ? update_process_times+0x6f/0x80 [ 246.132137] [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80 [ 246.140308] [<ffffffff81061db0>] run_timer_softirq+0x1f0/0x2a0 [ 246.146933] [<ffffffff81059a80>] __do_softirq+0xe0/0x220 [ 246.152976] [<ffffffff8165fedc>] call_softirq+0x1c/0x30 [ 246.158920] [<ffffffff810045f5>] do_softirq+0x55/0x90 [ 246.164670] [<ffffffff81059d35>] irq_exit+0xa5/0xb0 [ 246.170227] [<ffffffff8166062a>] smp_apic_timer_interrupt+0x4a/0x60 [ 246.177324] [<ffffffff8165f40a>] apic_timer_interrupt+0x6a/0x70 [ 246.184041] <EOI> [<ffffffff81505a1b>] ? cpuidle_enter_state+0x5b/0xe0 [ 246.191559] [<ffffffff81505a17>] ? cpuidle_enter_state+0x57/0xe0 [ 246.198374] [<ffffffff81505b5d>] cpuidle_idle_call+0xbd/0x200 [ 246.204900] [<ffffffff8100b7ae>] arch_cpu_idle+0xe/0x30 [ 246.210846] [<ffffffff810a47b0>] cpu_startup_entry+0xd0/0x250 [ 246.217371] [<ffffffff81646b47>] rest_init+0x77/0x80 [ 246.223028] [<ffffffff81d09e8e>] start_kernel+0x3ee/0x3fb [ 246.229165] [<ffffffff81d0989f>] ? repair_env_string+0x5e/0x5e [ 246.235787] [<ffffffff81d095a5>] x86_64_start_reservations+0x2a/0x2c [ 246.242990] [<ffffffff81d0969f>] x86_64_start_kernel+0xf8/0xfc [ 246.249610] ---[ end trace fb74fdef54d79039 ]--- [ 246.254807] ixgbe 0000:c2:00.0 p786p1: initiating reset due to tx timeout [ 246.262489] ixgbe 0000:c2:00.0 p786p1: Reset adapter Last login: Mon Nov 11 08:35:14 from 10.18.17.119 [root@(none) ~]# [ 246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5 [ 249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX [ 246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5 [ 249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX (last lines keep repeating. ixgbe driver is dead until module reload.) If the downed cpu has more vectors than are free on the remaining cpus on the system, it is possible that some vectors are "orphaned" even though they are assigned to a cpu. In this case, since the ixgbe driver had a watchdog, the watchdog fired and notified that something was wrong. This patch adds a function, check_vectors(), to compare the number of vectors on the CPU going down and compares it to the number of vectors available on the system. If there aren't enough vectors for the CPU to go down, an error is returned and propogated back to userspace. v2: Do not need to look at percpu irqs v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest Priority Mode v4: Additional changes suggested by Gong Chen. v5/v6/v7/v8: Updated comment text Signed-off-by: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1389613861-3853-1-git-send-email-prarit@redhat.com Reviewed-by: Gong Chen <gong.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michel Lespinasse <walken@google.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Yang Zhang <yang.z.zhang@Intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Janet Morgan <janet.morgan@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Ruiv Wang <ruiv.wang@gmail.com> Cc: Gong Chen <gong.chen@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org>	2014-01-15 22:24:02 -08:00
David Cohen	bc20aa48bb	x86, intel-mid: Add Merrifield platform support This code was partially based on Mark Brown's previous work. Signed-off-by: David Cohen <david.a.cohen@linux.intel.com> Link: http://lkml.kernel.org/r/1387224459-25746-4-git-send-email-david.a.cohen@linux.intel.com Signed-off-by: Fei Yang <fei.yang@intel.com> Cc: Mark F. Brown <mark.f.brown@intel.com> Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-01-15 14:38:58 -08:00
Kuppuswamy Sathyanarayanan	85611e3feb	x86, intel-mid: Add Clovertrail platform support This patch adds Clovertrail support on intel-mid and makes it more flexible to support other SoCs. Signed-off-by: David Cohen <david.a.cohen@linux.intel.com> Link: http://lkml.kernel.org/r/1387224459-25746-3-git-send-email-david.a.cohen@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Fei Yang <fei.yang@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-01-15 14:38:58 -08:00
Borislav Petkov	3b56496865	x86, cpu, amd: Add workaround for family 16h, erratum 793 This adds the workaround for erratum 793 as a precaution in case not every BIOS implements it. This addresses CVE-2013-6885. Erratum text: [Revision Guide for AMD Family 16h Models 00h-0Fh Processors, document 51810 Rev. 3.04 November 2013] 793 Specific Combination of Writes to Write Combined Memory Types and Locked Instructions May Cause Core Hang Description Under a highly specific and detailed set of internal timing conditions, a locked instruction may trigger a timing sequence whereby the write to a write combined memory type is not flushed, causing the locked instruction to stall indefinitely. Potential Effect on System Processor core hang. Suggested Workaround BIOS should set MSR C001_1020[15] = 1b. Fix Planned No fix planned [ hpa: updated description, fixed typo in MSR name ] Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnic Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-01-14 16:39:07 -08:00
Borislav Petkov	5335ba5cf4	x86, microcode, AMD: Fix early ucode loading The original idea to use the microcode cache for the APs doesn't pan out because we do memory allocation there very early and with IRQs disabled and we don't want to involve GFP_ATOMIC allocations. Not if it can be helped. Thus, extend the caching of the BSP patch approach to the APs and iterate over the ucode in the initrd instead of using the cache. We still save the relevant patches to it but later, right before we jettison the initrd. While at it, fix early ucode loading on 32-bit too. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>	2014-01-13 19:59:38 +01:00
Borislav Petkov	e1b43e3f13	x86, microcode: Share native MSR accessing variants We want to use those in AMD's early loading path too. Also, add a native_wrmsrl variant. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>	2014-01-13 19:57:27 +01:00
Borislav Petkov	5aa3d718f2	x86, ramdisk: Export relocated ramdisk VA The ramdisk can possibly get relocated if the whole image is not mapped. And since we're going over it in the microcode loader and fishing out the relevant microcode patches, we want access it at its new location. Thus, export it. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>	2014-01-13 19:56:25 +01:00
Peter Zijlstra	8cb75e0c4e	sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding With various drivers wanting to inject idle time; we get people calling idle routines outside of the idle loop proper. Therefore we need to be extra careful about not missing TIF_NEED_RESCHED -> PREEMPT_NEED_RESCHED propagations. While looking at this, I also realized there's a small window in the existing idle loop where we can miss TIF_NEED_RESCHED; when it hits right after the tif_need_resched() test at the end of the loop but right before the need_resched() test at the start of the loop. So move preempt_fold_need_resched() out of the loop where we're guaranteed to have TIF_NEED_RESCHED set. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-x9jgh45oeayzajz2mjt0y7d6@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 17:38:55 +01:00
Ingo Molnar	c9c8986847	Merge branch 'x86/idle' into sched/core Merge these x86 specific bits - we are going to add generic bits as well. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 17:37:05 +01:00
Peter Zijlstra	20d1c86a57	sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs Use a ring-buffer like multi-version object structure which allows always having a coherent object; we use this to avoid having to disable IRQs while reading sched_clock() and avoids a problem when getting an NMI while changing the cyc2ns data. MAINLINE PRE POST sched_clock_stable: 1 1 1 (cold) sched_clock: 329841 331312 257223 (cold) local_clock: 301773 310296 309889 (warm) sched_clock: 38375 38247 25280 (warm) local_clock: 100371 102713 85268 (warm) rdtsc: 27340 27289 24247 sched_clock_stable: 0 0 0 (cold) sched_clock: 382634 372706 301224 (cold) local_clock: 396890 399275 399870 (warm) sched_clock: 38194 38124 25630 (warm) local_clock: 143452 148698 129629 (warm) rdtsc: 27345 27365 24307 Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-s567in1e5ekq2nlyhn8f987r@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 15:13:06 +01:00
Peter Zijlstra	57c67da274	sched/clock, x86: Move some cyc2ns() code around There are no __cycles_2_ns() users outside of arch/x86/kernel/tsc.c, so move it there. There are no cycles_2_ns() users. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-01lslnavfgo3kmbo4532zlcj@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:47:39 +01:00
Peter Zijlstra	5dd12c2152	sched/clock, x86: Use mul_u64_u32_shr() for native_sched_clock() Use mul_u64_u32_shr() so that x86_64 can use a single 64x64->128 mul. Before: 0000000000000560 <native_sched_clock>: 560: 44 8b 1d 00 00 00 00 mov 0x0(%rip),%r11d # 567 <native_sched_clock+0x7> 567: 55 push %rbp 568: 48 89 e5 mov %rsp,%rbp 56b: 45 85 db test %r11d,%r11d 56e: 75 4f jne 5bf <native_sched_clock+0x5f> 570: 0f 31 rdtsc 572: 89 c0 mov %eax,%eax 574: 48 c1 e2 20 shl $0x20,%rdx 578: 48 c7 c1 00 00 00 00 mov $0x0,%rcx 57f: 48 09 c2 or %rax,%rdx 582: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 589: 65 8b 04 25 00 00 00 mov %gs:0x0,%eax 590: 00 591: 48 98 cltq 593: 48 8b 34 c5 00 00 00 mov 0x0(,%rax,8),%rsi 59a: 00 59b: 48 89 d0 mov %rdx,%rax 59e: 81 e2 ff 03 00 00 and $0x3ff,%edx 5a4: 48 c1 e8 0a shr $0xa,%rax 5a8: 48 0f af 14 0e imul (%rsi,%rcx,1),%rdx 5ad: 48 0f af 04 0e imul (%rsi,%rcx,1),%rax 5b2: 5d pop %rbp 5b3: 48 03 04 3e add (%rsi,%rdi,1),%rax 5b7: 48 c1 ea 0a shr $0xa,%rdx 5bb: 48 01 d0 add %rdx,%rax 5be: c3 retq After: 0000000000000550 <native_sched_clock>: 550: 8b 3d 00 00 00 00 mov 0x0(%rip),%edi # 556 <native_sched_clock+0x6> 556: 55 push %rbp 557: 48 89 e5 mov %rsp,%rbp 55a: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp 55e: 85 ff test %edi,%edi 560: 75 2c jne 58e <native_sched_clock+0x3e> 562: 0f 31 rdtsc 564: 89 c0 mov %eax,%eax 566: 48 c1 e2 20 shl $0x20,%rdx 56a: 48 09 c2 or %rax,%rdx 56d: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 574: 00 00 576: 89 c0 mov %eax,%eax 578: 48 f7 e2 mul %rdx 57b: 65 48 8b 0c 25 00 00 mov %gs:0x0,%rcx 582: 00 00 584: c9 leaveq 585: 48 0f ac d0 0a shrd $0xa,%rdx,%rax 58a: 48 01 c8 add %rcx,%rax 58d: c3 retq MAINLINE POST sched_clock_stable: 1 1 (cold) sched_clock: 329841 331312 (cold) local_clock: 301773 310296 (warm) sched_clock: 38375 38247 (warm) local_clock: 100371 102713 (warm) rdtsc: 27340 27289 sched_clock_stable: 0 0 (cold) sched_clock: 382634 372706 (cold) local_clock: 396890 399275 (warm) sched_clock: 38194 38124 (warm) local_clock: 143452 148698 (warm) rdtsc: 27345 27365 Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-piu203ses5y1g36bnyw2n16x@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:47:38 +01:00
Ingo Molnar	1c62448e39	Linux 3.13-rc8 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJS0miqAAoJEHm+PkMAQRiGbfgIAJSWEfo8ludknhPcHJabBtxu 75SQAKJlL3sBVnxEc58Rtt8gsKYQIrm4IY5Slunklsn04RxuDUIQMgFoAYR5gQwz +Myqkw/HOqDe5VStGxtLYpWnfglxVwGDCd7ISfL9AOVy5adMWBxh4Tv+qqQc7aIZ eF7dy+DD+C6Q3Z5OoV8s0FZDxse29vOf17Nki7+7t8WMqyegYwjoOqNeqocGKsPi eHLrJgTl4T6jB4l9LKKC154DSKjKOTSwZMWgwK8mToyNLT/ufCiKgXloIjEvZZcY VVKUtncdHiTf+iqVojgpGBzOEeB5DM83iiapFeDiJg8C9yBzvT8lBtA9aPb5Wgw= =lEeV -----END PGP SIGNATURE----- Merge tag 'v3.13-rc8' into core/locking Refresh the tree with the latest fixes, before applying new changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 11:44:41 +01:00
Prarit Bhargava	9345005f4e	x86/irq: Fix do_IRQ() interrupt warning for cpu hotplug retriggered irqs During heavy CPU-hotplug operations the following spurious kernel warnings can trigger: do_IRQ: No ... irq handler for vector (irq -1) [ See: https://bugzilla.kernel.org/show_bug.cgi?id=64831 ] When downing a cpu it is possible that there are unhandled irqs left in the APIC IRR register. The following code path shows how the problem can occur: 1. CPU 5 is to go down. 2. cpu_disable() on CPU 5 executes with interrupt flag cleared by local_irq_save() via stop_machine(). 3. IRQ 12 asserts on CPU 5, setting IRR but not ISR because interrupt flag is cleared (CPU unabled to handle the irq) 4. IRQs are migrated off of CPU 5, and the vectors' irqs are set to -1. 5. stop_machine() finishes cpu_disable() 6. cpu_die() for CPU 5 executes in normal context. 7. CPU 5 attempts to handle IRQ 12 because the IRR is set for IRQ 12. The code attempts to find the vector's IRQ and cannot because it has been set to -1. 8. do_IRQ() warning displays warning about CPU 5 IRQ 12. I added a debug printk to output which CPU & vector was retriggered and discovered that that we are getting bogus events. I see a 100% correlation between this debug printk in fixup_irqs() and the do_IRQ() warning. This patchset resolves this by adding definitions for VECTOR_UNDEFINED(-1) and VECTOR_RETRIGGERED(-2) and modifying the code to use them. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=64831 Signed-off-by: Prarit Bhargava <prarit@redhat.com> Reviewed-by: Rui Wang <rui.y.wang@intel.com> Cc: Michel Lespinasse <walken@google.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Yang Zhang <yang.z.zhang@Intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: janet.morgan@Intel.com Cc: tony.luck@Intel.com Cc: ruiv.wang@gmail.com Link: http://lkml.kernel.org/r/1388938252-16627-1-git-send-email-prarit@redhat.com [ Cleaned up the code a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 13:13:02 +01:00
Peter Zijlstra	47933ad41a	arch: Introduce smp_load_acquire(), smp_store_release() A number of situations currently require the heavyweight smp_mb(), even though there is no need to order prior stores against later loads. Many architectures have much cheaper ways to handle these situations, but the Linux kernel currently has no portable way to make use of them. This commit therefore supplies smp_load_acquire() and smp_store_release() to remedy this situation. The new smp_load_acquire() primitive orders the specified load against any subsequent reads or writes, while the new smp_store_release() primitive orders the specifed store against any prior reads or writes. These primitives allow array-based circular FIFOs to be implemented without an smp_mb(), and also allow a theoretical hole in rcu_assign_pointer() to be closed at no additional expense on most architectures. In addition, the RCU experience transitioning from explicit smp_read_barrier_depends() and smp_wmb() to rcu_dereference() and rcu_assign_pointer(), respectively resulted in substantial improvements in readability. It therefore seems likely that replacing other explicit barriers with smp_load_acquire() and smp_store_release() will provide similar benefits. It appears that roughly half of the explicit barriers in core kernel code might be so replaced. [Changelog by PaulMck] Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 10:37:17 +01:00
Linus Torvalds	26bef1318a	x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround Before we do an EMMS in the AMD FXSAVE information leak workaround we need to clear any pending exceptions, otherwise we trap with a floating-point exception inside this code. Reported-by: halfdog <me@halfdog.net> Tested-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-01-11 19:15:52 -08:00
Bjorn Helgaas	96702be560	Merge branch 'pci/resource' into next * pci/resource: PCI: Allocate 64-bit BARs above 4G when possible PCI: Enforce bus address limits in resource allocation PCI: Split out bridge window override of minimum allocation address agp/ati: Use PCI_COMMAND instead of hard-coded 4 agp/intel: Use CPU physical address, not bus address, for ioremap() agp/intel: Use pci_bus_address() to get GTTADR bus address agp/intel: Use pci_bus_address() to get MMADR bus address agp/intel: Support 64-bit GMADR agp/intel: Rename gtt_bus_addr to gtt_phys_addr drm/i915: Rename gtt_bus_addr to gtt_phys_addr agp: Use pci_resource_start() to get CPU physical address for BAR agp: Support 64-bit APBASE PCI: Add pci_bus_address() to get bus address of a BAR PCI: Convert pcibios_resource_to_bus() to take a pci_bus, not a pci_dev PCI: Change pci_bus_region addresses to dma_addr_t	2014-01-10 14:23:15 -07:00
David E. Box	4618441536	arch: x86: New MailBox support driver for Intel SOC's Current Intel SOC cores use a MailBox Interface (MBI) to provide access to configuration registers on devices (called units) connected to the system fabric. This is a support driver that implements access to this interface on those platforms that can enumerate the device using PCI. Initial support is for BayTrail, for which port definitons are provided. This is a requirement for implementing platform specific features (e.g. RAPL driver requires this to perform platform specific power management using the registers in PUNIT). Dependant modules should select IOSF_MBI in their respective Kconfig configuraiton. Serialized access is handled by all exported routines with spinlocks. The API includes 3 functions for access to unit registers: int iosf_mbi_read(u8 port, u8 opcode, u32 offset, u32 *mdr) int iosf_mbi_write(u8 port, u8 opcode, u32 offset, u32 mdr) int iosf_mbi_modify(u8 port, u8 opcode, u32 offset, u32 mdr, u32 mask) port: indicating the unit being accessed opcode: the read or write port specific opcode offset: the register offset within the port mdr: the register data to be read, written, or modified mask: bit locations in mdr to change Returns nonzero on error Note: GPU code handles access to the GFX unit. Therefore access to that unit with this driver is disallowed to avoid conflicts. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Link: http://lkml.kernel.org/r/1389216471-734-1-git-send-email-david.e.box@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Matthew Garrett <mjg59@srcf.ucam.org>	2014-01-08 14:36:29 -08:00
Yinghai Lu	f75b99d5a7	PCI: Enforce bus address limits in resource allocation When allocating space for 32-bit BARs, we previously limited RESOURCE addresses so they would fit in 32 bits. However, the BUS address need not be the same as the resource address, and it's the bus address that must fit in the 32-bit BAR. This patch adds: - pci_clip_resource_to_region(), which clips a resource so it contains only the range that maps to the specified bus address region, e.g., to clip a resource to 32-bit bus addresses, and - pci_bus_alloc_from_region(), which allocates space for a resource from the specified bus address region, and changes pci_bus_alloc_resource() to allocate space for 64-bit BARs from the entire bus address region, and space for 32-bit BARs from only the bus address region below 4GB. If we had this window: pci_root HWP0002:0a: host bridge window [mem 0xf0180000000-0xf01fedfffff] (bus address [0x80000000-0xfedfffff]) we previously could not put a 32-bit BAR there, because the CPU addresses don't fit in 32 bits. This patch fixes this, so we can use this space for 32-bit BARs. It's also possible (though unlikely) to have resources with 32-bit CPU addresses but bus addresses above 4GB. In this case the previous code would allocate space that a 32-bit BAR could not map. Remove PCIBIOS_MAX_MEM_32, which is no longer used. [bhelgaas: reworked starting from http://lkml.kernel.org/r/1386658484-15774-3-git-send-email-yinghai@kernel.org] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2014-01-07 16:24:33 -07:00
Paul Gortmaker	663b55b9b3	x86: Delete non-required instances of include <linux/init.h> None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. [ hpa: undid incorrect removal from arch/x86/kernel/head_32.S ] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/1389054026-12947-1-git-send-email-paul.gortmaker@windriver.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-01-06 21:25:18 -08:00
David S. Miller	56a4342dfe	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c net/ipv6/ip6_tunnel.c net/ipv6/ip6_vti.c ipv6 tunnel statistic bug fixes conflicting with consolidation into generic sw per-cpu net stats. qlogic conflict between queue counting bug fix and the addition of multiple MAC address support. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 17:37:45 -05:00
Konrad Rzeszutek Wilk	efaf30a335	xen/grant: Implement an grant frame array struct (v3). The 'xen_hvm_resume_frames' used to be an 'unsigned long' and contain the virtual address of the grants. That was OK for most architectures (PVHVM, ARM) were the grants are contiguous in memory. That however is not the case for PVH - in which case we will have to do a lookup for each virtual address for the PFN. Instead of doing that, lets make it a structure which will contain the array of PFNs, the virtual address and the count of said PFNs. Also provide a generic functions: gnttab_setup_auto_xlat_frames and gnttab_free_auto_xlat_frames to populate said structure with appropriate values for PVHVM and ARM. To round it off, change the name from 'xen_hvm_resume_frames' to a more descriptive one - 'xen_auto_xlat_grant_frames'. For PVH, in patch "xen/pvh: Piggyback on PVHVM for grant driver" we will populate the 'xen_auto_xlat_grant_frames' by ourselves. v2 moves the xen_remap in the gnttab_setup_auto_xlat_frames and also introduces xen_unmap for gnttab_free_auto_xlat_frames. Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v3: Based on top of 'asm/xen/page.h: remove redundant semicolon'] Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2014-01-06 10:44:20 -05:00
Mukesh Rathor	fc590efe66	xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn. Most of the functions in page.h are prefaced with if (xen_feature(XENFEAT_auto_translated_physmap)) return mfn; Except the mfn_to_local_pfn. At a first sight, the function should work without this patch - as the 'mfn_to_mfn' has a similar check. But there are no such check in the 'get_phys_to_machine' function - so we would crash in there. This fixes it by following the convention of having the check for auto-xlat in these static functions. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2014-01-06 10:43:56 -05:00
Ingo Molnar	ef0b8b9a52	Linux 3.13-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJSyJVbAAoJEHm+PkMAQRiGa28H/0m7GpZSpT8mvBthITxzqWCq JRkSPS4KTurAWlA5CJMJePyCM30DgN90s06bYUen9sTecZUwnL+qSV5OqAmg2r+0 PrfwtXtGZR6/Y12XlZ/3oFxVfUxjmgJyDAS76TIH1IvIum52nvJmLrR+6AyVphIX DkgBOuapdA7lia+U+ZM1cRkeHxUOKTUEw9v611VgoN3LYZyzyRb6d0rB7JtZN1RV dnXRi27enaPhwxelsCnORioRjsByMwD40CERxfLHmr5CGhmvCehBjO6bJ+KAdp14 52bfwWcNdbFMzUobcR7qlfS3Hy3AYJci+P6JzeeZ+kWEdv/eh5/1lvNuXtBJRlc= =iwzJ -----END PGP SIGNATURE----- Merge tag 'v3.13-rc7' into x86/efi-kexec to resolve conflicts Conflicts: arch/x86/platform/efi/efi.c drivers/firmware/efi/Kconfig Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-05 12:34:29 +01:00
Steven Rostedt	df90ca9690	x86, sparse: Do not force removal of __user when calling copy_to/from_user_nocheck() Commit `ff47ab4ff3` "x86: Add 1/2/4/8 byte optimization to 64bit __copy_{from,to}_user_inatomic" added a "_nocheck" call in between the copy_to/from_user() and copy_user_generic(). As both the normal and nocheck versions of theses calls use the proper __user annotation, a typecast to remove it should not be added. This causes sparse to spin out the following warnings: arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces) arch/x86/include/asm/uaccess_64.h:207:47: expected void const [noderef] <asn:1>src arch/x86/include/asm/uaccess_64.h:207:47: got void const <noident> arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces) arch/x86/include/asm/uaccess_64.h:207:47: expected void const [noderef] <asn:1>src arch/x86/include/asm/uaccess_64.h:207:47: got void const <noident> arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces) arch/x86/include/asm/uaccess_64.h:207:47: expected void const [noderef] <asn:1>src arch/x86/include/asm/uaccess_64.h:207:47: got void const <noident> arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces) arch/x86/include/asm/uaccess_64.h:207:47: expected void const [noderef] <asn:1>src arch/x86/include/asm/uaccess_64.h:207:47: got void const <noident> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20140103164500.5f6478f5@gandalf.local.home Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-01-04 13:54:50 -08:00
Kirill A. Shutemov	dd360393f4	x86, cpu: Detect more TLB configuration The Intel Software Developer’s Manual covers few more TLB configurations exposed as CPUID 2 descriptors: 61H Instruction TLB: 4 KByte pages, fully associative, 48 entries 63H Data TLB: 1 GByte pages, 4-way set associative, 4 entries 76H Instruction TLB: 2M/4M pages, fully associative, 8 entries B5H Instruction TLB: 4KByte pages, 8-way set associative, 64 entries B6H Instruction TLB: 4KByte pages, 8-way set associative, 128 entries C1H Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries C2H DTLB DTLB: 2 MByte/$MByte pages, 4-way associative, 16 entries Let's detect them as well. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/1387801018-14499-1-git-send-email-kirill.shutemov@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-01-03 14:35:42 -08:00
Dave Young	5c12af0c41	x86/efi: parse_efi_setup() build fix In case without CONFIG_EFI, there will be below build error: arch/x86/built-in.o: In function `setup_arch': (.init.text+0x9dc): undefined reference to `parse_efi_setup' Thus fix it by adding blank inline function in asm/efi.h Also remove an unused declaration for variable efi_data_len. Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-01-03 14:38:18 +00:00
Dave Young	456a29ddad	x86: Add xloadflags bit for EFI runtime support on kexec Old kexec-tools can not load new kernels. The reason is kexec-tools does not fill efi_info in x86 setup header previously, thus EFI failed to initialize. In new kexec-tools it will by default to fill efi_info and pass other EFI required infomation to 2nd kernel so kexec kernel EFI initialization can succeed finally. To prevent from breaking userspace, add a new xloadflags bit so kexec-tools can check the flag and switch to old logic. Signed-off-by: Dave Young <dyoung@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2013-12-29 13:09:06 +00:00
Dave Young	1fec053369	x86/efi: Pass necessary EFI data for kexec via setup_data Add a new setup_data type SETUP_EFI for kexec use. Passing the saved fw_vendor, runtime, config tables and EFI runtime mappings. When entering virtual mode, directly mapping the EFI runtime regions which we passed in previously. And skip the step to call SetVirtualAddressMap(). Specially for HP z420 workstation we need save the smbios physical address. The kernel boot sequence proceeds in the following order. Step 2 requires efi.smbios to be the physical address. However, I found that on HP z420 EFI system table has a virtual address of SMBIOS in step 1. Hence, we need set it back to the physical address with the smbios in efi_setup_data. (When it is still the physical address, it simply sets the same value.) 1. efi_init() - Set efi.smbios from EFI system table 2. dmi_scan_machine() - Temporary map efi.smbios to access SMBIOS table 3. efi_enter_virtual_mode() - Map EFI ranges Tested on ovmf+qemu, lenovo thinkpad, a dell laptop and an HP z420 workstation. Signed-off-by: Dave Young <dyoung@redhat.com> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2013-12-29 13:09:05 +00:00
H. Peter Anvin	a740576a4a	x86: Slightly tweak the access_ok() C variant for better code gcc can under very specific circumstances realize that the code sequence: foo += bar; if (foo < bar) ... ... is equivalent to a carry out from the addition. Tweak the implementation of access_ok() (specifically __chk_range_not_ok()) to make it more likely that gcc will make that connection. It isn't fool-proof (sometimes gcc seems to think it can make better code with lea, and ends up with a second comparison), still, but it seems to be able to connect the two more frequently this way. Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/CA%2B55aFzPBdbfKovMT8Edr4SmE2_=%2BOKJFac9XW2awegogTkVTA@mail.gmail.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2013-12-27 17:02:53 -08:00
Linus Torvalds	c5fe5d8068	x86: Replace assembly access_ok() with a C variant It turns out that the assembly variant doesn't actually produce that good code, presumably partly because it creates a long dependency chain with no scheduling, and partly because we cannot get a flags result out of gcc (which could be fixed with asm goto, but it turns out not to be worth it.) The C code allows gcc to schedule and generate multiple (easily predictable) branches, and as a side benefit we can really optimize the case where the size is constant. Link: http://lkml.kernel.org/r/CA%2B55aFzPBdbfKovMT8Edr4SmE2_=%2BOKJFac9XW2awegogTkVTA@mail.gmail.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2013-12-27 16:58:17 -08:00
Dave Young	3b2664964b	x86/efi: Add a wrapper function efi_map_region_fixed() Kexec kernel will use saved runtime virtual mapping, so add a new function efi_map_region_fixed() for directly mapping a md to md->virt. The md is passed in from 1st kernel, the virtual addr is saved in md->virt_addr. Signed-off-by: Dave Young <dyoung@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2013-12-21 15:09:51 +00:00
H.J. Lu	b70fedc158	x86, x32: Use __kernel_long_t/__kernel_ulong_t in x86-64 stat.h Both x32 and x86-64 use the same stat system call interface. But x32 long is 32-bit. This patch changes x86 uapi <asm/stat.h> to use __kernel_long_t/__kernel_ulong_t in x86-64 stat. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Link: http://lkml.kernel.org/r/CAMe9rOquPtWEro0GQ=Z95pZJ=c7GGkSHynjN4FbiB4p445x-Ng@mail.gmail.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-12-20 16:04:35 -08:00
H. Peter Anvin	7e98b71920	x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers Use static_cpu_has() to conditionalize the CLFLUSH workaround, and add memory barriers around it since the documentation is explicit that CLFLUSH is only ordered with respect to MFENCE. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Link: http://lkml.kernel.org/r/CA%2B55aFzGxcML7j8CEvQPYzh0W81uVoAAVmGctMOUZ7CZ1yYd2A@mail.gmail.com	2013-12-19 11:58:16 -08:00
Peter Zijlstra	1682425539	x86, acpi, idle: Restructure the mwait idle routines People seem to delight in writing wrong and broken mwait idle routines; collapse the lot. This leaves mwait_play_dead() the sole remaining user of __mwait() and new __mwait() users are probably doing it wrong. Also remove __sti_mwait() as its unused. Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jacob Jun Pan <jacob.jun.pan@linux.intel.com> Cc: Mike Galbraith <bitbucket@online.de> Cc: Len Brown <lenb@kernel.org> Cc: Rui Zhang <rui.zhang@intel.com> Acked-by: Rafael Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131212141654.616820819@infradead.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-12-19 11:54:44 -08:00
Ingo Molnar	3331e924e7	Linux 3.13-rc4 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iQEcBAABAgAGBQJSrhGrAAoJEHm+PkMAQRiGsNoH/jIK3CsQ2lbW7yRLXmfgtbzz i2Kep6D4SDvmaLpLYOVC8xNYTiE8jtTbSXHomwP5wMZ63MQDhBfnEWsEWqeZ9+D9 3Q46p0QWuoBgYu2VGkoxTfygkT6hhSpwWIi3SeImbY4fg57OHiUil/+YGhORM4Qc K4549OCTY3sIrgmWL77gzqjRUo+pQ4C73NKqZ3+5nlOmYBZC1yugk8mFwEpQkwhK 4NRNU760Fo+XIht/bINqRiPMddzC15p0mxvJy3cDW8bZa1tFSS9SB7AQUULBbcHL +2dFlFOEb5SV1sNiNPrJ0W+h2qUh2e7kPB0F8epaBppgbwVdyQoC2u4uuLV2ZN0= =lI2r -----END PGP SIGNATURE----- Merge tag 'v3.13-rc4' into x86/mm Pick up the latest fixes before applying new patches. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-19 13:56:10 +01:00
Rik van Riel	2084140594	mm: fix TLB flush race between migration, and change_protection_range There are a few subtle races, between change_protection_range (used by mprotect and change_prot_numa) on one side, and NUMA page migration and compaction on the other side. The basic race is that there is a time window between when the PTE gets made non-present (PROT_NONE or NUMA), and the TLB is flushed. During that time, a CPU may continue writing to the page. This is fine most of the time, however compaction or the NUMA migration code may come in, and migrate the page away. When that happens, the CPU may continue writing, through the cached translation, to what is no longer the current memory location of the process. This only affects x86, which has a somewhat optimistic pte_accessible. All other architectures appear to be safe, and will either always flush, or flush whenever there is a valid mapping, even with no permissions (SPARC). The basic race looks like this: CPU A CPU B CPU C load TLB entry make entry PTE/PMD_NUMA fault on entry read/write old page start migrating page change PTE/PMD to new page read/write old page [] flush TLB reload TLB from new entry read/write new page lose data [] the old page may belong to a new user at this point! The obvious fix is to flush remote TLB entries, by making sure that pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may still be accessible if there is a TLB flush pending for the mm. This should fix both NUMA migration and compaction. [mgorman@suse.de: fix build] Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-12-18 19:04:51 -08:00
David S. Miller	143c905494	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/intel/i40e/i40e_main.c drivers/net/macvtap.c Both minor merge hassles, simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-18 16:42:06 -05:00
Francesco Fusco	71ae8aac3e	lib: introduce arch optimized hash library We introduce a new hashing library that is meant to be used in the contexts where speed is more important than uniformity of the hashed values. The hash library leverages architecture specific implementation to achieve high performance and fall backs to jhash() for the generic case. On Intel-based x86 architectures, the library can exploit the crc32l instruction, part of the Intel SSE4.2 instruction set, if the instruction is supported by the processor. This implementation is twice as fast as the jhash() implementation on an i7 processor. Additional architectures, such as Arm64 provide instructions for accelerating the computation of CRC, so they could be added as well in follow-up work. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Thomas Graf <tgraf@redhat.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-17 14:27:17 -05:00
Qiaowei Ren	0ee3b6f87d	x86: replace futex_atomic_cmpxchg_inatomic() with user_atomic_cmpxchg_inatomic futex_atomic_cmpxchg_inatomic() is simply the 32-bit implementation of user_atomic_cmpxchg_inatomic(), which in turn is simply a generalization of the original code in futex_atomic_cmpxchg_inatomic(). Use the newly generalized user_atomic_cmpxchg_inatomic() as the futex implementation, too. [ hpa: retain the inline in futex.h rather than changing it to a macro ] Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com> Link: http://lkml.kernel.org/r/1387002303-6620-2-git-send-email-qiaowei.ren@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>	2013-12-16 09:08:13 -08:00
Qiaowei Ren	f09174c501	x86: add user_atomic_cmpxchg_inatomic at uaccess.h This patch adds user_atomic_cmpxchg_inatomic() to use CMPXCHG instruction against a user space address. This generalizes the already existing futex_atomic_cmpxchg_inatomic() so it can be used in other contexts. This will be used in the upcoming support for Intel MPX (Memory Protection Extensions.) [ hpa: replaced #ifdef inside a macro with IS_ENABLED() ] Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com> Link: http://lkml.kernel.org/r/1387002303-6620-1-git-send-email-qiaowei.ren@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>	2013-12-16 09:07:57 -08:00
DuanZhenzhong	ac8344c4c0	PCI: Drop "irq" param from _restore_msi_irqs() Change x86_msi.restore_msi_irqs(struct pci_dev dev, int irq) to x86_msi.restore_msi_irqs(struct pci_dev *dev). restore_msi_irqs() restores multiple MSI-X IRQs, so param 'int irq' is unneeded. This makes code more consistent between vm and bare metal. Dom0 MSI-X restore code can also be optimized as XEN only has a hypercall to restore all MSI-X vectors at one time. Tested-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-12-13 08:44:30 -07:00
Jan Kiszka	6dfacadd58	KVM: nVMX: Add support for activity state HLT We can easily emulate the HLT activity state for L1: If it decides that L2 shall be halted on entry, just invoke the normal emulation of halt after switching to L2. We do not depend on specific host features to provide this, so we can expose the capability unconditionally. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-12 10:49:56 +01:00
Peter Zijlstra	ba1f14fbe7	sched: Remove PREEMPT_NEED_RESCHED from generic code While hunting a preemption issue with Alexander, Ben noticed that the currently generic PREEMPT_NEED_RESCHED stuff is horribly broken for load-store architectures. We currently rely on the IPI to fold TIF_NEED_RESCHED into PREEMPT_NEED_RESCHED, but when this IPI lands while we already have a load for the preempt-count but before the store, the store will erase the PREEMPT_NEED_RESCHED change. The current preempt-count only works on load-store archs because interrupts are assumed to be completely balanced wrt their preempt_count fiddling; the previous preempt_count load will match the preempt_count state after the interrupt and therefore nothing gets lost. This patch removes the PREEMPT_NEED_RESCHED usage from generic code and pushes it into x86 arch code; the generic code goes back to relying on TIF_NEED_RESCHED. Boot tested on x86_64 and compile tested on ppc64. Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reported-and-Tested-by: Alexander Graf <agraf@suse.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20131128132641.GP10022@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-11 15:52:32 +01:00
Qiaowei Ren	e7d820a5e5	x86, xsave: Support eager-only xsave features, add MPX support Some features, like Intel MPX, work only if the kernel uses eagerfpu model. So we should force eagerfpu on unless the user has explicitly disabled it. Add definitions for Intel MPX and add it to the supported list. [ hpa: renamed XSTATE_FLEXIBLE to XSTATE_LAZY and added comments ] Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com> Link: http://lkml.kernel.org/r/9E0BE1322F2F2246BD820DA9FC397ADE014A6115@SHSMSX102.ccr.corp.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-12-06 17:17:42 -08:00
Qiaowei Ren	191f57c137	x86, cpufeature: Define the Intel MPX feature flag Define the Intel MPX (Memory Protection Extensions) CPU feature flag in the cpufeature list. Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com> Link: http://lkml.kernel.org/r/1386375658-2191-2-git-send-email-qiaowei.ren@intel.com Signed-off-by: Xudong Hao <xudong.hao@intel.com> Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-12-06 10:21:44 -08:00
Linus Torvalds	53c6de5026	Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 and EFI fixes from Peter Anvin: "Half of these are EFI-related: The by far biggest change is the change to hold off the deletion of a sysfs entry while a backend scan is in progress. This is to avoid calling kmemdup() while under a spinlock. The other major change is for each entry in the EFI pstore backend to get a unique identifier, as required by the pstore filesystem proper. The other changes are: A fix to the recent consolidation and optimization of using "asm goto" with read-modify-write operation, which broke the bitops; specifically in such a way that we could end up generating invalid code. A build hack to make sure we compile with -mno-sse. icc, and most likely future versions of gcc, can generate SSE instructions unless we tell it not to. A comment-only patch to a change the was due in part to an unpublished erratum; now when the erratum is published we want to add a comment explaining why" * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic, doc: Justification for disabling IO APIC before Local APIC x86, bitops: Correct the assembly constraints to testing bitops x86-64, build: Always pass in -mno-sse efi-pstore: Make efi-pstore return a unique id x86/efi: Fix earlyprintk off-by-one bug efivars, efi-pstore: Hold off deletion of sysfs entry until the scan is completed	2013-12-04 21:45:21 -08:00
H. Peter Anvin	e0f6dec35f	x86, bitops: Correct the assembly constraints to testing bitops In checkin: `0c44c2d0f4` x86: Use asm goto to implement better modify_and_test() functions the various functions which do modify and test were unified and optimized using "asm goto". However, this change missed the detail that the bitops require an "Ir" constraint rather than an "er" constraint ("I" = integer constant from 0-31, "e" = signed 32-bit integer constant). This would cause code to miscompile if these functions were used on constant bit positions 32-255 and the build to fail if used on constant bit positions above 255. Add the constraints as a parameter to the GEN_BINARY_RMWcc() macro to avoid this problem. Reported-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/529E8719.4070202@zytor.com	2013-12-04 14:31:28 -08:00
Linus Torvalds	e321ae4c20	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc kernel and tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools lib traceevent: Fix conversion of pointer to integer of different size perf/trace: Properly use u64 to hold event_id perf: Remove fragile swevent hlist optimization ftrace, perf: Avoid infinite event generation loop tools lib traceevent: Fix use of multiple options in processing field perf header: Fix possible memory leaks in process_group_desc() perf header: Fix bogus group name perf tools: Tag thread comm as overriden	2013-12-02 10:13:09 -08:00
Ingo Molnar	61d0669775	* New static EFI runtime services virtual mapping layout which is groundwork for kexec support on EFI - Borislav Petkov -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQIcBAABAgAGBQJSfMDjAAoJEC84WcCNIz1V0LIP/2WyTbJR6bL0HXwnGLpxmxag v0VgnKRhypNboA3WEu4a9as6AdExqB7qsiWIipHuDSMj/vkfZgHAKTd2f1iRxmsJ RZxzwV6YzvsWkdXjvpCoLWKSsvQDug++BAIti1PitW6RXRjo01t3ymo/Ho1CQrpI hNJbB3bbihMF+uqFvdSpO0KZtZE6EtnylrfBeuo0GzqqJdTGe1MmqlWmyUlEy5JW ZiHV8E/xTjh3N675tWPcT9hGVfCyOXXu/kPrXsJTXrdYyZL9qgA9b8SVRLs6DctX wVgL9lNv4wobsmZJ5DxkYl9+TaF7rbshUeIJbzrQyMVJjb3TpXk/ZpspDMAEjL7e bb76c1bAx4xZuUatR/f1ykkWKAEryAhXHkvwcbIBjebW33if1MgGJLk5udJQQv6H j+J9ROH38MDr0Geg+pM2RnCyTz8l+q+8Mfu4Yh9TSte+ttB6fr9phs3/G+fdSUn1 0vI627v1HWzDcBh4eZjjslzJviR8PldsGVT3EsIaOnHGtk/9FPz/7n4efph4v7+9 yqTkLvQHxsAx7f0tR/qRpkEIQ9WTMXO0IO79OC13QTSATJSl+WSPTJM7ccqOgn+P h89ssBnzlwmFHkuvTi599KVHdzOZrWTsB2zROn+NnSchJ+YAY6TXwznk3/MNo9rB d+euVcffL4yfWZ7Bzj8Z =w6ff -----END PGP SIGNATURE----- Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/efi Pull EFI virtual mapping changes from Matt Fleming: * New static EFI runtime services virtual mapping layout which is groundwork for kexec support on EFI. (Borislav Petkov) Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-11-26 12:23:04 +01:00
Linus Torvalds	26b265cd29	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: - Made x86 ablk_helper generic for ARM - Phase out chainiv in favour of eseqiv (affects IPsec) - Fixed aes-cbc IV corruption on s390 - Added constant-time crypto_memneq which replaces memcmp - Fixed aes-ctr in omap-aes - Added OMAP3 ROM RNG support - Add PRNG support for MSM SoC's - Add and use Job Ring API in caam - Misc fixes [ NOTE! This pull request was sent within the merge window, but Herbert has some questionable email sending setup that makes him public enemy #1 as far as gmail is concerned. So most of his emails seem to be trapped by gmail as spam, resulting in me not seeing them. - Linus ] * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (49 commits) crypto: s390 - Fix aes-cbc IV corruption crypto: omap-aes - Fix CTR mode counter length crypto: omap-sham - Add missing modalias padata: make the sequence counter an atomic_t crypto: caam - Modify the interface layers to use JR API's crypto: caam - Add API's to allocate/free Job Rings crypto: caam - Add Platform driver for Job Ring hwrng: msm - Add PRNG support for MSM SoC's ARM: DT: msm: Add Qualcomm's PRNG driver binding document crypto: skcipher - Use eseqiv even on UP machines crypto: talitos - Simplify key parsing crypto: picoxcell - Simplify and harden key parsing crypto: ixp4xx - Simplify and harden key parsing crypto: authencesn - Simplify key parsing crypto: authenc - Export key parsing helper function crypto: mv_cesa: remove deprecated IRQF_DISABLED hwrng: OMAP3 ROM Random Number Generator support crypto: sha256_ssse3 - also test for BMI2 crypto: mv_cesa - Remove redundant of_match_ptr crypto: sahara - Remove redundant of_match_ptr ...	2013-11-23 16:18:25 -08:00
Linus Torvalds	82023bb7f7	More ACPI and power management updates for 3.13-rc1 - ACPI-based device hotplug fixes for issues introduced recently and a fix for an older error code path bug in the ACPI PCI host bridge driver. - Fix for recently broken OMAP cpufreq build from Viresh Kumar. - Fix for a recent hibernation regression related to s2disk. - Fix for a locking-related regression in the ACPI EC driver from Puneet Kumar. - System suspend error code path fix related to runtime PM and runtime PM documentation update from Ulf Hansson. - cpufreq's conservative governor fix from Xiaoguang Chen. - New processor IDs for intel_idle and turbostat and removal of an obsolete Kconfig option from Len Brown. - New device IDs for the ACPI LPSS (Low-Power Subsystem) driver and ACPI-based PCI hotplug (ACPIPHP) cleanup from Mika Westerberg. - Removal of several ACPI video DMI blacklist entries that are not necessary any more from Aaron Lu. - Rework of the ACPI companion representation in struct device and code cleanup related to that change from Rafael J Wysocki, Lan Tianyu and Jarkko Nikula. - Fixes for assigning names to ACPI-enumerated I2C and SPI devices from Jarkko Nikula. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABCAAGBQJSjLYNAAoJEILEb/54YlRxkEQP/1pmFWNwSsxLtTHd+PEs0Xbo QccqvjQrnw/c8GcmK4eZrz6/xyuepmmjy9kfRKj2ENZniy0NEsSFqkTdSO3vYlva 8HKWUj7MV3evhFERXAF6Tu0b4Enx4jOP7VMtmYxJo3qrSnKRUcUzc6DGv/ACsUT1 Nkj0Lhdsg053Z+YzIXrl50w0tCDEMhVmWlMHBtYgr+dMNVnkfPBGkqMblMkKCXT2 w/yHvauZlxQHtI+8bVqTuGgNN0CPzdlpFGiuUF+5mDf6dRX8zlSn56Ia+Wyw1k9X dQp4jYQOgPRo03rNKqQPDiPxUdc7T0RAHRvDB51Ncweuh5PfZGguQe71p6/LKY2W i6zblZ0f/vc13hTiMrP+qzKcwZvgPB5DH7SfnHr61JKV7GNFCdYAqoceS5hYMzR9 d2Fd+txgm763IHWewXfDS/G2cU492R5qr4jpmUIACBQKWDZcqmSRDwRj83t56Ltb jgFBMbg4vZxG7IARhind74xsALxdhsgmFjPmx+0qPWjYxcU8otQZpXbgGNI9iOuW pxIQv5WPQW0tTmwO4HSuVCOwDPLPz5R0jkev7SvSj3Ek3TeD7He4LmnK055CATiC puq+6dp1FISPOPJYk+0DI61qN/CB/qNwRp8LU3ctZwudPVhznIE9FFQ3iN1FdBg2 X8VDcT9t7VvVuxSBjgkj =QMp+ -----END PGP SIGNATURE----- Merge tag 'pm+acpi-2-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI and power management updates from Rafael Wysocki: - ACPI-based device hotplug fixes for issues introduced recently and a fix for an older error code path bug in the ACPI PCI host bridge driver - Fix for recently broken OMAP cpufreq build from Viresh Kumar - Fix for a recent hibernation regression related to s2disk - Fix for a locking-related regression in the ACPI EC driver from Puneet Kumar - System suspend error code path fix related to runtime PM and runtime PM documentation update from Ulf Hansson - cpufreq's conservative governor fix from Xiaoguang Chen - New processor IDs for intel_idle and turbostat and removal of an obsolete Kconfig option from Len Brown - New device IDs for the ACPI LPSS (Low-Power Subsystem) driver and ACPI-based PCI hotplug (ACPIPHP) cleanup from Mika Westerberg - Removal of several ACPI video DMI blacklist entries that are not necessary any more from Aaron Lu - Rework of the ACPI companion representation in struct device and code cleanup related to that change from Rafael J Wysocki, Lan Tianyu and Jarkko Nikula - Fixes for assigning names to ACPI-enumerated I2C and SPI devices from Jarkko Nikula * tag 'pm+acpi-2-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (24 commits) PCI / hotplug / ACPI: Drop unused acpiphp_debug declaration ACPI / scan: Set flags.match_driver in acpi_bus_scan_fixed() ACPI / PCI root: Clear driver_data before failing enumeration ACPI / hotplug: Fix PCI host bridge hot removal ACPI / hotplug: Fix acpi_bus_get_device() return value check cpufreq: governor: Remove fossil comment in the cpufreq_governor_dbs() ACPI / video: clean up DMI table for initial black screen problem ACPI / EC: Ensure lock is acquired before accessing ec struct members PM / Hibernate: Do not crash kernel in free_basic_memory_bitmaps() ACPI / AC: Remove struct acpi_device pointer from struct acpi_ac spi: Use stable dev_name for ACPI enumerated SPI slaves i2c: Use stable dev_name for ACPI enumerated I2C slaves ACPI: Provide acpi_dev_name accessor for struct acpi_device device name ACPI / bind: Use (put\|get)_device() on ACPI device objects too ACPI: Eliminate the DEVICE_ACPI_HANDLE() macro ACPI / driver core: Store an ACPI device pointer in struct acpi_dev_node cpufreq: OMAP: Fix compilation error 'r & ret undeclared' PM / Runtime: Fix error path for prepare PM / Runtime: Update documentation around probe\|remove\|suspend cpufreq: conservative: set requested_freq to policy max when it is over policy max ...	2013-11-20 13:25:04 -08:00
Linus Torvalds	4007162647	Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq cleanups from Ingo Molnar: "This is a multi-arch cleanup series from Thomas Gleixner, which we kept to near the end of the merge window, to not interfere with architecture updates. This series (motivated by the -rt kernel) unifies more aspects of IRQ handling and generalizes PREEMPT_ACTIVE" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: preempt: Make PREEMPT_ACTIVE generic sparc: Use preempt_schedule_irq ia64: Use preempt_schedule_irq m32r: Use preempt_schedule_irq hardirq: Make hardirq bits generic m68k: Simplify low level interrupt handling code genirq: Prevent spurious detection for unconditionally polled interrupts	2013-11-19 10:40:00 -08:00
Peter Zijlstra	d5b5f391d4	ftrace, perf: Avoid infinite event generation loop Vince's perf-trinity fuzzer found yet another 'interesting' problem. When we sample the irq_work_exit tracepoint with period==1 (or PERF_SAMPLE_PERIOD) and we add an fasync SIGNAL handler we create an infinite event generation loop: ,-> <IPI> \| irq_work_exit() -> \| trace_irq_work_exit() -> \| ... \| __perf_event_overflow() -> (due to fasync) \| irq_work_queue() -> (irq_work_list must be empty) '--------- arch_irq_work_raise() Similar things can happen due to regular poll() wakeups if we exceed the ring-buffer wakeup watermark, or have an event_limit. To avoid this, dis-allow sampling this particular tracepoint. In order to achieve this, create a special perf_perm function pointer for each event and call this (when set) on trying to create a tracepoint perf event. [ roasted: use expr... to allow for ',' in your expression ] Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Dave Jones <davej@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20131114152304.GC5364@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-11-19 16:57:40 +01:00
Kirill A. Shutemov	fd8526ad14	x86/mm: Implement ASLR for hugetlb mappings Matthew noticed that hugetlb mappings don't participate in ASLR on x86-64: % for i in `seq 3`; do > tools/testing/selftests/vm/map_hugetlb \| grep address > done Returned address is 0x2aaaaac00000 Returned address is 0x2aaaaac00000 Returned address is 0x2aaaaac00000 /proc/PID/maps entries for the mapping are always the same (except inode number): 2aaaaac00000-2aaabac00000 rw-p 00000000 00:0c 8200 /anon_hugepage (deleted) 2aaaaac00000-2aaabac00000 rw-p 00000000 00:0c 256 /anon_hugepage (deleted) 2aaaaac00000-2aaabac00000 rw-p 00000000 00:0c 7180 /anon_hugepage (deleted) The reason is the generic hugetlb_get_unmapped_area() function which is used on x86-64. It doesn't support randomization and use bottom-up unmapped area lookup, instead of usual top-down on x86-64. x86 has arch-specific hugetlb_get_unmapped_area(), but it's used only on x86-32. Let's use arch-specific hugetlb_get_unmapped_area() on x86-64 too. That adds ASLR and switches hugetlb mappings to use top-down unmapped area lookup: % for i in `seq 3`; do > tools/testing/selftests/vm/map_hugetlb \| grep address > done Returned address is 0x7f4f08a00000 Returned address is 0x7fdda4200000 Returned address is 0x7febe0000000 /proc/PID/maps entries: 7f4f08a00000-7f4f18a00000 rw-p 00000000 00:0c 1168 /anon_hugepage (deleted) 7fdda4200000-7fddb4200000 rw-p 00000000 00:0c 7092 /anon_hugepage (deleted) 7febe0000000-7febf0000000 rw-p 00000000 00:0c 7183 /anon_hugepage (deleted) Unmapped area lookup policy for hugetlb mappings is consistent with normal mappings now -- the only difference is alignment requirements for huge pages. libhugetlbfs test-suite didn't detect any regressions with the patch applied (although it shows few failures on my machine regardless the patch). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mel Gorman <mgorman@suse.de> Link: http://lkml.kernel.org/r/20131119131750.EA45CE0090@blue.fi.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-11-19 14:24:50 +01:00
Cyrill Gorcunov	5305ca10e7	x86/mm: Unify pte_to_pgoff() and pgoff_to_pte() helpers Use unified pte_bitop() helper to manipulate bits in pte/pgoff bitfields and convert pte_to_pgoff()/pgoff_to_pte() to inlines. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-11-19 14:24:34 +01:00
Rafael J. Wysocki	6431b43097	Merge branch 'pm-tools' * pm-tools: tools / power turbostat: Support Silvermont	2013-11-19 01:06:38 +01:00
Linus Torvalds	f080480488	Here are the 3.13 KVM changes. There was a lot of work on the PPC side: the HV and emulation flavors can now coexist in a single kernel is probably the most interesting change from a user point of view. On the x86 side there are nested virtualization improvements and a few bugfixes. ARM got transparent huge page support, improved overcommit, and support for big endian guests. Finally, there is a new interface to connect KVM with VFIO. This helps with devices that use NoSnoop PCI transactions, letting the driver in the guest execute WBINVD instructions. This includes some nVidia cards on Windows, that fail to start without these patches and the corresponding userspace changes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJShPAhAAoJEBvWZb6bTYbyl48P/297GgmELHAGBgjvb6q7yyGu L8+eHjKbh4XBAkPwyzbvUjuww5z2hM0N3JQ0BDV9oeXlO+zwwCEns/sg2Q5/NJXq XxnTeShaKnp9lqVBnE6G9rAOUWKoyLJ2wItlvUL8JlaO9xJ0Vmk0ta4n2Nv5GqDp db6UD7vju6rHtIAhNpvvAO51kAOwc01xxRixCVb7KUYOnmO9nvpixzoI/S0Rp1gu w/OWMfCosDzBoT+cOe79Yx1OKcpaVW94X6CH1s+ShCw3wcbCL2f13Ka8/E3FIcuq vkZaLBxio7vjUAHRjPObw0XBW4InXEbhI1DjzIvm8dmc4VsgmtLQkTCG8fj+jINc dlHQUq6Do+1F4zy6WMBUj8tNeP1Z9DsABp98rQwR8+BwHoQpGQBpAxW0TE0ZMngC t1caqyvjZ5pPpFUxSrAV+8Kg4AvobXPYOim0vqV7Qea07KhFcBXLCfF7BWdwq/Jc 0CAOlsLL4mHGIQWZJuVGw0YGP7oATDCyewlBuDObx+szYCoV4fQGZVBEL0KwJx/1 7lrLN7JWzRyw6xTgJ5VVwgYE1tUY4IFQcHu7/5N+dw8/xg9KWA3f4PeMavIKSf+R qteewbtmQsxUnvuQIBHLs8NRWPnBPy+F3Sc2ckeOLIe4pmfTte6shtTXcLDL+LqH NTmT/cfmYp2BRkiCfCiS =rWNf -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM changes from Paolo Bonzini: "Here are the 3.13 KVM changes. There was a lot of work on the PPC side: the HV and emulation flavors can now coexist in a single kernel is probably the most interesting change from a user point of view. On the x86 side there are nested virtualization improvements and a few bugfixes. ARM got transparent huge page support, improved overcommit, and support for big endian guests. Finally, there is a new interface to connect KVM with VFIO. This helps with devices that use NoSnoop PCI transactions, letting the driver in the guest execute WBINVD instructions. This includes some nVidia cards on Windows, that fail to start without these patches and the corresponding userspace changes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits) kvm, vmx: Fix lazy FPU on nested guest arm/arm64: KVM: PSCI: propagate caller endianness to the incoming vcpu arm/arm64: KVM: MMIO support for BE guest kvm, cpuid: Fix sparse warning kvm: Delete prototype for non-existent function kvm_check_iopl kvm: Delete prototype for non-existent function complete_pio hung_task: add method to reset detector pvclock: detect watchdog reset at pvclock read kvm: optimize out smp_mb after srcu_read_unlock srcu: API for barrier after srcu read unlock KVM: remove vm mmap method KVM: IOMMU: hva align mapping page size KVM: x86: trace cpuid emulation when called from emulator KVM: emulator: cleanup decode_register_operand() a bit KVM: emulator: check rex prefix inside decode_register() KVM: x86: fix emulation of "movzbl %bpl, %eax" kvm_host: typo fix KVM: x86: emulate SAHF instruction MAINTAINERS: add tree for kvm.git Documentation/kvm: add a 00-INDEX file ...	2013-11-15 13:51:36 +09:00
Linus Torvalds	eda670c626	Features: - SWIOTLB has tracing added when doing bounce buffer. - Xen ARM/ARM64 can use Xen-SWIOTLB. This work allows Linux to safely program real devices for DMA operations when running as a guest on Xen on ARM, without IOMMU support.1 - xen_raw_printk works with PVHVM guests if needed. Bug-fixes: - Make memory ballooning work under HVM with large MMIO region. - Inform hypervisor of MCFG regions found in ACPI DSDT. - Remove deprecated IRQF_DISABLED. - Remove deprecated __cpuinit. [1]: "On arm and arm64 all Xen guests, including dom0, run with second stage translation enabled. As a consequence when dom0 programs a device for a DMA operation is going to use (pseudo) physical addresses instead machine addresses. This work introduces two trees to track physical to machine and machine to physical mappings of foreign pages. Local pages are assumed mapped 1:1 (physical address == machine address). It enables the SWIOTLB-Xen driver on ARM and ARM64, so that Linux can translate physical addresses to machine addresses for dma operations when necessary. " (Stefano). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iQEcBAABAgAGBQJSgS86AAoJEFjIrFwIi8fJpY4H/R2gke1A1p9UvTwbkaDhgPs/ u/mkI6aH+ktgvu5QZNprki660uydtc4Ck7y8leeLGYw+ed1Ys559SJhRc/x8jBYZ Hh2chnplld0LAjSpdIDTTePArE1xBo4Gz+fT0zc5cVh0leJwOXn92Kx8N5AWD/T3 gwH4Ok4K1dzZBIls7imM2AM/L1xcApcx3Dl/QpNcoePQtR4yLuPWMUbb3LM8pbUY 0B6ZVN4GOhtJ84z8HRKnh4uMnBYmhmky6laTlHVa6L+j1fv7aAPCdNbePjIt/Pvj HVYB1O/ht73yHw0zGfK6lhoGG8zlu+Q7sgiut9UsGZZfh34+BRKzNTypqJ3ezQo= =xc43 -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.13-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen updates from Konrad Rzeszutek Wilk: "This has tons of fixes and two major features which are concentrated around the Xen SWIOTLB library. The short <blurb> is that the tracing facility (just one function) has been added to SWIOTLB to make it easier to track I/O progress. Additionally under Xen and ARM (32 & 64) the Xen-SWIOTLB driver "is used to translate physical to machine and machine to physical addresses of foreign[guest] pages for DMA operations" (Stefano) when booting under hardware without proper IOMMU. There are also bug-fixes, cleanups, compile warning fixes, etc. The commit times for some of the commits is a bit fresh - that is b/c we wanted to make sure we have the Ack's from the ARM folks - which with the string of back-to-back conferences took a bit of time. Rest assured - the code has been stewing in #linux-next for some time. Features: - SWIOTLB has tracing added when doing bounce buffer. - Xen ARM/ARM64 can use Xen-SWIOTLB. This work allows Linux to safely program real devices for DMA operations when running as a guest on Xen on ARM, without IOMMU support. [1] - xen_raw_printk works with PVHVM guests if needed. Bug-fixes: - Make memory ballooning work under HVM with large MMIO region. - Inform hypervisor of MCFG regions found in ACPI DSDT. - Remove deprecated IRQF_DISABLED. - Remove deprecated __cpuinit. [1]: "On arm and arm64 all Xen guests, including dom0, run with second stage translation enabled. As a consequence when dom0 programs a device for a DMA operation is going to use (pseudo) physical addresses instead machine addresses. This work introduces two trees to track physical to machine and machine to physical mappings of foreign pages. Local pages are assumed mapped 1:1 (physical address == machine address). It enables the SWIOTLB-Xen driver on ARM and ARM64, so that Linux can translate physical addresses to machine addresses for dma operations when necessary. " (Stefano)" * tag 'stable/for-linus-3.13-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (32 commits) xen/arm: pfn_to_mfn and mfn_to_pfn return the argument if nothing is in the p2m arm,arm64/include/asm/io.h: define struct bio_vec swiotlb-xen: missing include dma-direction.h pci-swiotlb-xen: call pci_request_acs only ifdef CONFIG_PCI arm: make SWIOTLB available xen: delete new instances of added __cpuinit xen/balloon: Set balloon's initial state to number of existing RAM pages xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas. xen: remove deprecated IRQF_DISABLED x86/xen: remove deprecated IRQF_DISABLED swiotlb-xen: fix error code returned by xen_swiotlb_map_sg_attrs swiotlb-xen: static inline xen_phys_to_bus, xen_bus_to_phys, xen_virt_to_bus and range_straddles_page_boundary grant-table: call set_phys_to_machine after mapping grant refs arm,arm64: do not always merge biovec if we are running on Xen swiotlb: print a warning when the swiotlb is full swiotlb-xen: use xen_dma_map/unmap_page, xen_dma_sync_single_for_cpu/device xen: introduce xen_dma_map/unmap_page and xen_dma_sync_single_for_cpu/device tracing/events: Fix swiotlb tracepoint creation swiotlb-xen: use xen_alloc/free_coherent_pages xen: introduce xen_alloc/free_coherent_pages ...	2013-11-15 13:34:37 +09:00
Kirill A. Shutemov	9491846fca	x86, mm: enable split page table lock for PMD level Enable PMD split page table lock for X86_64 and PAE. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Alex Thorlton <athorlton@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Robin Holt <robinmholt@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-15 09:32:15 +09:00
Rafael J. Wysocki	7b1998116b	ACPI / driver core: Store an ACPI device pointer in struct acpi_dev_node Modify struct acpi_dev_node to contain a pointer to struct acpi_device associated with the given device object (that is, its ACPI companion device) instead of an ACPI handle corresponding to it. Introduce two new macros for manipulating that pointer in a CONFIG_ACPI-safe way, ACPI_COMPANION() and ACPI_COMPANION_SET(), and rework the ACPI_HANDLE() macro to take the above changes into account. Drop the ACPI_HANDLE_SET() macro entirely and rework its users to use ACPI_COMPANION_SET() instead. For some of them who used to pass the result of acpi_get_child() directly to ACPI_HANDLE_SET() introduce a helper routine acpi_preset_companion() doing an equivalent thing. The main motivation for doing this is that there are things represented by struct acpi_device objects that don't have valid ACPI handles (so called fixed ACPI hardware features, such as power and sleep buttons) and we would like to create platform device objects for them and "glue" them to their ACPI companions in the usual way (which currently is impossible due to the lack of valid ACPI handles). However, there are more reasons why it may be useful. First, struct acpi_device pointers allow of much better type checking than void pointers which are ACPI handles, so it should be more difficult to write buggy code using modified struct acpi_dev_node and the new macros. Second, the change should help to reduce (over time) the number of places in which the result of ACPI_HANDLE() is passed to acpi_bus_get_device() in order to obtain a pointer to the struct acpi_device associated with the given "physical" device, because now that pointer is returned by ACPI_COMPANION() directly. Finally, the change should make it easier to write generic code that will build both for CONFIG_ACPI set and unset without adding explicit compiler directives to it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> # on Haswell Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Aaron Lu <aaron.lu@intel.com> # for ATA and SDIO part	2013-11-14 23:14:43 +01:00
Linus Torvalds	d320e203ba	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull two x86 fixes from Ingo Molnar. * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/amd: Tone down printk(), don't treat a missing firmware file as an error x86/dumpstack: Fix printk_address for direct addresses	2013-11-14 16:55:56 +09:00
Linus Torvalds	7971e23a66	Merge branch 'x86-trace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/trace changes from Ingo Molnar: "This adds page fault tracepoints which have zero runtime cost in the disabled case via IDT trickery (no NOPs in the page fault hotpath)" * 'x86-trace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, trace: Change user\|kernel_page_fault to page_fault_user\|kernel x86, trace: Add page fault tracepoints x86, trace: Delete __trace_alloc_intr_gate() x86, trace: Register exception handler to trace IDT x86, trace: Remove __alloc_intr_gate()	2013-11-14 16:25:10 +09:00
Linus Torvalds	2f466d33f5	PCI changes for the v3.13 merge window: Resource management - Fix host bridge window coalescing (Alexey Neyman) - Pass type, width, and prefetchability for window alignment (Wei Yang) PCI device hotplug - Convert acpiphp, acpiphp_ibm to dynamic debug (Lan Tianyu) Power management - Remove pci_pm_complete() (Liu Chuansheng) MSI - Fail initialization if device is not in PCI_D0 (Yijing Wang) MPS (Max Payload Size) - Use pcie_get_mps() and pcie_set_mps() to simplify code (Yijing Wang) - Use pcie_set_readrq() to simplify code (Yijing Wang) - Use cached pci_dev->pcie_mpss to simplify code (Yijing Wang) SR-IOV - Enable upstream bridges even for VFs on virtual buses (Bjorn Helgaas) - Use pci_is_root_bus() to avoid catching virtual buses (Wei Yang) Virtualization - Add x86 MSI masking ops (Konrad Rzeszutek Wilk) Freescale i.MX6 - Support i.MX6 PCIe controller (Sean Cross) - Increase link startup timeout (Marek Vasut) - Probe PCIe in fs_initcall() (Marek Vasut) - Fix imprecise abort handler (Tim Harvey) - Remove redundant of_match_ptr (Sachin Kamat) Renesas R-Car - Support Gen2 internal PCIe controller (Valentine Barshak) Samsung Exynos - Add MSI support (Jingoo Han) - Turn off power when link fails (Jingoo Han) - Add Jingoo Han as maintainer (Jingoo Han) - Add clk_disable_unprepare() on error path (Wei Yongjun) - Remove redundant of_match_ptr (Sachin Kamat) Synopsys DesignWare - Add irq_create_mapping() (Pratyush Anand) - Add header guards (Seungwon Jeon) Miscellaneous - Enable native PCIe services by default on non-ACPI (Andrew Murray) - Cleanup _OSC usage and messages (Bjorn Helgaas) - Remove pcibios_last_bus boot option on non-x86 (Bjorn Helgaas) - Convert bus code to use bus_, drv_, and dev_groups (Greg Kroah-Hartman) - Remove unused pci_mem_start (Myron Stowe) - Make sysfs functions static (Sachin Kamat) - Warn on invalid return from driver probe (Stephen M. Cameron) - Remove Intel Haswell D3 delays (Todd E Brandt) - Call pci_set_master() in core if driver doesn't do it (Yinghai Lu) - Use pci_is_pcie() to simplify code (Yijing Wang) - Use PCIe capability accessors to simplify code (Yijing Wang) - Use cached pci_dev->pcie_cap to simplify code (Yijing Wang) - Removed unused "is_pcie" from struct pci_dev (Yijing Wang) - Simplify sysfs CPU affinity implementation (Yijing Wang)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJSgUzsAAoJEFmIoMA60/r8wmsQAJhwmtkUYR2L4T1g9smAyjJz bLm5zoC6WdywFcbTpTBfsTrS1CHIQG5akRgkEXGdr99epiho5F2lwmagWsUR4ijL 39Qn3knAUMgtNjoVXXI106h/DfTyxSmkZBfih2AQFyWobJq+0kg7hjQQA3+836b4 8ssWr1+NSl6JJTqYQ0Paw1kSqvvYoXsu5rWFEfCHk8D0s/1bvr5ldAUpk2jTg93I uo9/5+O264yt1YoKZOMqAMZLUfd5DaWY1mV3yeF0Uauy1pBmol5csE8ckqJPDrES PRdJT1+PhBeLYWcgXANOBZsW58ddxA0pQ5jQV6VJHQWsm5cE82OBpYJf6xUZ2moV o6DZ0KRnCPVA3NllYYR16H+wbMfADwwO83QoA+QTIZJy/WgpDH3Cst+m8KePGqbL uFgDdXSws9Bs1BCFs7bfYzAM3OdkBFnn+ac7JoPXKP5ibgAp9nDlurgK2r90zRnp j15vHMx0mV+e8B8/iwiW5eRtg7NoCHYiNfFy7JalOlsPmYr2KFazBVKclp13Hng7 fe/Jy6X4UhWoQPdqsy4ftvSQb0gm1MClxFJeZ3VAt6LY9j8OP6S/Vdf6lpAL85KR lAQoQzB+lOhTPdXxFY2xgGkITkqPDOQMjPfowYUYFwybqBuG6BHXZPJobL+niBlb Nh+M2WlUUA9Z3V6rWJB6 =CTPk -----END PGP SIGNATURE----- Merge tag 'pci-v3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI changes from Bjorn Helgaas: "Resource management - Fix host bridge window coalescing (Alexey Neyman) - Pass type, width, and prefetchability for window alignment (Wei Yang) PCI device hotplug - Convert acpiphp, acpiphp_ibm to dynamic debug (Lan Tianyu) Power management - Remove pci_pm_complete() (Liu Chuansheng) MSI - Fail initialization if device is not in PCI_D0 (Yijing Wang) MPS (Max Payload Size) - Use pcie_get_mps() and pcie_set_mps() to simplify code (Yijing Wang) - Use pcie_set_readrq() to simplify code (Yijing Wang) - Use cached pci_dev->pcie_mpss to simplify code (Yijing Wang) SR-IOV - Enable upstream bridges even for VFs on virtual buses (Bjorn Helgaas) - Use pci_is_root_bus() to avoid catching virtual buses (Wei Yang) Virtualization - Add x86 MSI masking ops (Konrad Rzeszutek Wilk) Freescale i.MX6 - Support i.MX6 PCIe controller (Sean Cross) - Increase link startup timeout (Marek Vasut) - Probe PCIe in fs_initcall() (Marek Vasut) - Fix imprecise abort handler (Tim Harvey) - Remove redundant of_match_ptr (Sachin Kamat) Renesas R-Car - Support Gen2 internal PCIe controller (Valentine Barshak) Samsung Exynos - Add MSI support (Jingoo Han) - Turn off power when link fails (Jingoo Han) - Add Jingoo Han as maintainer (Jingoo Han) - Add clk_disable_unprepare() on error path (Wei Yongjun) - Remove redundant of_match_ptr (Sachin Kamat) Synopsys DesignWare - Add irq_create_mapping() (Pratyush Anand) - Add header guards (Seungwon Jeon) Miscellaneous - Enable native PCIe services by default on non-ACPI (Andrew Murray) - Cleanup _OSC usage and messages (Bjorn Helgaas) - Remove pcibios_last_bus boot option on non-x86 (Bjorn Helgaas) - Convert bus code to use bus_, drv_, and dev_groups (Greg Kroah-Hartman) - Remove unused pci_mem_start (Myron Stowe) - Make sysfs functions static (Sachin Kamat) - Warn on invalid return from driver probe (Stephen M. Cameron) - Remove Intel Haswell D3 delays (Todd E Brandt) - Call pci_set_master() in core if driver doesn't do it (Yinghai Lu) - Use pci_is_pcie() to simplify code (Yijing Wang) - Use PCIe capability accessors to simplify code (Yijing Wang) - Use cached pci_dev->pcie_cap to simplify code (Yijing Wang) - Removed unused "is_pcie" from struct pci_dev (Yijing Wang) - Simplify sysfs CPU affinity implementation (Yijing Wang)" * tag 'pci-v3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (79 commits) PCI: Enable upstream bridges even for VFs on virtual buses PCI: Add pci_upstream_bridge() PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq() PCI: Warn on driver probe return value greater than zero PCI: Drop warning about drivers that don't use pci_set_master() PCI: Workaround missing pci_set_master in pci drivers powerpc/pci: Use pci_is_pcie() to simplify code [fix] PCI: Update pcie_ports 'auto' behavior for non-ACPI platforms PCI: imx6: Probe the PCIe in fs_initcall() PCI: Add R-Car Gen2 internal PCI support PCI: imx6: Remove redundant of_match_ptr PCI: Report pci_pme_active() kmalloc failure mn10300/PCI: Remove useless pcibios_last_bus frv/PCI: Remove pcibios_last_bus PCI: imx6: Increase link startup timeout PCI: exynos: Remove redundant of_match_ptr PCI: imx6: Fix imprecise abort handler PCI: Fail MSI/MSI-X initialization if device is not in PCI_D0 PCI: imx6: Remove redundant dev_err() in imx6_pcie_probe() x86/PCI: Coalesce multiple overlapping host bridge windows ...	2013-11-14 14:02:00 +09:00
Linus Torvalds	f9300eaaac	ACPI and power management updates for 3.13-rc1 - New power capping framework and the the Intel Running Average Power Limit (RAPL) driver using it from Srinivas Pandruvada and Jacob Pan. - Addition of the in-kernel switching feature to the arm_big_little cpufreq driver from Viresh Kumar and Nicolas Pitre. - cpufreq support for iMac G5 from Aaro Koskinen. - Baytrail processors support for intel_pstate from Dirk Brandewie. - cpufreq support for Midway/ECX-2000 from Mark Langsdorf. - ARM vexpress/TC2 cpufreq support from Sudeep KarkadaNagesha. - ACPI power management support for the I2C and SPI bus types from Mika Westerberg and Lv Zheng. - cpufreq core fixes and cleanups from Viresh Kumar, Srivatsa S Bhat, Stratos Karafotis, Xiaoguang Chen, Lan Tianyu. - cpufreq drivers updates (mostly fixes and cleanups) from Viresh Kumar, Aaro Koskinen, Jungseok Lee, Sudeep KarkadaNagesha, Lukasz Majewski, Manish Badarkhe, Hans-Christian Egtvedt, Evgeny Kapaev. - intel_pstate updates from Dirk Brandewie and Adrian Huang. - ACPICA update to version 20130927 includig fixes and cleanups and some reduction of divergences between the ACPICA code in the kernel and ACPICA upstream in order to improve the automatic ACPICA patch generation process. From Bob Moore, Lv Zheng, Tomasz Nowicki, Naresh Bhat, Bjorn Helgaas, David E Box. - ACPI IPMI driver fixes and cleanups from Lv Zheng. - ACPI hotplug fixes and cleanups from Bjorn Helgaas, Toshi Kani, Zhang Yanfei, Rafael J Wysocki. - Conversion of the ACPI AC driver to the platform bus type and multiple driver fixes and cleanups related to ACPI from Zhang Rui. - ACPI processor driver fixes and cleanups from Hanjun Guo, Jiang Liu, Bartlomiej Zolnierkiewicz, Mathieu Rhéaume, Rafael J Wysocki. - Fixes and cleanups and new blacklist entries related to the ACPI video support from Aaron Lu, Felipe Contreras, Lennart Poettering, Kirill Tkhai. - cpuidle core cleanups from Viresh Kumar and Lorenzo Pieralisi. - cpuidle drivers fixes and cleanups from Daniel Lezcano, Jingoo Han, Bartlomiej Zolnierkiewicz, Prarit Bhargava. - devfreq updates from Sachin Kamat, Dan Carpenter, Manish Badarkhe. - Operation Performance Points (OPP) core updates from Nishanth Menon. - Runtime power management core fix from Rafael J Wysocki and update from Ulf Hansson. - Hibernation fixes from Aaron Lu and Rafael J Wysocki. - Device suspend/resume lockup detection mechanism from Benoit Goby. - Removal of unused proc directories created for various ACPI drivers from Lan Tianyu. - ACPI LPSS driver fix and new device IDs for the ACPI platform scan handler from Heikki Krogerus and Jarkko Nikula. - New ACPI _OSI blacklist entry for Toshiba NB100 from Levente Kurusa. - Assorted fixes and cleanups related to ACPI from Andy Shevchenko, Al Stone, Bartlomiej Zolnierkiewicz, Colin Ian King, Dan Carpenter, Felipe Contreras, Jianguo Wu, Lan Tianyu, Yinghai Lu, Mathias Krause, Liu Chuansheng. - Assorted PM fixes and cleanups from Andy Shevchenko, Thierry Reding, Jean-Christophe Plagniol-Villard. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABCAAGBQJSfPKLAAoJEILEb/54YlRxH6YQAJwDKi25RCZziFSIenXuqzC/ c6JxoH/tSnDHJHhcTgqh7H7Raa+zmatMDf0m2oEv2Wjfx4Lt4BQK4iefhe/zY4lX yJ8uXDg+U8DYhDX2XwbwnFpd1M1k/A+s2gIHDTHHGnE0kDngXdd8RAFFktBmooTZ l5LBQvOrTlgX/ZfqI/MNmQ6lfY6kbCABGSHV1tUUsDA6Kkvk/LAUTOMSmptv1q22 hcs6k55vR34qADPkUX5GghjmcYJv+gNtvbDEJUjcmCwVoPWouF415m7R5lJ8w3/M 49Q8Tbu5HELWLwca64OorS8qh/P7sgUOf1BX5IDzHnJT+TGeDfvcYbMv2Z275/WZ /bqhuLuKBpsHQ2wvEeT+lYV3FlifKeTf1FBxER3ApjzI3GfpmVVQ+dpEu8e9hcTh ZTPGzziGtoIsHQ0unxb+zQOyt1PmIk+cU4IsKazs5U20zsVDMcKzPrb19Od49vMX gCHvRzNyOTqKWpE83Ss4NGOVPAG02AXiXi/BpuYBHKDy6fTH/liKiCw5xlCDEtmt lQrEbupKpc/dhCLo5ws6w7MZzjWJs2eSEQcNR4DlR++pxIpYOOeoPTXXrghgZt2X mmxZI2qsJ7GAvPzII8OBeF3CRO3fabZ6Nez+M+oEZjGe05ZtpB3ccw410HwieqBn dYpJFt/BHK189odhV9CM =JCxk -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael J Wysocki: - New power capping framework and the the Intel Running Average Power Limit (RAPL) driver using it from Srinivas Pandruvada and Jacob Pan. - Addition of the in-kernel switching feature to the arm_big_little cpufreq driver from Viresh Kumar and Nicolas Pitre. - cpufreq support for iMac G5 from Aaro Koskinen. - Baytrail processors support for intel_pstate from Dirk Brandewie. - cpufreq support for Midway/ECX-2000 from Mark Langsdorf. - ARM vexpress/TC2 cpufreq support from Sudeep KarkadaNagesha. - ACPI power management support for the I2C and SPI bus types from Mika Westerberg and Lv Zheng. - cpufreq core fixes and cleanups from Viresh Kumar, Srivatsa S Bhat, Stratos Karafotis, Xiaoguang Chen, Lan Tianyu. - cpufreq drivers updates (mostly fixes and cleanups) from Viresh Kumar, Aaro Koskinen, Jungseok Lee, Sudeep KarkadaNagesha, Lukasz Majewski, Manish Badarkhe, Hans-Christian Egtvedt, Evgeny Kapaev. - intel_pstate updates from Dirk Brandewie and Adrian Huang. - ACPICA update to version 20130927 includig fixes and cleanups and some reduction of divergences between the ACPICA code in the kernel and ACPICA upstream in order to improve the automatic ACPICA patch generation process. From Bob Moore, Lv Zheng, Tomasz Nowicki, Naresh Bhat, Bjorn Helgaas, David E Box. - ACPI IPMI driver fixes and cleanups from Lv Zheng. - ACPI hotplug fixes and cleanups from Bjorn Helgaas, Toshi Kani, Zhang Yanfei, Rafael J Wysocki. - Conversion of the ACPI AC driver to the platform bus type and multiple driver fixes and cleanups related to ACPI from Zhang Rui. - ACPI processor driver fixes and cleanups from Hanjun Guo, Jiang Liu, Bartlomiej Zolnierkiewicz, Mathieu Rhéaume, Rafael J Wysocki. - Fixes and cleanups and new blacklist entries related to the ACPI video support from Aaron Lu, Felipe Contreras, Lennart Poettering, Kirill Tkhai. - cpuidle core cleanups from Viresh Kumar and Lorenzo Pieralisi. - cpuidle drivers fixes and cleanups from Daniel Lezcano, Jingoo Han, Bartlomiej Zolnierkiewicz, Prarit Bhargava. - devfreq updates from Sachin Kamat, Dan Carpenter, Manish Badarkhe. - Operation Performance Points (OPP) core updates from Nishanth Menon. - Runtime power management core fix from Rafael J Wysocki and update from Ulf Hansson. - Hibernation fixes from Aaron Lu and Rafael J Wysocki. - Device suspend/resume lockup detection mechanism from Benoit Goby. - Removal of unused proc directories created for various ACPI drivers from Lan Tianyu. - ACPI LPSS driver fix and new device IDs for the ACPI platform scan handler from Heikki Krogerus and Jarkko Nikula. - New ACPI _OSI blacklist entry for Toshiba NB100 from Levente Kurusa. - Assorted fixes and cleanups related to ACPI from Andy Shevchenko, Al Stone, Bartlomiej Zolnierkiewicz, Colin Ian King, Dan Carpenter, Felipe Contreras, Jianguo Wu, Lan Tianyu, Yinghai Lu, Mathias Krause, Liu Chuansheng. - Assorted PM fixes and cleanups from Andy Shevchenko, Thierry Reding, Jean-Christophe Plagniol-Villard. * tag 'pm+acpi-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (386 commits) cpufreq: conservative: fix requested_freq reduction issue ACPI / hotplug: Consolidate deferred execution of ACPI hotplug routines PM / runtime: Use pm_runtime_put_sync() in __device_release_driver() ACPI / event: remove unneeded NULL pointer check Revert "ACPI / video: Ignore BIOS initial backlight value for HP 250 G1" ACPI / video: Quirk initial backlight level 0 ACPI / video: Fix initial level validity test intel_pstate: skip the driver if ACPI has power mgmt option PM / hibernate: Avoid overflow in hibernate_preallocate_memory() ACPI / hotplug: Do not execute "insert in progress" _OST ACPI / hotplug: Carry out PCI root eject directly ACPI / hotplug: Merge device hot-removal routines ACPI / hotplug: Make acpi_bus_hot_remove_device() internal ACPI / hotplug: Simplify device ejection routines ACPI / hotplug: Fix handle_root_bridge_removal() ACPI / hotplug: Refuse to hot-remove all objects with disabled hotplug ACPI / scan: Start matching drivers after trying scan handlers ACPI: Remove acpi_pci_slot_init() headers from internal.h ACPI / blacklist: fix name of ThinkPad Edge E530 PowerCap: Fix build error with option -Werror=format-security ... Conflicts: arch/arm/mach-omap2/opp.c drivers/Kconfig drivers/spi/spi.c	2013-11-14 13:41:48 +09:00
Thomas Gleixner	00d1a39e69	preempt: Make PREEMPT_ACTIVE generic No point in having this bit defined by architecture. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130917183629.090698799@linutronix.de	2013-11-13 20:21:47 +01:00
Linus Torvalds	5cbb3d216e	Merge branch 'akpm' (patches from Andrew Morton) Merge first patch-bomb from Andrew Morton: "Quite a lot of other stuff is banked up awaiting further next->mainline merging, but this batch contains: - Lots of random misc patches - OCFS2 - Most of MM - backlight updates - lib/ updates - printk updates - checkpatch updates - epoll tweaking - rtc updates - hfs - hfsplus - documentation - procfs - update gcov to gcc-4.7 format - IPC" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (269 commits) ipc, msg: fix message length check for negative values ipc/util.c: remove unnecessary work pending test devpts: plug the memory leak in kill_sb ./Makefile: export initial ramdisk compression config option init/Kconfig: add option to disable kernel compression drivers: w1: make w1_slave::flags long to avoid memory corruption drivers/w1/masters/ds1wm.cuse dev_get_platdata() drivers/memstick/core/ms_block.c: fix unreachable state in h_msb_read_page() drivers/memstick/core/mspro_block.c: fix attributes array allocation drivers/pps/clients/pps-gpio.c: remove redundant of_match_ptr kernel/panic.c: reduce 1 byte usage for print tainted buffer gcov: reuse kbasename helper kernel/gcov/fs.c: use pr_warn() kernel/module.c: use pr_foo() gcov: compile specific gcov implementation based on gcc version gcov: add support for gcc 4.7 gcov format gcov: move gcov structs definitions to a gcc version specific file kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener() kernel/taskstats.c: add nla_nest_cancel() for failure processing between nla_nest_start() and nla_nest_end() kernel/sysctl_binary.c: use scnprintf() instead of snprintf() ...	2013-11-13 15:45:43 +09:00
Linus Torvalds	c08acff054	Merge branch 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu changes from Tejun Heo: "Two smallish changes for percpu. Two patches to remove unused this_cpu_xor() and one to fix a bug in percpu init failure path so that it can reach the proper BUG() instead of oopsing earlier" * 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: x86: remove this_cpu_xor() implementation percpu: remove this_cpu_xor() implementation percpu: fix bootmem error handling in pcpu_page_first_chunk()	2013-11-13 15:17:16 +09:00
Vineet Gupta	c375f15a43	x86: move fpu_counter into ARCH specific thread_struct Only a couple of arches (sh/x86) use fpu_counter in task_struct so it can be moved out into ARCH specific thread_struct, reducing the size of task_struct for other arches. Compile tested i386_defconfig + gcc 4.7.3 Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Paul Mundt <paul.mundt@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:13 +09:00
Len Brown	144b44b135	tools / power turbostat: Support Silvermont Support the next generation Intel Atom processor mirco-architecture, formerly called Silvermont. The server version, formerly called "Avoton", is named the "Intel(R) Atom(TM) Processor C2000 Product Family". The client version, formerly called "Bay Trail", is named the "Intel Atom Processor Z3000 Series", as well as various "Intel Pentium Processor" and "Intel Celeron Processor" brands, depending on form-factor. Silvermont has a set of MSRs not far off from NHM, but the RAPL register set is a sub-set of those previously supported. Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-11-12 23:16:02 +01:00
Jiri Slaby	5f01c98859	x86/dumpstack: Fix printk_address for direct addresses Consider a kernel crash in a module, simulated the following way: static int my_init(void) { char map = (void )0x5; *map = 3; return 0; } module_init(my_init); When we turn off FRAME_POINTERs, the very first instruction in that function causes a BUG. The problem is that we print IP in the BUG report using %pB (from printk_address). And %pB decrements the pointer by one to fix printing addresses of functions with tail calls. This was added in commit `71f9e59800` ("x86, dumpstack: Use %pB format specifier for stack trace") to fix the call stack printouts. So instead of correct output: BUG: unable to handle kernel NULL pointer dereference at 0000000000000005 IP: [<ffffffffa01ac000>] my_init+0x0/0x10 [pb173] We get: BUG: unable to handle kernel NULL pointer dereference at 0000000000000005 IP: [<ffffffffa0152000>] 0xffffffffa0151fff To fix that, we use %pS only for stack addresses printouts (via newly added printk_stack_address) and %pB for regs->ip (via printk_address). I.e. we revert to the old behaviour for all except call stacks. And since from all those reliable is 1, we remove that parameter from printk_address. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: joe@perches.com Cc: jirislaby@gmail.com Link: http://lkml.kernel.org/r/1382706418-8435-1-git-send-email-jslaby@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-11-12 21:06:06 +01:00
Linus Torvalds	10d0c9705e	DeviceTree updates for 3.13. This is a bit larger pull request than usual for this cycle with lots of clean-up. - Cross arch clean-up and consolidation of early DT scanning code. - Clean-up and removal of arch prom.h headers. Makes arch specific prom.h optional on all but Sparc. - Addition of interrupts-extended property for devices connected to multiple interrupt controllers. - Refactoring of DT interrupt parsing code in preparation for deferred probe of interrupts. - ARM cpu and cpu topology bindings documentation. - Various DT vendor binding documentation updates. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJSgPQ4AAoJEMhvYp4jgsXif28H/1WkrXq5+lCFQZF8nbYdE2h0 R8PsfiJJmAl6/wFgQTsRel+ScMk2hiP08uTyqf2RLnB1v87gCF7MKVaLOdONfUDi huXbcQGWCmZv0tbBIklxJe3+X3FIJch4gnyUvPudD1m8a0R0LxWXH/NhdTSFyB20 PNjhN/IzoN40X1PSAhfB5ndWnoxXBoehV/IVHVDU42vkPVbVTyGAw5qJzHW8CLyN 2oGTOalOO4ffQ7dIkBEQfj0mrgGcODToPdDvUQyyGZjYK2FY2sGrjyquir6SDcNa Q4gwatHTu0ygXpyphjtQf5tc3ZCejJ/F0s3olOAS1ahKGfe01fehtwPRROQnCK8= =GCbY -----END PGP SIGNATURE----- Merge tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree updates from Rob Herring: "DeviceTree updates for 3.13. This is a bit larger pull request than usual for this cycle with lots of clean-up. - Cross arch clean-up and consolidation of early DT scanning code. - Clean-up and removal of arch prom.h headers. Makes arch specific prom.h optional on all but Sparc. - Addition of interrupts-extended property for devices connected to multiple interrupt controllers. - Refactoring of DT interrupt parsing code in preparation for deferred probe of interrupts. - ARM cpu and cpu topology bindings documentation. - Various DT vendor binding documentation updates" * tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits) powerpc: add missing explicit OF includes for ppc dt/irq: add empty of_irq_count for !OF_IRQ dt: disable self-tests for !OF_IRQ of: irq: Fix interrupt-map entry matching MIPS: Netlogic: replace early_init_devtree() call of: Add Panasonic Corporation vendor prefix of: Add Chunghwa Picture Tubes Ltd. vendor prefix of: Add AU Optronics Corporation vendor prefix of/irq: Fix potential buffer overflow of/irq: Fix bug in interrupt parsing refactor. of: set dma_mask to point to coherent_dma_mask of: add vendor prefix for PHYTEC Messtechnik GmbH DT: sort vendor-prefixes.txt of: Add vendor prefix for Cadence of: Add empty for_each_available_child_of_node() macro definition arm/versatile: Fix versatile irq specifications. of/irq: create interrupts-extended property microblaze/pci: Drop PowerPC-ism from irq parsing of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code. of/irq: Use irq_of_parse_and_map() ...	2013-11-12 16:52:17 +09:00
Linus Torvalds	9b66bfb280	Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 UV debug changes from Ingo Molnar: "Various SGI UV debuggability improvements, amongst them KDB support, with related core KDB enabling patches changing kernel/debug/kdb/" * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "x86/UV: Add uvtrace support" x86/UV: Add call to KGDB/KDB from NMI handler kdb: Add support for external NMI handler to call KGDB/KDB x86/UV: Check for alloc_cpumask_var() failures properly in uv_nmi_setup() x86/UV: Add uvtrace support x86/UV: Add kdump to UV NMI handler x86/UV: Add summary of cpu activity to UV NMI handler x86/UV: Update UV support for external NMI signals x86/UV: Move NMI support	2013-11-12 12:01:14 +09:00
Linus Torvalds	c2136301e4	Merge branch 'x86-uaccess-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 uaccess changes from Ingo Molnar: "A single change that micro-optimizes __copy__user_inatomic(), used by the futex code" 'x86-uaccess-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Add 1/2/4/8 byte optimization to 64bit __copy_{from,to}_user_inatomic	2013-11-12 11:46:06 +09:00
Linus Torvalds	340286cd4e	Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: "The biggest change adds support for Intel 'CPER' (UEFI Common Platform Error Record) error logging, which builds upon an enhanced error logging mechanism available on Xeon processors. Full description is here: http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html This change provides a module (and support code) to check for an extended error log and prints extra details about the error on the console" * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ACPI, x86: Fix extended error log driver to depend on CONFIG_X86_LOCAL_APIC dmi: Avoid unaligned memory access in save_mem_devices() Move cper.c from drivers/acpi/apei to drivers/firmware/efi EDAC, GHES: Update ghes error record info ACPI, APEI, CPER: Cleanup CPER memory error output format ACPI, APEI, CPER: Enhance memory reporting capability ACPI, APEI, CPER: Add UEFI 2.4 support for memory error DMI: Parse memory device (type 17) in SMBIOS ACPI, x86: Extended error log driver for x86 platform bitops: Introduce a more generic BITMASK macro ACPI, CPER: Update cper info ACPI, APEI, CPER: Fix status check during error printing	2013-11-12 11:16:44 +09:00
Linus Torvalds	dba538ff56	Merge branch 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/intel-mid changes from Ingo Molnar: "Update the 'intel mid' (mobile internet device) platform code as Intel is rolling out more SoC designs. This gets rid of most of the 'MRST' platform code in the process, mostly by renaming and shuffling code around into their respective 'intel-mid' platform drivers" * 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, intel-mid: Do not re-introduce usage of obsolete __cpuinit intel_mid: Move platform device setups to their own platform_<device>.* files x86: intel-mid: Add section for sfi device table intel-mid: sfi: Allow struct devs_id.get_platform_data to be NULL intel_mid: Moved SFI related code to sfi.c intel_mid: Added custom handler for ipc devices intel_mid: Added custom device_handler support intel_mid: Refactored sfi_parse_devs() function intel_mid: Renamed mrst to intel_mid pci: intel_mid: Return true/false in function returning bool intel_mid: Renamed mrst to intel_mid mrst: Fixed indentation issues mrst: Fixed printk/pr_* related issues	2013-11-12 11:12:22 +09:00
Linus Torvalds	2dc1733fd4	Merge branch 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/hyperv changes from Ingo Molnar: "These changes enable Linux guests to boot as 'Modern VM' guest kernels on MS-Hyperv hosts" * 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, hyperv: Move a variable to avoid an unused variable warning x86, hyperv: Fix build error due to missing <asm/apic.h> include x86, hyperv: Correctly guard the local APIC calibration code x86, hyperv: Get the local APIC timer frequency from the hypervisor	2013-11-12 11:07:49 +09:00
Linus Torvalds	69019d77c7	Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 EFI changes from Ingo Molnar: "Main changes: - Add support for earlyprintk=efi which uses the EFI framebuffer. Very useful for debugging boot problems. - EFI stub support for large memory maps (more than 128 entries) - EFI ARM support - this was mostly done by generalizing x86 <-> ARM platform differences, such as by moving x86 EFI code into drivers/firmware/efi/ and sharing it with ARM. - Documentation updates - misc fixes" * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits) x86/efi: Add EFI framebuffer earlyprintk support boot, efi: Remove redundant memset() x86/efi: Fix config_table_type array termination x86 efi: bugfix interrupt disabling sequence x86: EFI stub support for large memory maps efi: resolve warnings found on ARM compile efi: Fix types in EFI calls to match EFI function definitions. efi: Renames in handle_cmdline_files() to complete generalization. efi: Generalize handle_ramdisks() and rename to handle_cmdline_files(). efi: Allow efi_free() to be called with size of 0 efi: use efi_get_memory_map() to get final map for x86 efi: generalize efi_get_memory_map() efi: Rename __get_map() to efi_get_memory_map() efi: Move unicode to ASCII conversion to shared function. efi: Generalize relocate_kernel() for use by other architectures. efi: Move relocate_kernel() to shared file. efi: Enforce minimum alignment of 1 page on allocations. efi: Rename memory allocation/free functions efi: Add system table pointer argument to shared functions. efi: Move common EFI stub code from x86 arch code to common location ...	2013-11-12 10:48:30 +09:00
Linus Torvalds	014d595c23	Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot changes from Ingo Molnar: "Two changes that prettify and compactify the SMP bootup output from: smpboot: Booting Node 0, Processors #1 #2 #3 OK smpboot: Booting Node 1, Processors #4 #5 #6 #7 OK smpboot: Booting Node 2, Processors #8 #9 #10 #11 OK smpboot: Booting Node 3, Processors #12 #13 #14 #15 OK Brought up 16 CPUs to something like: x86: Booting SMP configuration: .... node #0, CPUs: #1 #2 #3 .... node #1, CPUs: #4 #5 #6 #7 .... node #2, CPUs: #8 #9 #10 #11 .... node #3, CPUs: #12 #13 #14 #15 x86: Booted up 4 nodes, 16 CPUs" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Further compress CPUs bootup message x86: Improve the printout of the SMP bootup CPU table	2013-11-12 10:41:10 +09:00
Linus Torvalds	ae795fe760	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 user access changes from Ingo Molnar: "This tree contains two copy_[from/to]_user() build time checking changes/enhancements from Jan Beulich. The desired outcome is to get better compiler warnings with CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y, to keep people from introducing bugs such as overflows and information leaks" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Unify copy_to_user() and add size checking to it x86: Unify copy_from_user() size checking	2013-11-12 10:39:08 +09:00
Linus Torvalds	39cf275a1a	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The main changes in this cycle are: - (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik van Riel, Peter Zijlstra et al. Yay! - optimize preemption counter handling: merge the NEED_RESCHED flag into the preempt_count variable, by Peter Zijlstra. - wait.h fixes and code reorganization from Peter Zijlstra - cfs_bandwidth fixes from Ben Segall - SMP load-balancer cleanups from Peter Zijstra - idle balancer improvements from Jason Low - other fixes and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits) ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED stop_machine: Fix race between stop_two_cpus() and stop_cpus() sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus sched: Fix asymmetric scheduling for POWER7 sched: Move completion code from core.c to completion.c sched: Move wait code from core.c to wait.c sched: Move wait.c into kernel/sched/ sched/wait: Fix __wait_event_interruptible_lock_irq_timeout() sched: Avoid throttle_cfs_rq() racing with period_timer stopping sched: Guarantee new group-entities always have weight sched: Fix hrtimer_cancel()/rq->lock deadlock sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining sched: Fix race on toggling cfs_bandwidth_used sched: Remove extra put_online_cpus() inside sched_setaffinity() sched/rt: Fix task_tick_rt() comment sched/wait: Fix build breakage sched/wait: Introduce prepare_to_wait_event() sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too sched: Remove get_online_cpus() usage sched: Fix race in migrate_swap_stop() ...	2013-11-12 10:20:12 +09:00
Ingo Molnar	b5dfcb09de	Revert "x86/UV: Add uvtrace support" This reverts commit `8eba18428a`. uv_trace() is not used by anything, nor is uv_trace_nmi_func, nor uv_trace_func. That's not how we do instrumentation code in the kernel: we add tracepoints, printk()s, etc. so that everyone not just those with magic kernel modules can debug a system. So remove this unused (and misguied) piece of code. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Mike Travis <travis@sgi.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Hedi Berriche <hedi@sgi.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Jason Wessel <jason.wessel@windriver.com> Link: http://lkml.kernel.org/n/tip-tumfBffmr4jmnt8Gyxanoblg@git.kernel.org	2013-11-11 19:53:42 +01:00
H. Peter Anvin	a4f61dec55	x86, trace: Change user\|kernel_page_fault to page_fault_user\|kernel Tracepoints are named hierachially, and it makes more sense to keep a general flow of information level from general to specific from left to right, i.e. x86_exceptions.page_fault_user\|kernel rather than x86_exceptions.user\|kernel_page_fault Suggested-by: Ingo Molnar <mingo@kernel.org> Acked-by: Seiji Aguchi <seiji.aguchi@hds.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/20131111082955.GB12405@gmail.com	2013-11-11 08:15:40 -08:00
Seiji Aguchi	d34603b07c	x86, trace: Add page fault tracepoints This patch introduces page fault tracepoints to x86 architecture by switching IDT. Two events, for user and kernel spaces, are introduced at the beginning of page fault handler for tracing. - User space event There is a request of page fault event for user space as below. https://lkml.kernel.org/r/1368079520-11015-2-git-send-email-fdeslaur+()+gmail+!+com https://lkml.kernel.org/r/1368079520-11015-1-git-send-email-fdeslaur+()+gmail+!+com - Kernel space event: When we measure an overhead in kernel space for investigating performance issues, we can check if it comes from the page fault events. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/52716E67.6090705@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-11-08 14:15:49 -08:00
Seiji Aguchi	ac7956e269	x86, trace: Delete __trace_alloc_intr_gate() Currently irq vector handlers for tracing are registered in both set_intr_gate() and __trace_alloc_intr_gate() in alloc_intr_gate(). But, we don't need to do that twice. So, let's delete __trace_alloc_intr_gate(). Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/52716E1B.7090205@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-11-08 14:15:47 -08:00
Seiji Aguchi	25c74b10ba	x86, trace: Register exception handler to trace IDT This patch registers exception handlers for tracing to a trace IDT. To implemented it in set_intr_gate(), this patch does followings. - Register the exception handlers to the trace IDT by prepending "trace_" to the handler's names. - Also, newly introduce trace_page_fault() to add tracepoints in a subsequent patch. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/52716DEC.5050204@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-11-08 14:15:45 -08:00
Seiji Aguchi	959c071f09	x86, trace: Remove __alloc_intr_gate() Prepare to move set_intr_gate() into a macro by removing __alloc_intr_gate(). The purpose is to avoid failing a kernel build after applying a subsequent patch which changes set_intr_gate() into a macro. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/52716DB8.1080702@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-11-08 14:15:44 -08:00
Rob Herring	b5480950c6	Merge remote-tracking branch 'grant/devicetree/next' into for-next	2013-11-07 10:34:46 -06:00
Konrad Rzeszutek Wilk	0e4ccb1505	PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq() Certain platforms do not allow writes in the MSI-X BARs to setup or tear down vector values. To combat against the generic code trying to write to that and either silently being ignored or crashing due to the pagetables being marked R/O this patch introduces a platform override. Note that we keep two separate, non-weak, functions default_mask_msi_irqs() and default_mask_msix_irqs() for the behavior of the arch_mask_msi_irqs() and arch_mask_msix_irqs(), as the default behavior is needed by x86 PCI code. For Xen, which does not allow the guest to write to MSI-X tables - as the hypervisor is solely responsible for setting the vector values - we implement two nops. This fixes a Xen guest crash when passing a PCI device with MSI-X to the guest. See the bugzilla for more details. [bhelgaas: add bugzilla info] Reference: https://bugzilla.kernel.org/show_bug.cgi?id=64581 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> CC: Zhenzhong Duan <zhenzhong.duan@oracle.com>	2013-11-06 16:32:19 -07:00
Oleg Nesterov	8a8de66c4f	uprobes: Introduce arch_uprobe->ixol Currently xol_get_insn_slot() assumes that we should simply copy arch_uprobe->insn[] which is (ignoring arch_uprobe_analyze_insn) just the copy of the original insn. This is not true for arm which needs to create another insn to execute it out-of-line. So this patch simply adds the new member, ->ixol into the union. This doesn't make any difference for x86 and powerpc, but arm can divorce insn/ixol and initialize the correct xol insn in arch_uprobe_analyze_insn(). Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2013-11-06 20:00:05 +01:00
David A. Long	3820b4d278	uprobes: Move function declarations out of arch Move the function declarations from the arch headers to the common header, since only the function bodies are architecture-specific. These changes are from Vincent Rabin's uprobes patch. [ oleg: update arch/powerpc/include/asm/uprobes.h ] Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: David A. Long <dave.long@linaro.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2013-11-06 19:59:37 +01:00
Josh Triplett	a890b6fefd	kvm: Delete prototype for non-existent function kvm_check_iopl The prototype for kvm_check_iopl appeared in commit `f850e2e603` ("KVM: x86 emulator: Check IOPL level during io instruction emulation"), but the function never actually existed. Remove the prototype. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-11-06 11:07:09 +02:00
Josh Triplett	35a5121b58	kvm: Delete prototype for non-existent function complete_pio complete_pio ceased to exist in commit `7972995b0c` ("KVM: x86 emulator: Move string pio emulation into emulator.c"), but the prototype remained. Remove its prototype. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-11-06 11:06:32 +02:00
Marcelo Tosatti	d63285e94a	pvclock: detect watchdog reset at pvclock read Implement reset of kernel watchdogs at pvclock read time. This avoids adding special code to every watchdog. This is possible for watchdogs which measure time based on sched_clock() or ktime_get() variants. Suggested by Don Zickus. Acked-by: Don Zickus <dzickus@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-11-06 09:48:43 +02:00
Borislav Petkov	d2f7cbe7b2	x86/efi: Runtime services virtual mapping We map the EFI regions needed for runtime services non-contiguously, with preserved alignment on virtual addresses starting from -4G down for a total max space of 64G. This way, we provide for stable runtime services addresses across kernels so that a kexec'd kernel can still use them. Thus, they're mapped in a separate pagetable so that we don't pollute the kernel namespace. Add an efi= kernel command line parameter for passing miscellaneous options and chicken bits from the command line. While at it, add a chicken bit called "efi=old_map" which can be used as a fallback to the old runtime services mapping method in case there's some b0rkage with a particular EFI implementation (haha, it is hard to hold up the sarcasm here...). Also, add the UEFI RT VA space to Documentation/x86/x86_64/mm.txt. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2013-11-02 11:09:36 +00:00
Ingo Molnar	fb10d5b7ef	Merge branch 'linus' into sched/core Resolve cherry-picking conflicts: Conflicts: mm/huge_memory.c mm/memory.c mm/mprotect.c See this upstream merge commit for more details: `52469b4fcd` Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-11-01 08:24:41 +01:00
Greg Thelen	bd09d9a351	percpu: fix this_cpu_sub() subtrahend casting for unsigneds this_cpu_sub() is implemented as negation and addition. This patch casts the adjustment to the counter type before negation to sign extend the adjustment. This helps in cases where the counter type is wider than an unsigned adjustment. An alternative to this patch is to declare such operations unsupported, but it seemed useful to avoid surprises. This patch specifically helps the following example: unsigned int delta = 1 preempt_disable() this_cpu_write(long_counter, 0) this_cpu_sub(long_counter, delta) preempt_enable() Before this change long_counter on a 64 bit machine ends with value 0xffffffff, rather than 0xffffffffffffffff. This is because this_cpu_sub(pcp, delta) boils down to this_cpu_add(pcp, -delta), which is basically: long_counter = 0 + 0xffffffff Also apply the same cast to: __this_cpu_sub() __this_cpu_sub_return() this_cpu_sub_return() All percpu_test.ko passes, especially the following cases which previously failed: l -= ui_one; __this_cpu_sub(long_counter, ui_one); CHECK(l, long_counter, -1); l -= ui_one; this_cpu_sub(long_counter, ui_one); CHECK(l, long_counter, -1); CHECK(l, long_counter, 0xffffffffffffffff); ul -= ui_one; __this_cpu_sub(ulong_counter, ui_one); CHECK(ul, ulong_counter, -1); CHECK(ul, ulong_counter, 0xffffffffffffffff); ul = this_cpu_sub_return(ulong_counter, ui_one); CHECK(ul, ulong_counter, 2); ul = __this_cpu_sub_return(ulong_counter, ui_one); CHECK(ul, ulong_counter, 1); Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-10-30 14:27:03 -07:00
Alex Williamson	e0f0bbc527	kvm: Create non-coherent DMA registeration We currently use some ad-hoc arch variables tied to legacy KVM device assignment to manage emulation of instructions that depend on whether non-coherent DMA is present. Create an interface for this, adapting legacy KVM device assignment and adding VFIO via the KVM-VFIO device. For now we assume that non-coherent DMA is possible any time we have a VFIO group. Eventually an interface can be developed as part of the VFIO external user interface to query the coherency of a group. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-30 19:02:23 +01:00
Alex Williamson	d96eb2c6f4	kvm/x86: Convert iommu_flags to iommu_noncoherent Default to operating in coherent mode. This simplifies the logic when we switch to a model of registering and unregistering noncoherent I/O with KVM. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-30 19:02:13 +01:00
Borislav Petkov	b51e974fcd	kvm, emulator: Rename VendorSpecific flag Call it EmulateOnUD which is exactly what we're trying to do with vendor-specific instructions. Rename ->only_vendor_specific_insn to something shorter, while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-30 18:54:40 +01:00
Borislav Petkov	1ce19dc16c	kvm, emulator: Use opcode length Add a field to the current emulation context which contains the instruction opcode length. This will streamline handling of opcodes of different length. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-30 18:54:39 +01:00
Borislav Petkov	9c15bb1d0a	kvm: Add KVM_GET_EMULATED_CPUID Add a kvm ioctl which states which system functionality kvm emulates. The format used is that of CPUID and we return the corresponding CPUID bits set for which we do emulate functionality. Make sure ->padding is being passed on clean from userspace so that we can use it for something in the future, after the ioctl gets cast in stone. s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-30 18:54:39 +01:00
Matt Fleming	72548e836b	x86/efi: Add EFI framebuffer earlyprintk support It's incredibly difficult to diagnose early EFI boot issues without special hardware because earlyprintk=vga doesn't work on EFI systems. Add support for writing to the EFI framebuffer, via earlyprintk=efi, which will actually give users a chance of providing debug output. Cc: H. Peter Anvin <hpa@zytor.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Jones <pjones@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2013-10-28 18:09:58 +00:00
Rafael J. Wysocki	ce6bceabae	Merge branch 'powercap' * powercap: PowerCap: Convert class code to use dev_groups PowerCap: Introduce Intel RAPL power capping driver bitops: Introduce BIT_ULL x86 / msr: add 64bit _on_cpu access functions PowerCap: Add to drivers Kconfig and Makefile PowerCap: Add class driver PowerCap: Documentation	2013-10-28 01:21:49 +01:00
Rafael J. Wysocki	3f66c315f5	Merge branch 'acpi-tables' * acpi-tables: ACPI / x86: Increase override tables number limit	2013-10-28 01:13:29 +01:00
Rafael J. Wysocki	3fbc4d6374	Merge branch 'acpi-processor' * acpi-processor: ACPI / processor: fixed a brace coding style issue ACPI / processor: Remove outdated comments ACPI / processor: remove unnecessary if (!pr) check ACPI / processor: remove some dead code in acpi_processor_get_info() x86 / ACPI: simplify _acpi_map_lsapic() ACPI / processor: use apic_id and remove duplicated _MAT evaluation ACPI / processor: Introduce apic_id in struct processor to save parsed APIC id	2013-10-28 01:11:24 +01:00
Heiko Carstens	90f2492cf9	x86: remove this_cpu_xor() implementation Remove the unused x86 implementation of this_cpu_xor(). Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-10-27 09:03:46 -04:00
Jan Beulich	7a3d9b0f3a	x86: Unify copy_to_user() and add size checking to it Similarly to copy_from_user(), where the range check is to protect against kernel memory corruption, copy_to_user() can benefit from such checking too: Here it protects against kernel information leaks. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: <arjan@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/5265059502000078000FC4F6@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com>	2013-10-26 12:27:37 +02:00
Jan Beulich	3df7b41aa5	x86: Unify copy_from_user() size checking Commits `4a31276930` ("x86: Turn the copy_from_user check into an (optional) compile time warning") and `63312b6a6f` ("x86: Add a Kconfig option to turn the copy_from_user warnings into errors") touched only the 32-bit variant of copy_from_user(), whereas the original commit `9f0cf4adb6` ("x86: Use __builtin_object_size() to validate the buffer size for copy_from_user()") also added the same code to the 64-bit one. Further the earlier conversion from an inline WARN() to the call to copy_from_user_overflow() went a little too far: When the number of bytes to be copied is not a constant (e.g. [looking at 3.11] in drivers/net/tun.c:__tun_chr_ioctl() or drivers/pci/pcie/aer/aer_inject.c:aer_inject_write()), the compiler will always have to keep the funtion call, and hence there will always be a warning. By using __builtin_constant_p() we can avoid this. And then this slightly extends the effect of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS in that apart from converting warnings to errors in the constant size case, it retains the (possibly wrong) warnings in the non-constant size case, such that if someone is prepared to get a few false positives, (s)he'll be able to recover the current behavior (except that these diagnostics now will never be converted to errors). Since the 32-bit variant (intentionally) didn't call might_fault(), the unification results in this being called twice now. Adding a suitable #ifdef would be the alternative if that's a problem. I'd like to point out though that with __compiletime_object_size() being restricted to gcc before 4.6, the whole construct is going to become more and more pointless going forward. I would question however that commit `2fb0815c9e` ("gcc4: disable __compiletime_object_size for GCC 4.6+") was really necessary, and instead this should have been dealt with as is done here from the beginning. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5265056D02000078000FC4F3@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-26 12:27:36 +02:00
Stefano Stabellini	7100b077ab	xen: introduce xen_dma_map/unmap_page and xen_dma_sync_single_for_cpu/device Introduce xen_dma_map_page, xen_dma_unmap_page, xen_dma_sync_single_for_cpu and xen_dma_sync_single_for_device. They have empty implementations on x86 and ia64 but they call the corresponding platform dma_ops function on arm and arm64. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Changes in v9: - xen_dma_map_page return void, avoid page_to_phys.	2013-10-25 10:39:49 +00:00
Chen, Gong	4b3db708b1	ACPI, x86: Extended error log driver for x86 platform This H/W error log driver (a.k.a eMCA driver) is implemented based on http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html After errors are captured, more detailed platform specific information can be got via this new enhanced H/W error log driver. Most notably we can track memory errors back to the DIMM slot silk screen label. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-10-23 10:09:07 -07:00
David Cohen	40a96d54ee	intel_mid: Move platform device setups to their own platform_<device>.* files As Intel rolling out more SoC's after Moorestown, we need to re-structure the code in a way that is backward compatible and easy to expand. This patch implements a flexible way to support multiple boards and devices. This patch does not add any new functional support. It just refactors the existing code to increase the modularity and decrease the code duplication for supporting multiple soc's and boards. Currently intel-mid.c has both board and soc related code in one file. This patch moves the board related code to new files and let linker script to create SFI devite table following this: 1. Move the SFI device specific code to arch/x86/platform/intel-mid/device-libs/platform_<device>.* A new device file is added for every supported device. This code will get conditionally compiled by using corresponding device driver CONFIG option. 2. Move the device_ids location to .x86_intel_mid_dev.init section by using new sfi_device() macro. This patch was based on previous code from Sathyanarayanan Kuppuswamy. Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: http://lkml.kernel.org/r/1382049336-21316-13-git-send-email-david.a.cohen@linux.intel.com Signed-off-by: David Cohen <david.a.cohen@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-10-17 16:41:50 -07:00
Kuppuswamy Sathyanarayanan	aeedb370e7	intel_mid: Moved SFI related code to sfi.c Moved SFI specific parsing/handling code to sfi.c. This will enable us to reuse our intel-mid code for platforms that supports firmware interfaces other than SFI (like ACPI). Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: http://lkml.kernel.org/r/1382049336-21316-10-git-send-email-david.a.cohen@linux.intel.com Signed-off-by: David Cohen <david.a.cohen@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-10-17 16:40:51 -07:00
Kuppuswamy Sathyanarayanan	49c72a0a8a	intel_mid: Added custom handler for ipc devices Added a custom handler for medfield based ipc devices and moved devs_id structure defintion to header file. Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: http://lkml.kernel.org/r/1382049336-21316-9-git-send-email-david.a.cohen@linux.intel.com Signed-off-by: David Cohen <david.a.cohen@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-10-17 16:40:50 -07:00
Kuppuswamy Sathyanarayanan	712b6aa873	intel_mid: Renamed mrst to intel_mid mrst is used as common name to represent all intel_mid type soc's. But moorsetwon is just one of the intel_mid soc. So renamed them to use intel_mid. This patch mainly renames the variables and related functions that uses mrst prefix with intel_mid. To ensure that there are no functional changes, I have compared the objdump of related files before and after rename and found the only difference is symbol and name changes. Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: http://lkml.kernel.org/r/1382049336-21316-6-git-send-email-david.a.cohen@linux.intel.com Signed-off-by: David Cohen <david.a.cohen@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-10-17 16:40:47 -07:00
Kuppuswamy Sathyanarayanan	05454c26eb	intel_mid: Renamed mrst to intel_mid Following files contains code that is common to all intel mid soc's. So renamed them as below. mrst/mrst.c -> intel-mid/intel-mid.c mrst/vrtc.c -> intel-mid/intel_mid_vrtc.c mrst/early_printk_mrst.c -> intel-mid/intel_mid_vrtc.c pci/mrst.c -> pci/intel_mid_pci.c Also, renamed the corresponding header files and made changes to the driver files that included these header files. To ensure that there are no functional changes, I have compared the objdump of renamed files before and after rename and found that the only difference is file name change. Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: http://lkml.kernel.org/r/1382049336-21316-4-git-send-email-david.a.cohen@linux.intel.com Signed-off-by: David Cohen <david.a.cohen@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-10-17 16:40:36 -07:00
Gleb Natapov	13acfd5715	Powerpc KVM work is based on a commit after rc4. Merging master into next to satisfy the dependencies. Conflicts: arch/arm/kvm/reset.c	2013-10-17 17:41:49 +03:00
Jacob Pan	1a6b991a98	x86 / msr: add 64bit _on_cpu access functions Having 64-bit MSR access methods on given CPU can avoid shifting and simplify MSR content manipulation. We already have other combinations of rdmsrl_xxx and wrmsrl_xxx but missing the _on_cpu version. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-17 00:36:06 +02:00
Christoffer Dall	6d9d41e574	KVM: Move gfn_to_index to x86 specific code The gfn_to_index function relies on huge page defines which either may not make sense on systems that don't support huge pages or are defined in an unconvenient way for other architectures. Since this is x86-specific, move the function to arch/x86/include/asm/kvm_host.h. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-10-14 10:11:44 +03:00
Kees Cook	6145cfe394	x86, kaslr: Raise the maximum virtual address to -1 GiB on x86_64 On 64-bit, this raises the maximum location to -1 GiB (from -1.5 GiB), the upper limit currently, since the kernel fixmap page mappings need to be moved to use the other 1 GiB (which would be the theoretical limit when building with -mcmodel=kernel). Signed-off-by: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/1381450698-28710-7-git-send-email-keescook@chromium.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-10-13 03:13:13 -07:00
Kees Cook	5bfce5ef55	x86, kaslr: Provide randomness functions Adds potential sources of randomness: RDRAND, RDTSC, or the i8254. This moves the pre-alternatives inline rdrand function into the header so both pieces of code can use it. Availability of RDRAND is then controlled by CONFIG_ARCH_RANDOM, if someone wants to disable it even for kASLR. Signed-off-by: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/1381450698-28710-4-git-send-email-keescook@chromium.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-10-13 03:12:12 -07:00
Ingo Molnar	88f182dd77	x86: Apply the asm_volatile_goto() compiler quirk Apply the asm_volatile_goto() compiler quirk to the new rmwcc.h file as well, introduced in: `c2daa3bed5` sched, x86: Provide a per-cpu preempt_count implementation Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Suggested-by: Jakub Jelinek <jakub@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-11 07:41:43 +02:00
Ingo Molnar	ec0ad3d01f	Merge branch 'core/urgent' into sched/core Merge in asm goto fix, to be able to apply the asm/rmwcc.h fix. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-11 07:39:37 +02:00
Ingo Molnar	3f0116c323	compiler/gcc4: Add quirk for 'asm goto' miscompilation bug Fengguang Wu, Oleg Nesterov and Peter Zijlstra tracked down a kernel crash to a GCC bug: GCC miscompiles certain 'asm goto' constructs, as outlined here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 Implement a workaround suggested by Jakub Jelinek. Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Suggested-by: Jakub Jelinek <jakub@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-11 07:39:14 +02:00
K. Y. Srinivasan	9e7827b5ea	x86, hyperv: Get the local APIC timer frequency from the hypervisor Hyper-V supports a mechanism for retrieving the local APIC frequency. Use this and bypass the calibration code in the kernel . This would allow us to boot the Linux kernel as a "modern VM" on Hyper-V where many of the legacy devices (such as PIT) are not emulated. I would like to thank Olaf Hering <olaf@aepfle.de>, Jan Beulich <JBeulich@suse.com> and H. Peter Anvin <h.peter.anvin@intel.com> for their help in this effort. In this version of the patch, I have addressed Jan's comments. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Link: http://lkml.kernel.org/r/1380554932-9888-1-git-send-email-olaf@aepfle.de Tested-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-10-10 11:44:12 -07:00
Arthur Chunqi Li	7854cbca81	KVM: nVMX: Fully support nested VMX preemption timer This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of "Save VMX-preemption timer value" VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-10 18:22:54 +02:00
Rob Herring	32df8dca50	of: remove HAVE_ARCH_DEVTREE_FIXUPS HAVE_ARCH_DEVTREE_FIXUPS appears to always be needed except for sparc, but it is only used for /proc/device-teee and sparc does not enable /proc/device-tree. So this option is redundant. Remove the option and always enable it. This has the side effect of fixing /proc/device-tree on arches such as arm64 which failed to define this option. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Grant Likely <grant.likely@linaro.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: James Hogan <james.hogan@imgtec.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com>	2013-10-09 20:04:08 -05:00
Rob Herring	25ff79443c	of: implement pci_address_to_pio as weak function Implement pci_address_to_pio as weak function to remove the dependency on asm/prom.h. This is in preparation to make prom.h optional. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Grant Likely <grant.likely@linaro.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Grant Likely <grant.likely@linaro.org>	2013-10-09 20:04:06 -05:00
Stefano Stabellini	d6fe76c58c	xen: introduce xen_alloc/free_coherent_pages xen_swiotlb_alloc_coherent needs to allocate a coherent buffer for cpu and devices. On native x86 is sufficient to call __get_free_pages in order to get a coherent buffer, while on ARM (and potentially ARM64) we need to call the native dma_ops->alloc implementation. Introduce xen_alloc_coherent_pages to abstract the arch specific buffer allocation. Similarly introduce xen_free_coherent_pages to free a coherent buffer: on x86 is simply a call to free_pages while on ARM and ARM64 is arm_dma_ops.free. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Changes in v7: - rename __get_dma_ops to __generic_dma_ops; - call __generic_dma_ops(hwdev)->alloc/free on arm64 too. Changes in v6: - call __get_dma_ops to get the native dma_ops pointer on arm.	2013-10-09 17:18:14 +00:00
Ingo Molnar	37bf06375c	Linux 3.12-rc4 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQEcBAABAgAGBQJSUc9zAAoJEHm+PkMAQRiG9DMH/AtpuAF6LlMRPjrCeuJQ1pyh T0IUO+CsLKO6qtM5IyweP8V6zaasNjIuW1+B6IwVIl8aOrM+M7CwRiKvpey26ldM I8G2ron7hqSOSQqSQs20jN2yGAqQGpYIbTmpdGLAjQ350NNNvEKthbP5SZR5PAmE UuIx5OGEkaOyZXvCZJXU9AZkCxbihlMSt2zFVxybq2pwnGezRUYgCigE81aeyE0I QLwzzMVdkCxtZEpkdJMpLILAz22jN4RoVDbXRa2XC7dA9I2PEEXI9CcLzqCsx2Ii 8eYS+no2K5N2rrpER7JFUB2B/2X8FaVDE+aJBCkfbtwaYTV9UYLq3a/sKVpo1Cs= =xSFJ -----END PGP SIGNATURE----- Merge tag 'v3.12-rc4' into sched/core Merge Linux v3.12-rc4 to fix a conflict and also to refresh the tree before applying more scheduler patches. Conflicts: arch/avr32/include/asm/Kbuild Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-09 12:36:13 +02:00
Paolo Bonzini	8a3c1a3347	KVM: mmu: change useless int return types to void kvm_mmu initialization is mostly filling in function pointers, there is no way for it to fail. Clean up unused return values. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-10-03 15:44:02 +03:00
Paolo Bonzini	d8d173dab2	KVM: mmu: remove uninteresting MMU "new_cr3" callbacks The new_cr3 MMU callback has been a wrapper for mmu_free_roots since commit `e676505` (KVM: MMU: Force cr3 reload with two dimensional paging on mov cr3 emulation, 2012-07-08). The commit message mentioned that "mmu_free_roots() is somewhat of an overkill, but fixing that is more complicated and will be done after this minimal fix". One year has passed, and no one really felt the need to do a different fix. Wrap the call with a kvm_mmu_new_cr3 function for clarity, but remove the callback. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-10-03 15:43:59 +03:00
Paolo Bonzini	206260941f	KVM: mmu: remove uninteresting MMU "free" callbacks The free MMU callback has been a wrapper for mmu_free_roots since mmu_free_roots itself was introduced (commit `17ac10a`, [PATCH] KVM: MU: Special treatment for shadow pae root pages, 2007-01-05), and has always been the same for all MMU cases. Remove the indirection as it is useless. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-10-03 15:43:56 +03:00
Paolo Bonzini	4344ee981e	KVM: x86: only copy XSAVE state for the supported features This makes the interface more deterministic for userspace, which can expect (after configuring only the features it supports) to get exactly the same state from the kernel, independent of the host CPU and kernel version. Suggested-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-10-03 12:29:09 +03:00
Paolo Bonzini	d7876f1be4	KVM: x86: prevent setting unsupported XSAVE states A guest can still attempt to save and restore XSAVE states even if they have been masked in CPUID leaf 0Dh. This usually is not visible to the guest, but is still wrong: "Any attempt to set a reserved bit (as determined by the contents of EAX and EDX after executing CPUID with EAX=0DH, ECX= 0H) in XCR0 for a given processor will result in a #GP exception". The patch also performs the same checks as __kvm_set_xcr in KVM_SET_XSAVE. This catches migration from newer to older kernel/processor before the guest starts running. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-10-03 12:29:07 +03:00
Borislav Petkov	646e29a178	x86: Improve the printout of the SMP bootup CPU table As the new x86 CPU bootup printout format code maintainer, I am taking immediate action to improve and clean (and thus indulge my OCD) the reporting of the cores when coming up online. Fix padding to a right-hand alignment, cleanup code and bind reporting width to the max number of supported CPUs on the system, like this: [ 0.074509] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 #6 #7 OK [ 0.644008] smpboot: Booting Node 1, Processors: #8 #9 #10 #11 #12 #13 #14 #15 OK [ 1.245006] smpboot: Booting Node 2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK [ 1.864005] smpboot: Booting Node 3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK [ 2.489005] smpboot: Booting Node 4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK [ 3.093005] smpboot: Booting Node 5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK [ 3.698005] smpboot: Booting Node 6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK [ 4.304005] smpboot: Booting Node 7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK [ 4.961413] Brought up 64 CPUs and this: [ 0.072367] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 #6 #7 OK [ 0.686329] Brought up 8 CPUs Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Libin <huawei.libin@huawei.com> Cc: wangyijing@huawei.com Cc: fenghua.yu@intel.com Cc: guohanjun@huawei.com Cc: paul.gortmaker@windriver.com Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-09-28 10:10:26 +02:00
Peter Zijlstra	75f93fed50	sched: Revert need_resched() to look at TIF_NEED_RESCHED Yuanhan reported a serious throughput regression in his pigz benchmark. Using the ftrace patch I found that several idle paths need more TLC before we can switch the generic need_resched() over to preempt_need_resched. The preemption paths benefit most from preempt_need_resched and do indeed use it; all other need_resched() users don't really care that much so reverting need_resched() back to tif_need_resched() is the simple and safe solution. Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: lkp@linux.intel.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20130927153003.GF15690@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-09-28 10:04:47 +02:00
Linus Torvalds	4b97280675	Bug-fixes: - Fix PV spinlocks triggering jump_label code bug - Remove extraneous code in the tpm front driver - Fix ballooning out of pages when non-preemptible - Fix deadlock when using a 32-bit initial domain with large amount of memory. - Add xen_nopvpsin parameter to the documentation -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQEcBAABAgAGBQJSQvzCAAoJEFjIrFwIi8fJyCIIAMENABapdLhrOiRdQ1Y7T5v1 4bogPDLwpVxHzwo/vnHcNpl35/dUZrC6wQa51Bkoqq0V8o1XmjFy3SY/EBGjEAvw hh4qxGY0p0NNi6hKrWC8mH9u2TcluZGm1uecabkXUhl9mrAB5oBsfJdbBZ5N69gO QXXt0j7Xwv1APwH86T0e1Lz+lulhdw2ItXP4osYkEbRYNSaaGnuwsd0Jxcb4DeMk qhKgP7QMn3C7zDDaapJo1axeYQRBNEtv5M8+0wwMleX4yX1+IBRZeQTsRfMr7RB/ 8FhssWiH15xU6Gmzgi/VR8xhTEIbQh5GWsVReGf6pqIYSxGSYTvvyhm0bVRH4JI= =c+7u -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen fixes from Konrad Rzeszutek Wilk: "Bug-fixes and one update to the kernel-paramters.txt documentation. - Fix PV spinlocks triggering jump_label code bug - Remove extraneous code in the tpm front driver - Fix ballooning out of pages when non-preemptible - Fix deadlock when using a 32-bit initial domain with large amount of memory - Add xen_nopvpsin parameter to the documentation" * tag 'stable/for-linus-3.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/spinlock: Document the xen_nopvspin parameter. xen/p2m: check MFN is in range before using the m2p table xen/balloon: don't alloc page while non-preemptible xen: Do not enable spinlocks before jump_label_init() has executed tpm: xen-tpmfront: Remove the locality sysfs attribute tpm: xen-tpmfront: Fix default durations	2013-09-25 15:50:53 -07:00
Yinghai Lu	bee7f9c83e	ACPI / x86: Increase override tables number limit Current ACPI tables in initrd is limited to 10, that is too small. 64 should be good enough as we have 35 sigs and could have several SSDT. Two problems in current code prevent us from increasing limit: 1. The cpio file info array is put in stack, as every element is 32 bytes, could run out of stack if we have that array size to 64. We can move it out from stack, make it global and put it into the __initdata section. 2. early_ioremap() only can remap 256k one time. Current code maps 10 tables at a time. If we increased that limit, the whole size could be more than 256k, so early_ioremap() would fail with that. We can map chunks one by one during copying, instead of mapping all of them together. Signed-off-by: Yinghai <yinghai@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Tested-by: Thomas Renninger <trenn@suse.de> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com> Tested-by: Tang Chen <tangchen@cn.fujitsu.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-09-25 16:59:39 +02:00
David Vrabel	0160676bba	xen/p2m: check MFN is in range before using the m2p table On hosts with more than 168 GB of memory, a 32-bit guest may attempt to grant map an MFN that is error cannot lookup in its mapping of the m2p table. There is an m2p lookup as part of m2p_add_override() and m2p_remove_override(). The lookup falls off the end of the mapped portion of the m2p and (because the mapping is at the highest virtual address) wraps around and the lookup causes a fault on what appears to be a user space address. do_page_fault() (thinking it's a fault to a userspace address), tries to lock mm->mmap_sem. If the gntdev device is used for the grant map, m2p_add_override() is called from from gnttab_mmap() with mm->mmap_sem already locked. do_page_fault() then deadlocks. The deadlock would most commonly occur when a 64-bit guest is started and xenconsoled attempts to grant map its console ring. Introduce mfn_to_pfn_no_overrides() which checks the MFN is within the mapped portion of the m2p table before accessing the table and use this in m2p_add_override(), m2p_remove_override(), and mfn_to_pfn() (which already had the correct range check). All faults caused by accessing the non-existant parts of the m2p are thus within the kernel address space and exception_fixup() is called without trying to lock mm->mmap_sem. This means that for MFNs that are outside the mapped range of the m2p then mfn_to_pfn() will always look in the m2p overrides. This is correct because it must be a foreign MFN (and the PFN in the m2p in this case is only relevant for the other domain). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Stefano Stabellini <stefano.stabellini@citrix.com> Cc: Jan Beulich <JBeulich@suse.com> -- v3: check for auto_translated_physmap in mfn_to_pfn_no_overrides() v2: in mfn_to_pfn() look in m2p_overrides if the MFN is out of range as it's probably foreign. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2013-09-25 09:00:03 -04:00
Peter Zijlstra	1a338ac32c	sched, x86: Optimize the preempt_schedule() call Remove the bloat of the C calling convention out of the preempt_enable() sites by creating an ASM wrapper which allows us to do an asm("call ___preempt_schedule") instead. calling.h bits by Andi Kleen Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-tk7xdi1cvvxewixzke8t8le1@git.kernel.org [ Fixed build error. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-09-25 14:23:07 +02:00
Peter Zijlstra	c2daa3bed5	sched, x86: Provide a per-cpu preempt_count implementation Convert x86 to use a per-cpu preemption count. The reason for doing so is that accessing per-cpu variables is a lot cheaper than accessing thread_info variables. We still need to save/restore the actual preemption count due to PREEMPT_ACTIVE so we place the per-cpu __preempt_count variable in the same cache-line as the other hot __switch_to() variables such as current_task. NOTE: this save/restore is required even for !PREEMPT kernels as cond_resched() also relies on preempt_count's PREEMPT_ACTIVE to ignore task_struct::state. Also rename thread_info::preempt_count to ensure nobody is 'accidentally' still poking at it. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-gzn5rfsf8trgjoqx8hyayy3q@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-09-25 14:07:57 +02:00
Peter Zijlstra	a787870924	sched, arch: Create asm/preempt.h In order to prepare to per-arch implementations of preempt_count move the required bits into an asm-generic header and use this for all archs. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-h5j0c1r3e3fk015m30h8f1zx@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-09-25 14:07:50 +02:00
Peter Zijlstra	0c44c2d0f4	x86: Use asm goto to implement better modify_and_test() functions Linus suggested using asm goto to get rid of the typical SETcc + TEST instruction pair -- which also clobbers an extra register -- for our typical modify_and_test() functions. Because asm goto doesn't allow output fields it has to include an unconditinal memory clobber when it changes a memory variable to force a reload. Luckily all atomic ops already imply a compiler barrier to go along with their memory barrier semantics. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-0mtn9siwbeo1d33bap1422se@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-09-25 13:53:08 +02:00
Mike Travis	8eba18428a	x86/UV: Add uvtrace support This patch adds support for the uvtrace module by providing a skeleton call to the registered trace function. It also provides another separate 'NMI' tracer that is triggered by the system wide 'power nmi' command. Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Dimitri Sivanich <sivanich@sgi.com> Reviewed-by: Hedi Berriche <hedi@sgi.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Jason Wessel <jason.wessel@windriver.com> Link: http://lkml.kernel.org/r/20130923212501.185052551@asylum.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-09-24 09:02:04 +02:00
Mike Travis	0d12ef0c90	x86/UV: Update UV support for external NMI signals The current UV NMI handler has not been updated for the changes in the system NMI handler and the perf operations. The UV NMI handler reads an MMR in the UV Hub to check to see if the NMI event was caused by the external 'system NMI' that the operator can initiate on the System Mgmt Controller. The problem arises when the perf tools are running, causing millions of perf events per second on very large CPU count systems. Previously this was okay because the perf NMI handler ran at a higher priority on the NMI call chain and if the NMI was a perf event, it would stop calling other NMI handlers remaining on the NMI call chain. Now the system NMI handler calls all the handlers on the NMI call chain including the UV NMI handler. This causes the UV NMI handler to read the MMRs at the same millions per second rate. This can lead to significant performance loss and possible system failures. It also can cause thousands of 'Dazed and Confused' messages being sent to the system console. This effectively makes perf tools unusable on UV systems. To avoid this excessive overhead when perf tools are running, this code has been optimized to minimize reading of the MMRs as much as possible, by moving to the NMI_UNKNOWN notifier chain. This chain is called only when all the users on the standard NMI_LOCAL call chain have been called and none of them have claimed this NMI. There is an exception where the NMI_LOCAL notifier chain is used. When the perf tools are in use, it's possible that the UV NMI was captured by some other NMI handler and then either ignored or mistakenly processed as a perf event. We set a per_cpu ('ping') flag for those CPUs that ignored the initial NMI, and then send them an IPI NMI signal. The NMI_LOCAL handler on each cpu does not need to read the MMR, but instead checks the in memory flag indicating it was pinged. There are two module variables, 'ping_count' indicating how many requested NMI events occurred, and 'ping_misses' indicating how many stray NMI events. These most likely are perf events so it shows the overhead of the perf NMI interrupts and how many MMR reads were avoided. This patch also minimizes the reads of the MMRs by having the first cpu entering the NMI handler on each node set a per HUB in-memory atomic value. (Having a per HUB value avoids sending lock traffic over NumaLink.) Both types of UV NMIs from the SMI layer are supported. Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Dimitri Sivanich <sivanich@sgi.com> Reviewed-by: Hedi Berriche <hedi@sgi.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Jason Wessel <jason.wessel@windriver.com> Link: http://lkml.kernel.org/r/20130923212500.353547733@asylum.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-09-24 09:02:02 +02:00
Mike Travis	1e019421bc	x86/UV: Move NMI support This patch moves the UV NMI support from the x2apic file to a new separate uv_nmi.c file in preparation for the next sequence of patches. It prevents upcoming bloat of the x2apic file, and has the added benefit of putting the upcoming /sys/module parameters under the name 'uv_nmi' instead of 'x2apic_uv_x', which was obscure. Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Dimitri Sivanich <sivanich@sgi.com> Reviewed-by: Hedi Berriche <hedi@sgi.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Jason Wessel <jason.wessel@windriver.com> Link: http://lkml.kernel.org/r/20130923212500.183295611@asylum.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-09-24 09:02:02 +02:00
Jiang Liu	7e1f85f96d	x86 / ACPI: simplify _acpi_map_lsapic() In acpi_register_lapic(), it will generates a new logical cpu number and maps to the local APIC id, this logical cpu number can be returned to simplify _acpi_map_lsapic() implementation. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-09-24 01:39:40 +02:00
Ard Biesheuvel	801201aa25	crypto: move x86 to the generic version of ablk_helper Move all users of ablk_helper under x86/ to the generic version and delete the x86 specific version. Acked-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-09-24 06:02:24 +10:00
Cyrill Gorcunov	fa0f281cf9	mm: make sure _PAGE_SWP_SOFT_DIRTY bit is not set on present pte _PAGE_SOFT_DIRTY bit should never be set on present pte so add VM_BUG_ON to catch any potential future abuse. Also add a comment on _PAGE_SWP_SOFT_DIRTY definition explaining scope of its usage. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-11 15:58:06 -07:00
Dave Hansen	6df46865ff	mm: vmstats: track TLB flush stats on UP too The previous patch doing vmstats for TLB flushes ("mm: vmstats: tlb flush counters") effectively missed UP since arch/x86/mm/tlb.c is only compiled for SMP. UP systems do not do remote TLB flushes, so compile those counters out on UP. arch/x86/kernel/cpu/mtrr/generic.c calls __flush_tlb() directly. This is probably an optimization since both the mtrr code and __flush_tlb() write cr4. It would probably be safe to make that a flush_tlb_all() (and then get these statistics), but the mtrr code is ancient and I'm hesitant to touch it other than to just stick in the counters. [akpm@linux-foundation.org: tweak comments] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-11 15:57:09 -07:00
Linus Torvalds	442e0973e9	Merge branch 'x86/jumplabel' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 jumplabel changes from Peter Anvin: "One more x86 tree for this merge window. This tree improves the handling of jump labels, so that most of the time we don't have to do a massive initial patching run. Furthermore, we will error out of the jump label is not what is expected, eg if it has been corrupted or tampered with" * 'x86/jumplabel' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/jump-label: Show where and what was wrong on errors x86/jump-label: Add safety checks to jump label conversions x86/jump-label: Do not bother updating nops if they are correct x86/jump-label: Use best default nops for inital jump label calls	2013-09-10 19:43:23 -07:00
Andi Kleen	ff47ab4ff3	x86: Add 1/2/4/8 byte optimization to 64bit __copy_{from,to}_user_inatomic The 64bit __copy_{from,to}_user_inatomic always called copy_from_user_generic, but skipped the special optimizations for 1/2/4/8 byte accesses. This especially hurts the futex call, which accesses the 4 byte futex user value with a complicated fast string operation in a function call, instead of a single movl. Use __copy_{from,to}_user for _inatomic instead to get the same optimizations. The only problem was the might_fault() in those functions. So move that to new wrapper and call __copy_{f,t}_user_nocheck() from *_inatomic directly. 32bit already did this correctly by duplicating the code. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1376687844-19857-2-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-09-10 15:27:43 -07:00
Linus Torvalds	64c353864e	Merge branch 'for-v3.12' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping Pull DMA mapping update from Marek Szyprowski: "This contains an addition of Device Tree support for reserved memory regions (Contiguous Memory Allocator is one of the drivers for it) and changes required by the KVM extensions for PowerPC architectue" * 'for-v3.12' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: ARM: init: add support for reserved memory defined by device tree drivers: of: add initialization code for dma reserved memory drivers: of: add function to scan fdt nodes given by path drivers: dma-contiguous: clean source code and prepare for device tree	2013-09-09 10:26:33 -07:00
Konrad Rzeszutek Wilk	65320fceda	Linux 3.11-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQEcBAABAgAGBQJSGqS5AAoJEHm+PkMAQRiGFxEH/3VrqF6WAkcviNiW/0DCdO8k v6Wi7Sp5LxVkwzmOCHCV1tTHwLRlH3cB9YmJlGQ0kHCREaAuEQAB0xJXIW7dnyYj Qq7KoRZEMe3wizmjEsj8qsrhfMLzHjBw67hBz2znwW/4P7YdgzwD7KRiEat+yRC9 ON3nNL2zIqpfk92RXvVrSVl4KMEM+WNbOfiffgBiEP24Ja1MJMFH1d4i6hNOaB0x 9Pb3Lw8let92x+8Ao5jnjKdKMgVsoZWbN/TgQR8zZOHM38AGGiDgk18vMz+L+hpS jqfjckxj1m30jGq0qZ9ZbMZx3IGif4KccVr30MqNHJpwi6Q24qXvT3YfA3HkstM= =nAab -----END PGP SIGNATURE----- Merge tag 'v3.11-rc7' into stable/for-linus-3.12 Linux 3.11-rc7 As we need the git commit 28817e9de4f039a1a8c1fe1df2fa2df524626b9e Author: Chuck Anderson <chuck.anderson@oracle.com> Date: Tue Aug 6 15:12:19 2013 -0700 xen/smp: initialize IPI vectors before marking CPU online * tag 'v3.11-rc7': (443 commits) Linux 3.11-rc7 ARC: [lib] strchr breakage in Big-endian configuration VFS: collect_mounts() should return an ERR_PTR bfs: iget_locked() doesn't return an ERR_PTR efs: iget_locked() doesn't return an ERR_PTR() proc: kill the extra proc_readfd_common()->dir_emit_dots() cope with potentially long ->d_dname() output for shmem/hugetlb usb: phy: fix build breakage USB: OHCI: add missing PCI PM callbacks to ohci-pci.c staging: comedi: bug-fix NULL pointer dereference on failed attach lib/lz4: correct the LZ4 license memcg: get rid of swapaccount leftovers nilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP error detection nilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP error drivers/platform/olpc/olpc-ec.c: initialise earlier ipv4: expose IPV4_DEVCONF ipv6: handle Redirect ICMP Message with no Redirected Header option be2net: fix disabling TX in be_close() Revert "ACPI / video: Always call acpi_video_init_brightness() on init" Revert "genetlink: fix family dump race" ... Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-09-09 12:05:37 -04:00
Konrad Rzeszutek Wilk	c3f31f6a6f	Merge branch 'x86/spinlocks' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into stable/for-linus-3.12 * 'x86/spinlocks' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kvm/guest: Fix sparse warning: "symbol 'klock_waiting' was not declared as static" kvm: Paravirtual ticketlocks support for linux guests running on KVM hypervisor kvm guest: Add configuration support to enable debug information for KVM Guests kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi xen, pvticketlock: Allow interrupts to be enabled while blocking x86, ticketlock: Add slowpath logic jump_label: Split jumplabel ratelimit x86, pvticketlock: When paravirtualizing ticket locks, increment by 2 x86, pvticketlock: Use callee-save for lock_spinning xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks xen, pvticketlock: Xen implementation for PV ticket locks xen: Defer spinlock setup until boot CPU setup x86, ticketlock: Collapse a layer of functions x86, ticketlock: Don't inline _spin_unlock when using paravirt spinlocks x86, spinlock: Replace pv spinlocks with pv ticketlocks	2013-09-09 12:01:15 -04:00
Linus Torvalds	6be48f2940	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "Here is the crypto update for 3.12: - Added MODULE_SOFTDEP to allow pre-loading of modules. - Reinstated crct10dif driver using the module softdep feature. - Allow via rng driver to be auto-loaded. - Split large input data when necessary in nx. - Handle zero length messages correctly for GCM/XCBC in nx. - Handle SHA-2 chunks bigger than block size properly in nx. - Handle unaligned lengths in omap-aes. - Added SHA384/SHA512 to omap-sham. - Added OMAP5/AM43XX SHAM support. - Added OMAP4 TRNG support. - Misc fixes" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (66 commits) Reinstate "crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform framework" hwrng: via - Add MODULE_DEVICE_TABLE crypto: fcrypt - Fix bitoperation for compilation with clang crypto: nx - fix SHA-2 for chunks bigger than block size crypto: nx - fix GCM for zero length messages crypto: nx - fix XCBC for zero length messages crypto: nx - fix limits to sg lists for AES-CCM crypto: nx - fix limits to sg lists for AES-XCBC crypto: nx - fix limits to sg lists for AES-GCM crypto: nx - fix limits to sg lists for AES-CTR crypto: nx - fix limits to sg lists for AES-CBC crypto: nx - fix limits to sg lists for AES-ECB crypto: nx - add offset to nx_build_sg_lists() padata - Register hotcpu notifier after initialization padata - share code between CPU_ONLINE and CPU_DOWN_FAILED, same to CPU_DOWN_PREPARE and CPU_UP_CANCELED hwrng: omap - reorder OMAP TRNG driver code crypto: omap-sham - correct dma burst size crypto: omap-sham - Enable Polling mode if DMA fails crypto: tegra-aes - bitwise vs logical and crypto: sahara - checking the wrong variable ...	2013-09-07 14:31:18 -07:00
Herbert Xu	eeca9fad52	Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux Merge upstream tree in order to reinstate crct10dif.	2013-09-07 12:53:35 +10:00
Linus Torvalds	b4b50fd78b	ARM: SoC platform changes for 3.12 This branch contains mostly additions and changes to platform enablement and SoC-level drivers. Since there's sometimes a dependency on device-tree changes, there's also a fair amount of those in this branch. Pieces worth mentioning are: - Mbus driver for Marvell platforms, allowing kernel configuration and resource allocation of on-chip peripherals. - Enablement of the mbus infrastructure from Marvell PCI-e drivers. - Preparation of MSI support for Marvell platforms. - Addition of new PCI-e host controller driver for Tegra platforms - Some churn caused by sharing of macro names between i.MX 6Q and 6DL platforms in the device tree sources and header files. - Various suspend/PM updates for Tegra, including LP1 support. - Versatile Express support for MCPM, part of big little support. - Allwinner platform support for A20 and A31 SoCs (dual and quad Cortex-A7) - OMAP2+ support for DRA7, a new Cortex-A15-based SoC. The code that touches other architectures are patches moving MSI arch-specific functions over to weak symbols and removal of ARCH_SUPPORTS_MSI, acked by PCI maintainers. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJSKhYmAAoJEIwa5zzehBx322AP/1ONYs8o8f7/Gzq6lZvTN6T3 0pBTApg6Jfioi3lwKvUAEIcsW82YKQ+UZkbW66GQH6+Ri4aZJKZHuz0+JPU67OJ4 LtSLuzVWrymy2VOOUvAnS/SXkOZw/pHhU4cLNHn1dMndhUL1Uqp9/XwuiHEQyFsP uOkpcBtIu0EWElov0PKKZ5SWBg8JJs2vy5ydiViGelWHCrZvDDZkWzIsDcBQxJLQ juzT4+JE+KOu7vKmfw78o6iHoCS2TBRAN9YUCajRb8Wl+out1hrTahHnDWaZ5Mce EskcQNkJROqFbjD4k3ABN4XGTv2VDmrztIwFe0SEQ7Dz/9ypCrBGT69uI9xIqTXr GwVRIwAUFTpMupK0gy93z1ajV3N0CXV79out9+jQNUQybYE+czp8QOyhmuc1tZx0 8fn9jlBQe9Vy6yrs39gEcE7nUwrayeyQ+6UvqqwsE2pWZabNAnCMSPX5+QIu+T/3 tQ7+jYmfFeserp1sIDOHOnxfhtW9EI6U9d1h/DUCwrsuFdkL9ha4M/vh9Pwgye98 tBdz0T4yE39AJQwwFWRkv1jcQKcGu6WqJanmvS4KRBksGwuLWxy+ewOnkz2ifS25 ZYSyxAryZRBvQRqlOK11rXPfRcbGcY0MG9lkKX96rGcyWEizgE1DdjxXD8HoIleN R8heV6GX5OzlFLGX2tKK =fJ5x -----END PGP SIGNATURE----- Merge tag 'soc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC platform changes from Olof Johansson: "This branch contains mostly additions and changes to platform enablement and SoC-level drivers. Since there's sometimes a dependency on device-tree changes, there's also a fair amount of those in this branch. Pieces worth mentioning are: - Mbus driver for Marvell platforms, allowing kernel configuration and resource allocation of on-chip peripherals. - Enablement of the mbus infrastructure from Marvell PCI-e drivers. - Preparation of MSI support for Marvell platforms. - Addition of new PCI-e host controller driver for Tegra platforms - Some churn caused by sharing of macro names between i.MX 6Q and 6DL platforms in the device tree sources and header files. - Various suspend/PM updates for Tegra, including LP1 support. - Versatile Express support for MCPM, part of big little support. - Allwinner platform support for A20 and A31 SoCs (dual and quad Cortex-A7) - OMAP2+ support for DRA7, a new Cortex-A15-based SoC. The code that touches other architectures are patches moving MSI arch-specific functions over to weak symbols and removal of ARCH_SUPPORTS_MSI, acked by PCI maintainers" * tag 'soc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (266 commits) tegra-cpuidle: provide stub when !CONFIG_CPU_IDLE PCI: tegra: replace devm_request_and_ioremap by devm_ioremap_resource ARM: tegra: Drop ARCH_SUPPORTS_MSI and sort list ARM: dts: vf610-twr: enable i2c0 device ARM: dts: i.MX51: Add one more I2C2 pinmux entry ARM: dts: i.MX51: Move pins configuration under "iomuxc" label ARM: dtsi: imx6qdl-sabresd: Add USB OTG vbus pin to pinctrl_hog ARM: dtsi: imx6qdl-sabresd: Add USB host 1 VBUS regulator ARM: dts: imx27-phytec-phycore-som: Enable AUDMUX ARM: dts: i.MX27: Disable AUDMUX in the template ARM: dts: wandboard: Add support for SDIO bcm4329 ARM: i.MX5 clocks: Remove optional clock setup (CKIH1) from i.MX51 template ARM: dts: imx53-qsb: Make USBH1 functional ARM i.MX6Q: dts: Enable I2C1 with EEPROM and PMIC on Phytec phyFLEX-i.MX6 Ouad module ARM i.MX6Q: dts: Enable SPI NOR flash on Phytec phyFLEX-i.MX6 Ouad module ARM: dts: imx6qdl-sabresd: Add touchscreen support ARM: imx: add ocram clock for imx53 ARM: dts: imx: ocram size is different between imx6q and imx6dl ARM: dts: imx27-phytec-phycore-som: Fix regulator settings ARM: dts: i.MX27: Remove clock name from CPU node ...	2013-09-06 13:30:06 -07:00
Linus Torvalds	ae7a835cc5	Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Gleb Natapov: "The highlights of the release are nested EPT and pv-ticketlocks support (hypervisor part, guest part, which is most of the code, goes through tip tree). Apart of that there are many fixes for all arches" Fix up semantic conflicts as discussed in the pull request thread.. * 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (88 commits) ARM: KVM: Add newlines to panic strings ARM: KVM: Work around older compiler bug ARM: KVM: Simplify tracepoint text ARM: KVM: Fix kvm_set_pte assignment ARM: KVM: vgic: Bump VGIC_NR_IRQS to 256 ARM: KVM: Bugfix: vgic_bytemap_get_reg per cpu regs ARM: KVM: vgic: fix GICD_ICFGRn access ARM: KVM: vgic: simplify vgic_get_target_reg KVM: MMU: remove unused parameter KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate() KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX KVM: x86: update masterclock when kvmclock_offset is calculated (v2) KVM: PPC: Book3S: Fix compile error in XICS emulation KVM: PPC: Book3S PR: return appropriate error when allocation fails arch: powerpc: kvm: add signed type cast for comparation KVM: x86: add comments where MMIO does not return to the emulator KVM: vmx: count exits to userspace during invalid guest emulation KVM: rename __kvm_io_bus_sort_cmp to kvm_io_bus_cmp kvm: optimize away THP checks in kvm_is_mmio_pfn() ...	2013-09-04 18:15:06 -07:00
Linus Torvalds	cf39c8e535	Features: - Xen Trusted Platform Module (TPM) frontend driver - with the backend in MiniOS. - Scalability improvements in event channel. - Two extra Xen co-maintainers (David, Boris) and one going away (Jeremy) Bug-fixes: - Make the 1:1 mapping work during early bootup on selective regions. - Add scratch page to balloon driver to deal with unexpected code still holding on stale pages. - Allow NMIs on PV guests (64-bit only) - Remove unnecessary TLB flush in M2P code. - Fixes duplicate callbacks in Xen granttable code. - Fixes in PRIVCMD_MMAPBATCH ioctls to allow retries - Fix for events being lost due to rescheduling on different VCPUs. - More documentation. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQEcBAABAgAGBQJSJgGgAAoJEFjIrFwIi8fJ4asH+gKp0aauPEdHtmn7rLfZUUJ5 uuvWBiXiVVYMFz81NXlZ1WoAMuDuVA45Eu785uPRb9oUHDi0W8LO4Dqr+9lJTrXJ KiMvTXmOLSfSdjRlDI4jCoxBdg8tpbT3oJkXsFcHnrd5d4oTFGb0uuo5nFYPDicZ BGogDclzcqtlYl/2LUb+6vUXUQd77n0oW7RQ4yAaw3Qdj381om3Dmoeat8QU9Kdo Q4dhsHS6YAGR5R+G0zPfVOoKvSGoGV0NUdXr19QpYArGxKXcmiPjrgAJ/NGLsxvm 8AbPjmQzOFJmUclHiiej6kvBsh2ZTYAesJMSAFLWD7EndXii7zljyJv0PIJ//uQ= =hNDW -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.12-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen updates from Konrad Rzeszutek Wilk: "A couple of features and a ton of bug-fixes. There is also some maintership changes. Jeremy is enjoying the full-time work at the startup and as much as he would love to help - he can't find the time. I have a bunch of other things that I promised to work on - paravirt diet, get SWIOTLB working everywhere, etc, but haven't been able to find the time. As such both David Vrabel and Boris Ostrovsky have graciously volunteered to help with the maintership role. They will keep the lid on regressions, bug-fixes, etc. I will be in the background to help - but eventually there will be less of me doing the Xen GIT pulls and more of them. Stefano is still doing the ARM/ARM64 and will continue on doing so. Features: - Xen Trusted Platform Module (TPM) frontend driver - with the backend in MiniOS. - Scalability improvements in event channel. - Two extra Xen co-maintainers (David, Boris) and one going away (Jeremy) Bug-fixes: - Make the 1:1 mapping work during early bootup on selective regions. - Add scratch page to balloon driver to deal with unexpected code still holding on stale pages. - Allow NMIs on PV guests (64-bit only) - Remove unnecessary TLB flush in M2P code. - Fixes duplicate callbacks in Xen granttable code. - Fixes in PRIVCMD_MMAPBATCH ioctls to allow retries - Fix for events being lost due to rescheduling on different VCPUs. - More documentation" * tag 'stable/for-linus-3.12-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (23 commits) hvc_xen: Remove unnecessary __GFP_ZERO from kzalloc drivers/xen-tpmfront: Fix compile issue with missing option. xen/balloon: don't set P2M entry for auto translated guest xen/evtchn: double free on error Xen: Fix retry calls into PRIVCMD_MMAPBATCH*. xen/pvhvm: Initialize xen panic handler for PVHVM guests xen/m2p: use GNTTABOP_unmap_and_replace to reinstate the original mapping xen: fix ARM build after `6efa20e4` MAINTAINERS: Remove Jeremy from the Xen subsystem. xen/events: document behaviour when scanning the start word for events x86/xen: during early setup, only 1:1 map the ISA region x86/xen: disable premption when enabling local irqs swiotlb-xen: replace dma_length with sg_dma_len() macro swiotlb: replace dma_length with sg_dma_len() macro xen/balloon: set a mapping for ballooned out pages xen/evtchn: improve scalability by using per-user locks xen/p2m: avoid unneccesary TLB flush in m2p_remove_override() MAINTAINERS: Add in two extra co-maintainers of the Xen tree. MAINTAINERS: Update the Xen subsystem's with proper mailing list. xen: replace strict_strtoul() with kstrtoul() ...	2013-09-04 17:45:39 -07:00
Linus Torvalds	816434ec4a	Merge branch 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 spinlock changes from Ingo Molnar: "The biggest change here are paravirtualized ticket spinlocks (PV spinlocks), which bring a nice speedup on various benchmarks. The KVM host side will come to you via the KVM tree" * 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kvm/guest: Fix sparse warning: "symbol 'klock_waiting' was not declared as static" kvm: Paravirtual ticketlocks support for linux guests running on KVM hypervisor kvm guest: Add configuration support to enable debug information for KVM Guests kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi xen, pvticketlock: Allow interrupts to be enabled while blocking x86, ticketlock: Add slowpath logic jump_label: Split jumplabel ratelimit x86, pvticketlock: When paravirtualizing ticket locks, increment by 2 x86, pvticketlock: Use callee-save for lock_spinning xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks xen, pvticketlock: Xen implementation for PV ticket locks xen: Defer spinlock setup until boot CPU setup x86, ticketlock: Collapse a layer of functions x86, ticketlock: Don't inline _spin_unlock when using paravirt spinlocks x86, spinlock: Replace pv spinlocks with pv ticketlocks	2013-09-04 11:55:10 -07:00
Linus Torvalds	f357a82048	Merge branch 'x86-smap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SMAP fixes from Ingo Molnar: "Fixes for Intel SMAP support, to fix SIGSEGVs during bootup" * 'x86-smap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Introduce [compat_]save_altstack_ex() to unbreak x86 SMAP x86, smap: Handle csum_partial_copy_*_user()	2013-09-04 11:08:32 -07:00
Linus Torvalds	b20c99eb66	Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: "[ The reason for drivers/ updates is that Boris asked for the drivers/edac/ changes to go via x86/ras in this cycle ] Main changes: - AMD CPUs: . Add ECC event decoding support for new F15h models . Various erratum fixes . Fix single-channel on dual-channel-controllers bug. - Intel CPUs: . UC uncorrectable memory error parsing fix . Add support for CMC (Corrected Machine Check) 'FF' (Firmware First) flag in the APEI HEST - Various cleanups and fixes" * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: amd64_edac: Fix incorrect wraparounds amd64_edac: Correct erratum 505 range cpc925_edac: Use proper array termination x86/mce, acpi/apei: Only disable banks listed in HEST if mce is configured amd64_edac: Get rid of boot_cpu_data accesses amd64_edac: Add ECC decoding support for newer F15h models x86, amd_nb: Clarify F15h, model 30h GART and L3 support pci_ids: Add PCI device ID functions 3 and 4 for newer F15h models. x38_edac: Make a local function static i3200_edac: Make a local function static x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors APEI/ERST: Fix error message formatting amd64_edac: Fix single-channel setups EDAC: Replace strict_strtol() with kstrtol() mce: acpi/apei: Soft-offline a page on firmware GHES notification mce: acpi/apei: Add a boot option to disable ff mode for corrected errors mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC	2013-09-04 11:07:04 -07:00
Linus Torvalds	05eebfb26b	Merge branch 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt changes from Ingo Molnar: "Hypervisor signature detection cleanup and fixes - the goal is to make KVM guests run better on MS/Hyperv and to generalize and factor out the code a bit" * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Correctly detect hypervisor x86, kvm: Switch to use hypervisor_cpuid_base() xen: Switch to use hypervisor_cpuid_base() x86: Introduce hypervisor_cpuid_base()	2013-09-04 11:05:13 -07:00
Linus Torvalds	cb3e4330e6	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "Misc smaller fixes: - a parse_setup_data() boot crash fix - a memblock and an __early_ioremap cleanup - turn the always-on CONFIG_ARCH_MEMORY_PROBE=y into a configurable option and turn it off - it's an unrobust debug facility, it shouldn't be enabled by default" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: avoid remapping data in parse_setup_data() x86: Use memblock_set_current_limit() to set limit for memblock. mm: Remove unused variable idx0 in __early_ioremap() mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default	2013-09-04 09:39:26 -07:00
Linus Torvalds	aafcd5d757	Merge branch 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 relocation changes from Ingo Molnar: "This tree contains a single change, ELF relocation handling in C - one of the kernel randomization patches that makes sense even without randomization present upstream" * 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, relocs: Move ELF relocation handling to C	2013-09-04 09:38:10 -07:00
Linus Torvalds	228abe73ad	Merge branch 'x86-fb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fb changes from Ingo Molnar: "This tree includes preparatory patches for SimpleDRM driver support, by David Herrmann. They clean up x86 framebuffer support by creating simplefb devices wherever possible. More background can be found at http://lwn.net/Articles/558104/" * 'x86-fb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fbdev: fbcon: select VT_HW_CONSOLE_BINDING fbdev: efifb: bind to efi-framebuffer fbdev: vesafb: bind to platform-framebuffer device fbdev: simplefb: add common x86 RGB formats x86: sysfb: move EFI quirks from efifb to sysfb x86: provide platform-devices for boot-framebuffers fbdev: simplefb: mark as fw and allocate apertures fbdev: simplefb: add init through platform_data	2013-09-04 09:12:17 -07:00
Linus Torvalds	1f9c52e16b	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu feature fixes from Ingo Molnar: "Two small cpufeature support updates" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Fix override new_cpu_data.x86 with 486 x86, cpufeature: Use new CC_HAVE_ASM_GOTO	2013-09-04 09:11:16 -07:00
Linus Torvalds	2a475501b8	Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/asmlinkage changes from Ingo Molnar: "As a preparation for Andi Kleen's LTO patchset (link time optimizations using GCC's -flto which build time optimization has steadily increased in quality over the past few years and might eventually be usable for the kernel too) this tree includes a handful of preparatory patches that make function calling convention annotations consistent again: - Mark every function without arguments (or 64bit only) that is used by assembly code with asmlinkage() - Mark every function with parameters or variables that is used by assembly code as __visible. For the vanilla kernel this has documentation, consistency and debuggability advantages, for the time being" * 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asmlinkage: Fix warning in xen asmlinkage change x86, asmlinkage, vdso: Mark vdso variables __visible x86, asmlinkage, power: Make various symbols used by the suspend asm code visible x86, asmlinkage: Make dump_stack visible x86, asmlinkage: Make 64bit checksum functions visible x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops x86, asmlinkage, apm: Make APM data structure used from assembler visible x86, asmlinkage: Make syscall tables visible x86, asmlinkage: Make several variables used from assembler/linker script visible x86, asmlinkage: Make kprobes code visible and fix assembler code x86, asmlinkage: Make various syscalls asmlinkage x86, asmlinkage: Make 32bit/64bit __switch_to visible x86, asmlinkage: Make _*_start_kernel visible x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible x86, asmlinkage: Change dotraplinkage into __visible on 32bit x86: Fix sys_call_table type in asm/syscall.h	2013-09-04 08:42:44 -07:00
Linus Torvalds	3d7e5fc37f	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/asm changes from Ingo Molnar: "Main changes: - Apply low level mutex optimization on x86-64, by Wedson Almeida Filho. - Change bitops to be naturally 'long', by H Peter Anvin. - Add TSX-NI opcodes support to the x86 (instrumentation) decoder, by Masami Hiramatsu. - Add clang compatibility adjustments/workarounds, by Jan-Simon Möller" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, doc: Update uaccess.h comment to reflect clang changes x86, asm: Fix a compilation issue with clang x86, asm: Extend definitions of _ASM_* with a raw format x86, insn: Add new opcodes as of June, 2013 x86/ia32/asm: Remove unused argument in macro x86, bitops: Change bitops to be native operand size x86: Use asm-goto to implement mutex fast path on x86-64	2013-09-04 08:39:38 -07:00
Linus Torvalds	6924a4672d	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/apic changes from Ingo Molnar: "Smaller fixes" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioapic: Check attr against the previous setting when programmed more than once x86/ioapic/kcrash: Prevent crash_kexec() from deadlocking on ioapic_lock x86/acpi: Fix incorrect sanity check in acpi_register_lapic()	2013-09-04 08:39:05 -07:00
Linus Torvalds	5e0b3a4e88	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "Various optimizations, cleanups and smaller fixes - no major changes in scheduler behavior" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Fix the sd_parent_degenerate() code sched/fair: Rework and comment the group_imb code sched/fair: Optimize find_busiest_queue() sched/fair: Make group power more consistent sched/fair: Remove duplicate load_per_task computations sched/fair: Shrink sg_lb_stats and play memset games sched: Clean-up struct sd_lb_stat sched: Factor out code to should_we_balance() sched: Remove one division operation in find_busiest_queue() sched/cputime: Use this_cpu_add() in task_group_account_field() cpumask: Fix cpumask leak in partition_sched_domains() sched/x86: Optimize switch_mm() for multi-threaded workloads generic-ipi: Kill unnecessary variable - csd_flags numa: Mark __node_set() as __always_inline sched/fair: Cleanup: remove duplicate variable declaration sched/__wake_up_sync_key(): Fix nr_exclusive tasks which lead to WF_SYNC clearing	2013-09-04 08:36:35 -07:00
Linus Torvalds	0d99b70873	Merge branches 'perf-urgent-for-linus' and 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "As a first remark I'd like to point out that the obsolete '-f' (--force) option, which has not done anything for several releases, has been removed from 'perf record' and related utilities. Everyone please update muscle memory accordingly! :-) Main changes on the perf kernel side: - Performance optimizations: . for trace events, by Steve Rostedt. . for time values, by Peter Zijlstra - New hardware support: . for Intel Silvermont (22nm Atom) CPUs, by Zheng Yan . for Intel SNB-EP uncore PMUs, by Zheng Yan - Enhanced hardware support: . for Intel uncore PMUs: add filter support for QPI boxes, by Zheng Yan - Core perf events code enhancements and fixes: . for full-nohz feature handling, by Frederic Weisbecker . for group events, by Jiri Olsa . for call chains, by Frederic Weisbecker . for event stream parsing, by Adrian Hunter - New ABI details: . Add attr->mmap2 attribute, by Stephane Eranian . Add PERF_EVENT_IOC_ID ioctl to return event ID, by Jiri Olsa . Export u64 time_zero on the mmap header page to allow TSC calculation, by Adrian Hunter . Add dummy software event, by Adrian Hunter. . Add a new PERF_SAMPLE_IDENTIFIER to make samples always parseable, by Adrian Hunter. . Make Power7 events available via sysfs, by Runzhen Wang. - Code cleanups and refactorings: . for nohz-full, by Frederic Weisbecker . for group events, by Jiri Olsa - Documentation updates: . for perf_event_type, by Peter Zijlstra Main changes on the perf tooling side (some of these tooling changes utilize the above kernel side changes): - Lots of 'perf trace' enhancements: . Make 'perf trace' command line arguments consistent with 'perf record', by David Ahern. . Allow specifying syscalls a la strace, by Arnaldo Carvalho de Melo. . Add --verbose and -o/--output options, by Arnaldo Carvalho de Melo. . Support ! in -e expressions, to filter a list of syscalls, by Arnaldo Carvalho de Melo. . Arg formatting improvements to allow masking arguments in syscalls such as futex and open, where the some arguments are ignored and thus should not be printed depending on other args, by Arnaldo Carvalho de Melo. . Beautify futex open, openat, open_by_handle_at, lseek and futex syscalls, by Arnaldo Carvalho de Melo. . Add option to analyze events in a file versus live, so that one can do: [root@zoo ~]# perf record -a -e raw_syscalls:* sleep 1 [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 25.150 MB perf.data (~1098836 samples) ] [root@zoo ~]# perf trace -i perf.data -e futex --duration 1 17.799 ( 1.020 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, ua 113.344 (95.429 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 4294967 133.778 ( 1.042 ms): 18004 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 429496 [root@zoo ~]# By David Ahern. . Honor target pid / tid options when analyzing a file, by David Ahern. . Introduce better formatting of syscall arguments, including so far beautifiers for mmap, madvise, syscall return values, by Arnaldo Carvalho de Melo. . Handle HUGEPAGE defines in the mmap beautifier, by David Ahern. - 'perf report/top' enhancements: . Do annotation using /proc/kcore and /proc/kallsyms when available, removing the forced need for a vmlinux file kernel assembly annotation. This also improves this use case because vmlinux has just the initial kernel image, not what is actually in use after various code patchings by things like alternatives. By Adrian Hunter. . Add --ignore-callees=<regex> option to collapse undesired parts of call graphs, by Greg Price. . Simplify symbol filtering by doing it at machine class level, by Adrian Hunter. . Add support for callchains in the gtk UI, by Namhyung Kim. . Add --objdump option to 'perf top', by Sukadev Bhattiprolu. - 'perf kvm' enhancements: . Add option to print only events that exceed a specified time duration, by David Ahern. . Improve stack trace printing, by David Ahern. . Update documentation of the live command, by David Ahern . Add perf kvm stat live mode that combines aspects of 'perf kvm stat' record and report, by David Ahern. . Add option to analyze specific VM in perf kvm stat report, by David Ahern. . Do not require /lib/modules/* on a guest, by Jason Wessel. - 'perf script' enhancements: . Fix symbol offset computation for some dsos, by David Ahern. . Fix named threads support, by David Ahern. . Don't install scripting files files when perl/python support is disabled, by Arnaldo Carvalho de Melo. - 'perf test' enhancements: . Add various improvements and fixes to the "vmlinux matches kallsyms" 'perf test' entry, related to the /proc/kcore annotation feature. By Adrian Hunter. . Add sample parsing test, by Adrian Hunter. . Add test for reading object code, by Adrian Hunter. . Add attr record group sampling test, by Jiri Olsa. . Misc testing infrastructure improvements and other details, by Jiri Olsa. - 'perf list' enhancements: . Skip unsupported hardware events, by Namhyung Kim. . List pmu events, by Andi Kleen. - 'perf diff' enhancements: . Add support for more than two files comparison, by Jiri Olsa. - 'perf sched' enhancements: . Various improvements, including removing reliance on some scheduler tracepoints that provide the same information as the PERF_RECORD_{FORK,EXIT} events. By David Ahern. . Remove odd build stall by moving a large struct initialization from a local variable to a global one, by Namhyung Kim. - 'perf stat' enhancements: . Add --initial-delay option to skip measuring for a defined startup phase, by Andi Kleen. - Generic perf tooling infrastructure/plumbing changes: . Tidy up sample parsing validation, by Adrian Hunter. . Fix up jobserver setup in libtraceevent Makefile. by Arnaldo Carvalho de Melo. . Debug improvements, by Adrian Hunter. . Fix correlation of samples coming after PERF_RECORD_EXIT event, by David Ahern. . Improve robustness of the topology parsing code, by Stephane Eranian. . Add group leader sampling, that allows just one event in a group to sample while the other events have just its values read, by Jiri Olsa. . Add support for a new modifier "D", which requests that the event, or group of events, be pinned to the PMU. By Michael Ellerman. . Support callchain sorting based on addresses, by Andi Kleen . Prep work for multi perf data file storage, by Jiri Olsa. . libtraceevent cleanups, by Namhyung Kim. And lots and lots of other fixes and code reorganizations that did not make it into the list, see the shortlog, diffstat and the Git log for details!" [ Also merge a leftover from the 3.11 cycle ] * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Prevent race in unthrottling code * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (237 commits) perf trace: Tell arg formatters the arg index perf trace: Add beautifier for open's flags arg perf trace: Add beautifier for lseek's whence arg perf tools: Fix symbol offset computation for some dsos perf list: Skip unsupported events perf tests: Add 'keep tracking' test perf tools: Add support for PERF_COUNT_SW_DUMMY perf: Add a dummy software event to keep tracking perf trace: Add beautifier for futex 'operation' parm perf trace: Allow syscall arg formatters to mask args perf: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node() perf: Export struct perf_branch_entry to userspace perf: Add attr->mmap2 attribute to an event perf/x86: Add Silvermont (22nm Atom) support perf/x86: use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_X perf trace: Handle missing HUGEPAGE defines perf trace: Honor target pid / tid options when analyzing a file perf trace: Add option to analyze events in a file versus live perf evlist: Add tracepoint lookup by name perf tests: Add a sample parsing test ...	2013-09-04 08:25:35 -07:00
Linus Torvalds	40031da445	ACPI and power management updates for 3.12-rc1 1) ACPI-based PCI hotplug (ACPIPHP) subsystem rework and introduction of Intel Thunderbolt support on systems that use ACPI for signalling Thunderbolt hotplug events. This also should make ACPIPHP work in some cases in which it was known to have problems. From Rafael J Wysocki, Mika Westerberg and Kirill A Shutemov. 2) ACPI core code cleanups and dock station support cleanups from Jiang Liu and Rafael J Wysocki. 3) Fixes for locking problems related to ACPI device hotplug from Rafael J Wysocki. 4) ACPICA update to version 20130725 includig fixes, cleanups, support for more than 256 GPEs per GPE block and a change to make the ACPI PM Timer optional (we've seen systems without the PM Timer in the field already). One of the fixes, related to the DeRefOf operator, is necessary to prevent some Windows 8 oriented AML from causing problems to happen. From Bob Moore, Lv Zheng, and Jung-uk Kim. 5) Removal of the old and long deprecated /proc/acpi/event interface and related driver changes from Thomas Renninger. 6) ACPI and Xen changes to make the reduced hardware sleep work with the latter from Ben Guthro. 7) ACPI video driver cleanups and a blacklist of systems that should not tell the BIOS that they are compatible with Windows 8 (or ACPI backlight and possibly other things will not work on them). From Felipe Contreras. 8) Assorted ACPI fixes and cleanups from Aaron Lu, Hanjun Guo, Kuppuswamy Sathyanarayanan, Lan Tianyu, Sachin Kamat, Tang Chen, Toshi Kani, and Wei Yongjun. 9) cpufreq ondemand governor target frequency selection change to reduce oscillations between min and max frequencies (essentially, it causes the governor to choose target frequencies proportional to load) from Stratos Karafotis. 10) cpufreq fixes allowing sysfs attributes file permissions to be preserved over suspend/resume cycles Srivatsa S Bhat. 11) Removal of Device Tree parsing for CPU device nodes from multiple cpufreq drivers that required some changes related to of_get_cpu_node() to be made in a few architectures and in the driver core. From Sudeep KarkadaNagesha. 12) cpufreq core fixes and cleanups related to mutual exclusion and driver module references from Viresh Kumar, Lukasz Majewski and Rafael J Wysocki. 13) Assorted cpufreq fixes and cleanups from Amit Daniel Kachhap, Bartlomiej Zolnierkiewicz, Hanjun Guo, Jingoo Han, Joseph Lo, Julia Lawall, Li Zhong, Mark Brown, Sascha Hauer, Stephen Boyd, Stratos Karafotis, and Viresh Kumar. 14) Fixes to prevent race conditions in coupled cpuidle from happening from Colin Cross. 15) cpuidle core fixes and cleanups from Daniel Lezcano and Tuukka Tikkanen. 16) Assorted cpuidle fixes and cleanups from Daniel Lezcano, Geert Uytterhoeven, Jingoo Han, Julia Lawall, Linus Walleij, and Sahara. 17) System sleep tracing changes from Todd E Brandt and Shuah Khan. 18) PNP subsystem conversion to using struct dev_pm_ops for power management from Shuah Khan. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABCAAGBQJSJcKhAAoJEKhOf7ml8uNsplIQAJSOshxhkkemvFOuHZ+0YIbh R9aufjXeDkMDBi8YtU+tB7ERth1j+0LUSM0NTnP51U7e+7eSGobA9s5jSZQj2l7r HFtnSOegLuKAfqwgfSLK91xa1rTFdfW0Kych9G2nuHtBIt6P0Oc59Cb5M0oy6QXs nVtaDEuU//tmO71+EF5HnMJHabRTrpvtn/7NbDUpU7LZYpWJrHJFT9xt1rXNab7H YRCATPm3kXGRg58Doc3EZE4G3D7DLvq74jWMaI089X/m5Pg1G6upqArypOy6oxdP p2FEzYVrb2bi8fakXp7BBeO1gCJTAqIgAkbSSZHLpGhFaeEMmb9/DWPXdm2TjzMV c1EEucvsqZWoprXgy12i5Hk814xN8d8nBBLg/UYiRJ44nc/hevXfyE9ZYj6bkseJ +GNHmZIa1QYC05nnGli4+W4kHns8EZf/gmvIxnPuco1RN2yMWagrud5/G6Dr9M2B hzJV6qauLVzgZso4oe79zv9aVxe/dPHKANLD/sg23WBiJJbJF1ocBlnj2Xlbpqze pmMUWGiO/gUiS0fmpW/lAJauza5jFmSCjE4E8R0Gyn0j4YXjmMhdEanaU6J3VuCi yVgEzYEth4sowq4AflMMLKYN+WmozDnK7taZRGmT0t+EKRFKLT6EgnNrkQgs1vKl oawD9LM4fZ8E0yroOEme =CgqW -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: 1) ACPI-based PCI hotplug (ACPIPHP) subsystem rework and introduction of Intel Thunderbolt support on systems that use ACPI for signalling Thunderbolt hotplug events. This also should make ACPIPHP work in some cases in which it was known to have problems. From Rafael J Wysocki, Mika Westerberg and Kirill A Shutemov. 2) ACPI core code cleanups and dock station support cleanups from Jiang Liu and Rafael J Wysocki. 3) Fixes for locking problems related to ACPI device hotplug from Rafael J Wysocki. 4) ACPICA update to version 20130725 includig fixes, cleanups, support for more than 256 GPEs per GPE block and a change to make the ACPI PM Timer optional (we've seen systems without the PM Timer in the field already). One of the fixes, related to the DeRefOf operator, is necessary to prevent some Windows 8 oriented AML from causing problems to happen. From Bob Moore, Lv Zheng, and Jung-uk Kim. 5) Removal of the old and long deprecated /proc/acpi/event interface and related driver changes from Thomas Renninger. 6) ACPI and Xen changes to make the reduced hardware sleep work with the latter from Ben Guthro. 7) ACPI video driver cleanups and a blacklist of systems that should not tell the BIOS that they are compatible with Windows 8 (or ACPI backlight and possibly other things will not work on them). From Felipe Contreras. 8) Assorted ACPI fixes and cleanups from Aaron Lu, Hanjun Guo, Kuppuswamy Sathyanarayanan, Lan Tianyu, Sachin Kamat, Tang Chen, Toshi Kani, and Wei Yongjun. 9) cpufreq ondemand governor target frequency selection change to reduce oscillations between min and max frequencies (essentially, it causes the governor to choose target frequencies proportional to load) from Stratos Karafotis. 10) cpufreq fixes allowing sysfs attributes file permissions to be preserved over suspend/resume cycles Srivatsa S Bhat. 11) Removal of Device Tree parsing for CPU device nodes from multiple cpufreq drivers that required some changes related to of_get_cpu_node() to be made in a few architectures and in the driver core. From Sudeep KarkadaNagesha. 12) cpufreq core fixes and cleanups related to mutual exclusion and driver module references from Viresh Kumar, Lukasz Majewski and Rafael J Wysocki. 13) Assorted cpufreq fixes and cleanups from Amit Daniel Kachhap, Bartlomiej Zolnierkiewicz, Hanjun Guo, Jingoo Han, Joseph Lo, Julia Lawall, Li Zhong, Mark Brown, Sascha Hauer, Stephen Boyd, Stratos Karafotis, and Viresh Kumar. 14) Fixes to prevent race conditions in coupled cpuidle from happening from Colin Cross. 15) cpuidle core fixes and cleanups from Daniel Lezcano and Tuukka Tikkanen. 16) Assorted cpuidle fixes and cleanups from Daniel Lezcano, Geert Uytterhoeven, Jingoo Han, Julia Lawall, Linus Walleij, and Sahara. 17) System sleep tracing changes from Todd E Brandt and Shuah Khan. 18) PNP subsystem conversion to using struct dev_pm_ops for power management from Shuah Khan. * tag 'pm+acpi-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (217 commits) cpufreq: Don't use smp_processor_id() in preemptible context cpuidle: coupled: fix race condition between pokes and safe state cpuidle: coupled: abort idle if pokes are pending cpuidle: coupled: disable interrupts after entering safe state ACPI / hotplug: Remove containers synchronously driver core / ACPI: Avoid device hot remove locking issues cpufreq: governor: Fix typos in comments cpufreq: governors: Remove duplicate check of target freq in supported range cpufreq: Fix timer/workqueue corruption due to double queueing ACPI / EC: Add ASUSTEK L4R to quirk list in order to validate ECDT ACPI / thermal: Add check of "_TZD" availability and evaluating result cpufreq: imx6q: Fix clock enable balance ACPI: blacklist win8 OSI for buggy laptops cpufreq: tegra: fix the wrong clock name cpuidle: Change struct menu_device field types cpuidle: Add a comment warning about possible overflow cpuidle: Fix variable domains in get_typical_interval() cpuidle: Fix menu_device->intervals type cpuidle: CodingStyle: Break up multiple assignments on single line cpuidle: Check called function parameter in get_typical_interval() ...	2013-09-03 15:59:39 -07:00
Linus Torvalds	542a086ac7	Driver core patches for 3.12-rc1 Here's the big driver core pull request for 3.12-rc1. Lots of tiny changes here fixing up the way sysfs attributes are created, to try to make drivers simpler, and fix a whole class race conditions with creations of device attributes after the device was announced to userspace. All the various pieces are acked by the different subsystem maintainers. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.21 (GNU/Linux) iEYEABECAAYFAlIlIPcACgkQMUfUDdst+ynUMwCaAnITsxyDXYQ4DqEsz8EcOtMk 718AoLrgnUZs3B+70AT34DVktg4HSThk =USl9 -----END PGP SIGNATURE----- Merge tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core patches from Greg KH: "Here's the big driver core pull request for 3.12-rc1. Lots of tiny changes here fixing up the way sysfs attributes are created, to try to make drivers simpler, and fix a whole class race conditions with creations of device attributes after the device was announced to userspace. All the various pieces are acked by the different subsystem maintainers" * tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (119 commits) firmware loader: fix pending_fw_head list corruption drivers/base/memory.c: introduce help macro to_memory_block dynamic debug: line queries failing due to uninitialized local variable sysfs: sysfs_create_groups returns a value. debugfs: provide debugfs_create_x64() when disabled rbd: convert bus code to use bus_groups firmware: dcdbas: use binary attribute groups sysfs: add sysfs_create/remove_groups for when SYSFS is not enabled driver core: add #include <linux/sysfs.h> to core files. HID: convert bus code to use dev_groups Input: serio: convert bus code to use drv_groups Input: gameport: convert bus code to use drv_groups driver core: firmware: use __ATTR_RW() driver core: core: use DEVICE_ATTR_RO driver core: bus: use DRIVER_ATTR_WO() driver core: create write-only attribute macros for devices and drivers sysfs: create __ATTR_WO() driver-core: platform: convert bus code to use dev_groups workqueue: convert bus code to use dev_groups MEI: convert bus code to use dev_groups ...	2013-09-03 11:37:15 -07:00
Linus Torvalds	bc08b449ee	lockref: implement lockless reference count updates using cmpxchg() Instead of taking the spinlock, the lockless versions atomically check that the lock is not taken, and do the reference count update using a cmpxchg() loop. This is semantically identical to doing the reference count update protected by the lock, but avoids the "wait for lock" contention that you get when accesses to the reference count are contended. Note that a "lockref" is absolutely _not_ equivalent to an atomic_t. Even when the lockref reference counts are updated atomically with cmpxchg, the fact that they also verify the state of the spinlock means that the lockless updates can never happen while somebody else holds the spinlock. So while "lockref_put_or_lock()" looks a lot like just another name for "atomic_dec_and_lock()", and both optimize to lockless updates, they are fundamentally different: the decrement done by atomic_dec_and_lock() is truly independent of any lock (as long as it doesn't decrement to zero), so a locked region can still see the count change. The lockref structure, in contrast, really is a locked reference count. If you hold the spinlock, the reference count will be stable and you can modify the reference count without using atomics, because even the lockless updates will see and respect the state of the lock. In order to enable the cmpxchg lockless code, the architecture needs to do three things: (1) Make sure that the "arch_spinlock_t" and an "unsigned int" can fit in an aligned u64, and have a "cmpxchg()" implementation that works on such a u64 data type. (2) define a helper function to test for a spinlock being unlocked ("arch_spin_value_unlocked()") (3) select the "ARCH_USE_CMPXCHG_LOCKREF" config variable in its Kconfig file. This enables it for x86-64 (but not 32-bit, we'd need to make sure cmpxchg() turns into the proper cmpxchg8b in order to enable it for 32-bit mode). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-02 12:12:15 -07:00
H. Peter Anvin	7263dda41b	x86, smap: Handle csum_partial_copy_*_user() Add SMAP annotations to csum_partial_copy_to/from_user(). These functions legitimately access user space and thus need to set the AC flag. TODO: add explicit checks that the side with the kernel space pointer really points into kernel space. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-2aps0u00eer658fd5xyanan7@git.kernel.org Cc: <stable@vger.kernel.org> # v3.7+	2013-09-01 14:09:48 -07:00
H. Peter Anvin	f69fa9a91f	x86, doc: Update uaccess.h comment to reflect clang changes Update comment in uaccess.h to reflect the changes for clang support: gcc only cares about the base register (most architectures don't encode the size of the operation in the operands like x86 does, and so it is treated effectively like a register number), whereas clang tries to enforce the size -- but not for register pairs. Link: http://lkml.kernel.org/r/1377803585-5913-3-git-send-email-dl9pf@gmx.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Jan-Simon Möller <dl9pf@gmx.de>	2013-08-29 13:34:50 -07:00
Jan-Simon Möller	bdfc017eea	x86, asm: Fix a compilation issue with clang Clang does not support the "shortcut" we're taking here for gcc (see below). The patch uses the macro _ASM_DX to do the job. From arch/x86/include/asm/uaccess.h: /* * Careful: we have to cast the result to the type of the pointer * for sign reasons. * * The use of %edx as the register specifier is a bit of a * simplification, as gcc only cares about it as the starting point * and not size: for a 64-bit value it will use %ecx:%edx on 32 bits * (%ecx being the next register in gcc's x86 register sequence), and * %rdx on 64 bits. */ [ hpa: I consider this a compatibility bug in clang as this reflects a bit of a misunderstanding about how register strings are used by gcc, but the workaround is straightforward and there is no particular reason to not do it. ] Signed-off-by: Jan-Simon Möller <dl9pf@gmx.de> Link: http://lkml.kernel.org/r/1377803585-5913-3-git-send-email-dl9pf@gmx.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-29 13:26:33 -07:00
Jan-Simon Möller	3e9b2327b5	x86, asm: Extend definitions of _ASM_* with a raw format The __ASM_* macros (e.g. __ASM_DX) are used to return the proper register name (e.g. edx for 32bit / rdx for 64bit). We want to use this also in arch/x86/include/asm/uaccess.h / get_user() . For this to work, we need a raw form as both gcc and clang choke on the whitespace in a register asm() statement, and the __ASM_FORM macro surrounds the argument with blanks. A new macro, __ASM_FORM_RAW was added and we change __ASM_REG to use the new RAW form. Signed-off-by: Jan-Simon Möller <dl9pf@gmx.de> Link: http://lkml.kernel.org/r/1377803585-5913-2-git-send-email-dl9pf@gmx.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-29 13:26:32 -07:00
Ingo Molnar	aee2bce3cf	Merge branch 'linus' into perf/core Pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-08-29 12:02:08 +02:00
Marek Szyprowski	a254738039	drivers: dma-contiguous: clean source code and prepare for device tree This patch cleans the initialization of dma contiguous framework. The all-in-one dma_declare_contiguous() function is now separated into dma_contiguous_reserve_area() which only steals the the memory from memblock allocator and dma_contiguous_add_device() function, which assigns given device to the specified reserved memory area. This improves the flexibility in defining contiguous memory areas and assigning device to them, because now it is possible to assign more than one device to the given contiguous memory area. Such split in initialization procedure is also required for upcoming device tree support. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Tomasz Figa <t.figa@samsung.com>	2013-08-27 09:18:29 +02:00
Rafael J. Wysocki	7a330a5416	Merge branch 'pm-cpufreq' * pm-cpufreq: (60 commits) cpufreq: pmac32-cpufreq: remove device tree parsing for cpu nodes cpufreq: pmac64-cpufreq: remove device tree parsing for cpu nodes cpufreq: maple-cpufreq: remove device tree parsing for cpu nodes cpufreq: arm_big_little: remove device tree parsing for cpu nodes cpufreq: kirkwood-cpufreq: remove device tree parsing for cpu nodes cpufreq: spear-cpufreq: remove device tree parsing for cpu nodes cpufreq: highbank-cpufreq: remove device tree parsing for cpu nodes cpufreq: cpufreq-cpu0: remove device tree parsing for cpu nodes cpufreq: imx6q-cpufreq: remove device tree parsing for cpu nodes drivers/bus: arm-cci: avoid parsing DT for cpu device nodes ARM: mvebu: remove device tree parsing for cpu nodes ARM: topology: remove hwid/MPIDR dependency from cpu_capacity of/device: add helper to get cpu device node from logical cpu index driver/core: cpu: initialize of_node in cpu's device struture ARM: DT/kernel: define ARM specific arch_match_cpu_phys_id of: move of_get_cpu_node implementation to DT core library powerpc: refactor of_get_cpu_node to support other architectures openrisc: remove undefined of_get_cpu_node declaration microblaze: remove undefined of_get_cpu_node declaration cpufreq: fix bad unlock balance on !CONFIG_SMP ...	2013-08-27 01:44:40 +02:00
Srivatsa Vaddagiri	6aef266c6e	kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks kvm_hc_kick_cpu allows the calling vcpu to kick another vcpu out of halt state. the presence of these hypercalls is indicated to guest via kvm_feature_pv_unhalt. Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration During migration, any vcpu that got kicked but did not become runnable (still in halted state) should be runnable after migration. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com> [Raghu: Apic related changes, folding pvunhalted into vcpu_runnable Added flags for future use (suggested by Gleb)] [ Raghu: fold pv_unhalt flag as suggested by Eric Northup] Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Gleb Natapov <gleb@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-08-26 12:47:09 +03:00
Raghavendra K T	4b0a867085	kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi this is needed by both guest and host. Originally-from: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Gleb Natapov <gleb@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-08-26 12:46:01 +03:00
Rafael J. Wysocki	4eb5178c9c	Merge back earlier 'pm-cpufreq' material.	2013-08-23 00:55:13 +02:00
Kevin Hilman	bfa664f21b	ARM: tegra: core SoC enhancements for 3.12 This branch includes a number of enhancements to core SoC support for Tegra devices. The major new features are: * Adds a new CPU-power-gated cpuidle state for Tegra114. * Adds initial system suspend support for Tegra114, initially supporting just CPU-power-gating during suspend. * Adds "LP1" suspend mode support for all of Tegra20/30/114. This mode both gates CPU power, and places the DRAM into self-refresh mode. * A new DT-driven PCIe driver to Tegra20/30. The driver is also moved from arch/arm/mach-tegra/ to drivers/pci/host/. The PCIe driver work depends on the following tag from Thomas Petazzoni: git://git.infradead.org/linux-mvebu.git mis-3.12.2 ... which is merged into the middle of this pull request. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJSDlwwAAoJEMzrak5tbycxR68QAJZ/Izc9Izj0JH8hmCEvMNfi ub1DQfWAy3oXk0ttkk+BMvuyD8JTvBr8LSK8GqjZs//rFGlW81A4NHTvCwoKZjKe hgrRgI2B1wj3Um1sp8le9D0klKrTcfmpXrOxH8ALgz0BIpMge8AGZHkV0SrfQa1z bKiISFVAw12WJCVrQ2nbzpZGU51lbyJ/+RghttM1a8LuS2P03CZgt2kqiytk3UVK uiGEy3sCkjXLFO3EsUvM6ha623S6BumCAYjNfgDowTVKaoEe1r2TD4bFeU6lGcXJ mlVTv0Kywazf4Q2gKzkbDz8UQMArW4hok2iILHzz+sf/Rn0hie5XVqhFlbBlcae8 vyWsHmqvmE9BJAK2G2RLs9cJCTzEpEyAjUWfE3sIIa3ztSguT5+PHndDLR/d76aS j8L3FYReICZ1NuNw1JSQPFs9g2EWJbNRiy+8o9O2elsJMpLDBj/FcV6TVpudbBTI z7hvN+XSVYUaCVD4e8ma9YoC3VGseiAZvd+Y8hPd2MFBECVPNpy2bOacieU6Bgxh zjSBXZ/URxN3rTkv9+F3BLWAOfVmJYN0rKV9YfM/rqpWjc9iQx30m1fRZDnXWhvd ps8eFIYsKqc6v9AAugl/RexFy4Laav9eREjb0k2LA8ClLhK/qLLuiisVmKWS/grh lX9tzPEG2nZcjxSYaEjz =ve9i -----END PGP SIGNATURE----- Merge tag 'tegra-for-3.12-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/swarren/linux-tegra into next/soc From: Stephen Warren: ARM: tegra: core SoC enhancements for 3.12 This branch includes a number of enhancements to core SoC support for Tegra devices. The major new features are: * Adds a new CPU-power-gated cpuidle state for Tegra114. * Adds initial system suspend support for Tegra114, initially supporting just CPU-power-gating during suspend. * Adds "LP1" suspend mode support for all of Tegra20/30/114. This mode both gates CPU power, and places the DRAM into self-refresh mode. * A new DT-driven PCIe driver to Tegra20/30. The driver is also moved from arch/arm/mach-tegra/ to drivers/pci/host/. The PCIe driver work depends on the following tag from Thomas Petazzoni: git://git.infradead.org/linux-mvebu.git mis-3.12.2 ... which is merged into the middle of this pull request. * tag 'tegra-for-3.12-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/swarren/linux-tegra: (33 commits) ARM: tegra: disable LP2 cpuidle state if PCIe is enabled MAINTAINERS: Add myself as Tegra PCIe maintainer PCI: tegra: set up PADS_REFCLK_CFG1 PCI: tegra: Add Tegra 30 PCIe support PCI: tegra: Move PCIe driver to drivers/pci/host PCI: msi: add default MSI operations for !HAVE_GENERIC_HARDIRQS platforms ARM: tegra: add LP1 suspend support for Tegra114 ARM: tegra: add LP1 suspend support for Tegra20 ARM: tegra: add LP1 suspend support for Tegra30 ARM: tegra: add common LP1 suspend support clk: tegra114: add LP1 suspend/resume support ARM: tegra: config the polarity of the request of sys clock ARM: tegra: add common resume handling code for LP1 resuming ARM: pci: add ->add_bus() and ->remove_bus() hooks to hw_pci of: pci: add registry of MSI chips PCI: Introduce new MSI chip infrastructure PCI: remove ARCH_SUPPORTS_MSI kconfig option PCI: use weak functions for MSI arch-specific functions ARM: tegra: unify Tegra's Kconfig a bit more ARM: tegra: remove the limitation that Tegra114 can't support suspend ... Signed-off-by: Kevin Hilman <khilman@linaro.org>	2013-08-21 10:17:18 -07:00
John Haxby	edb6f29464	crypto: xor - Check for osxsave as well as avx in crypto/xor This affects xen pv guests with sufficiently old versions of xen and sufficiently new hardware. On such a system, a guest with a btrfs root won't even boot. Signed-off-by: John Haxby <john.haxby@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-08-21 21:08:35 +10:00
Yoshihiro YUNOMAE	17405453f4	x86/ioapic/kcrash: Prevent crash_kexec() from deadlocking on ioapic_lock Prevent crash_kexec() from deadlocking on ioapic_lock. When crash_kexec() is executed on a CPU, the CPU will take ioapic_lock in disable_IO_APIC(). So if the cpu gets an NMI while locking ioapic_lock, a deadlock will happen. In this patch, ioapic_lock is zapped/initialized before disable_IO_APIC(). You can reproduce this deadlock the following way: 1. Add mdelay(1000) after raw_spin_lock_irqsave() in native_ioapic_set_affinity()@arch/x86/kernel/apic/io_apic.c Although the deadlock can occur without this modification, it will increase the potential of the deadlock problem. 2. Build and install the kernel 3. Set up the OS which will run panic() and kexec when NMI is injected # echo "kernel.unknown_nmi_panic=1" >> /etc/sysctl.conf # vim /etc/default/grub add "nmi_watchdog=0 crashkernel=256M" in GRUB_CMDLINE_LINUX line # grub2-mkconfig 4. Reboot the OS 5. Run following command for each vcpu on the guest # while true; do echo <CPU num> > /proc/irq/<IO-APIC-edge or IO-APIC-fasteoi>/smp_affinitity; done; By running this command, cpus will get ioapic_lock for setting affinity. 6. Inject NMI (push a dump button or execute 'virsh inject-nmi <domain>' if you use VM). After injecting NMI, panic() is called in an nmi-handler context. Then, kexec will normally run in panic(), but the operation will be stopped by deadlock on ioapic_lock in crash_kexec()->machine_crash_shutdown()-> native_machine_crash_shutdown()->disable_IO_APIC()->clear_IO_APIC()-> clear_IO_APIC_pin()->ioapic_read_entry(). Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/20130820070107.28245.83806.stgit@yunodevel Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-08-20 09:26:33 +02:00
Linus Torvalds	7067552dfb	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two AMD microcode loader fixes and an OLPC firmware support fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode, AMD: Fix early microcode loading x86, microcode, AMD: Make cpu_has_amd_erratum() use the correct struct cpuinfo_x86 x86: Don't clear olpc_ofw_header when sentinel is detected	2013-08-19 09:18:29 -07:00
Linus Torvalds	f1d6e17f54	Merge branch 'akpm' (patches from Andrew Morton) Merge a bunch of fixes from Andrew Morton. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: fs/proc/task_mmu.c: fix buffer overflow in add_page_map() arch: : Kconfig: add "kernel/Kconfig.freezer" to "arch//Kconfig" ocfs2: fix null pointer dereference in ocfs2_dir_foreach_blk_id() x86 get_unmapped_area(): use proper mmap base for bottom-up direction ocfs2: fix NULL pointer dereference in ocfs2_duplicate_clusters_by_page ocfs2: Revert `40bd62e` to avoid regression in extended allocation drivers/rtc/rtc-stmp3xxx.c: provide timeout for potentially endless loop polling a HW bit hugetlb: fix lockdep splat caused by pmd sharing aoe: adjust ref of head for compound page tails microblaze: fix clone syscall mm: save soft-dirty bits on file pages mm: save soft-dirty bits on swapped pages memcg: don't initialize kmem-cache destroying work for root caches	2013-08-14 10:04:43 -07:00
Srivatsa Vaddagiri	92b75202e5	kvm: Paravirtual ticketlocks support for linux guests running on KVM hypervisor During smp_boot_cpus paravirtualied KVM guest detects if the hypervisor has required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so, support for pv-ticketlocks is registered via pv_lock_ops. Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20130810193849.GA25260@linux.vnet.ibm.com Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com> [Raghu: check_zero race fix, enum for kvm_contention_stat, jumplabel related changes, addition of safe_halt for irq enabled case, bailout spinning in nmi case(Gleb)] Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Gleb Natapov <gleb@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-14 13:12:35 +02:00
Ingo Molnar	ccb1f55e71	Those are basically two fixes which correct the AMD early ucode loader from accessing cpu_data too early, i.e. before smp_store_cpu_info() has copied the boot_cpu_data ontop and overwritten an already empty structure (which we shouldn't access that early in the first place anyway). The second patch is kinda largish for that late in the game but it shouldn't be problematic because we're simply switching from using cpu_data to use the CPU family number directly and thus again, not use uninitialized cpu_data structure. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABAgAGBQJSCT0UAAoJEBLB8Bhh3lVKWNsQAJbu1y79mc2vHR15NqMGXVoP F6sUNC3iUnNjxlUxWn3PPynsJS9zFTBEo1S6ILnsfuMWliDoH+4psHLMmG5F20TV XJCSS9s3fdsaJZiNa7JXggdI80xU3y8uB/z3nacRDKgcdw3tX+ITP4SdPgNF5F3k B/XGAFxyXfl9M9SRFMIO4GS9UyFMsDQP0VWMRhcosY1/Fj5bc9Uds1ngP9E4XMe9 jB1pIzbvtDvtolYSdShwFE9M0os90TYPHVSgriYYXHLZPryZC8mWkc7GwSTl4bG9 EwBGXv79SJWics9vWM2n9EQbFt46fUAyhpjyOo/fqVp4L5XjKuV7UyeT7X+bsY0q b3z1jWzBxGZ5ltE2BmZ8tVnfKgwYJlKgAf8FbSrylhtBInP80Wau1aYTzw50h9D2 lmjO/rSOhm2FszdeFHG/IeVbSZJbSFrUdwm51nx2xF1qG0MXldy9ehJ4pO3Ui2XA d0z2y83ZxOYV723fTCa1/5C5xAes7LjvxftM32G9QTu7R5XnvVrfehVfPh9K1DJA nIEH6pbM358/PT+/q7BCPTnH7KVi+mE633YERDOfAgz/NFnVKfKzLBM0+AyFNHjk a4xFrOqVbTctW1Chv8Nh1457A2+gcT8yeju1XjbLE8IMqKOGyMt0gnRZlgMYj8BH WYKWi2q0LaDwsOcaQ9bm =cxbY -----END PGP SIGNATURE----- Merge tag 'amd_ucode_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent Pull AMD microcode fixes from Borislav Petkov: " Those are basically two fixes which correct the AMD early ucode loader from accessing cpu_data too early, i.e. before smp_store_cpu_info() has copied the boot_cpu_data ontop and overwritten an already empty structure (which we shouldn't access that early in the first place anyway). The second patch is kinda largish for that late in the game but it shouldn't be problematic because we're simply switching from using cpu_data to use the CPU family number directly and thus again, not use uninitialized cpu_data structure. " Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-08-14 12:16:28 +02:00
Linn Crosetto	30e46b574a	x86: avoid remapping data in parse_setup_data() Type SETUP_PCI, added by setup_efi_pci(), may advertise a ROM size larger than early_memremap() is able to handle, which is currently limited to 256kB. If this occurs it leads to a NULL dereference in parse_setup_data(). To avoid this, remap the setup_data header and allow parsing functions for individual types to handle their own data remapping. Signed-off-by: Linn Crosetto <linn@hp.com> Link: http://lkml.kernel.org/r/1376430401-67445-1-git-send-email-linn@hp.com Acked-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-13 23:29:19 -07:00
Cyrill Gorcunov	41bb3476b3	mm: save soft-dirty bits on file pages Andy reported that if file page get reclaimed we lose the soft-dirty bit if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address get encoded into pte entry. Thus when #pf happens on such non-present pte we can restore it back. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-13 17:57:48 -07:00
Cyrill Gorcunov	179ef71cbc	mm: save soft-dirty bits on swapped pages Andy Lutomirski reported that if a page with _PAGE_SOFT_DIRTY bit set get swapped out, the bit is getting lost and no longer available when pte read back. To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is saved in pte entry for the page being swapped out. When such page is to be read back from a swap cache we check for bit presence and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY bit back. One of the problem was to find a place in pte entry where we can save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The _PAGE_PSE was chosen for that, it doesn't intersect with swap entry format stored in pte. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-13 17:57:47 -07:00
Oleg Nesterov	e0acd0a68e	sched: fix the theoretical signal_wake_up() vs schedule() race This is only theoretical, but after try_to_wake_up(p) was changed to check p->state under p->pi_lock the code like __set_current_state(TASK_INTERRUPTIBLE); schedule(); can miss a signal. This is the special case of wait-for-condition, it relies on try_to_wake_up/schedule interaction and thus it does not need mb() between __set_current_state() and if(signal_pending). However, this __set_current_state() can move into the critical section protected by rq->lock, now that try_to_wake_up() takes another lock we need to ensure that it can't be reordered with "if (signal_pending(current))" check inside that section. The patch is actually one-liner, it simply adds smp_wmb() before spin_lock_irq(rq->lock). This is what try_to_wake_up() already does by the same reason. We turn this wmb() into the new helper, smp_mb__before_spinlock(), for better documentation and to allow the architectures to change the default implementation. While at it, kill smp_mb__after_lock(), it has no callers. Perhaps we can also add smp_mb__before/after_spinunlock() for prepare_to_wait(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-13 08:19:26 -07:00
Jan Kiszka	cc2df20c7c	KVM: x86: Update symbolic exit codes Add decoding for INVEPT and reorder the list according to the reason numbers. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-08-13 16:58:42 +02:00
Ingo Molnar	6356bb0ad6	Bit 12 may or may not be set in MCi_STATUS.MCACOD when an uncorrected error is reported. Ignore it when checking error signatures. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJR/912AAoJEKurIx+X31iBz/oP/javBL5Q098ir8qj2GdlkSV0 sPZkJ0Qd4AmmcvYgiYfE3wVL0F8mL2R3gFLcppN7wBTNqAjFOEOJb2wofwByiv+T 0eOM9OWxPZTsoxdiCHd5pvoNuw6lkegUfmxbb9vr9Xs0JaJuhjQY/bbbUbi2ue2V WoghLZvcSyZBGAaFJ3tWNdPYroHp4OulbvVdgbb4+iLgaLz73zzpv1HQvBnw6CLW x+P5guI5FawZnS6FjPwfjIs1mKqU5fdBxGl4vHB55IPyexWOVY6i+zc9VG0EuzDE za+FW2v8ohJzzxNP3/u4W3ousJoXkZSrwlZvDvuEkozrM8izibmDr2JglYqnQ2sK 5WMXL1aXHyTISWpHYiH0/218hljUhxaC5+TfNVeiZYytFULSyZOES8Em3fENx0gN HohTZkyBH0jNd2OZytSqS69/dvPgDIFD7qGR6KB2CaBAU+c/qH9M6g8vfaXt5Y/6 2a0oKrZ+WXiXYgEq3PynnTHTiaHYDo2rjBN+yQDrauomopZv0qpEMEicS66G0lE3 7+zh3CQOiv6WL9pYQxYjeIiP46H7tb+BSpbsDYDQ23++nLj61by1YXrLBJ2MsGep 2xEDkoVE7jW5+vskcSIUfavmp7pNNnIpRsU2cb7bem44iTc2DkuHYXZtHstvQIDN qIwoXyi3JE2siMTs4icc =LbVP -----END PGP SIGNATURE----- Merge tag 'please-pull-mce-f-bit' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull MCE-uncorrected-error fix from Tony Luck: "Bit 12 may or may not be set in MCi_STATUS.MCACOD when an uncorrected error is reported. Ignore it when checking error signatures." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-08-12 19:51:43 +02:00
Torsten Kaiser	84516098b5	x86, microcode, AMD: Fix early microcode loading load_microcode_amd() (and the helper it is using) should not have an cpu parameter. The microcode loading does not depend on the CPU wrt the patches loaded since they will end up in a global list for all CPUs anyway. The change from cpu to x86family in load_microcode_amd() now allows to drop the code messing with cpu_data(cpu) from collect_cpu_info_amd_early(), which is wrong anyway because at that point the per-cpu cpu_info is not yet setup (These values would later be overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info()). Fold the rest of collect_cpu_info_amd_early() into load_ucode_amd_ap(), because its only used at one place and without the cpuinfo_x86 accesses it was not much left. Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com> [ Fengguang: build fix ] Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> [ Boris: adapt it to current tree. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2013-08-12 18:32:45 +02:00
Ingo Molnar	0237d7f355	Merge branch 'x86/mce' into x86/ras Pursue a single RAS/MCE topic branch on x86. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-08-12 17:54:05 +02:00
Thomas Petazzoni	4287d824f2	PCI: use weak functions for MSI arch-specific functions Until now, the MSI architecture-specific functions could be overloaded using a fairly complex set of #define and compile-time conditionals. In order to prepare for the introduction of the msi_chip infrastructure, it is desirable to switch all those functions to use the 'weak' mechanism. This commit converts all the architectures that were overidding those MSI functions to use the new strategy. Note that we keep two separate, non-weak, functions default_teardown_msi_irqs() and default_restore_msi_irqs() for the default behavior of the arch_teardown_msi_irqs() and arch_restore_msi_irqs(), as the default behavior is needed by x86 PCI code. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Daniel Price <daniel.price@gmail.com> Tested-by: Thierry Reding <thierry.reding@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: linux-s390@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: linux-ia64@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: David S. Miller <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Jason Cooper <jason@lakedaemon.net>	2013-08-12 15:26:39 +00:00
Daniel Drake	d55e37bb0f	x86: Don't clear olpc_ofw_header when sentinel is detected OpenFirmware wasn't quite following the protocol described in boot.txt and the kernel has detected this through use of the sentinel value in boot_params. OFW does zero out almost all of the stuff that it should do, but not the sentinel. This causes the kernel to clear olpc_ofw_header, which breaks x86 OLPC support. OpenFirmware has now been fixed. However, it would be nice if we could maintain Linux compatibility with old firmware versions. To do that, we just have to avoid zeroing out olpc_ofw_header. OFW does not write to any other parts of the header that are being zapped by the sentinel-detection code, and all users of olpc_ofw_header are somewhat protected through checking for the OLPC_OFW_SIG magic value before using it. So this should not cause any problems for anyone. Signed-off-by: Daniel Drake <dsd@laptop.org> Link: http://lkml.kernel.org/r/20130809221420.618E6FAB03@dev.laptop.org Acked-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v3.9+	2013-08-09 15:29:48 -07:00
Konrad Rzeszutek Wilk	6efa20e49b	xen: Support 64-bit PV guest receiving NMIs This is based on a patch that Zhenzhong Duan had sent - which was missing some of the remaining pieces. The kernel has the logic to handle Xen-type-exceptions using the paravirt interface in the assembler code (see PARAVIRT_ADJUST_EXCEPTION_FRAME - pv_irq_ops.adjust_exception_frame and and INTERRUPT_RETURN - pv_cpu_ops.iret). That means the nmi handler (and other exception handlers) use the hypervisor iret. The other changes that would be neccessary for this would be to translate the NMI_VECTOR to one of the entries on the ipi_vector and make xen_send_IPI_mask_allbutself use different events. Fortunately for us commit `1db01b4903` (xen: Clean up apic ipi interface) implemented this and we piggyback on the cleanup such that the apic IPI interface will pass the right vector value for NMI. With this patch we can trigger NMIs within a PV guest (only tested x86_64). For this to work with normal PV guests (not initial domain) we need the domain to be able to use the APIC ops - they are already implemented to use the Xen event channels. For that to be turned on in a PV domU we need to remove the masking of X86_FEATURE_APIC. Incidentally that means kgdb will also now work within a PV guest without using the 'nokgdbroundup' workaround. Note that the 32-bit version is different and this patch does not enable that. CC: Lisa Nguyen <lisa@xenapiadmin.com> CC: Ben Guthro <benjamin.guthro@citrix.com> CC: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v1: Fixed up per David Vrabel comments] Reviewed-by: Ben Guthro <benjamin.guthro@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com>	2013-08-09 10:55:47 -04:00
Raghavendra K T	3a3bb00d5c	kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi These are needed by both guest and host. Originally-from: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1376058122-8248-13-git-send-email-raghavendra.kt@linux.vnet.ibm.com Acked-by: Gleb Natapov <gleb@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-09 07:54:12 -07:00
Jeremy Fitzhardinge	96f853eaa8	x86, ticketlock: Add slowpath logic Maintain a flag in the LSB of the ticket lock tail which indicates whether anyone is in the lock slowpath and may need kicking when the current holder unlocks. The flags are set when the first locker enters the slowpath, and cleared when unlocking to an empty queue (ie, no contention). In the specific implementation of lock_spinning(), make sure to set the slowpath flags on the lock just before blocking. We must do this before the last-chance pickup test to prevent a deadlock with the unlocker: Unlocker Locker test for lock pickup -> fail unlock test slowpath -> false set slowpath flags block Whereas this works in any ordering: Unlocker Locker set slowpath flags test for lock pickup -> fail block unlock test slowpath -> true, kick If the unlocker finds that the lock has the slowpath flag set but it is actually uncontended (ie, head == tail, so nobody is waiting), then it clears the slowpath flag. The unlock code uses a locked add to update the head counter. This also acts as a full memory barrier so that its safe to subsequently read back the slowflag state, knowing that the updated lock is visible to the other CPUs. If it were an unlocked add, then the flag read may just be forwarded from the store buffer before it was visible to the other CPUs, which could result in a deadlock. Unfortunately this means we need to do a locked instruction when unlocking with PV ticketlocks. However, if PV ticketlocks are not enabled, then the old non-locked "add" is the only unlocking code. Note: this code relies on gcc making sure that unlikely() code is out of line of the fastpath, which only happens when OPTIMIZE_SIZE=n. If it doesn't the generated code isn't too bad, but its definitely suboptimal. Thanks to Srivatsa Vaddagiri for providing a bugfix to the original version of this change, which has been folded in. Thanks to Stephan Diestelhorst for commenting on some code which relied on an inaccurate reading of the x86 memory ordering rules. Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-11-git-send-email-raghavendra.kt@linux.vnet.ibm.com Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Stephan Diestelhorst <stephan.diestelhorst@amd.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-09 07:54:00 -07:00
Jeremy Fitzhardinge	4a1ed4ca68	x86, pvticketlock: When paravirtualizing ticket locks, increment by 2 Increment ticket head/tails by 2 rather than 1 to leave the LSB free to store a "is in slowpath state" bit. This halves the number of possible CPUs for a given ticket size, but this shouldn't matter in practice - kernels built for 32k+ CPU systems are probably specially built for the hardware rather than a generic distro kernel. Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-9-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Attilio Rao <attilio.rao@citrix.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-09 07:53:50 -07:00
Jeremy Fitzhardinge	354714dd26	x86, pvticketlock: Use callee-save for lock_spinning Although the lock_spinning calls in the spinlock code are on the uncommon path, their presence can cause the compiler to generate many more register save/restores in the function pre/postamble, which is in the fast path. To avoid this, convert it to using the pvops callee-save calling convention, which defers all the save/restores until the actual function is called, keeping the fastpath clean. Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-8-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Attilio Rao <attilio.rao@citrix.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-09 07:53:44 -07:00
Jeremy Fitzhardinge	b798df09f9	x86, ticketlock: Collapse a layer of functions Now that the paravirtualization layer doesn't exist at the spinlock level any more, we can collapse the __ticket_ functions into the arch_ functions. Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-4-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Attilio Rao <attilio.rao@citrix.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-09 07:53:14 -07:00
Jeremy Fitzhardinge	545ac13892	x86, spinlock: Replace pv spinlocks with pv ticketlocks Rather than outright replacing the entire spinlock implementation in order to paravirtualize it, keep the ticket lock implementation but add a couple of pvops hooks on the slow patch (long spin on lock, unlocking a contended lock). Ticket locks have a number of nice properties, but they also have some surprising behaviours in virtual environments. They enforce a strict FIFO ordering on cpus trying to take a lock; however, if the hypervisor scheduler does not schedule the cpus in the correct order, the system can waste a huge amount of time spinning until the next cpu can take the lock. (See Thomas Friebel's talk "Prevent Guests from Spinning Around" http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) To address this, we add two hooks: - __ticket_spin_lock which is called after the cpu has been spinning on the lock for a significant number of iterations but has failed to take the lock (presumably because the cpu holding the lock has been descheduled). The lock_spinning pvop is expected to block the cpu until it has been kicked by the current lock holder. - __ticket_spin_unlock, which on releasing a contended lock (there are more cpus with tail tickets), it looks to see if the next cpu is blocked and wakes it if so. When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub functions causes all the extra code to go away. Results: ======= setup: 32 core machine with 32 vcpu KVM guest (HT off) with 8GB RAM base = 3.11-rc patched = base + pvspinlock V12 +-----------------+----------------+--------+ dbench (Throughput in MB/sec. Higher is better) +-----------------+----------------+--------+ \| base (stdev %)\|patched(stdev%) \| %gain \| +-----------------+----------------+--------+ \| 15035.3 (0.3) \|15150.0 (0.6) \| 0.8 \| \| 1470.0 (2.2) \| 1713.7 (1.9) \| 16.6 \| \| 848.6 (4.3) \| 967.8 (4.3) \| 14.0 \| \| 652.9 (3.5) \| 685.3 (3.7) \| 5.0 \| +-----------------+----------------+--------+ pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases, and undercommits results are flat Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-2-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Attilio Rao <attilio.rao@citrix.com> [ Raghavendra: Changed SPIN_THRESHOLD, fixed redefinition of arch_spinlock_t] Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-09 07:53:05 -07:00
Kees Cook	a021506107	x86, relocs: Move ELF relocation handling to C Moves the relocation handling into C, after decompression. This requires that the decompressed size is passed to the decompression routine as well so that relocations can be found. Only kernels that need relocation support will use the code (currently just x86_32), but this is laying the ground work for 64-bit using it in support of KASLR. Based on work by Neill Clift and Michael Davidson. Signed-off-by: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/20130708161517.GA4832@www.outflux.net Acked-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-07 21:00:04 -07:00
Nadav Har'El	bfd0a56b90	nEPT: Nested INVEPT If we let L1 use EPT, we should probably also support the INVEPT instruction. In our current nested EPT implementation, when L1 changes its EPT table for L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in the course of this modification already calls INVEPT. But if last level of shadow page is unsync not all L1's changes to EPT12 are intercepted, which means roots need to be synced when L1 calls INVEPT. Global INVEPT should not be different since roots are synced by kvm_mmu_load() each time EPTP02 changes. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-08-07 15:57:42 +02:00
Yang Zhang	25d92081ae	nEPT: Add nEPT violation/misconfigration support Inject nEPT fault to L1 guest. This patch is original from Xinhao. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-08-07 15:57:40 +02:00
Steven Rostedt	c3c7f14a11	x86/jump-label: Use best default nops for inital jump label calls As specified by H. Peter Anvin, the best nops for x86 without knowing the running computer is: 32bit: 0x3e, 0x8d, 0x74, 0x26, 0x00 also known as GENERIC_NOP5_ATOMIC 64bit: 0x0f, 0x1f, 0x44, 0x00, 0x00 also known as P6_NOP5_ATOMIC Currently the default nop that is used by jump label is: 0xe9 0x00 0x00 0x00 0x00 Which is really a 5byte jump to the next position. It's better to use a real nop than a jmp. Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-08-06 21:16:33 -04:00
Andi Kleen	28596b6a87	x86, asmlinkage, vdso: Mark vdso variables __visible Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-17-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-06 14:21:08 -07:00
Andi Kleen	4a335c0695	x86, asmlinkage: Make 64bit checksum functions visible They are implemented in assembler. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-14-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-06 14:20:59 -07:00
Andi Kleen	9a55fdbe94	x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-13-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-06 14:20:56 -07:00
Andi Kleen	277d5b40b7	x86, asmlinkage: Make several variables used from assembler/linker script visible Plus one function, load_gs_index(). Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-10-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-06 14:20:13 -07:00
Andi Kleen	04bb591ca7	x86, asmlinkage: Make kprobes code visible and fix assembler code - Make all the external assembler template symbols __visible - Move the templates inline assembler code into a top level assembler statement, not inside a function. This avoids it being optimized away or cloned. Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-8-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-06 14:19:48 -07:00
Andi Kleen	ff49103fdb	x86, asmlinkage: Make various syscalls asmlinkage FWIW I suspect sys_rt_sigreturn/sys_sigreturn should use standard SYSCALL wrappers. But I didn't do that change in this patch. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-7-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-06 14:18:33 -07:00
Andi Kleen	35ea7903b8	x86, asmlinkage: Make 32bit/64bit __switch_to visible This function is called from inline assembler, so has to be visible. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-6-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-06 14:18:30 -07:00
Andi Kleen	a1ed4ddfb7	x86, asmlinkage: Make _*_start_kernel visible Obviously these functions have to be visible, otherwise the whole kernel could be optimized away. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-5-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-06 14:18:26 -07:00
Andi Kleen	1d9090e2fb	x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible These handlers are all referenced from assembler stubs, so need to be visible. The handlers without arguments become asmlinkage, the others __visible to not force regparms(0) on x86-32. I put it all into a single patch, please let me know if you want it it split up. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-4-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-06 14:18:23 -07:00
Andi Kleen	9e1a431de0	x86, asmlinkage: Change dotraplinkage into __visible on 32bit Mark 32bit dotraplinkage functions as __visible for LTO. 64bit already is using asmlinkage which includes it. v2: Clean up (M.Marek) Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-3-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-06 14:18:17 -07:00
Andi Kleen	1599e8fc84	x86: Fix sys_call_table type in asm/syscall.h Make the sys_call_table type defined in asm/syscall.h match the definition in syscall_64.c v2: include asm/syscall.h in syscall_64.c too. I left uml alone because it doesn't have an syscall.h on its own and including the native one leads to other errors. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-2-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Richard Weinberger <richard@nod.at>	2013-08-06 14:18:08 -07:00
Tony Luck	0ca06c0857	x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors The 0x1000 bit of the MCACOD field of machine check MCi_STATUS registers is only defined for corrected errors (where it means that hardware may be filtering errors see SDM section 15.9.2.1). For uncorrected errors it may, or may not be set - so we should mask it out when checking for the architecturaly defined recoverable error signatures (see SDM 15.9.3.1 and 15.9.3.2) Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-08-05 10:09:40 -07:00
Jason Wang	9df56f19a5	x86: Correctly detect hypervisor We try to handle the hypervisor compatibility mode by detecting hypervisor through a specific order. This is not robust, since hypervisors may implement each others features. This patch tries to handle this situation by always choosing the last one in the CPUID leaves. This is done by letting .detect() return a priority instead of true/false and just re-using the CPUID leaf where the signature were found as the priority (or 1 if it was found by DMI). Then we can just pick hypervisor who has the highest priority. Other sophisticated detection method could also be implemented on top. Suggested by H. Peter Anvin and Paolo Bonzini. Acked-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Doug Covelli <dcovelli@vmware.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: http://lkml.kernel.org/r/1374742475-2485-4-git-send-email-jasowang@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-05 06:35:33 -07:00
Jason Wang	1085ba7f55	x86, kvm: Switch to use hypervisor_cpuid_base() Switch to use hypervisor_cpuid_base() to detect KVM. Cc: Gleb Natapov <gleb@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: http://lkml.kernel.org/r/1374742475-2485-3-git-send-email-jasowang@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-05 06:34:33 -07:00
Jason Wang	448ac44d56	xen: Switch to use hypervisor_cpuid_base() Switch to use hypervisor_cpuid_base() to detect Xen. Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: http://lkml.kernel.org/r/1374742475-2485-2-git-send-email-jasowang@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-05 06:34:09 -07:00
Jason Wang	96e39ac0e9	x86: Introduce hypervisor_cpuid_base() This patch introduce hypervisor_cpuid_base() which loop test the hypervisor existence function until the signature match and check the number of leaves if required. This could be used by Xen/KVM guest to detect the existence of hypervisor. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: http://lkml.kernel.org/r/1374742475-2485-1-git-send-email-jasowang@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-05 06:33:54 -07:00
David Herrmann	2995e50627	x86: sysfb: move EFI quirks from efifb to sysfb The EFI FB quirks from efifb.c are useful for simple-framebuffer devices as well. Apply them by default so we can convert efifb.c to use efi-framebuffer platform devices. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Link: http://lkml.kernel.org/r/1375445127-15480-5-git-send-email-dh.herrmann@gmail.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-02 16:17:47 -07:00
David Herrmann	e3263ab389	x86: provide platform-devices for boot-framebuffers The current situation regarding boot-framebuffers (VGA, VESA/VBE, EFI) on x86 causes troubles when loading multiple fbdev drivers. The global "struct screen_info" does not provide any state-tracking about which drivers use the FBs. request_mem_region() theoretically works, but unfortunately vesafb/efifb ignore it due to quirks for broken boards. Avoid this by creating a platform framebuffer devices with a pointer to the "struct screen_info" as platform-data. Drivers can now create platform-drivers and the driver-core will refuse multiple drivers being active simultaneously. We keep the screen_info available for backwards-compatibility. Drivers can be converted in follow-up patches. Different devices are created for VGA/VESA/EFI FBs to allow multiple drivers to be loaded on distro kernels. We create: - "vesa-framebuffer" for VBE/VESA graphics FBs - "efi-framebuffer" for EFI FBs - "platform-framebuffer" for everything else This allows to load vesafb, efifb and others simultaneously and each picks up only the supported FB types. Apart from platform-framebuffer devices, this also introduces a compatibility option for "simple-framebuffer" drivers which recently got introduced for OF based systems. If CONFIG_X86_SYSFB is selected, we try to match the screen_info against a simple-framebuffer supported format. If we succeed, we create a "simple-framebuffer" device instead of a platform-framebuffer. This allows to reuse the simplefb.c driver across architectures and also to introduce a SimpleDRM driver. There is no need to have vesafb.c, efifb.c, simplefb.c and more just to have architecture specific quirks in their setup-routines. Instead, we now move the architecture specific quirks into x86-setup and provide a generic simple-framebuffer. For backwards-compatibility (if strange formats are used), we still allow vesafb/efifb to be loaded simultaneously and pick up all remaining devices. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Link: http://lkml.kernel.org/r/1375445127-15480-4-git-send-email-dh.herrmann@gmail.com Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-08-02 16:17:46 -07:00
Rik van Riel	8f898fbbe5	sched/x86: Optimize switch_mm() for multi-threaded workloads Dick Fowles, Don Zickus and Joe Mario have been working on improvements to perf, and noticed heavy cache line contention on the mm_cpumask, running linpack on a 60 core / 120 thread system. The cause turned out to be unnecessary atomic accesses to the mm_cpumask. When in lazy TLB mode, the CPU is only removed from the mm_cpumask if there is a TLB flush event. Most of the time, no such TLB flush happens, and the kernel skips the TLB reload. It can also skip the atomic memory set & test. Here is a summary of Joe's test results: * The __schedule function dropped from 24% of all program cycles down to 5.5%. * The cacheline contention/hotness for accesses to that bitmask went from being the 1st/2nd hottest - down to the 84th hottest (0.3% of all shared misses which is now quite cold) * The average load latency for the bit-test-n-set instruction in __schedule dropped from 10k-15k cycles down to an average of 600 cycles. * The linpack program results improved from 133 GFlops to 144 GFlops. Peak GFlops rose from 133 to 153. Reported-by: Don Zickus <dzickus@redhat.com> Reported-by: Joe Mario <jmario@redhat.com> Tested-by: Joe Mario <jmario@redhat.com> Signed-off-by: Rik van Riel <riel@redhat.com> Reviewed-by: Paul Turner <pjt@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20130731221421.616d3d20@annuminas.surriel.com [ Made the comments consistent around the modified code. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-08-01 09:10:26 +02:00
Hanjun Guo	76f411fb3a	x86 / cpu topology: remove the stale macro arch_provides_topology_pointers Macro arch_provides_topology_pointers is pointless now, remove it. Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-07-29 13:12:45 -07:00
Paolo Bonzini	ac0a48c39a	KVM: x86: rename EMULATE_DO_MMIO The next patch will reuse it for other userspace exits than MMIO, namely debug events. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-07-29 09:01:14 +02:00
Stratos Karafotis	61c63e5ed3	cpufreq: Remove unused APERF/MPERF support The target frequency calculation method in the ondemand governor has changed and it is now independent of the measured average frequency. Consequently, the APERF/MPERF support in cpufreq is not used any more, so drop it. [rjw: Changelog] Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-07-26 01:06:43 +02:00
Adrian Hunter	c73deb6aec	perf/x86: Add ability to calculate TSC from perf sample timestamps For modern CPUs, perf clock is directly related to TSC. TSC can be calculated from perf clock and vice versa using a simple calculation. Two of the three componenets of that calculation are already exported in struct perf_event_mmap_page. This patch exports the third. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: http://lkml.kernel.org/r/1372425741-1676-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-07-23 12:17:45 +02:00
Jiri Kosina	17f41571bb	kprobes/x86: Call out into INT3 handler directly instead of using notifier In `fd4363fff3` ("x86: Introduce int3 (breakpoint)-based instruction patching"), the mechanism that was introduced for notifying alternatives code from int3 exception handler that and exception occured was die_notifier. This is however problematic, as early code might be using jump labels even before the notifier registration has been performed, which will then lead to an oops due to unhandled exception. One of such occurences has been encountered by Fengguang: int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8 task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000 RIP: 0010:[<ffffffff811098cc>] [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225 RSP: 0000:ffff88000dd03f10 EFLAGS: 00000006 RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40 RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001 RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002 R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0 R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8 FS: 0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0 Stack: ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48 ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68 ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98 Call Trace: <IRQ> [<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79 [<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84 [<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0 [<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41 [<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80 <EOI> [<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1 [<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16 [<ffffffff81015f10>] default_idle+0x147/0x282 [<ffffffff81017026>] arch_cpu_idle+0x3d/0x5d [<ffffffff81127d6a>] cpu_idle_loop+0x46d/0x5db [<ffffffff81127f5c>] cpu_startup_entry+0x84/0x84 [<ffffffff8104f4f8>] start_secondary+0x3c8/0x3d5 [...] Fix this by directly calling poke_int3_handler() from the int3 exception handler (analogically to what ftrace has been doing already), instead of relying on notifier, registration of which might not have yet been finalized by the time of the first trap. Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1307231007490.14024@pobox.suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-07-23 10:12:57 +02:00
Andi Kleen	103af0a987	perf, kvm: Support the in_tx/in_tx_cp modifiers in KVM arch perfmon emulation v5 [KVM maintainers: The underlying support for this is in perf/core now. So please merge this patch into the KVM tree.] This is not arch perfmon, but older CPUs will just ignore it. This makes it possible to do at least some TSX measurements from a KVM guest v2: Various fixes to address review feedback v3: Ignore the bits when no CPUID. No #GP. Force raw events with TSX bits. v4: Use reserved bits for #GP v5: Remove obsolete argument Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-07-19 18:24:45 +02:00
Masami Hiramatsu	ea8596bb2d	kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions Since introducing the text_poke_bp() for all text_poke_smp() callers, text_poke_smp() are now unused. This patch basically reverts: `3d55cc8a05` ("x86: Add text_poke_smp for SMP cross modifying code") `7deb18dcf0` ("x86: Introduce text_poke_smp_batch() for batch-code modifying") and related commits. This patch also fixes a Kconfig dependency issue on STOP_MACHINE in the case of CONFIG_SMP && !CONFIG_MODULE_UNLOAD. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jiri Kosina <jkosina@suse.cz> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jason Baron <jbaron@akamai.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Borislav Petkov <bpetkov@suse.de> Link: http://lkml.kernel.org/r/20130718114753.26675.18714.stgit@mhiramat-M0-7522 Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-07-19 09:57:04 +02:00
Ingo Molnar	9bb15425c3	Merge branch 'x86/jumplabel' into perf/core Upcoming kprobes patches rely on the int3 code-patching machinery introduced by: `fd4363fff3` x86: Introduce int3 (breakpoint)-based instruction patching Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-07-19 09:55:00 +02:00
Marcelo Tosatti	e04c5d76b0	remove sched notifier for cross-cpu migrations Linux as a guest on KVM hypervisor, the only user of the pvclock vsyscall interface, does not require notification on task migration because: 1. cpu ID number maps 1:1 to per-CPU pvclock time info. 2. per-CPU pvclock time info is updated if the underlying CPU changes. 3. that version is increased whenever underlying CPU changes. Which is sufficient to guarantee nanoseconds counter is calculated properly. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-07-18 12:29:30 +02:00
Jiri Kosina	fd4363fff3	x86: Introduce int3 (breakpoint)-based instruction patching Introduce a method for run-time instruction patching on a live SMP kernel based on int3 breakpoint, completely avoiding the need for stop_machine(). The way this is achieved: - add a int3 trap to the address that will be patched - sync cores - update all but the first byte of the patched range - sync cores - replace the first byte (int3) by the first byte of replacing opcode - sync cores According to http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html synchronization after replacing "all but first" instructions should not be necessary (on Intel hardware), as the syncing after the subsequent patching of the first byte provides enough safety. But there's not only Intel HW out there, and we'd rather be on a safe side. If any CPU instruction execution would collide with the patching, it'd be trapped by the int3 breakpoint and redirected to the provided "handler" (which would typically mean just skipping over the patched region, acting as "nop" has been there, in case we are doing nop -> jump and jump -> nop transitions). Ftrace has been using this very technique since `08d636b` ("ftrace/x86: Have arch x86_64 use breakpoints instead of stop machine") for ages already, and jump labels are another obvious potential user of this. Based on activities of Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> a few years ago. Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1307121102440.29788@pobox.suse.cz Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-07-16 17:55:29 -07:00
H. Peter Anvin	9b710506a0	x86, bitops: Change bitops to be native operand size Change the bitops operation to be naturally "long", i.e. 63 bits on the 64-bit kernel. Additional bugs are likely to crop up in the future. We already have bugs which machines with > 16 TiB of memory in a single node, as can happen if memory is interleaved. The x86 bitop operations take a signed index, so using an unsigned type is not an option. Jim Kukunas measured the effect of this patch on kernel size: it adds 2779 bytes to the allyesconfig kernel. Some of that probably could be elided by replacing the inline functions with macros which select the 32-bit type if the index is a 32-bit value, something like: In that case we could also use "Jr" constraints for the 64-bit version. However, this would more than double the amount of code for a relatively small gain. Note that we can't use ilog2() for _BITOPS_LONG_SHIFT, as that causes a recursive header inclusion problem. The change to constant_test_bit() should both generate better code and give correct result for negative bit indicies. As previously written the compiler had to generate extra code to create the proper wrong result for negative values. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Jim Kukunas <james.t.kukunas@intel.com> Link: http://lkml.kernel.org/n/tip-z61ofiwe90xeyb461o72h8ya@git.kernel.org	2013-07-16 15:24:04 -07:00
Paul Gortmaker	148f9bb877	x86: delete __cpuinit usage from all x86 files The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit `5e427ec2d0` ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. Note that some harmless section mismatch warnings may result, since notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c) are flagged as __cpuinit -- so if we remove the __cpuinit from arch specific callers, we will also get section mismatch warnings. As an intermediate step, we intend to turn the linux/init.h cpuinit content into no-ops as early as possible, since that will get rid of these warnings. In any case, they are temporary and harmless. This removes all the arch/x86 uses of the __cpuinit macros from all C files. x86 only had the one __CPUINIT used in assembly files, and it wasn't paired off with a .previous or a __FINIT, so we can delete it directly w/o any corresponding additional change there. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2013-07-14 19:36:56 -04:00
Linus Torvalds	8cbd0eefca	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management updates from Zhang Rui: "There are not too many changes this time, except two new platform thermal drivers, ti-soc-thermal driver and x86_pkg_temp_thermal driver, and a couple of small fixes. Highlights: - move the ti-soc-thermal driver out of the staging tree to the thermal tree. - introduce the x86_pkg_temp_thermal driver. This driver registers CPU digital temperature package level sensor as a thermal zone. - small fixes/cleanups including removing redundant use of platform_set_drvdata() and of_match_ptr for all platform thermal drivers" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (34 commits) thermal: cpu_cooling: fix stub function thermal: ti-soc-thermal: use standard GPIO DT bindings thermal: MAINTAINERS: Add git tree path for SoC specific updates thermal: fix x86_pkg_temp_thermal.c build and Kconfig Thermal: Documentation for x86 package temperature thermal driver Thermal: CPU Package temperature thermal thermal: consider emul_temperature while computing trend thermal: ti-soc-thermal: add DT example for DRA752 chip thermal: ti-soc-thermal: add dra752 chip to device table thermal: ti-soc-thermal: add thermal data for DRA752 chips thermal: ti-soc-thermal: remove usage of IS_ERR_OR_NULL thermal: ti-soc-thermal: freeze FSM while computing trend thermal: ti-soc-thermal: remove external heat while extrapolating hotspot thermal: ti-soc-thermal: update DT reference for OMAP5430 x86, mcheck, therm_throt: Process package thresholds thermal: cpu_cooling: fix 'descend' check in get_property() Thermal: spear: Remove redundant use of of_match_ptr Thermal: kirkwood: Remove redundant use of of_match_ptr Thermal: dove: Remove redundant use of of_match_ptr Thermal: armada: Remove redundant use of of_match_ptr ...	2013-07-11 12:26:08 -07:00
Linus Torvalds	2e17c5a97e	Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux Pull drm updates from Dave Airlie: "Okay this is the big one, I was stalled on the fbdev pull req as I stupidly let fbdev guys merge a patch I required to fix a warning with some patches I had, they ended up merging the patch from the wrong place, but the warning should be fixed. In future I'll just take the patch myself! Outside drm: There are some snd changes for the HDMI audio interactions on haswell, they've been acked for inclusion via my tree. This relies on the wound/wait tree from Ingo which is already merged. Major changes: AMD finally released the dynamic power management code for all their GPUs from r600->present day, this is great, off by default for now but also a huge amount of code, in fact it is most of this pull request. Since it landed there has been a lot of community testing and Alex has sent a lot of fixes for any bugs found so far. I suspect radeon might now be the biggest kernel driver ever :-P p.s. radeon.dpm=1 to enable dynamic powermanagement for anyone. New drivers: Renesas r-car display unit. Other highlights: - core: GEM CMA prime support, use new w/w mutexs for TTM reservations, cursor hotspot, doc updates - dvo chips: chrontel 7010B support - i915: Haswell (fbc, ips, vecs, watermarks, audio powerwell), Valleyview (enabled by default, rc6), lots of pll reworking, 30bpp support (this time for sure) - nouveau: async buffer object deletion, context/register init updates, kernel vp2 engine support, GF117 support, GK110 accel support (with external nvidia ucode), context cleanups. - exynos: memory leak fixes, Add S3C64XX SoC series support, device tree updates, common clock framework support, - qxl: cursor hotspot support, multi-monitor support, suspend/resume support - mgag200: hw cursor support, g200 mode limiting - shmobile: prime support - tegra: fixes mostly I've been banging on this quite a lot due to the size of it, and it seems to okay on everything I've tested it on." * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (811 commits) drm/radeon/dpm: implement vblank_too_short callback for si drm/radeon/dpm: implement vblank_too_short callback for cayman drm/radeon/dpm: implement vblank_too_short callback for btc drm/radeon/dpm: implement vblank_too_short callback for evergreen drm/radeon/dpm: implement vblank_too_short callback for 7xx drm/radeon/dpm: add checks against vblank time drm/radeon/dpm: add helper to calculate vblank time drm/radeon: remove stray line in old pm code drm/radeon/dpm: fix display_gap programming on rv7xx drm/nvc0/gr: fix gpc firmware regression drm/nouveau: fix minor thinko causing bo moves to not be async on kepler drm/radeon/dpm: implement force performance level for TN drm/radeon/dpm: implement force performance level for ON/LN drm/radeon/dpm: implement force performance level for SI drm/radeon/dpm: implement force performance level for cayman drm/radeon/dpm: implement force performance levels for 7xx/eg/btc drm/radeon/dpm: add infrastructure to force performance levels drm/radeon: fix surface setup on r1xx drm/radeon: add support for 3d perf states on older asics drm/radeon: set default clocks for SI when DPM is disabled ...	2013-07-09 16:04:31 -07:00
Robin Holt	1b3a5d02ee	reboot: move arch/x86 reboot= handling to generic kernel Merge together the unicore32, arm, and x86 reboot= command line parameter handling. Signed-off-by: Robin Holt <holt@sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Russ Anderson <rja@sgi.com> Cc: Robin Holt <holt@sgi.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-07-09 10:33:29 -07:00
Naveen N. Rao	9ad95879cd	mce: acpi/apei: Add a boot option to disable ff mode for corrected errors Add a boot option to disable firmware first mode for corrected errors. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-07-08 11:54:28 -07:00
Naveen N. Rao	c3d1fb567a	mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC The Corrected Machine Check structure (CMC) in HEST has a flag which can be set by the firmware to indicate to the OS that it prefers to process the corrected error events first. In this scenario, the OS is expected to not monitor for corrected errors (through CMCI/polling). Instead, the firmware notifies the OS on corrected error events through GHES. Linux already has support for GHES. This patch adds support for parsing CMC structure and to disable CMCI/polling if the firmware first flag is set. Further, the list of machine check bank structures at the end of CMC is used to determine which MCA banks function in FF mode, so that we continue to monitor error events on the other banks. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-07-08 11:53:01 -07:00
Linus Torvalds	21884a83b2	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner: "The timer changes contain: - posix timer code consolidation and fixes for odd corner cases - sched_clock implementation moved from ARM to core code to avoid duplication by other architectures - alarm timer updates - clocksource and clockevents unregistration facilities - clocksource/events support for new hardware - precise nanoseconds RTC readout (Xen feature) - generic support for Xen suspend/resume oddities - the usual lot of fixes and cleanups all over the place The parts which touch other areas (ARM/XEN) have been coordinated with the relevant maintainers. Though this results in an handful of trivial to solve merge conflicts, which we preferred over nasty cross tree merge dependencies. The patches which have been committed in the last few days are bug fixes plus the posix timer lot. The latter was in akpms queue and next for quite some time; they just got forgotten and Frederic collected them last minute." * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits) hrtimer: Remove unused variable hrtimers: Move SMP function call to thread context clocksource: Reselect clocksource when watchdog validated high-res capability posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting posix_timers: fix racy timer delta caching on task exit posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule() selftests: add basic posix timers selftests posix_cpu_timers: consolidate expired timers check posix_cpu_timers: consolidate timer list cleanups posix_cpu_timer: consolidate expiry time type tick: Sanitize broadcast control logic tick: Prevent uncontrolled switch to oneshot mode tick: Make oneshot broadcast robust vs. CPU offlining x86: xen: Sync the CMOS RTC as well as the Xen wallclock x86: xen: Sync the wallclock when the system time is set timekeeping: Indicate that clock was set in the pvclock gtod notifier timekeeping: Pass flags instead of multiple bools to timekeeping_update() xen: Remove clock_was_set() call in the resume path hrtimers: Support resuming with two or more CPUs online (but stopped) timer: Fix jiffies wrap behavior of round_jiffies_common() ...	2013-07-06 14:09:38 -07:00
Linus Torvalds	b2c311075d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: - Do not idle omap device between crypto operations in one session. - Added sha224/sha384 shims for SSSE3. - More optimisations for camellia-aesni-avx2. - Removed defunct blowfish/twofish AVX2 implementations. - Added unaligned buffer self-tests. - Added PCLMULQDQ optimisation for CRCT10DIF. - Added support for Freescale's DCP co-processor - Misc fixes. * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (44 commits) crypto: testmgr - test hash implementations with unaligned buffers crypto: testmgr - test AEADs with unaligned buffers crypto: testmgr - test skciphers with unaligned buffers crypto: testmgr - check that entries in alg_test_descs are in correct order Revert "crypto: twofish - add AVX2/x86_64 assembler implementation of twofish cipher" Revert "crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher" crypto: camellia-aesni-avx2 - tune assembly code for more performance hwrng: bcm2835 - fix MODULE_LICENSE tag hwrng: nomadik - use clk_prepare_enable() crypto: picoxcell - replace strict_strtoul() with kstrtoul() crypto: dcp - Staticize local symbols crypto: dcp - Use NULL instead of 0 crypto: dcp - Use devm_* APIs crypto: dcp - Remove redundant platform_set_drvdata() hwrng: use platform_{get,set}_drvdata() crypto: omap-aes - Don't idle/start AES device between Encrypt operations crypto: crct10dif - Use PTR_RET crypto: ux500 - Cocci spatch "resource_size.spatch" crypto: sha256_ssse3 - add sha224 support crypto: sha512_ssse3 - add sha384 support ...	2013-07-05 12:12:33 -07:00
Thomas Gleixner	2b0f89317e	Merge branch 'timers/posix-cpu-timers-for-tglx' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core Frederic sayed: "Most of these patches have been hanging around for several month now, in -mmotm for a significant chunk. They already missed a few releases." Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-07-04 23:11:22 +02:00
Linus Torvalds	7f0ef0267e	Merge branch 'akpm' (updates from Andrew Morton) Merge first patch-bomb from Andrew Morton: - various misc bits - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been distracted. There has been quite a bit of activity. - About half the MM queue - Some backlight bits - Various lib/ updates - checkpatch updates - zillions more little rtc patches - ptrace - signals - exec - procfs - rapidio - nbd - aoe - pps - memstick - tools/testing/selftests updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits) tools/testing/selftests: don't assume the x bit is set on scripts selftests: add .gitignore for kcmp selftests: fix clean target in kcmp Makefile selftests: add .gitignore for vm selftests: add hugetlbfstest self-test: fix make clean selftests: exit 1 on failure kernel/resource.c: remove the unneeded assignment in function __find_resource aio: fix wrong comment in aio_complete() drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode drivers/memstick/host/r592.c: convert to module_pci_driver drivers/memstick/host/jmb38x_ms: convert to module_pci_driver pps-gpio: add device-tree binding and support drivers/pps/clients/pps-gpio.c: convert to module_platform_driver drivers/pps/clients/pps-gpio.c: convert to devm_* helpers drivers/parport/share.c: use kzalloc Documentation/accounting/getdelays.c: avoid strncpy in accounting tool aoe: update internal version number to v83 aoe: update copyright date aoe: perform I/O completions in parallel ...	2013-07-03 17:12:13 -07:00
Oleg Nesterov	37f0765552	x86: kill TIF_DEBUG Because it is not used. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-07-03 16:08:01 -07:00
Pavel Emelyanov	0f8975ec4d	mm: soft-dirty bits for user memory changes tracking The soft-dirty is a bit on a PTE which helps to track which pages a task writes to. In order to do this tracking one should 1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs) 2. Wait some time. 3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries) To do this tracking, the writable bit is cleared from PTEs when the soft-dirty bit is. Thus, after this, when the task tries to modify a page at some virtual address the #PF occurs and the kernel sets the soft-dirty bit on the respective PTE. Note, that although all the task's address space is marked as r/o after the soft-dirty bits clear, the #PF-s that occur after that are processed fast. This is so, since the pages are still mapped to physical memory, and thus all the kernel does is finds this fact out and puts back writable, dirty and soft-dirty bits on the PTE. Another thing to note, is that when mremap moves PTEs they are marked with soft-dirty as well, since from the user perspective mremap modifies the virtual memory at mremap's new address. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-07-03 16:07:26 -07:00
Linus Torvalds	f991fae5c6	Power management and ACPI updates for 3.11-rc1 - Hotplug changes allowing device hot-removal operations to fail gracefully (instead of crashing the kernel) if they cannot be carried out completely. From Rafael J Wysocki and Toshi Kani. - Freezer update from Colin Cross and Mandeep Singh Baines targeted at making the freezing of tasks a bit less heavy weight operation. - cpufreq resume fix from Srivatsa S Bhat for a regression introduced during the 3.10 cycle causing some cpufreq sysfs attributes to return wrong values to user space after resume. - New freqdomain_cpus sysfs attribute for the acpi-cpufreq driver to provide information previously available via related_cpus from Lan Tianyu. - cpufreq fixes and cleanups from Viresh Kumar, Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia, Arnd Bergmann, and Tang Yuantian. - Fix for an ACPICA regression causing suspend/resume issues to appear on some systems introduced during the 3.4 development cycle from Lv Zheng. - ACPICA fixes and cleanups from Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and Zhang Rui. - New cupidle driver for Xilinx Zynq processors from Michal Simek. - cpuidle fixes and cleanups from Daniel Lezcano. - Changes to make suspend/resume work correctly in Xen guests from Konrad Rzeszutek Wilk. - ACPI device power management fixes and cleanups from Fengguang Wu and Rafael J Wysocki. - ACPI documentation updates from Lv Zheng, Aaron Lu and Hanjun Guo. - Fix for the IA-64 issue that was the reason for reverting commit `9f29ab1` and updates of the ACPI scan code from Rafael J Wysocki. - Mechanism for adding CMOS RTC address space handlers from Lan Tianyu (to allow some EC-related breakage to be fixed on some systems). - Spec-compliant implementation of acpi_os_get_timer() from Mika Westerberg. - Modification of do_acpi_find_child() to execute _STA in order to to avoid situations in which a pointer to a disabled device object is returned instead of an enabled one with the same _ADR value. From Jeff Wu. - Intel BayTrail PCH (Platform Controller Hub) support for the ACPI Intel Low-Power Subsystems (LPSS) driver and modificaions of that driver to work around a couple of known BIOS issues from Mika Westerberg and Heikki Krogerus. - EC driver fix from Vasiliy Kulikov to make it use get_user() and put_user() instead of dereferencing user space pointers blindly. - Assorted ACPI code cleanups from Bjorn Helgaas, Nicholas Mazzuca and Toshi Kani. - Modification of the "runtime idle" helper routine to take the return values of the callbacks executed by it into account and to call rpm_suspend() if they return 0, which allows some code bloat reduction to be done, from Rafael J Wysocki and Alan Stern. - New trace points for PM QoS from Sahara <keun-o.park@windriver.com>. - PM QoS documentation update from Lan Tianyu. - Assorted core PM code cleanups and changes from Bernie Thompson, Bjorn Helgaas, Julius Werner, and Shuah Khan. - New devfreq driver for the Exynos5-bus device from Abhilash Kesavan. - Minor devfreq cleanups, fixes and MAINTAINERS update from MyungJoo Ham, Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and Wei Yongjun. - OMAP Adaptive Voltage Scaling (AVS) SmartReflex voltage control driver updates from Andrii Tseglytskyi and Nishanth Menon. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJR0ZNOAAoJEKhOf7ml8uNsDLYP/0EU4rmvw0TWTITfp6RS1KDE 9GwBn96ZR4Q5bJd9gBCTPSqhHOYMqxWEUp99sn/M2wehG1pk/jw5LO56+2IhM3UZ g1HDcJ7te2nVT/iXsKiAGTVhU9Rk0aYwoVSknwk27qpIBGxW9w/s5tLX8pY3Q3Zq wL/7aTPjyL+PFFFEaxgH7qLqsl3DhbtYW5AriUBTkXout/tJ4eO1b7MNBncLDh8X VQ/0DNCKE95VEJfkO4rk9RKUyVp9GDn0i+HXCD/FS4IA5oYzePdVdNDmXf7g+swe CGlTZq8pB+oBpDiHl4lxzbNrKQjRNbGnDUkoRcWqn0nAw56xK+vmYnWJhW99gQ/I fKnvxeLca5po1aiqmC4VSJxZIatFZqLrZAI4dzoCLWY+bGeTnCKmj0/F8ytFnZA2 8IuLLs7/dFOaHXV/pKmpg6FAlFa9CPxoqRFoyqb4M0GjEarADyalXUWsPtG+6xCp R/p0CISpwk+guKZR/qPhL7M654S7SHrPwd2DPF0KgGsvk+G2GhoB8EzvD8BVp98Z 9siCGCdgKQfJQVI6R0k9aFmn/4gRQIAgyPhkhv9tqULUUkiaXki+/t8kPfnb8O/d zep+CA57E2G8MYLkDJfpFeKS7GpPD6TIdgFdGmOUC0Y6sl9iTdiw4yTx8O2JM37z rHBZfYGkJBrbGRu+Q1gs =VBBq -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI updates from Rafael Wysocki: "This time the total number of ACPI commits is slightly greater than the number of cpufreq commits, but Viresh Kumar (who works on cpufreq) remains the most active patch submitter. To me, the most significant change is the addition of offline/online device operations to the driver core (with the Greg's blessing) and the related modifications of the ACPI core hotplug code. Next are the freezer updates from Colin Cross that should make the freezing of tasks a bit less heavy weight. We also have a couple of regression fixes, a number of fixes for issues that have not been identified as regressions, two new drivers and a bunch of cleanups all over. Highlights: - Hotplug changes to support graceful hot-removal failures. It sometimes is necessary to fail device hot-removal operations gracefully if they cannot be carried out completely. For example, if memory from a memory module being hot-removed has been allocated for the kernel's own use and cannot be moved elsewhere, it's desirable to fail the hot-removal operation in a graceful way rather than to crash the kernel, but currenty a success or a kernel crash are the only possible outcomes of an attempted memory hot-removal. Needless to say, that is not a very attractive alternative and it had to be addressed. However, in order to make it work for memory, I first had to make it work for CPUs and for this purpose I needed to modify the ACPI processor driver. It's been split into two parts, a resident one handling the low-level initialization/cleanup and a modular one playing the actual driver's role (but it binds to the CPU system device objects rather than to the ACPI device objects representing processors). That's been sort of like a live brain surgery on a patient who's riding a bike. So this is a little scary, but since we found and fixed a couple of regressions it caused to happen during the early linux-next testing (a month ago), nobody has complained. As a bonus we remove some duplicated ACPI hotplug code, because the ACPI-based CPU hotplug is now going to use the common ACPI hotplug code. - Lighter weight freezing of tasks. These changes from Colin Cross and Mandeep Singh Baines are targeted at making the freezing of tasks a bit less heavy weight operation. They reduce the number of tasks woken up every time during the freezing, by using the observation that the freezer simply doesn't need to wake up some of them and wait for them all to call refrigerator(). The time needed for the freezer to decide to report a failure is reduced too. Also reintroduced is the check causing a lockdep warining to trigger when try_to_freeze() is called with locks held (which is generally unsafe and shouldn't happen). - cpufreq updates First off, a commit from Srivatsa S Bhat fixes a resume regression introduced during the 3.10 cycle causing some cpufreq sysfs attributes to return wrong values to user space after resume. The fix is kind of fresh, but also it's pretty obvious once Srivatsa has identified the root cause. Second, we have a new freqdomain_cpus sysfs attribute for the acpi-cpufreq driver to provide information previously available via related_cpus. From Lan Tianyu. Finally, we fix a number of issues, mostly related to the CPUFREQ_POSTCHANGE notifier and cpufreq Kconfig options and clean up some code. The majority of changes from Viresh Kumar with bits from Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia, Arnd Bergmann, and Tang Yuantian. - ACPICA update A usual bunch of updates from the ACPICA upstream. During the 3.4 cycle we introduced support for ACPI 5 extended sleep registers, but they are only supposed to be used if the HW-reduced mode bit is set in the FADT flags and the code attempted to use them without checking that bit. That caused suspend/resume regressions to happen on some systems. Fix from Lv Zheng causes those registers to be used only if the HW-reduced mode bit is set. Apart from this some other ACPICA bugs are fixed and code cleanups are made by Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and Zhang Rui. - cpuidle updates New driver for Xilinx Zynq processors is added by Michal Simek. Multidriver support simplification, addition of some missing kerneldoc comments and Kconfig-related fixes come from Daniel Lezcano. - ACPI power management updates Changes to make suspend/resume work correctly in Xen guests from Konrad Rzeszutek Wilk, sparse warning fix from Fengguang Wu and cleanups and fixes of the ACPI device power state selection routine. - ACPI documentation updates Some previously missing pieces of ACPI documentation are added by Lv Zheng and Aaron Lu (hopefully, that will help people to uderstand how the ACPI subsystem works) and one outdated doc is updated by Hanjun Guo. - Assorted ACPI updates We finally nailed down the IA-64 issue that was the reason for reverting commit `9f29ab11dd` ("ACPI / scan: do not match drivers against objects having scan handlers"), so we can fix it and move the ACPI scan handler check added to the ACPI video driver back to the core. A mechanism for adding CMOS RTC address space handlers is introduced by Lan Tianyu to allow some EC-related breakage to be fixed on some systems. A spec-compliant implementation of acpi_os_get_timer() is added by Mika Westerberg. The evaluation of _STA is added to do_acpi_find_child() to avoid situations in which a pointer to a disabled device object is returned instead of an enabled one with the same _ADR value. From Jeff Wu. Intel BayTrail PCH (Platform Controller Hub) support is added to the ACPI driver for Intel Low-Power Subsystems (LPSS) and that driver is modified to work around a couple of known BIOS issues. Changes from Mika Westerberg and Heikki Krogerus. The EC driver is fixed by Vasiliy Kulikov to use get_user() and put_user() instead of dereferencing user space pointers blindly. Code cleanups are made by Bjorn Helgaas, Nicholas Mazzuca and Toshi Kani. - Assorted power management updates The "runtime idle" helper routine is changed to take the return values of the callbacks executed by it into account and to call rpm_suspend() if they return 0, which allows us to reduce the overall code bloat a bit (by dropping some code that's not necessary any more after that modification). The runtime PM documentation is updated by Alan Stern (to reflect the "runtime idle" behavior change). New trace points for PM QoS are added by Sahara (<keun-o.park@windriver.com>). PM QoS documentation is updated by Lan Tianyu. Code cleanups are made and minor issues are addressed by Bernie Thompson, Bjorn Helgaas, Julius Werner, and Shuah Khan. - devfreq updates New driver for the Exynos5-bus device from Abhilash Kesavan. Minor cleanups, fixes and MAINTAINERS update from MyungJoo Ham, Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and Wei Yongjun. - OMAP power management updates Adaptive Voltage Scaling (AVS) SmartReflex voltage control driver updates from Andrii Tseglytskyi and Nishanth Menon." * tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits) cpufreq: Fix cpufreq regression after suspend/resume ACPI / PM: Fix possible NULL pointer deref in acpi_pm_device_sleep_state() PM / Sleep: Warn about system time after resume with pm_trace cpufreq: don't leave stale policy pointer in cdbs->cur_policy acpi-cpufreq: Add new sysfs attribute freqdomain_cpus cpufreq: make sure frequency transitions are serialized ACPI: implement acpi_os_get_timer() according the spec ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan ACPI: Add CMOS RTC Operation Region handler support ACPI / processor: Drop unused variable from processor_perflib.c cpufreq: tegra: call CPUFREQ_POSTCHANGE notfier in error cases cpufreq: s3c64xx: call CPUFREQ_POSTCHANGE notfier in error cases cpufreq: omap: call CPUFREQ_POSTCHANGE notfier in error cases cpufreq: imx6q: call CPUFREQ_POSTCHANGE notfier in error cases cpufreq: exynos: call CPUFREQ_POSTCHANGE notfier in error cases cpufreq: dbx500: call CPUFREQ_POSTCHANGE notfier in error cases cpufreq: davinci: call CPUFREQ_POSTCHANGE notfier in error cases cpufreq: arm-big-little: call CPUFREQ_POSTCHANGE notfier in error cases cpufreq: powernow-k8: call CPUFREQ_POSTCHANGE notfier in error cases cpufreq: pcc: call CPUFREQ_POSTCHANGE notfier in error cases ...	2013-07-03 14:35:40 -07:00
Linus Torvalds	fe489bf450	KVM fixes for 3.11 On the x86 side, there are some optimizations and documentation updates. The big ARM/KVM change for 3.11, support for AArch64, will come through Catalin Marinas's tree. s390 and PPC have misc cleanups and bugfixes. There is a conflict due to "s390/pgtable: fix ipte notify bit" having entered 3.10 through Martin Schwidefsky's s390 tree. This pull request has additional changes on top, so this tree's version is the correct one. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQIcBAABAgAGBQJR0oU6AAoJEBvWZb6bTYbynnsP/RSUrrHrA8Wu1tqVfAKu+1y5 6OIihqZ9x11/YMaNofAfv86jqxFu0/j7CzMGphNdjzujqKI+Q1tGe7oiVCmKzoG+ UvSctWsz0lpllgBtnnrm5tcfmG6rrddhLtpA7m320+xCVx8KV5P4VfyHZEU+Ho8h ziPmb2mAQ65gBNX6nLHEJ3ITTgad6gt4NNbrKIYpyXuWZQJypzaRqT/vpc4md+Ed dCebMXsL1xgyb98EcnOdrWH1wV30MfucR7IpObOhXnnMKeeltqAQPvaOlKzZh4dK +QfxJfdRZVS0cepcxzx1Q2X3dgjoKQsHq1nlIyz3qu1vhtfaqBlixLZk0SguZ/R9 1S1YqucZiLRO57RD4q0Ak5oxwobu18ZoqJZ6nledNdWwDe8bz/W2wGAeVty19ky0 qstBdM9jnwXrc0qrVgZp3+s5dsx3NAm/KKZBoq4sXiDLd/yBzdEdWIVkIrU3X9wU 3X26wOmBxtsB7so/JR7ciTsQHelmLicnVeXohAEP9CjIJffB81xVXnXs0P0SYuiQ RzbSCwjPzET4JBOaHWT0Dhv0DTS/EaI97KzlN32US3Bn3WiLlS1oDCoPFoaLqd2K LxQMsXS8anAWxFvexfSuUpbJGPnKSidSQoQmJeMGBa9QhmZCht3IL16/Fb641ToN xBohzi49L9FDbpOnTYfz =1zpG -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "On the x86 side, there are some optimizations and documentation updates. The big ARM/KVM change for 3.11, support for AArch64, will come through Catalin Marinas's tree. s390 and PPC have misc cleanups and bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (87 commits) KVM: PPC: Ignore PIR writes KVM: PPC: Book3S PR: Invalidate SLB entries properly KVM: PPC: Book3S PR: Allow guest to use 1TB segments KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry KVM: PPC: Book3S PR: Fix proto-VSID calculations KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL KVM: Fix RTC interrupt coalescing tracking kvm: Add a tracepoint write_tsc_offset KVM: MMU: Inform users of mmio generation wraparound KVM: MMU: document fast invalidate all mmio sptes KVM: MMU: document fast invalidate all pages KVM: MMU: document fast page fault KVM: MMU: document mmio page fault KVM: MMU: document write_flooding_count KVM: MMU: document clear_spte_count KVM: MMU: drop kvm_mmu_zap_mmio_sptes KVM: MMU: init kvm generation close to mmio wrap-around value KVM: MMU: add tracepoint for check_mmio_spte KVM: MMU: fast invalidate all mmio sptes ...	2013-07-03 13:21:40 -07:00
Linus Torvalds	96a3d998fb	Merge branch 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 tracing updates from Ingo Molnar: "This tree adds IRQ vector tracepoints that are named after the handler and which output the vector #, based on a zero-overhead approach that relies on changing the IDT entries, by Seiji Aguchi. The new tracepoints look like this: # perf list \| grep -i irq_vector irq_vectors:local_timer_entry [Tracepoint event] irq_vectors:local_timer_exit [Tracepoint event] irq_vectors:reschedule_entry [Tracepoint event] irq_vectors:reschedule_exit [Tracepoint event] irq_vectors:spurious_apic_entry [Tracepoint event] irq_vectors:spurious_apic_exit [Tracepoint event] irq_vectors:error_apic_entry [Tracepoint event] irq_vectors:error_apic_exit [Tracepoint event] [...]" * 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tracing: Add config option checking to the definitions of mce handlers trace,x86: Do not call local_irq_save() in load_current_idt() trace,x86: Move creation of irq tracepoints from apic.c to irq.c x86, trace: Add irq vector tracepoints x86: Rename variables for debugging x86, trace: Introduce entering/exiting_irq() tracing: Add DEFINE_EVENT_FN() macro	2013-07-02 16:31:49 -07:00
Linus Torvalds	3045f94a20	Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS update from Ingo Molnar: "The changes in this tree are: - ACPI APEI (ACPI Platform Error Interface) improvements, by Chen Gong - misc MCE fixes/cleanups" * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Update MCE severity condition check mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem ACPI/APEI: Update einj documentation for param1/param2 ACPI/APEI: Add parameter check before error injection ACPI, APEI, EINJ: Fix error return code in einj_init() x86, mce: Fix "braodcast" typo	2013-07-02 16:30:46 -07:00
Linus Torvalds	1982269a5c	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "Misc improvements: - Fix /proc/mtrr reporting - Fix ioremap printout - Remove the unused pvclock fixmap entry on 32-bit - misc cleanups" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioremap: Correct function name output x86: Fix /proc/mtrr with base/size more than 44bits ix86: Don't waste fixmap entries x86/mm: Drop unneeded include <asm/pgtable, page_types.h> x86_64: Correct phys_addr in cleanup_highmap comment	2013-07-02 16:29:05 -07:00
Linus Torvalds	fdd78889aa	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading update from Ingo Molnar: "Two main changes that improve microcode loading on AMD CPUs: - Add support for all-in-one binary microcode files that concatenate the microcode images of multiple processor families, by Jacob Shin - Add early microcode loading (embedded in the initrd) support, also by Jacob Shin" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode, amd: Another early loading fixup x86, microcode, amd: Allow multiple families' bin files appended together x86, microcode, amd: Make find_ucode_in_initrd() __init x86, microcode, amd: Fix warnings and errors on with CONFIG_MICROCODE=m x86, microcode, amd: Early microcode patch loading support for AMD x86, microcode, amd: Refactor functions to prepare for early loading x86, microcode: Vendor abstract out save_microcode_in_initrd() x86, microcode, intel: Correct typo in printk	2013-07-02 16:28:10 -07:00
Linus Torvalds	d652df0b2f	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FPU changes from Ingo Molnar: "There are two bigger changes in this tree: - Add an [early-use-]safe static_cpu_has() variant and other robustness improvements, including the new X86_DEBUG_STATIC_CPU_HAS configurable debugging facility, motivated by recent obscure FPU code bugs, by Borislav Petkov - Reimplement FPU detection code in C and drop the old asm code, by Peter Anvin." * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, fpu: Use static_cpu_has_safe before alternatives x86: Add a static_cpu_has_safe variant x86: Sanity-check static_cpu_has usage x86, cpu: Add a synthetic, always true, cpu feature x86: Get rid of ->hard_math and all the FPU asm fu	2013-07-02 16:26:44 -07:00
Linus Torvalds	4d6f843a38	Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 EFI changes from Ingo Molnar: "Two fixes that should in principle increase robustness of our interaction with the EFI firmware, and a cleanup" * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, efi: retry ExitBootServices() on failure efi: Convert runtime services function ptrs UEFI: Don't pass boot services regions to SetVirtualAddressMap()	2013-07-02 16:25:50 -07:00
Linus Torvalds	35c23d5d79	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Ingo Molnar: "Two changes: - Extend 32-bit double fault debugging aid to 64-bit - Fix a build warning" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel/cacheinfo: Shut up last long-standing warning x86: Extend #DF debugging aid to 64-bit	2013-07-02 16:24:24 -07:00
Linus Torvalds	57935b262c	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc x86 cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, reloc: Use xorl instead of xorq in relocate_kernel_64.S x86, cleanups: Remove extra tab in __flush_tlb_one() x86/mce: Remove check for CONFIG_X86_MCE_P4THERMAL	2013-07-02 16:23:50 -07:00
Linus Torvalds	002e44bfb5	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull asm/x86 changes from Ingo Molnar: "Misc changes, with a bigger processor-flags cleanup/reorganization by Peter Anvin" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, asm, cleanup: Replace open-coded control register values with symbolic x86, processor-flags: Fix the datatypes and add bit number defines x86: Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED linux/const.h: Add _BITUL() and _BITULL() x86/vdso: Convert use of typedef ctl_table to struct ctl_table x86: __force_order doesn't need to be an actual variable	2013-07-02 16:21:45 -07:00
Linus Torvalds	e13053f506	Merge branch 'sched-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull voluntary preemption fixes from Ingo Molnar: "This tree contains a speedup which is achieved through better might_sleep()/might_fault() preemption point annotations for uaccess functions, by Michael S Tsirkin: 1. The only reason uaccess routines might sleep is if they fault. Make this explicit for all architectures. 2. A voluntary preemption point in uaccess functions means compiler can't inline them efficiently, this breaks assumptions that they are very fast and small that e.g. net code seems to make. Remove this preemption point so behaviour matches with what callers assume. 3. Accesses (e.g through socket ops) to kernel memory with KERNEL_DS like net/sunrpc does will never sleep. Remove an unconditinal might_sleep() in the might_fault() inline in kernel.h (used when PROVE_LOCKING is not set). 4. Accesses with pagefault_disable() return EFAULT but won't cause caller to sleep. Check for that and thus avoid might_sleep() when PROVE_LOCKING is set. These changes offer a nice speedup for CONFIG_PREEMPT_VOLUNTARY=y kernels, here's a network bandwidth measurement between a virtual machine and the host: before: incoming: 7122.77 Mb/s outgoing: 8480.37 Mb/s after: incoming: 8619.24 Mb/s [ +21.0% ] outgoing: 9455.42 Mb/s [ +11.5% ] I kept these changes in a separate tree, separate from scheduler changes, because it's a mixed MM and scheduler topic" * 'sched-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: mm, sched: Allow uaccess in atomic with pagefault_disable() mm, sched: Drop voluntary schedule from might_fault() x86: uaccess s/might_sleep/might_fault/ tile: uaccess s/might_sleep/might_fault/ powerpc: uaccess s/might_sleep/might_fault/ mn10300: uaccess s/might_sleep/might_fault/ microblaze: uaccess s/might_sleep/might_fault/ m32r: uaccess s/might_sleep/might_fault/ frv: uaccess s/might_sleep/might_fault/ arm64: uaccess s/might_sleep/might_fault/ asm-generic: uaccess s/might_sleep/might_fault/	2013-07-02 16:19:24 -07:00
Linus Torvalds	f0bb4c0ab0	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel improvements: - watchdog driver improvements by Li Zefan - Power7 CPI stack events related improvements by Sukadev Bhattiprolu - event multiplexing via hrtimers and other improvements by Stephane Eranian - kernel stack use optimization by Andrew Hunter - AMD IOMMU uncore PMU support by Suravee Suthikulpanit - NMI handling rate-limits by Dave Hansen - various hw_breakpoint fixes by Oleg Nesterov - hw_breakpoint overflow period sampling and related signal handling fixes by Jiri Olsa - Intel Haswell PMU support by Andi Kleen Tooling improvements: - Reset SIGTERM handler in workload child process, fix from David Ahern. - Makefile reorganization, prep work for Kconfig patches, from Jiri Olsa. - Add automated make test suite, from Jiri Olsa. - Add --percent-limit option to 'top' and 'report', from Namhyung Kim. - Sorting improvements, from Namhyung Kim. - Expand definition of sysfs format attribute, from Michael Ellerman. Tooling fixes: - 'perf tests' fixes from Jiri Olsa. - Make Power7 CPI stack events available in sysfs, from Sukadev Bhattiprolu. - Handle death by SIGTERM in 'perf record', fix from David Ahern. - Fix printing of perf_event_paranoid message, from David Ahern. - Handle realloc failures in 'perf kvm', from David Ahern. - Fix divide by 0 in variance, from David Ahern. - Save parent pid in thread struct, from David Ahern. - Handle JITed code in shared memory, from Andi Kleen. - Fixes for 'perf diff', from Jiri Olsa. - Remove some unused struct members, from Jiri Olsa. - Add missing liblk.a dependency for python/perf.so, fix from Jiri Olsa. - Respect CROSS_COMPILE in liblk.a, from Rabin Vincent. - No need to do locking when adding hists in perf report, only 'top' needs that, from Namhyung Kim. - Fix alignment of symbol column in in the hists browser (top, report) when -v is given, from NAmhyung Kim. - Fix 'perf top' -E option behavior, from Namhyung Kim. - Fix bug in isupper() and islower(), from Sukadev Bhattiprolu. - Fix compile errors in bp_signal 'perf test', from Sukadev Bhattiprolu. ... and more things" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (102 commits) perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable() perf/x86: Fix shared register mutual exclusion enforcement perf/x86/intel: Support full width counting x86: Add NMI duration tracepoints perf: Drop sample rate when sampling is too slow x86: Warn when NMI handlers take large amounts of time hw_breakpoint: Introduce "struct bp_cpuinfo" hw_breakpoint: Simplify *register_wide_hw_breakpoint() hw_breakpoint: Introduce cpumask_of_bp() hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths perf/x86/intel: Add mem-loads/stores support for Haswell perf/x86/intel: Support Haswell/v4 LBR format perf/x86/intel: Move NMI clearing to end of PMI handler perf/x86/intel: Add Haswell PEBS support perf/x86/intel: Add simple Haswell PMU support perf/x86/intel: Add Haswell PEBS record support perf/x86/intel: Fix sparse warning perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation perf/x86/amd: Add IOMMU Performance Counter resource management ...	2013-07-02 16:15:23 -07:00
Linus Torvalds	0c46d68d19	Merge branch 'core-mutexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull WW mutex support from Ingo Molnar: "This tree adds support for wound/wait style locks, which the graphics guys would like to make use of in the TTM graphics subsystem. Wound/wait mutexes are used when other multiple lock acquisitions of a similar type can be done in an arbitrary order. The deadlock handling used here is called wait/wound in the RDBMS literature: The older tasks waits until it can acquire the contended lock. The younger tasks needs to back off and drop all the locks it is currently holding, ie the younger task is wounded. See this LWN.net description of W/W mutexes: https://lwn.net/Articles/548909/ The comments there outline specific usecases for this facility (which have already been implemented for the DRM tree). Also see Documentation/ww-mutex-design.txt for more details" * 'core-mutexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking-selftests: Handle unexpected failures more strictly mutex: Add more w/w tests to test EDEADLK path handling mutex: Add more tests to lib/locking-selftest.c mutex: Add w/w tests to lib/locking-selftest.c mutex: Add w/w mutex slowpath debugging mutex: Add support for wound/wait style locks arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not	2013-07-02 16:09:13 -07:00
Linus Torvalds	63580e51bb	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS patches (part 1) from Al Viro: "The major change in this pile is ->readdir() replacement with ->iterate(), dealing with ->f_pos races in ->readdir() instances for good. There's a lot more, but I'd prefer to split the pull request into several stages and this is the first obvious cutoff point." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (67 commits) [readdir] constify ->actor [readdir] ->readdir() is gone [readdir] convert ecryptfs [readdir] convert coda [readdir] convert ocfs2 [readdir] convert fatfs [readdir] convert xfs [readdir] convert btrfs [readdir] convert hostfs [readdir] convert afs [readdir] convert ncpfs [readdir] convert hfsplus [readdir] convert hfs [readdir] convert befs [readdir] convert cifs [readdir] convert freevxfs [readdir] convert fuse [readdir] convert hpfs reiserfs: switch reiserfs_readdir_dentry to inode reiserfs: is_privroot_deh() needs only directory inode, actually ...	2013-07-02 09:28:37 -07:00
Al Viro	40d158e618	consolidate io_remap_pfn_range definitions Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-06-29 12:46:35 +04:00
Borislav Petkov	62122fd7da	x86, cpufeature: Use new CC_HAVE_ASM_GOTO ... for checking for "asm goto" compiler support. It is more explicit this way and we cover the cases where distros have backported that support even to gcc versions < 4.5. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1372437701-13351-1-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-06-28 15:26:48 -07:00
H. Peter Anvin	9f84b6267c	Merge remote-tracking branch 'origin/x86/fpu' into queue/x86/cpu Use the union of 3.10 x86/cpu and x86/fpu as baseline. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-06-28 15:26:17 -07:00
Wedson Almeida Filho	00e55a7907	x86: Use asm-goto to implement mutex fast path on x86-64 The new implementation allows the compiler to better optimize the code; the original implementation is still used when the kernel is compiled with older versions of gcc that don't support asm-goto. Compiling with gcc 4.7.3, the original mutex_lock() is 60 bytes with the fast path taking 16 instructions; the new mutex_lock() is 42 bytes, with the fast path taking 12 instructions. The original mutex_unlock() is 24 bytes with the fast path taking 7 instructions; the new mutex_unlock() is 25 bytes (because the compiler used a 2-byte ret) with the fast path taking 4 instructions. The two versions of the functions are included below for reference. Old: ffffffff817742a0 <mutex_lock>: ffffffff817742a0: 55 push %rbp ffffffff817742a1: 48 89 e5 mov %rsp,%rbp ffffffff817742a4: 48 83 ec 10 sub $0x10,%rsp ffffffff817742a8: 48 89 5d f0 mov %rbx,-0x10(%rbp) ffffffff817742ac: 48 89 fb mov %rdi,%rbx ffffffff817742af: 4c 89 65 f8 mov %r12,-0x8(%rbp) ffffffff817742b3: e8 28 15 00 00 callq ffffffff817757e0 <_cond_resched> ffffffff817742b8: 48 89 df mov %rbx,%rdi ffffffff817742bb: f0 ff 0f lock decl (%rdi) ffffffff817742be: 79 05 jns ffffffff817742c5 <mutex_lock+0x25> ffffffff817742c0: e8 cb 04 00 00 callq ffffffff81774790 <__mutex_lock_slowpath> ffffffff817742c5: 65 48 8b 04 25 c0 b7 mov %gs:0xb7c0,%rax ffffffff817742cc: 00 00 ffffffff817742ce: 4c 8b 65 f8 mov -0x8(%rbp),%r12 ffffffff817742d2: 48 89 43 18 mov %rax,0x18(%rbx) ffffffff817742d6: 48 8b 5d f0 mov -0x10(%rbp),%rbx ffffffff817742da: c9 leaveq ffffffff817742db: c3 retq ffffffff81774250 <mutex_unlock>: ffffffff81774250: 55 push %rbp ffffffff81774251: 48 c7 47 18 00 00 00 movq $0x0,0x18(%rdi) ffffffff81774258: 00 ffffffff81774259: 48 89 e5 mov %rsp,%rbp ffffffff8177425c: f0 ff 07 lock incl (%rdi) ffffffff8177425f: 7f 05 jg ffffffff81774266 <mutex_unlock+0x16> ffffffff81774261: e8 ea 04 00 00 callq ffffffff81774750 <__mutex_unlock_slowpath> ffffffff81774266: 5d pop %rbp ffffffff81774267: c3 retq New: ffffffff81774920 <mutex_lock>: ffffffff81774920: 55 push %rbp ffffffff81774921: 48 89 e5 mov %rsp,%rbp ffffffff81774924: 53 push %rbx ffffffff81774925: 48 89 fb mov %rdi,%rbx ffffffff81774928: e8 a3 0e 00 00 callq ffffffff817757d0 <_cond_resched> ffffffff8177492d: f0 ff 0b lock decl (%rbx) ffffffff81774930: 79 08 jns ffffffff8177493a <mutex_lock+0x1a> ffffffff81774932: 48 89 df mov %rbx,%rdi ffffffff81774935: e8 16 fe ff ff callq ffffffff81774750 <__mutex_lock_slowpath> ffffffff8177493a: 65 48 8b 04 25 c0 b7 mov %gs:0xb7c0,%rax ffffffff81774941: 00 00 ffffffff81774943: 48 89 43 18 mov %rax,0x18(%rbx) ffffffff81774947: 5b pop %rbx ffffffff81774948: 5d pop %rbp ffffffff81774949: c3 retq ffffffff81774730 <mutex_unlock>: ffffffff81774730: 48 c7 47 18 00 00 00 movq $0x0,0x18(%rdi) ffffffff81774737: 00 ffffffff81774738: f0 ff 07 lock incl (%rdi) ffffffff8177473b: 7f 0a jg ffffffff81774747 <mutex_unlock+0x17> ffffffff8177473d: 55 push %rbp ffffffff8177473e: 48 89 e5 mov %rsp,%rbp ffffffff81774741: e8 aa ff ff ff callq ffffffff817746f0 <__mutex_unlock_slowpath> ffffffff81774746: 5d pop %rbp ffffffff81774747: f3 c3 repz retq Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com> Link: http://lkml.kernel.org/r/1372420245-60021-1-git-send-email-wedsonaf@gmail.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-06-28 15:22:18 -07:00
Rafael J. Wysocki	3b4550e0e0	Merge branch 'acpi-pm' * acpi-pm: ACPI / PM: Rework and clean up acpi_dev_pm_get_state() ACPI / PM: Replace ACPI_STATE_D3 with ACPI_STATE_D3_COLD in device_pm.c ACPI / PM: Rename function acpi_device_power_state() and make it static ACPI / PM: acpi_processor_suspend() can be static xen / ACPI / sleep: Register an acpi_suspend_lowlevel callback. x86 / ACPI / sleep: Provide registration for acpi_suspend_lowlevel.	2013-06-28 12:58:30 +02:00
Xiao Guangrong	f6f8adeef5	KVM: MMU: document fast invalidate all pages Document it to Documentation/virtual/kvm/mmu.txt Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-06-27 14:20:47 +03:00
Xiao Guangrong	0cbf8e437b	KVM: MMU: document write_flooding_count Document write_flooding_count to Documentation/virtual/kvm/mmu.txt Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-06-27 14:20:43 +03:00
Xiao Guangrong	accaefe07d	KVM: MMU: document clear_spte_count Document it to Documentation/virtual/kvm/mmu.txt Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-06-27 14:20:42 +03:00
Xiao Guangrong	a8eca9dcc6	KVM: MMU: drop kvm_mmu_zap_mmio_sptes Drop kvm_mmu_zap_mmio_sptes and use kvm_mmu_invalidate_zap_all_pages instead to handle mmio generation number overflow Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-06-27 14:20:40 +03:00
Xiao Guangrong	f8f559422b	KVM: MMU: fast invalidate all mmio sptes This patch tries to introduce a very simple and scale way to invalidate all mmio sptes - it need not walk any shadow pages and hold mmu-lock KVM maintains a global mmio valid generation-number which is stored in kvm->memslots.generation and every mmio spte stores the current global generation-number into his available bits when it is created When KVM need zap all mmio sptes, it just simply increase the global generation-number. When guests do mmio access, KVM intercepts a MMIO #PF then it walks the shadow page table and get the mmio spte. If the generation-number on the spte does not equal the global generation-number, it will go to the normal #PF handler to update the mmio spte Since 19 bits are used to store generation-number on mmio spte, we zap all mmio sptes when the number is round Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-06-27 14:20:36 +03:00
Dave Airlie	dc0216445c	Merge branch 'core/mutexes' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into drm-next Merge in the tip core/mutexes branch for future GPU driver use. Ingo will send this branch to Linus prior to drm-next. * 'core/mutexes' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) locking-selftests: Handle unexpected failures more strictly mutex: Add more w/w tests to test EDEADLK path handling mutex: Add more tests to lib/locking-selftest.c mutex: Add w/w tests to lib/locking-selftest.c mutex: Add w/w mutex slowpath debugging mutex: Add support for wound/wait style locks arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not powerpc/pci: Fix boot panic on mpc83xx (regression) s390/ipl: Fix FCP WWPN and LUN format strings for read fs: fix new splice.c kernel-doc warning spi/pxa2xx: fix memory corruption due to wrong size used in devm_kzalloc() s390/mem_detect: fix memory hole handling s390/dma: support debug_dma_mapping_error s390/dma: fix mapping_error detection s390/irq: Only define synchronize_irq() on SMP Input: xpad - fix for "Mad Catz Street Fighter IV FightPad" controllers Input: wacom - add a new stylus (0x100802) for Intuos5 and Cintiqs spi/pxa2xx: use GFP_ATOMIC in sg table allocation fuse: hold i_mutex in fuse_file_fallocate() Input: add missing dependencies on CONFIG_HAS_IOMEM ...	2013-06-27 20:42:09 +10:00
Dave Airlie	4300a0f8bd	Linux 3.10-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQEbBAABAgAGBQJRxf9cAAoJEHm+PkMAQRiGMWkH911xM4gRmFgE7SqVW4F4AWBm ngcqMqNy9IdqKfibORUUDvVfEa5gjD5ai2quIKpfQiaukbpQJ696H90ijuAkajLn DQBrN243s0pzhhc/quWINnWxsFQ613JjdUMUMaD7e9A1aKjYzWrPGt/tSjrFXGCP tArTupVzc/iOmnEQDKiROI/Nokq44QJ36aTGPM7n08xMtpKmkCXM+9/UosBteB0O HVI33dmjwz7i55fI53XAWyuZCE+gSEnA4z8spJ9LfXso2W14V+roc+GuL6OyeeTI pCn/+4niVPb4B0ROZlpyVmdZjbPPcMMEK5o+BSJI68SH6LHZTQh2iVuqYfpSyA== =uUH5 -----END PGP SIGNATURE----- Merge tag 'v3.10-rc7' into drm-next Linux 3.10-rc7 The sdvo lvds fix in this -fixes pull commit `c3456fb3e4` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Mon Jun 10 09:47:58 2013 +0200 drm/i915: prefer VBT modes for SVDO-LVDS over EDID has a silent functional conflict with commit `990256aec2` Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Fri May 31 12:17:07 2013 +0000 drm: Add probed modes in probe order in drm-next. W simply need to add the vbt modes before edid modes, i.e. the other way round than now. Conflicts: drivers/gpu/drm/drm_prime.c drivers/gpu/drm/i915/intel_sdvo.c	2013-06-27 20:40:44 +10:00
Maarten Lankhorst	a41b56efa7	arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not This will allow me to call functions that have multiple arguments if fastpath fails. This is required to support ticket mutexes, because they need to be able to pass an extra argument to the fail function. Originally I duplicated the functions, by adding __mutex_fastpath_lock_retval_arg. This ended up being just a duplication of the existing function, so a way to test if fastpath was called ended up being better. This also cleaned up the reservation mutex patch some by being able to call an atomic_set instead of atomic_xchg, and making it easier to detect if the wrong unlock function was previously used. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: robclark@gmail.com Cc: rostedt@goodmis.org Cc: daniel@ffwll.ch Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20130620113105.4001.83929.stgit@patser Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-06-26 12:10:55 +02:00
Andi Kleen	069e0c3c40	perf/x86/intel: Support full width counting Recent Intel CPUs like Haswell and IvyBridge have a new alternative MSR range for perfctrs that allows writing the full counter width. Enable this range if the hardware reports it using a new capability bit. Currently the perf code queries CPUID to get the counter width, and sign extends the counter values as needed. The traditional PERFCTR MSRs always limit to 32bit, even though the counter internally is larger (usually 48 bits on recent CPUs) When the new capability is set use the alternative range which do not have these restrictions. This lowers the overhead of perf stat slightly because it has to do less interrupts to accumulate the counter value. On Haswell it also avoids some problems with TSX aborting when the end of the counter range is reached. ( See the patch "perf/x86/intel: Avoid checkpointed counters causing excessive TSX aborts" for more details. ) Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1372173153-20215-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-06-26 11:59:25 +02:00
Ingo Molnar	ca02c21674	Better comments so we understand our existing machine check bank bitmaps - prelude to adding another bitmap soon. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRygQkAAoJEKurIx+X31iB5RsP/R6w/X4wbKlXD8K8e2qhSXU2 oYVtaMN46M+xLRdNDdE1Y+clVKDQTuapfrlOtBLZVHIkNfqF5jAtu2O1wTERpx+F 7lDHuWuFbddbqV2vqE6RJ6ZqsUTsrlzrqLSLWGSoHn36DlnkSm+L5vDDG75QzV+w fPpGpJTM3kRxPUHnmwvfniJxTiBjsli6FDZ1vGq4Ingu+xxOJDU6+FdWts08hn4Y +KLjs1JMJSdo3di05T6t4ARKBgW3B1lJnrIS6fGYMmax+kHQqO/zvzHs3A86LZyN 7L0toEoZHa8VRMN6C1mw0XsFwPmyMOsrHrRpBlFgHpC7QLAzx6LkxchyDBqPL0mo W4bnoyu1QZUvblq1mrsnJa9yeyMjSHAeq2XTBj8Pbv20AKzmCbNl3aJ5tCDhPj4Y A+e8vk/tq8zWCjdf3SV/D+wjgEtkeWALZZj70maufu9AKtKXy5repa6DZCKEhyIC yh72c3gZ25IkxjwcHmJh+CYaKll9MgdW5toZkftBin2iOdWSaHS9QX2r4I5Awvbi m0Iwq5Fg8DSs3AhBLxiJi8L7N+RNa+7GoTH0z6UDtfAY3ExvepfwjPzLxAfTbPw6 oK2wLfuPiNizl2DX1yNlizGD2YSRiWKoap3Iog84A3IMTIe1nJ+97GHvsVcGna8I zXjnZDNQThAgBswtNY1s =U/Kg -----END PGP SIGNATURE----- Merge tag 'please-pull-mce-bitmap-comment' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull MCE updates from Tony Luck: "Better comments so we understand our existing machine check bank bitmaps - prelude to adding another bitmap soon." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-06-26 10:53:45 +02:00
H. Peter Anvin	d1fbefcb3a	x86, processor-flags: Fix the datatypes and add bit number defines The control registers are unsigned long (32 bits on i386, 64 bits on x86-64), and so make that manifest in the data type for the various constants. Add defines with a _BIT suffix which defines the bit number, as opposed to the bit mask. This should resolve some issues with ~bitmask that Linus discovered. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-cwckhbrib2aux1qbteaebij0@git.kernel.org	2013-06-25 16:26:06 -07:00
H. Peter Anvin	afcbf13fa6	x86: Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE to match the SDM. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Link: http://lkml.kernel.org/n/tip-buq1evi5dpykxx7ak6amaam0@git.kernel.org	2013-06-25 16:26:06 -07:00
H. Peter Anvin	1adfa76a95	x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED Bit 1 in the x86 EFLAGS is always set. Name the macro something that actually tries to explain what it is all about, rather than being a tautology. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Gleb Natapov <gleb@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/n/tip-f10rx5vjjm6tfnt8o1wseb3v@git.kernel.org	2013-06-25 16:25:32 -07:00
Steven Rostedt (Red Hat)	2b4bc78956	trace,x86: Do not call local_irq_save() in load_current_idt() As load_current_idt() is now what is used to update the IDT for the switches needed for NMI, lockdep debug, and for tracing, it must not call local_irq_save(). This is because one of the users of this is lockdep, which does tracing of local_irq_save() and when the debug trap is hit, we need to update the IDT before tracing interrupts being disabled. As load_current_idt() is used to do this, calling local_irq_save() which lockdep traces, defeats the point of calling load_current_idt(). As interrupts are already disabled when used by lockdep and NMI, the only other user is tracing that can disable interrupts itself. Simply have the tracing update disable interrupts before calling load_current_idt() instead of breaking the other users. Here's the dump that happened: ------------[ cut here ]------------ WARNING: at /work/autotest/nobackup/linux-test.git/kernel/fork.c:1196 copy_process+0x2c3/0x1398() DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled) Modules linked in: CPU: 1 PID: 4570 Comm: gdm-simple-gree Not tainted 3.10.0-rc3-test+ #5 Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006 ffffffff81d2a7a5 ffff88006ed13d50 ffffffff8192822b ffff88006ed13d90 ffffffff81035f25 ffff8800721c6000 ffff88006ed13da0 0000000001200011 0000000000000000 ffff88006ed5e000 ffff8800721c6000 ffff88006ed13df0 Call Trace: [<ffffffff8192822b>] dump_stack+0x19/0x1b [<ffffffff81035f25>] warn_slowpath_common+0x67/0x80 [<ffffffff81035fe1>] warn_slowpath_fmt+0x46/0x48 [<ffffffff812bfc5d>] ? __raw_spin_lock_init+0x31/0x52 [<ffffffff810341f7>] copy_process+0x2c3/0x1398 [<ffffffff8103539d>] do_fork+0xa8/0x260 [<ffffffff810ca7b1>] ? trace_preempt_on+0x2a/0x2f [<ffffffff812afb3e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81937fe7>] ? sysret_check+0x1b/0x56 [<ffffffff81937fe7>] ? sysret_check+0x1b/0x56 [<ffffffff810355cf>] SyS_clone+0x16/0x18 [<ffffffff81938369>] stub_clone+0x69/0x90 [<ffffffff81937fc2>] ? system_call_fastpath+0x16/0x1b ---[ end trace 8b157a9d20ca1aa2 ]--- in fork.c: #ifdef CONFIG_PROVE_LOCKING DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled); <-- bug here DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled); #endif Cc: Seiji Aguchi <seiji.aguchi@hds.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-06-22 13:16:19 -04:00
Jussi Kivilinna	99f42f937a	Revert "crypto: twofish - add AVX2/x86_64 assembler implementation of twofish cipher" This reverts commit `cf1521a1a5`. Instruction (vpgatherdd) that this implementation relied on turned out to be slow performer on real hardware (i5-4570). The previous 8-way twofish/AVX implementation is therefore faster and this implementation should be removed. Converting this implementation to use the same method as in twofish/AVX for table look-ups would give additional ~3% speed up vs twofish/AVX, but would hardly be worth of the added code and binary size. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-06-21 14:44:29 +08:00
Jussi Kivilinna	3d387ef08c	Revert "crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher" This reverts commit `6048801070`. Instruction (vpgatherdd) that this implementation relied on turned out to be slow performer on real hardware (i5-4570). The previous 4-way blowfish implementation is therefore faster and this implementation should be removed. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-06-21 14:44:28 +08:00
Seiji Aguchi	cf910e83ae	x86, trace: Add irq vector tracepoints [Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. So, add following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Create trace irq handlers with entering_irq()/exiting_irq(). - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Register the new handlers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at registering time of irq_vector handlers. - Add checking, whether irq vector tracing is on/off, into load_current_idt(). This has to be done below debug checking for these reasons. - Switching to debug IDT may be kicked while tracing is enabled. - On the other hands, switching to trace IDT is kicked only when debugging is disabled. In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>	2013-06-20 22:25:34 -07:00
Seiji Aguchi	629f4f9d59	x86: Rename variables for debugging Rename variables for debugging to describe meaning of them precisely. Also, introduce a generic way to switch IDT by checking a current state, debug on/off. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323A8.7050905@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>	2013-06-20 22:25:13 -07:00
Seiji Aguchi	eddc0e922a	x86, trace: Introduce entering/exiting_irq() When implementing tracepoints in interrupt handers, if the tracepoints are simply added in the performance sensitive path of interrupt handers, it may cause potential performance problem due to the time penalty. To solve the problem, an idea is to prepare non-trace/trace irq handers and switch their IDTs at the enabling/disabling time. So, let's introduce entering_irq()/exiting_irq() for pre/post- processing of each irq handler. A way to use them is as follows. Non-trace irq handler: smp_irq_handler() { entering_irq(); /* pre-processing of this handler / __smp_irq_handler(); / * common logic between non-trace and trace handlers * in a vector. / exiting_irq(); / post-processing of this handler / } Trace irq_handler: smp_trace_irq_handler() { entering_irq(); / pre-processing of this handler / trace_irq_entry(); / tracepoint for irq entry / __smp_irq_handler(); / * common logic between non-trace and trace handlers * in a vector. / trace_irq_exit(); / tracepoint for irq exit / exiting_irq(); / post-processing of this handler */ } If tracepoints can place outside entering_irq()/exiting_irq() as follows, it looks cleaner. smp_trace_irq_handler() { trace_irq_entry(); smp_irq_handler(); trace_irq_exit(); } But it doesn't work. The problem is with irq_enter/exit() being called. They must be called before trace_irq_enter/exit(), because of the rcu_irq_enter() must be called before any tracepoints are used, as tracepoints use rcu to synchronize. As a possible alternative, we may be able to call irq_enter() first as follows if irq_enter() can nest. smp_trace_irq_hander() { irq_entry(); trace_irq_entry(); smp_irq_handler(); trace_irq_exit(); irq_exit(); } But it doesn't work, either. If irq_enter() is nested, it may have a time penalty because it has to check if it was already called or not. The time penalty is not desired in performance sensitive paths even if it is tiny. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C3238D.9040706@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>	2013-06-20 22:25:01 -07:00
H. Peter Anvin	e6bca5a6a8	Linux 3.10-rc6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQEcBAABAgAGBQJRvOHmAAoJEHm+PkMAQRiGpOYIAI/OMDjCMro8S1FjDPGr+wsz r/u4h4DiHQTdAUSBYhksgXVBSm02hGhDF3tz7u9oAXuv+4zlGUMZNjmEmLOFfdyd Ve9vqMrUkdwA0jt0d7AYVjtCy/j/yVe5szXZCksHXOZiyb78SEZPQgHc13CJ4lKI K2jPk0sMko7d5ptALPIX+ome48M+3w3k1HuQ62Znm4pFSADcFGpGdf40HP/5LL+I Llp3XHJ5OZrcHRal0t39oHJOCWW61bnYXTWel9J7XoUztebF2cRgFydAhtrto42z x7ELOKVY197WH8l56cnVHrISneQd4kgWrooxLYy6QsJl73/qvQBX+7nmkoes7so= =qeV0 -----END PGP SIGNATURE----- Merge tag 'v3.10-rc6' into x86/cleanups Linux 3.10-rc6 We need a change that is the mainline tree for further work. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-06-20 21:13:55 -07:00
Borislav Petkov	5f8c421814	x86, fpu: Use static_cpu_has_safe before alternatives The call stack below shows how this happens: basically eager_fpu_init() calls __thread_fpu_begin(current) which then does if (!use_eager_fpu()), which, in turn, uses static_cpu_has. And we're executing before alternatives so static_cpu_has doesn't work there yet. Use the safe variant in this path which becomes optimal after alternatives have run. WARNING: at arch/x86/kernel/cpu/common.c:1368 warn_pre_alternatives+0x1e/0x20() You're using static_cpu_has before alternatives have run! Modules linked in: Pid: 0, comm: swapper Not tainted 3.9.0-rc8+ #1 Call Trace: warn_slowpath_common warn_slowpath_fmt ? fpu_finit warn_pre_alternatives eager_fpu_init fpu_init cpu_init trap_init start_kernel ? repair_env_string x86_64_start_reservations x86_64_start_kernel Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1370772454-6106-6-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-06-20 17:38:22 -07:00
Borislav Petkov	4a90a99c4f	x86: Add a static_cpu_has_safe variant We want to use this in early code where alternatives might not have run yet and for that case we fall back to the dynamic boot_cpu_has. For that, force a 5-byte jump since the compiler could be generating differently sized jumps for each label. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1370772454-6106-5-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-06-20 17:38:14 -07:00
Borislav Petkov	5700f743b5	x86: Sanity-check static_cpu_has usage static_cpu_has may be used only after alternatives have run. Before that it always returns false if constant folding with __builtin_constant_p() doesn't happen. And you don't want that. This patch is the result of me debugging an issue where I overzealously put static_cpu_has in code which executed before alternatives have run and had to spend some time with scratching head and cursing at the monitor. So add a jump to a warning which screams loudly when we use this function too early. The alternatives patch that check away in conjunction with patching the rest of the kernel image. [ hpa: factored this into its own configuration option. If we want to have an overarching option, it should be an option which selects other options, not as a group option in the source code. ] Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1370772454-6106-4-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-06-20 17:37:19 -07:00
Borislav Petkov	c3b83598c1	x86, cpu: Add a synthetic, always true, cpu feature This will be used in alternatives later as an always-replace flag. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1370772454-6106-2-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-06-20 17:06:07 -07:00
Michel Lespinasse	b52e0a7c4e	x86: Fix trigger_all_cpu_backtrace() implementation The following change fixes the x86 implementation of trigger_all_cpu_backtrace(), which was previously (accidentally, as far as I can tell) disabled to always return false as on architectures that do not implement this function. trigger_all_cpu_backtrace(), as defined in include/linux/nmi.h, should call arch_trigger_all_cpu_backtrace() if available, or return false if the underlying arch doesn't implement this function. x86 did provide a suitable arch_trigger_all_cpu_backtrace() implementation, but it wasn't actually being used because it was declared in asm/nmi.h, which linux/nmi.h doesn't include. Also, linux/nmi.h couldn't easily be fixed by including asm/nmi.h, because that file is not available on all architectures. I am proposing to fix this by moving the x86 definition of arch_trigger_all_cpu_backtrace() to asm/irq.h. Tested via: echo l > /proc/sysrq-trigger Before the change, this uses a fallback implementation which shows backtraces on active CPUs (using smp_call_function_interrupt() ) After the change, this shows NMI backtraces on all CPUs Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1370518875-1346-1-git-send-email-walken@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-06-20 14:00:21 +02:00
Paul Gortmaker	949785996e	x86: Fix section mismatch on load_ucode_ap We are in the process of removing all the __cpuinit annotations. While working on making that change, an existing problem was made evident: WARNING: arch/x86/kernel/built-in.o(.text+0x198f2): Section mismatch in reference from the function cpu_init() to the function .init.text:load_ucode_ap() The function cpu_init() references the function __init load_ucode_ap(). This is often because cpu_init lacks a __init annotation or the annotation of load_ucode_ap is wrong. This now appears because in my working tree, cpu_init() is no longer tagged as __cpuinit, and so the audit picks up the mismatch. The 2nd hypothesis from the audit is the correct one, as there was an incorrect __init tag on the prototype in the header (but __cpuinit was used on the function itself.) The audit is telling us that the prototype's __init annotation took effect and the function did land in the .init.text section. Checking with objdump on a mainline tree that still has __cpuinit shows that the __cpuinit on the function takes precedence over the __init on the prototype, but that won't be true once we make __cpuinit a no-op. Even though we are removing __cpuinit, we temporarily align both the function and the prototype on __cpuinit so that the changeset can be applied to stable trees if desired. [ hpa: build fix only, no object code change ] Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: stable <stable@vger.kernel.org> # 3.9+ Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/1371654926-11729-1-git-send-email-paul.gortmaker@windriver.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-06-19 14:43:59 -07:00
Konrad Rzeszutek Wilk	d6a77ead21	x86 / ACPI / sleep: Provide registration for acpi_suspend_lowlevel. Which by default will be x86_acpi_suspend_lowlevel. This registration allows us to register another callback if there is a need to use another platform specific callback. Signed-off-by: Liang Tang <liang.tang@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Ben Guthro <benjamin.guthro@citrix.com> Acked-by: "H. Peter Anvin" <hpa@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-06-19 23:36:30 +02:00
Andi Kleen	3a632cb229	perf/x86/intel: Add simple Haswell PMU support Similar to SandyBridge, but has a few new events and two new counter bits. There are some new counter flags that need to be prevented from being set on fixed counters, and allowed to be set for generic counters. Also we add support for the counter 2 constraint to handle all raw events. (Contains fixes from Stephane Eranian.) Reviewed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Andi Kleen <ak@linux.jf.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/1371515812-9646-3-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-06-19 14:43:33 +02:00
Ingo Molnar	2e7e98b85d	Fix typo in define -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRsuoUAAoJEBLB8Bhh3lVKpFUP/j6763r5WPtrzUx0sAZBt7SN T9VAdqWd3SvxP+lDnRoU1luGQc0j+05njUnc74e2k2rGu2+WDonkxJIv4xMHsuG/ cItdK+ThmxOQa4ijm3uJ0N8I16k15cZkW0H5hyHSVqbawPaAjCt25DEXexTSgBQM wnQSsDNXRjtU/LSBPAZX4t9ePdawy/EwLGO+YYJ1ZKck4oesCwp0j/PVqQG+IGTM QUbB3mh0GvdHW5pPgVH6n+yqQ7puN92WHzJicYsYCtJDHa1Yf5D6/C+ZBq8GJ1V0 rCdxQ8uI5tcn9WOJ1gZrUGPmjEMotnLrZX26b/cTEPzBP0mjf+Kd/Ew+gNmdmvX8 5jv1Agzb1GQVXesPpv+N3hw5HNqLEQCBdlLQ5kiX63wYwF4vck+JaAGDWUILO6qp TtdEiotBHDKpk9YuWeepmCUDB2u/uyMrHSA0HGLBh6ACQXsl5/RnQj3KnGFlbvo6 FbMvoOEXbNIOB92OEnC9kZteXGlIBtsK0sHmRXfOsckbH/gyYyBcFwqmeHFpzuKV N5f3NV8GTrUyrNdn4S28S/wlnx+KOqpqFhllVkgp/OcW2/Ry2eEZYM7ypU8bXMpo +HOFwQZimmQonVqRzGrQQI9w5+D5xGJ3gI/0lcjkX2r2Njowx8q13wfKKmqE8EDB 7a+BsNdTBUt01dskFlSe =8FzZ -----END PGP SIGNATURE----- Merge tag 'ras_fixlet_for_3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull "Fix typo in define" change from Borislav Petkov. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-06-19 13:51:54 +02:00
H. Peter Anvin	45df901cc8	* More tweaking to the EFI variable anti-bricking algorithm. Quite a few users were reporting boot regressions in v3.9. This has now been fixed with a more accurate "minimum storage requirement to avoid bricking" value from Samsung (5K instead of 50%) and code to trigger garbage collection when we near our limit - Matthew Garrett. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQIcBAABAgAGBQJRtkY2AAoJEC84WcCNIz1VJOsP/00xwiY4VKh2RfqNkYKSl/w5 gEshIHFEAXHX5X8C4ReocZVywvdjTgbJoKBbBy3FePYRzLddrmavvjen17hk7BzS /cO8/eXForkNWCGR1kLagA6HLpgKP5DPayKizoMb4Mg6muzfT1SCcN6Pzh8cDMWe btcq/l9JZejXdJ4Wfoq1My+WdXs19OT/BNeD3y65K4x29vNUjop6oaIdDJWLlH/S aeLHh8d4xbSHNWzK1fBP7CnFTYU27xxs1BFNAReU6McxeQCYZAIaRovYnjTZEvfJ twd2tLrOn9HBVTbWa8T4XGNSr+QcT4XGMadLvdwuqltmKDfH6Onm8aWQM3IqA7gy Qimbcv2B7HrITgXWTzp3DPkXF1LA8/8QHSBXVMUU9Rl6QOLy18vIdKiQy3M1Ng9Z 0q+Ow93JtnL11zf9wLDMdKaKcA9HOxbG/wRTK6XO4vGaWj9brFv3n5Ib7OreHH6D GP58zDEnThFuj97K/NKREBZZFcFOMZpKk5MAipVkzltihUQmNeTF/dAtBJ3Ncu/A PqQE6uuKVXjASJR8Gy0bI3WHtSTZK4L/sg9c2MF3bdJa9BswN+m8IEbls+S+iFOx +sYPQx7Zw6SFENxDw8cDYNzC14yfr60qyOxTWfkHH7l/FnvhOgwHzqPsLcXx0ouR C6k1yPYSTgiqFdWC2sjn =TZuM -----END PGP SIGNATURE----- Merge tag 'efi-urgent' into x86/urgent * More tweaking to the EFI variable anti-bricking algorithm. Quite a few users were reporting boot regressions in v3.9. This has now been fixed with a more accurate "minimum storage requirement to avoid bricking" value from Samsung (5K instead of 50%) and code to trigger garbage collection when we near our limit - Matthew Garrett. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-06-13 08:59:23 -07:00
Srinivas Pandruvada	25cdce170d	x86, mcheck, therm_throt: Process package thresholds Added callback registration for package threshold reports. Also added a callback to check the rate control implemented in callback or not. If there is no rate control implemented, then there is a default rate control similar to core threshold notification by delaying for CHECK_INTERVAL (5 minutes) between reports. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>	2013-06-13 09:59:14 +08:00
Borislav Petkov	43ab0476a6	efi: Convert runtime services function ptrs ... to void * like the boot services and lose all the void * casts. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2013-06-11 07:39:26 +01:00
Matthew Garrett	f8b8404337	Modify UEFI anti-bricking code This patch reworks the UEFI anti-bricking code, including an effective reversion of `cc5a080c` and `31ff2f20`. It turns out that calling QueryVariableInfo() from boot services results in some firmware implementations jumping to physical addresses even after entering virtual mode, so until we have 1:1 mappings for UEFI runtime space this isn't going to work so well. Reverting these gets us back to the situation where we'd refuse to create variables on some systems because they classify deleted variables as "used" until the firmware triggers a garbage collection run, which they won't do until they reach a lower threshold. This results in it being impossible to install a bootloader, which is unhelpful. Feedback from Samsung indicates that the firmware doesn't need more than 5KB of storage space for its own purposes, so that seems like a reasonable threshold. However, there's still no guarantee that a platform will attempt garbage collection merely because it drops below this threshold. It seems that this is often only triggered if an attempt to write generates a genuine EFI_OUT_OF_RESOURCES error. We can force that by attempting to create a variable larger than the remaining space. This should fail, but if it somehow succeeds we can then immediately delete it. I've tested this on the UEFI machines I have available, but I don't have a Samsung and so can't verify that it avoids the bricking problem. Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Lee, Chun-Y <jlee@suse.com> [ dummy variable cleanup ] Cc: <stable@vger.kernel.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2013-06-10 21:59:37 +01:00
H. Peter Anvin	60e019eb37	x86: Get rid of ->hard_math and all the FPU asm fu Reimplement FPU detection code in C and drop old, not-so-recommended detection method in asm. Move all the relevant stuff into i387.c where it conceptually belongs. Finally drop cpuinfo_x86.hard_math. [ hpa: huge thanks to Borislav for taking my original concept patch and productizing it ] [ Boris, note to self: do not use static_cpu_has before alternatives! ] Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1367244262-29511-2-git-send-email-bp@alien8.de Link: http://lkml.kernel.org/r/1365436666-9837-2-git-send-email-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-06-06 14:32:04 -07:00
Mathias Krause	a90936845d	x86, mce: Fix "braodcast" typo Fix the typo in MCJ_IRQ_BRAODCAST. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2013-06-05 11:59:17 +02:00
Xiao Guangrong	365c886860	KVM: MMU: reclaim the zapped-obsolete page first As Marcelo pointed out that \| "(retention of large number of pages while zapping) \| can be fatal, it can lead to OOM and host crash" We introduce a list, kvm->arch.zapped_obsolete_pages, to link all the pages which are deleted from the mmu cache but not actually freed. When page reclaiming is needed, we always zap this kind of pages first. Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-06-05 12:33:33 +03:00
Xiao Guangrong	5304b8d37c	KVM: MMU: fast invalidate all pages The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to walk and zap all shadow pages one by one, also it need to zap all guest page's rmap and all shadow page's parent spte list. Particularly, things become worse if guest uses more memory or vcpus. It is not good for scalability In this patch, we introduce a faster way to invalidate all shadow pages. KVM maintains a global mmu invalid generation-number which is stored in kvm->arch.mmu_valid_gen and every shadow page stores the current global generation-number into sp->mmu_valid_gen when it is created When KVM need zap all shadow pages sptes, it just simply increase the global generation-number then reload root shadow pages on all vcpus. Vcpu will create a new shadow page table according to current kvm's generation-number. It ensures the old pages are not used any more. Then the obsolete pages (sp->mmu_valid_gen != kvm->arch.mmu_valid_gen) are zapped by using lock-break technique Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-06-05 12:32:33 +03:00
Michael Wang	e8747f10ba	x86, cleanups: Remove extra tab in __flush_tlb_one() Remove the extra tab in __flush_tlb_one(). CC: Alex Shi <alex.shi@intel.com> CC: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/51AD8902.60603@linux.vnet.ibm.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-06-04 11:51:26 -07:00
Jacob Shin	6b3389ac21	x86, microcode, amd: Fix warnings and errors on with CONFIG_MICROCODE=m Fix section mismatch warnings on microcode_amd_early. Compile error occurs when CONFIG_MICROCODE=m, change so that early loading depends on microcode_core. Reported-by: Yinghai Lu <yinghai@kernel.org> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/20130531150241.GA12006@jshin-Toonie Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-05-31 13:56:58 -07:00
Jan Beulich	1d10f6ee60	x86: __force_order doesn't need to be an actual variable It being static causes over a dozen instances to be scattered across the kernel image, with non of them ever being referenced in any way. Making the variable extern without ever defining it works as well - all we need is to have the compiler think the variable is being accessed. Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/51A610B802000078000D99A0@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-31 13:09:17 +02:00
Jan Beulich	cbdce7b251	ix86: Don't waste fixmap entries The vsyscall related pvclock entries can only ever be used on x86-64, and hence they shouldn't even get allocated for 32-bit kernels (the more that it is there where address space is relatively precious). Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Link: http://lkml.kernel.org/r/51A60F1F02000078000D997C@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-31 13:08:18 +02:00
Jacob Shin	757885e94a	x86, microcode, amd: Early microcode patch loading support for AMD Add early microcode patch loading support for AMD. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1369940959-2077-5-git-send-email-jacob.shin@amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com>	2013-05-30 20:19:25 -07:00
Jacob Shin	a76096a657	x86, microcode, amd: Refactor functions to prepare for early loading In preparation work for early loading, refactor some common functions that will be shared, and move some struct defines to a common header file. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1369940959-2077-4-git-send-email-jacob.shin@amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com>	2013-05-30 20:19:25 -07:00
Jacob Shin	f2b3ee820a	x86, microcode: Vendor abstract out save_microcode_in_initrd() Currently save_microcode_in_initrd() is declared in vendor neutural microcode.h file, but defined in vendor specific microcode_intel_early.c file. Vendor abstract it out to microcode_core_early.c with a wrapper function. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1369940959-2077-3-git-send-email-jacob.shin@amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com>	2013-05-30 20:19:25 -07:00
Andy Lutomirski	d0d98eedee	Add arch_phys_wc_{add, del} to manipulate WC MTRRs if needed Several drivers currently use mtrr_add through various #ifdef guards and/or drm wrappers. The vast majority of them want to add WC MTRRs on x86 systems and don't actually need the MTRR if PAT (i.e. ioremap_wc, etc) are working. arch_phys_wc_add and arch_phys_wc_del are new functions, available on all architectures and configurations, that add WC MTRRs on x86 if needed (and handle errors) and do nothing at all otherwise. They're also easier to use than mtrr_add and mtrr_del, so the call sites can be simplified. As an added benefit, this will avoid wasting MTRRs and possibly warning pointlessly on PAT-supporting systems. Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Dave Airlie <airlied@redhat.com>	2013-05-31 13:02:52 +10:00
Jan Beulich	2baad6121e	x86, crc32-pclmul: Fix build with older binutils binutils prior to 2.18 (e.g. the ones found on SLE10) don't support assembling PEXTRD, so a macro based approach like the one for PCLMULQDQ in the same file should be used. This requires making the helper macros capable of recognizing 32-bit general purpose register operands. [ hpa: tagging for stable as it is a low risk build fix ] Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/51A6142A02000078000D99D8@nat28.tlf.novell.com Cc: Alexander Boyko <alexander_boyko@xyratex.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Huang Ying <ying.huang@intel.com> Cc: <stable@vger.kernel.org> v3.9 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-05-30 16:36:23 -07:00
John Stultz	ce0b098981	x86: Fix vrtc_get_time/set_mmss to use new timespec interface The patch "x86: Increase precision of x86_platform.get/set_wallclock" changed the x86 platform set_wallclock/get_wallclock interfaces to use nsec granular timespecs instead of a second granular interface. However, that patch missed converting the vrtc code, so this patch converts those functions to use timespecs. Many thanks to the kbuild test robot for finding this! Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2013-05-29 12:57:35 -07:00
David Vrabel	3565184ed0	x86: Increase precision of x86_platform.get/set_wallclock() All the virtualized platforms (KVM, lguest and Xen) have persistent wallclocks that have more than one second of precision. read_persistent_wallclock() and update_persistent_wallclock() allow for nanosecond precision but their implementation on x86 with x86_platform.get/set_wallclock() only allows for one second precision. This means guests may see a wallclock time that is off by up to 1 second. Make set_wallclock() and get_wallclock() take a struct timespec parameter (which allows for nanosecond precision) so KVM and Xen guests may start with a more accurate wallclock time and a Xen dom0 can maintain a more accurate wallclock for guests. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2013-05-28 14:00:59 -07:00
Michael S. Tsirkin	016be2e55d	x86: uaccess s/might_sleep/might_fault/ The only reason uaccess routines might sleep is if they fault. Make this explicit. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1369577426-26721-9-git-send-email-mst@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-28 09:41:10 +02:00
Jiri Olsa	5e219b3c67	x86/signals: Propagate RF EFLAGS bit through the signal restore call While porting Vince's perf overflow tests I found perf event breakpoint overflow does not work properly. I found the x86 RF EFLAG bit not being set when returning from debug exception after triggering signal handler. Which is exactly what you get when you set perf breakpoint overflow SIGIO handler. This patch and the next two patches fix the underlying bugs. This patch adds the RF EFLAGS bit to be restored on return from signal from the original register context before the signal was entered. This will prevent the RF flag to disappear when returning from exception due to the signal handler being executed. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Originally-Reported-by: Vince Weaver <vincent.weaver@maine.edu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Stephane Eranian <eranian@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1367421944-19082-2-git-send-email-jolsa@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-28 08:46:50 +02:00
Borislav Petkov	4d067d8e05	x86: Extend #DF debugging aid to 64-bit It is sometimes very helpful to be able to pinpoint the location which causes a double fault before it turns into a triple fault and the machine reboots. We have this for 32-bit already so extend it to 64-bit. On 64-bit we get the register snapshot at #DF time and not from the first exception which actually causes the #DF. It should be close enough, though. [ hpa: and definitely better than nothing, which is what we have now. ] Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1368093749-31296-1-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-05-13 13:42:44 -07:00
Linus Torvalds	ac4e01093f	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull idle update from Len Brown: "Add support for new Haswell-ULT CPU idle power states" * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: intel_idle: initial C8, C9, C10 support tools/power turbostat: display C8, C9, C10 residency	2013-05-11 15:23:17 -07:00
Linus Torvalds	3644bc2ec7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull stray syscall bits from Al Viro: "Several syscall-related commits that were missing from the original" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE unicore32: just use mmap_pgoff()... unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...)	2013-05-10 09:21:05 -07:00
Al Viro	91c2e0bcae	unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-09 13:46:38 -04:00
Linus Torvalds	c8de2fa4dc	Merge branch 'rwsem-optimizations' Merge rwsem optimizations from Michel Lespinasse: "These patches extend Alex Shi's work (which added write lock stealing on the rwsem slow path) in order to provide rwsem write lock stealing on the fast path (that is, without taking the rwsem's wait_lock). I have unfortunately been unable to push this through -next before due to Ingo Molnar / David Howells / Peter Zijlstra being busy with other things. However, this has gotten some attention from Rik van Riel and Davidlohr Bueso who both commented that they felt this was ready for v3.10, and Ingo Molnar has said that he was OK with me pushing directly to you. So, here goes :) Davidlohr got the following test results from pgbench running on a quad-core laptop: \| db_size \| clients \| tps-vanilla \| tps-rwsem \| +---------+----------+----------------+--------------+ \| 160 MB \| 1 \| 5803 \| 6906 \| + 19.0% \| 160 MB \| 2 \| 13092 \| 15931 \| \| 160 MB \| 4 \| 29412 \| 33021 \| \| 160 MB \| 8 \| 32448 \| 34626 \| \| 160 MB \| 16 \| 32758 \| 33098 \| \| 160 MB \| 20 \| 26940 \| 31343 \| + 16.3% \| 160 MB \| 30 \| 25147 \| 28961 \| \| 160 MB \| 40 \| 25484 \| 26902 \| \| 160 MB \| 50 \| 24528 \| 25760 \| ------------------------------------------------------ \| 1.6 GB \| 1 \| 5733 \| 7729 \| + 34.8% \| 1.6 GB \| 2 \| 9411 \| 19009 \| + 101.9% \| 1.6 GB \| 4 \| 31818 \| 33185 \| \| 1.6 GB \| 8 \| 33700 \| 34550 \| \| 1.6 GB \| 16 \| 32751 \| 33079 \| \| 1.6 GB \| 20 \| 30919 \| 31494 \| \| 1.6 GB \| 30 \| 28540 \| 28535 \| \| 1.6 GB \| 40 \| 26380 \| 27054 \| \| 1.6 GB \| 50 \| 25241 \| 25591 \| ------------------------------------------------------ \| 7.6 GB \| 1 \| 5779 \| 6224 \| \| 7.6 GB \| 2 \| 10897 \| 13611 \| + 24.9% \| 7.6 GB \| 4 \| 32683 \| 33108 \| \| 7.6 GB \| 8 \| 33968 \| 34712 \| \| 7.6 GB \| 16 \| 32287 \| 32895 \| \| 7.6 GB \| 20 \| 27770 \| 31689 \| + 14.1% \| 7.6 GB \| 30 \| 26739 \| 29003 \| \| 7.6 GB \| 40 \| 24901 \| 26683 \| \| 7.6 GB \| 50 \| 17115 \| 25925 \| + 51.5% ------------------------------------------------------ (Davidlohr also has one additional patch which further improves throughput, though I will ask him to send it directly to you as I have suggested some minor changes)." * emailed patches from Michel Lespinasse <walken@google.com>: rwsem: no need for explicit signed longs x86 rwsem: avoid taking slow path when stealing write lock rwsem: do not block readers at head of queue if other readers are active rwsem: implement support for write lock stealing on the fastpath rwsem: simplify __rwsem_do_wake rwsem: skip initial trylock in rwsem_down_write_failed rwsem: avoid taking wait_lock in rwsem_down_write_failed rwsem: use cmpxchg for trying to steal write lock rwsem: more agressive lock stealing in rwsem_down_write_failed rwsem: simplify rwsem_down_write_failed rwsem: simplify rwsem_down_read_failed rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed rwsem: shorter spinlocked section in rwsem_down_failed_common() rwsem: make the waiter type an enumeration rather than a bitmask	2013-05-07 09:22:03 -07:00
Michel Lespinasse	a31a369b07	x86 rwsem: avoid taking slow path when stealing write lock modify __down_write[_nested] and __down_write_trylock to grab the write lock whenever the active count is 0, even if there are queued waiters (they must be writers pending wakeup, since the active count is 0). Note that this is an optimization only; architectures without this optimization will still work fine: - __down_write() would take the slow path which would take the wait_lock and then try stealing the lock (as in the spinlocked rwsem implementation) - __down_write_trylock() would fail, but callers must be ready to deal with that - since there are some writers pending wakeup, they could have raced with us and obtained the lock before we steal it. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-05-07 07:20:17 -07:00
Linus Torvalds	99737982ca	IOMMU Updates for Linux v3.10 The updates are mostly about the x86 IOMMUs this time. Exceptions are the groundwork for the PAMU IOMMU from Freescale (for a PPC platform) and an extension to the IOMMU group interface. On the x86 side this includes a workaround for VT-d to disable interrupt remapping on broken chipsets. On the AMD-Vi side the most important new feature is a kernel command-line interface to override broken information in IVRS ACPI tables and get interrupt remapping working this way. Besides that there are small fixes all over the place. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRh2vAAAoJECvwRC2XARrjbjkP/jzKzeffybUpQsIJF8rs/IEt hSwqpGLr6WR5FdneEH9fiBIp4pyMDXmuAb/2ZNgB+DgPN3xgqmWVo4WLk7pMo3BS /xIz/lu7hIX3AtKt807pL9+rPdhGYEJ43Vmr4bW9x0l1kuNXy6fmMLcN5FaPKjV4 p4hY4jOstEgtYQw4wi39/9b4FsYoipZizkOUSdtCzWwTv7jOHH7/Wra8iZyzL6Je 1VlF/efp0ytTcwLdHOfGwPCIlZrQRtQCM4SqdAUG9bOL3ARR9Yu/0iW1295nbLzo CQX5CfKePvo/fGxki1jcBi+UCyxYKPosB5kCxmh4MAxCg/VzzMsaME/A73tLJa6W Y29bbjwPoBPMq03HX8S9R5QWY8HpFujUUp+J4TXcKuTgYEV28WfLu1uaeKD716nM LoXUojov7Cj8ZQZnhyu5l+XNaephBZLfw/8bM6bAxhlKXwAjmLiS5Z+srPl1GJee 5GCV+L94JifHLZaREWh3JFsh9O3W7Wno2++c4JU32uCWJHXH7tMgs2P8n5AY9rnT Km1a9y6w2MF3Gg9j4y6u75m0XnFTNzYjeJMUtqVlwVhNHhgaXfuIWY63xOQCLJs1 ThTHOjoh0VqONGobR/ywn+0ouo9X07DnWpluyFd+zY3XK0UE0NOu9XMr4i6TWxOf mlzWoEKxtw36XGHB/FtQ =MVc/ -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: "The updates are mostly about the x86 IOMMUs this time. Exceptions are the groundwork for the PAMU IOMMU from Freescale (for a PPC platform) and an extension to the IOMMU group interface. On the x86 side this includes a workaround for VT-d to disable interrupt remapping on broken chipsets. On the AMD-Vi side the most important new feature is a kernel command-line interface to override broken information in IVRS ACPI tables and get interrupt remapping working this way. Besides that there are small fixes all over the place." * tag 'iommu-updates-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (24 commits) iommu/tegra: Fix printk formats for dma_addr_t iommu: Add a function to find an iommu group by id iommu/vt-d: Remove warning for HPET scope type iommu: Move swap_pci_ref function to drivers/iommu/pci.h. iommu/vt-d: Disable translation if already enabled iommu/amd: fix error return code in early_amd_iommu_init() iommu/AMD: Per-thread IOMMU Interrupt Handling iommu: Include linux/err.h iommu/amd: Workaround for ERBT1312 iommu/amd: Document ivrs_ioapic and ivrs_hpet parameters iommu/amd: Don't report firmware bugs with cmd-line ivrs overrides iommu/amd: Add ioapic and hpet ivrs override iommu/amd: Add early maps for ioapic and hpet iommu/amd: Extend IVRS special device data structure iommu/amd: Move add_special_device() to __init iommu: Fix compile warnings with forward declarations iommu/amd: Properly initialize irq-table lock iommu/amd: Use AMD specific data structure for irq remapping iommu/amd: Remove map_sg_no_iommu() iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets ...	2013-05-06 14:59:13 -07:00
Linus Torvalds	01227a889e	Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Gleb Natapov: "Highlights of the updates are: general: - new emulated device API - legacy device assignment is now optional - irqfd interface is more generic and can be shared between arches x86: - VMCS shadow support and other nested VMX improvements - APIC virtualization and Posted Interrupt hardware support - Optimize mmio spte zapping ppc: - BookE: in-kernel MPIC emulation with irqfd support - Book3S: in-kernel XICS emulation (incomplete) - Book3S: HV: migration fixes - BookE: more debug support preparation - BookE: e6500 support ARM: - reworking of Hyp idmaps s390: - ioeventfd for virtio-ccw And many other bug fixes, cleanups and improvements" * tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits) kvm: Add compat_ioctl for device control API KVM: x86: Account for failing enable_irq_window for NMI window request KVM: PPC: Book3S: Add API for in-kernel XICS emulation kvm/ppc/mpic: fix missing unlock in set_base_addr() kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write kvm/ppc/mpic: remove users kvm/ppc/mpic: fix mmio region lists when multiple guests used kvm/ppc/mpic: remove default routes from documentation kvm: KVM_CAP_IOMMU only available with device assignment ARM: KVM: iterate over all CPUs for CPU compatibility check KVM: ARM: Fix spelling in error message ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally KVM: ARM: Fix API documentation for ONE_REG encoding ARM: KVM: promote vfp_host pointer to generic host cpu context ARM: KVM: add architecture specific hook for capabilities ARM: KVM: perform HYP initilization for hotplugged CPUs ARM: KVM: switch to a dual-step HYP init code ARM: KVM: rework HYP page table freeing ARM: KVM: enforce maximum size for identity mapped code ARM: KVM: move to a KVM provided HYP idmap ...	2013-05-05 14:47:31 -07:00
Jan Kiszka	03b28f8133	KVM: x86: Account for failing enable_irq_window for NMI window request With VMX, enable_irq_window can now return -EBUSY, in which case an immediate exit shall be requested before entering the guest. Account for this also in enable_nmi_window which uses enable_irq_window in absence of vnmi support, e.g. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-05-02 22:17:38 -03:00
Alexander van Heukelum	5522ddb3fc	x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...) Commit `49cb25e929` x86: 'get rid of pt_regs argument in vm86/vm86old' got rid of the pt_regs stub for sys_vm86old and sys_vm86. The functions were, however, not changed to use the calling convention for syscalls. [AV: killed asmlinkage_protect() - it's done automatically now] Reported-and-tested-by: Hans de Bruin <jmdebruin@xmsnet.nl> Cc: stable@vger.kernel.org Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-02 20:36:32 -04:00
Linus Torvalds	797994f81a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: - XTS mode optimisation for twofish/cast6/camellia/aes on x86 - AVX2/x86_64 implementation for blowfish/twofish/serpent/camellia - SSSE3/AVX/AVX2 optimisations for sha256/sha512 - Added driver for SAHARA2 crypto accelerator - Fix for GMAC when used in non-IPsec secnarios - Added generic CMAC implementation (including IPsec glue) - IP update for crypto/atmel - Support for more than one device in hwrng/timeriomem - Added Broadcom BCM2835 RNG driver - Misc fixes * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (59 commits) crypto: caam - fix job ring cleanup code crypto: camellia - add AVX2/AES-NI/x86_64 assembler implementation of camellia cipher crypto: serpent - add AVX2/x86_64 assembler implementation of serpent cipher crypto: twofish - add AVX2/x86_64 assembler implementation of twofish cipher crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher crypto: tcrypt - add async cipher speed tests for blowfish crypto: testmgr - extend camellia test-vectors for camellia-aesni/avx2 crypto: aesni_intel - fix Kconfig problem with CRYPTO_GLUE_HELPER_X86 crypto: aesni_intel - add more optimized XTS mode for x86-64 crypto: x86/camellia-aesni-avx - add more optimized XTS code crypto: cast6-avx: use new optimized XTS code crypto: x86/twofish-avx - use optimized XTS code crypto: x86 - add more optimized XTS-mode for serpent-avx xfrm: add rfc4494 AES-CMAC-96 support crypto: add CMAC support to CryptoAPI crypto: testmgr - add empty test vectors for null ciphers crypto: testmgr - add AES GMAC test vectors crypto: gcm - fix rfc4543 to handle async crypto correctly crypto: gcm - make GMAC work when dst and src are different hwrng: timeriomem - added devicetree hooks ...	2013-05-02 14:53:12 -07:00
Linus Torvalds	a8bdf74512	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "Two regression fixes: 1. On 64 bits, we would set NX on non-NX-capable hardware (very rare in 64-bit land, but a nonzero subset.) 2. Fix suspend/resume across kernel versions" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64, init: Do not set NX bits on non-NX capable hardware x86, gdt, hibernate: Store/load GDT for hibernate path.	2013-05-02 14:43:58 -07:00
Linus Torvalds	736a2dd257	Lots of virtio work which wasn't quite ready for last merge window. Plus I dived into lguest again, reworking the pagetable code so we can move the switcher page: our fixmaps sometimes take more than 2MB now... Cheers, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRga7lAAoJENkgDmzRrbjx/yIQAKpqIBtxOJeYH3SY+Uoe7Cfp toNYcpJEldvb0UcWN8M2cSZpHoxl1SUoq9djwcM29tcKa7EZAjHaGtb/Q1qMTDgv +B3WAfiGU2pmXFxLAkbrlLNGnysy24JspqJQ5hcYV84EiBxQdZp+nCYgOphd+GMK ww16vo9ya8jFjzt3GeRp/Heb3vEzV4Cp6BC3i0m8A3WNpEpbRb66pqXNk5o8ggJO SxQOKSXmUM+0m+jKSul5xn3e2Ls2LOrZZ8/DIHA+gW66N4Zab7n2/j1Q9VRxb4lh FqnR7KwgBX8OCh9IsBDqQYS7MohvMYge6eUdLtFrq84jvMleMEhrC8q9v2tucFUb 5t18CLwvyK7Gdg6UCKiZ7YSPcuURAILO16al9bh5IseeBDsuX+43VsvQoBmFn9k6 cLOVTZ6BlOmahK5PyRYFSvLa9Rxzr/05Mr7oYq9UgshD9io78dnqczFYIORF53rW zD7C4HuTZfYJFfNd0wAJ0RfVXnf8QvDlMdo7zPC26DSXNWqj8OexCY0qqSWUB+2F vcfJP6NkV4fZB8aawWIFUVwc64yqtt2uPVLa7ATZWqk16PgKrchGewmw3tiEwOgu 1l7xgffTRRUIJsqaCZoXdgw3yezcKRjuUBcOxL09lDAAhc+NxWNvzZBsKp66DwDk yZQKn0OdXnuf0CeEOfFf =1tYL -----END PGP SIGNATURE----- Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio & lguest updates from Rusty Russell: "Lots of virtio work which wasn't quite ready for last merge window. Plus I dived into lguest again, reworking the pagetable code so we can move the switcher page: our fixmaps sometimes take more than 2MB now..." Ugh. Annoying conflicts with the tcm_vhost -> vhost_scsi rename. Hopefully correctly resolved. * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (57 commits) caif_virtio: Remove bouncing email addresses lguest: improve code readability in lg_cpu_start. virtio-net: fill only rx queues which are being used lguest: map Switcher below fixmap. lguest: cache last cpu we ran on. lguest: map Switcher text whenever we allocate a new pagetable. lguest: don't share Switcher PTE pages between guests. lguest: expost switcher_pages array (as lg_switcher_pages). lguest: extract shadow PTE walking / allocating. lguest: make check_gpte et. al return bool. lguest: assume Switcher text is a single page. lguest: rename switcher_page to switcher_pages. lguest: remove RESERVE_MEM constant. lguest: check vaddr not pgd for Switcher protection. lguest: prepare to make SWITCHER_ADDR a variable. virtio: console: replace EMFILE with EBUSY for already-open port virtio-scsi: reset virtqueue affinity when doing cpu hotplug virtio-scsi: introduce multiqueue support virtio-scsi: push vq lock/unlock into virtscsi_vq_done virtio-scsi: pass struct virtio_scsi to virtqueue completion function ...	2013-05-02 14:14:04 -07:00
Konrad Rzeszutek Wilk	cc456c4e7c	x86, gdt, hibernate: Store/load GDT for hibernate path. The git commite7a5cd063c7b4c58417f674821d63f5eb6747e37 ("x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path is not needed.") assumes that for the hibernate path the booting kernel and the resuming kernel MUST be the same. That is certainly the case for a 32-bit kernel (see check_image_kernel and CONFIG_ARCH_HIBERNATION_HEADER config option). However for 64-bit kernels it is OK to have a different kernel version (and size of the image) of the booting and resuming kernels. Hence the above mentioned git commit introduces an regression. This patch fixes it by introducing a 'struct desc_ptr gdt_desc' back in the 'struct saved_context'. However instead of having in the 'save_processor_state' and 'restore_processor_state' the store/load_gdt calls, we are only saving the GDT in the save_processor_state. For the restore path the lgdt operation is done in hibernate_asm_[32\|64].S in the 'restore_registers' path. The apt reader of this description will recognize that only 64-bit kernels need this treatment, not 32-bit. This patch adds the logic in the 32-bit path to be more similar to 64-bit so that in the future the unification process can take advantage of this. [ hpa: this also reverts an inadvertent on-disk format change ] Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1367459610-9656-2-git-send-email-konrad.wilk@oracle.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-05-02 11:27:35 -07:00
Joerg Roedel	0c4513be3d	Merge branches 'iommu/fixes', 'x86/vt-d', 'x86/amd', 'ppc/pamu', 'core' and 'arm/tegra' into next	2013-05-02 12:10:19 +02:00
Linus Torvalds	08d7676083	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull compat cleanup from Al Viro: "Mostly about syscall wrappers this time; there will be another pile with patches in the same general area from various people, but I'd rather push those after both that and vfs.git pile are in." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: syscalls.h: slightly reduce the jungles of macros get rid of union semop in sys_semctl(2) arguments make do_mremap() static sparc: no need to sign-extend in sync_file_range() wrapper ppc compat wrappers for add_key(2) and request_key(2) are pointless x86: trim sys_ia32.h x86: sys32_kill and sys32_mprotect are pointless get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC merge compat sys_ipc instances consolidate compat lookup_dcookie() convert vmsplice to COMPAT_SYSCALL_DEFINE switch getrusage() to COMPAT_SYSCALL_DEFINE switch epoll_pwait to COMPAT_SYSCALL_DEFINE convert sendfile{,64} to COMPAT_SYSCALL_DEFINE switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect make HAVE_SYSCALL_WRAPPERS unconditional consolidate cond_syscall and SYSCALL_ALIAS declarations teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long get rid of duplicate logics in __SC_....[1-6] definitions	2013-05-01 07:21:43 -07:00
Linus Torvalds	5f56886521	Merge branch 'akpm' (incoming from Andrew) Merge third batch of fixes from Andrew Morton: "Most of the rest. I still have two large patchsets against AIO and IPC, but they're a bit stuck behind other trees and I'm about to vanish for six days. - random fixlets - inotify - more of the MM queue - show_stack() cleanups - DMI update - kthread/workqueue things - compat cleanups - epoll udpates - binfmt updates - nilfs2 - hfs - hfsplus - ptrace - kmod - coredump - kexec - rbtree - pids - pidns - pps - semaphore tweaks - some w1 patches - relay updates - core Kconfig changes - sysrq tweaks" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (109 commits) Documentation/sysrq: fix inconstistent help message of sysrq key ethernet/emac/sysrq: fix inconstistent help message of sysrq key sparc/sysrq: fix inconstistent help message of sysrq key powerpc/xmon/sysrq: fix inconstistent help message of sysrq key ARM/etm/sysrq: fix inconstistent help message of sysrq key power/sysrq: fix inconstistent help message of sysrq key kgdb/sysrq: fix inconstistent help message of sysrq key lib/decompress.c: fix initconst notifier-error-inject: fix module names in Kconfig kernel/sys.c: make prctl(PR_SET_MM) generally available UAPI: remove empty Kbuild files menuconfig: print more info for symbol without prompts init/Kconfig: re-order CONFIG_EXPERT options to fix menuconfig display kconfig menu: move Virtualization drivers near other virtualization options Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS relay: use macro PAGE_ALIGN instead of FIX_SIZE kernel/relay.c: move FIX_SIZE macro into relay.c kernel/relay.c: remove unused function argument actor drivers/w1/slaves/w1_ds2760.c: fix the error handling in w1_ds2760_add_slave() drivers/w1/slaves/w1_ds2781.c: fix the error handling in w1_ds2781_add_slave() ...	2013-04-30 17:37:43 -07:00
Tejun Heo	a43cb95d54	dump_stack: unify debug information printed by show_regs() show_regs() is inherently arch-dependent but it does make sense to print generic debug information and some archs already do albeit in slightly different forms. This patch introduces a generic function to print debug information from show_regs() so that different archs print out the same information and it's much easier to modify what's printed. show_regs_print_info() prints out the same debug info as dump_stack() does plus task and thread_info pointers. * Archs which didn't print debug info now do. alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r, metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc, um, xtensa * Already prints debug info. Replaced with show_regs_print_info(). The printed information is superset of what used to be there. arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86 * s390 is special in that it used to print arch-specific information along with generic debug info. Heiko and Martin think that the arch-specific extra isn't worth keeping s390 specfic implementation. Converted to use the generic version. Note that now all archs print the debug info before actual register dumps. An example BUG() dump follows. kernel BUG at /work/os/work/kernel/workqueue.c:4841! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7 Hardware name: empty empty/S3992, BIOS 080011 10/26/2007 task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000 RIP: 0010:[<ffffffff8234a07e>] [<ffffffff8234a07e>] init_workqueues+0x4/0x6 RSP: 0000:ffff88007c861ec8 EFLAGS: 00010246 RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001 RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650 0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760 Call Trace: [<ffffffff81000312>] do_one_initcall+0x122/0x170 [<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8 [<ffffffff81c47760>] ? rest_init+0x140/0x140 [<ffffffff81c4776e>] kernel_init+0xe/0xf0 [<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0 [<ffffffff81c47760>] ? rest_init+0x140/0x140 ... v2: Typo fix in x86-32. v3: CPU number dropped from show_regs_print_info() as dump_stack_print_info() has been updated to print it. s390 specific implementation dropped as requested by s390 maintainers. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Sam Ravnborg <sam@ravnborg.org> Acked-by: Chris Metcalf <cmetcalf@tilera.com> [tile bits] Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon bits] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:02 -07:00
Linus Torvalds	3ed1c478ef	Power management and ACPI updates for 3.10-rc1 - ARM big.LITTLE cpufreq driver from Viresh Kumar. - exynos5440 cpufreq driver from Amit Daniel Kachhap. - cpufreq core cleanup and code consolidation from Viresh Kumar and Stratos Karafotis. - cpufreq scalability improvement from Nathan Zimmer. - AMD "frequency sensitivity feedback" powersave bias for the ondemand cpufreq governor from Jacob Shin. - cpuidle code consolidation and cleanups from Daniel Lezcano. - ARM OMAP cpuidle fixes from Santosh Shilimkar and Daniel Lezcano. - ACPICA fixes and other improvements from Bob Moore, Jung-uk Kim, Lv Zheng, Yinghai Lu, Tang Chen, Colin Ian King, and Linn Crosetto. - ACPI core updates related to hotplug from Toshi Kani, Paul Bolle, Yasuaki Ishimatsu, and Rafael J. Wysocki. - Intel Lynxpoint LPSS (Low-Power Subsystem) support improvements from Rafael J. Wysocki and Andy Shevchenko. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJRf8M8AAoJEKhOf7ml8uNsud4P/3cabXP5lDipzibRrpOiONse puuvIdhtNdMRMc3t1oSDjNH/w/JA51Gc+ICGFAORiyVmqxBc85mpT6J5ibqV7hNd pCqbKJceoB5PajHZSx22e4wG9O7YN1k3r80p38IfFzA+Ct0KNSuE0ixMEfHKYjiq p5pXswk6TY3gtBReH9agrafHqDtXw4IMTE0asMuJ+BorPW7vQeiNlrkuA+0qmDuu 26O0Pm2TVkx1ryfTjdM9zSZ9X2G4JuM8rm1/VFZWQJTExwlv3bA2Za1nvQNJlJ99 6JZ0JXfAehcEW2Ye0sqsZ8HSEabDVHM29QvvOszJ5RpBXERiOCHOkhvFleCoTpn0 Xq0rtXPrLMH1G28Ej+cxmsAjfzOLV2Byg30CAoI/GCLuQ+xh+VMCpuNYQuld25CG 9rtYd0fWESeYsAebhDcX0E3xyzJtbrHtOb9PyGwNkbAJ8YQfhVSMCOPi2SX2wa+Q qXLXi2VaHvjBSUKcAv5BmM+Ya57Be+88D0LxbgXbUeOnYefUK1ljldKDDshkMjgG P4LPdm4JpoB5ncXSOO1Dz9w9QnNcFexSUySd/TtKLNMha1vEHV8ISzNPYY+9IdXf XN0VZbFnUDzdj+Fwna7zyFb1cGihDYJKAtpXvRd8Y6RGUxKx9uGLAFJZw/xZB/cR KZKuML5O8MgJuef37F38 =H/se -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI updates from Rafael J Wysocki: - ARM big.LITTLE cpufreq driver from Viresh Kumar. - exynos5440 cpufreq driver from Amit Daniel Kachhap. - cpufreq core cleanup and code consolidation from Viresh Kumar and Stratos Karafotis. - cpufreq scalability improvement from Nathan Zimmer. - AMD "frequency sensitivity feedback" powersave bias for the ondemand cpufreq governor from Jacob Shin. - cpuidle code consolidation and cleanups from Daniel Lezcano. - ARM OMAP cpuidle fixes from Santosh Shilimkar and Daniel Lezcano. - ACPICA fixes and other improvements from Bob Moore, Jung-uk Kim, Lv Zheng, Yinghai Lu, Tang Chen, Colin Ian King, and Linn Crosetto. - ACPI core updates related to hotplug from Toshi Kani, Paul Bolle, Yasuaki Ishimatsu, and Rafael J Wysocki. - Intel Lynxpoint LPSS (Low-Power Subsystem) support improvements from Rafael J Wysocki and Andy Shevchenko. * tag 'pm+acpi-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (192 commits) cpufreq: Revert incorrect commit `5800043` cpufreq: MAINTAINERS: Add co-maintainer cpuidle: add maintainer entry ACPI / thermal: do not always return THERMAL_TREND_RAISING for active trip points ARM: s3c64xx: cpuidle: use init/exit common routine cpufreq: pxa2xx: initialize variables ACPI: video: correct acpi_video_bus_add error processing SH: cpuidle: use init/exit common routine ARM: S5pv210: compiling issue, ARM_S5PV210_CPUFREQ needs CONFIG_CPU_FREQ_TABLE=y ACPI: Fix wrong parameter passed to memblock_reserve cpuidle: fix comment format pnp: use %*phC to dump small buffers isapnp: remove debug leftovers ARM: imx: cpuidle: use init/exit common routine ARM: davinci: cpuidle: use init/exit common routine ARM: kirkwood: cpuidle: use init/exit common routine ARM: calxeda: cpuidle: use init/exit common routine ARM: tegra: cpuidle: use init/exit common routine for tegra3 ARM: tegra: cpuidle: use init/exit common routine for tegra2 ARM: OMAP4: cpuidle: use init/exit common routine ...	2013-04-30 15:21:02 -07:00
Linus Torvalds	5d434fcb25	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual stuff, mostly comment fixes, typo fixes, printk fixes and small code cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (45 commits) mm: Convert print_symbol to %pSR gfs2: Convert print_symbol to %pSR m32r: Convert print_symbol to %pSR iostats.txt: add easy-to-find description for field 6 x86 cmpxchg.h: fix wrong comment treewide: Fix typo in printk and comments doc: devicetree: Fix various typos docbook: fix 8250 naming in device-drivers pata_pdc2027x: Fix compiler warning treewide: Fix typo in printks mei: Fix comments in drivers/misc/mei treewide: Fix typos in kernel messages pm44xx: Fix comment for "CONFIG_CPU_IDLE" doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP" mmzone: correct "pags" to "pages" in comment. kernel-parameters: remove outdated 'noresidual' parameter Remove spurious _H suffixes from ifdef comments sound: Remove stray pluses from Kconfig file radio-shark: Fix printk "CONFIG_LED_CLASS" doc: put proper reference to CONFIG_MODULE_SIG_ENFORCE ...	2013-04-30 09:36:50 -07:00
Linus Torvalds	5a5a1bf099	Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: - Add an Intel CMCI hotplug fix - Add AMD family 16h EDAC support - Make the AMD MCE banks code more flexible for virtual environments * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: amd64_edac: Add Family 16h support x86/mce: Rework cmci_rediscover() to play well with CPU hotplug x86, MCE, AMD: Use MCG_CAP MSR to find out number of banks on AMD x86, MCE, AMD: Replace shared_bank array with is_shared_bank() helper	2013-04-30 08:42:45 -07:00
Linus Torvalds	1e2f5b598a	Merge branch 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt update from Ingo Molnar: "Various paravirtualization related changes - the biggest one makes guest support optional via CONFIG_HYPERVISOR_GUEST" * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, wakeup, sleep: Use pvops functions for changing GDT entries x86, xen, gdt: Remove the pvops variant of store_gdt. x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path is not needed. x86: Make Linux guest support optional x86, Kconfig: Move PARAVIRT_DEBUG into the paravirt menu	2013-04-30 08:41:21 -07:00
Linus Torvalds	f9b3bcfbc4	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "Misc smaller changes all over the map" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/iommu/dmar: Remove warning for HPET scope type x86/mm/gart: Drop unnecessary check x86/mm/hotplug: Put kernel_physical_mapping_remove() declaration in CONFIG_MEMORY_HOTREMOVE x86/mm/fixmap: Remove unused FIX_CYCLONE_TIMER x86/mm/numa: Simplify some bit mangling x86/mm: Re-enable DEBUG_TLBFLUSH for X86_32 x86/mm/cpa: Cleanup split_large_page() and its callee x86: Drop always empty .text..page_aligned section	2013-04-30 08:40:35 -07:00
Linus Torvalds	01c7cd0ef5	Merge branch 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perparatory x86 kasrl changes from Ingo Molnar: "This contains changes from the ongoing KASLR work, by Kees Cook. The main changes are the use of a read-only IDT on x86 (which decouples the userspace visible virtual IDT address from the physical address), and a rework of ELF relocation support, in preparation of random, boot-time kernel image relocation." * 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, relocs: Refactor the relocs tool to merge 32- and 64-bit ELF x86, relocs: Build separate 32/64-bit tools x86, relocs: Add 64-bit ELF support to relocs tool x86, relocs: Consolidate processing logic x86, relocs: Generalize ELF structure names x86: Use a read-only IDT alias on all CPUs	2013-04-30 08:37:24 -07:00
Linus Torvalds	df8edfa9af	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpuid changes from Ingo Molnar: "The biggest change is x86 CPU bug handling refactoring and cleanups, by Borislav Petkov" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, CPU, AMD: Drop useless label x86, AMD: Correct {rd,wr}msr_amd_safe warnings x86: Fold-in trivial check_config function x86, cpu: Convert AMD Erratum 400 x86, cpu: Convert AMD Erratum 383 x86, cpu: Convert Cyrix coma bug detection x86, cpu: Convert FDIV bug detection x86, cpu: Convert F00F bug detection x86, cpu: Expand cpufeature facility to include cpu bugs	2013-04-30 08:34:38 -07:00
Linus Torvalds	874f6d1be7	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc smaller cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/lib: Fix spelling, put space between a numeral and its units x86/lib: Fix spelling in the comments x86, quirks: Shut-up a long-standing gcc warning x86, msr: Unify variable names x86-64, docs, mm: Add vsyscall range to virtual address space layout x86: Drop KERNEL_IMAGE_START x86_64: Use __BOOT_DS instead_of __KERNEL_DS for safety	2013-04-30 08:34:07 -07:00
Linus Torvalds	ab86e974f0	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core timer updates from Ingo Molnar: "The main changes in this cycle's merge are: - Implement shadow timekeeper to shorten in kernel reader side blocking, by Thomas Gleixner. - Posix timers enhancements by Pavel Emelyanov: - allocate timer ID per process, so that exact timer ID allocations can be re-created be checkpoint/restore code. - debuggability and tooling (/proc/PID/timers, etc.) improvements. - suspend/resume enhancements by Feng Tang: on certain new Intel Atom processors (Penwell and Cloverview), there is a feature that the TSC won't stop in S3 state, so the TSC value won't be reset to 0 after resume. This can be taken advantage of by the generic via the CLOCK_SOURCE_SUSPEND_NONSTOP flag: instead of using the RTC to recover/approximate sleep time, the main (and precise) clocksource can be used. - Fix /proc/timer_list for 4096 CPUs by Nathan Zimmer: on so many CPUs the file goes beyond 4MB of size and thus the current simplistic seqfile approach fails. Convert /proc/timer_list to a proper seq_file with its own iterator. - Cleanups and refactorings of the core timekeeping code by John Stultz. - International Atomic Clock time is managed by the NTP code internally currently but not exposed externally. Separate the TAI code out and add CLOCK_TAI support and TAI support to the hrtimer and posix-timer code, by John Stultz. - Add deep idle support enhacement to the broadcast clockevents core timer code, by Daniel Lezcano: add an opt-in CLOCK_EVT_FEAT_DYNIRQ clockevents feature (which will be utilized by future clockevents driver updates), which allows the use of IRQ affinities to avoid spurious wakeups of idle CPUs - the right CPU with an expiring timer will be woken. - Add new ARM bcm281xx clocksource driver, by Christian Daudt - ... various other fixes and cleanups" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) clockevents: Set dummy handler on CPU_DEAD shutdown timekeeping: Update tk->cycle_last in resume posix-timers: Remove unused variable clockevents: Switch into oneshot mode even if broadcast registered late timer_list: Convert timer list to be a proper seq_file timer_list: Split timer_list_show_tickdevices posix-timers: Show sigevent info in proc file posix-timers: Introduce /proc/PID/timers file posix timers: Allocate timer id per process (v2) timekeeping: Make sure to notify hrtimers when TAI offset changes hrtimer: Fix ktime_add_ns() overflow on 32bit architectures hrtimer: Add expiry time overflow check in hrtimer_interrupt timekeeping: Shorten seq_count region timekeeping: Implement a shadow timekeeper timekeeping: Delay update of clock->cycle_last timekeeping: Store cycle_last value in timekeeper struct as well ntp: Remove ntp_lock, using the timekeeping locks to protect ntp state timekeeping: Simplify tai updating from do_adjtimex timekeeping: Hold timekeepering locks in do_adjtimex and hardpps timekeeping: Move ADJ_SETOFFSET to top level do_adjtimex() ...	2013-04-30 08:15:40 -07:00
Linus Torvalds	8700c95adb	Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP/hotplug changes from Ingo Molnar: "This is a pretty large, multi-arch series unifying and generalizing the various disjunct pieces of idle routines that architectures have historically copied from each other and have grown in random, wildly inconsistent and sometimes buggy directions: 101 files changed, 455 insertions(+), 1328 deletions(-) this went through a number of review and test iterations before it was committed, it was tested on various architectures, was exposed to linux-next for quite some time - nevertheless it might cause problems on architectures that don't read the mailing lists and don't regularly test linux-next. This cat herding excercise was motivated by the -rt kernel, and was brought to you by Thomas "the Whip" Gleixner." * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) idle: Remove GENERIC_IDLE_LOOP config switch um: Use generic idle loop ia64: Make sure interrupts enabled when we "safe_halt()" sparc: Use generic idle loop idle: Remove unused ARCH_HAS_DEFAULT_IDLE bfin: Fix typo in arch_cpu_idle() xtensa: Use generic idle loop x86: Use generic idle loop unicore: Use generic idle loop tile: Use generic idle loop tile: Enter idle with preemption disabled sh: Use generic idle loop score: Use generic idle loop s390: Use generic idle loop powerpc: Use generic idle loop parisc: Use generic idle loop openrisc: Use generic idle loop mn10300: Use generic idle loop mips: Use generic idle loop microblaze: Use generic idle loop ...	2013-04-30 07:50:17 -07:00
Linus Torvalds	16fa94b532	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The main changes in this development cycle were: - full dynticks preparatory work by Frederic Weisbecker - factor out the cpu time accounting code better, by Li Zefan - multi-CPU load balancer cleanups and improvements by Joonsoo Kim - various smaller fixes and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits) sched: Fix init NOHZ_IDLE flag sched: Prevent to re-select dst-cpu in load_balance() sched: Rename load_balance_tmpmask to load_balance_mask sched: Move up affinity check to mitigate useless redoing overhead sched: Don't consider other cpus in our group in case of NEWLY_IDLE sched: Explicitly cpu_idle_type checking in rebalance_domains() sched: Change position of resched_cpu() in load_balance() sched: Fix wrong rq's runnable_avg update with rt tasks sched: Document task_struct::personality field sched/cpuacct/UML: Fix header file dependency bug on the UML build cgroup: Kill subsys.active flag sched/cpuacct: No need to check subsys active state sched/cpuacct: Initialize cpuacct subsystem earlier sched/cpuacct: Initialize root cpuacct earlier sched/cpuacct: Allocate per_cpu cpuusage for root cpuacct statically sched/cpuacct: Clean up cpuacct.h sched/cpuacct: Remove redundant NULL checks in cpuacct_acount_field() sched/cpuacct: Remove redundant NULL checks in cpuacct_charge() sched/cpuacct: Add cpuacct_acount_field() sched/cpuacct: Add cpuacct_init() ...	2013-04-30 07:43:28 -07:00
Linus Torvalds	e0972916e8	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Features: - Add "uretprobes" - an optimization to uprobes, like kretprobes are an optimization to kprobes. "perf probe -x file sym%return" now works like kretprobes. By Oleg Nesterov. - Introduce per core aggregation in 'perf stat', from Stephane Eranian. - Add memory profiling via PEBS, from Stephane Eranian. - Event group view for 'annotate' in --stdio, --tui and --gtk, from Namhyung Kim. - Add support for AMD NB and L2I "uncore" counters, by Jacob Shin. - Add Ivy Bridge-EP uncore support, by Zheng Yan - IBM zEnterprise EC12 oprofile support patchlet from Robert Richter. - Add perf test entries for checking breakpoint overflow signal handler issues, from Jiri Olsa. - Add perf test entry for for checking number of EXIT events, from Namhyung Kim. - Add perf test entries for checking --cpu in record and stat, from Jiri Olsa. - Introduce perf stat --repeat forever, from Frederik Deweerdt. - Add --no-demangle to report/top, from Namhyung Kim. - PowerPC fixes plus a couple of cleanups/optimizations in uprobes and trace_uprobes, by Oleg Nesterov. Various fixes and refactorings: - Fix dependency of the python binding wrt libtraceevent, from Naohiro Aota. - Simplify some perf_evlist methods and to allow 'stat' to share code with 'record' and 'trace', by Arnaldo Carvalho de Melo. - Remove dead code in related to libtraceevent integration, from Namhyung Kim. - Revert "perf sched: Handle PERF_RECORD_EXIT events" to get 'perf sched lat' back working, by Arnaldo Carvalho de Melo - We don't use Newt anymore, just plain libslang, by Arnaldo Carvalho de Melo. - Kill a bunch of die() calls, from Namhyung Kim. - Fix build on non-glibc systems due to libio.h absence, from Cody P Schafer. - Remove some perf_session and tracing dead code, from David Ahern. - Honor parallel jobs, fix from Borislav Petkov - Introduce tools/lib/lk library, initially just removing duplication among tools/perf and tools/vm. from Borislav Petkov ... and many more I missed to list, see the shortlog and git log for more details." * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (136 commits) perf/x86/intel/P4: Robistify P4 PMU types perf/x86/amd: Fix AMD NB and L2I "uncore" support perf/x86/amd: Remove old-style NB counter support from perf_event_amd.c perf/x86: Check all MSRs before passing hw check perf/x86/amd: Add support for AMD NB and L2I "uncore" counters perf/x86/intel: Add Ivy Bridge-EP uncore support perf/x86/intel: Fix SNB-EP CBO and PCU uncore PMU filter management perf/x86: Avoid kfree() in CPU_{STARTING,DYING} uprobes/perf: Avoid perf_trace_buf_prepare/submit if ->perf_events is empty uprobes/tracing: Don't pass addr=ip to perf_trace_buf_submit() uprobes/tracing: Change create_trace_uprobe() to support uretprobes uprobes/tracing: Make seq_printf() code uretprobe-friendly uprobes/tracing: Make register_uprobe_event() paths uretprobe-friendly uprobes/tracing: Make uprobe_{trace,perf}_print() uretprobe-friendly uprobes/tracing: Introduce is_ret_probe() and uretprobe_dispatcher() uprobes/tracing: Introduce uprobe_{trace,perf}_print() helpers uprobes/tracing: Generalize struct uprobe_trace_entry_head uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls uprobes/tracing: Kill the pointless seq_print_ip_sym() call uprobes/tracing: Kill the pointless task_pt_regs() calls ...	2013-04-30 07:41:01 -07:00
Gerald Schaefer	106c992a5e	mm/hugetlb: add more arch-defined huge_pte functions Commit `abf09bed3c` ("s390/mm: implement software dirty bits") introduced another difference in the pte layout vs. the pmd layout on s390, thoroughly breaking the s390 support for hugetlbfs. This requires replacing some more pte_xxx functions in mm/hugetlbfs.c with a huge_pte_xxx version. This patch introduces those huge_pte_xxx functions and their generic implementation in asm-generic/hugetlb.h, which will now be included on all architectures supporting hugetlbfs apart from s390. This change will be a no-op for those architectures. [akpm@linux-foundation.org: fix warning] Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Hillf Danton <dhillf@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> [for !s390 parts] Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:33 -07:00
Chegu Vinod	cbf6435858	KVM: x86: Increase the "hard" max VCPU limit KVM guests today use 8bit APIC ids allowing for 256 ID's. Reserving one ID for Broadcast interrupts should leave 255 ID's. In case of KVM there is no need for reserving another ID for IO-APIC so the hard max limit for VCPUS can be increased from 254 to 255. (This was confirmed by Gleb Natapov http://article.gmane.org/gmane.comp.emulators.kvm.devel/99713 ) Signed-off-by: Chegu Vinod <chegu_vinod@hp.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-28 13:17:48 +03:00
Alex Williamson	2a5bab1004	kvm: Allow build-time configuration of KVM device assignment We hope to at some point deprecate KVM legacy device assignment in favor of VFIO-based assignment. Towards that end, allow legacy device assignment to be deconfigured. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-28 12:58:56 +03:00
Gleb Natapov	064d1afaa5	Merge git://github.com/agraf/linux-2.6.git kvm-ppc-next into queue	2013-04-28 12:50:07 +03:00
Jan Kiszka	730dca42c1	KVM: x86: Rework request for immediate exit The VMX implementation of enable_irq_window raised KVM_REQ_IMMEDIATE_EXIT after we checked it in vcpu_enter_guest. This caused infinite loops on vmentry. Fix it by letting enable_irq_window signal the need for an immediate exit via its return value and drop KVM_REQ_IMMEDIATE_EXIT. This issue only affects nested VMX scenarios. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-28 12:44:18 +03:00
Rafael J. Wysocki	885f925eef	Merge branch 'pm-cpufreq' * pm-cpufreq: (57 commits) cpufreq: MAINTAINERS: Add co-maintainer cpufreq: pxa2xx: initialize variables ARM: S5pv210: compiling issue, ARM_S5PV210_CPUFREQ needs CONFIG_CPU_FREQ_TABLE=y cpufreq: cpu0: Put cpu parent node after using it cpufreq: ARM big LITTLE: Adapt to latest cpufreq updates cpufreq: ARM big LITTLE: put DT nodes after using them cpufreq: Don't call __cpufreq_governor() for drivers without target() cpufreq: exynos5440: Protect OPP search calls with RCU lock cpufreq: dbx500: Round to closest available freq cpufreq: Call __cpufreq_governor() with correct policy->cpus mask cpufreq / intel_pstate: Optimize intel_pstate_set_policy cpufreq: OMAP: instantiate omap-cpufreq as a platform_driver arm: exynos: Enable OPP library support for exynos5440 cpufreq: exynos: Remove error return even if no soc is found cpufreq: exynos: Add cpufreq driver for exynos5440 cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor cpufreq: ondemand: allow custom powersave_bias_target handler to be registered cpufreq: convert cpufreq_driver to using RCU cpufreq: powerpc/platforms/cell: move cpufreq driver to drivers/cpufreq cpufreq: sparc: move cpufreq driver to drivers/cpufreq ... Conflicts: MAINTAINERS (with commit `a8e39c3` from pm-cpuidle) drivers/cpufreq/cpufreq_governor.h (with commit `beb0ff3`)	2013-04-28 02:10:46 +02:00
Alexander Graf	8175e5b79c	KVM: Add KVM_IRQCHIP_NUM_PINS in addition to KVM_IOAPIC_NUM_PINS The concept of routing interrupt lines to an irqchip is nothing that is IOAPIC specific. Every irqchip has a maximum number of pins that can be linked to irq lines. So let's add a new define that allows us to reuse generic code for non-IOAPIC platforms. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2013-04-26 20:27:13 +02:00
Ingo Molnar	5ac2b5c272	perf/x86/intel/P4: Robistify P4 PMU types Linus found, while extending integer type extension checks in the sparse static code checker, various fragile patterns of mixed signed/unsigned 64-bit/32-bit integer use in perf_events_p4.c. The relevant hardware register ABI is 64 bit wide on 32-bit kernels as well, so clean it all up a bit, remove unnecessary casts, and make sure we use 64-bit unsigned integers in these places. [ Unfortunately this patch was not tested on real P4 hardware, those are pretty rare already. If this patch causes any problems on P4 hardware then please holler ... ] Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: David Miller <davem@davemloft.net> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20130424072630.GB1780@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-26 09:31:41 +02:00
Jussi Kivilinna	f3f935a76a	crypto: camellia - add AVX2/AES-NI/x86_64 assembler implementation of camellia cipher Patch adds AVX2/AES-NI/x86-64 implementation of Camellia cipher, requiring 32 parallel blocks for input (512 bytes). Compared to AVX implementation, this version is extended to use the 256-bit wide YMM registers. For AES-NI instructions data is split to two 128-bit registers and merged afterwards. Even with this additional handling, performance should be higher compared to the AES-NI/AVX implementation. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:09:07 +08:00
Jussi Kivilinna	56d76c96a9	crypto: serpent - add AVX2/x86_64 assembler implementation of serpent cipher Patch adds AVX2/x86-64 implementation of Serpent cipher, requiring 16 parallel blocks for input (256 bytes). Implementation is based on the AVX implementation and extends to use the 256-bit wide YMM registers. Since serpent does not use table look-ups, this implementation should be close to two times faster than the AVX implementation. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:09:07 +08:00
Jussi Kivilinna	cf1521a1a5	crypto: twofish - add AVX2/x86_64 assembler implementation of twofish cipher Patch adds AVX2/x86-64 implementation of Twofish cipher, requiring 16 parallel blocks for input (256 bytes). Table look-ups are performed using vpgatherdd instruction directly from vector registers and thus should be faster than earlier implementations. Implementation also uses 256-bit wide YMM registers, which should give additional speed up compared to the AVX implementation. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:09:05 +08:00
Jussi Kivilinna	6048801070	crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher Patch adds AVX2/x86-64 implementation of Blowfish cipher, requiring 32 parallel blocks for input (256 bytes). Table look-ups are performed using vpgatherdd instruction directly from vector registers and thus should be faster than earlier implementations. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:09:04 +08:00
Jussi Kivilinna	a05248ed2d	crypto: x86 - add more optimized XTS-mode for serpent-avx This patch adds AVX optimized XTS-mode helper functions/macros and converts serpent-avx to use the new facilities. Benefits are slightly improved speed and reduced stack usage as use of temporary IV-array is avoided. tcrypt results, with Intel i5-2450M: enc dec 16B 1.00x 1.00x 64B 1.00x 1.00x 256B 1.04x 1.06x 1024B 1.09x 1.09x 8192B 1.10x 1.09x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:01:51 +08:00
Li Zhong	7f5281ae8a	x86 cmpxchg.h: fix wrong comment Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-04-25 10:39:04 +02:00
Thomas Gleixner	6402c7dc2a	Merge branch 'linus' into timers/core Reason: Get upstream fixes before adding conflicting code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-04-24 20:33:54 +02:00
Jan Kiszka	384bb78327	KVM: nVMX: Validate EFER values for VM_ENTRY/EXIT_LOAD_IA32_EFER As we may emulate the loading of EFER on VM-entry and VM-exit, implement the checks that VMX performs on the guest and host values on vmlaunch/ vmresume. Factor out kvm_valid_efer for this purpose which checks for set reserved bits. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 12:53:42 +03:00
Abel Gordon	89662e566c	KVM: nVMX: Shadow-vmcs control fields/bits Add definitions for all the vmcs control fields/bits required to enable vmcs-shadowing Signed-off-by: Abel Gordon <abelg@il.ibm.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 10:51:09 +03:00
Rusty Russell	6b39271746	lguest: map Switcher below fixmap. Now we've adjusted all the code, we can simply set switcher_addr to wherever it needs to go below the fixmaps, rather than asserting that it should be so. With large NR_CPUS and PAE, people were hitting the "mapping switcher would thwack fixmap" message. Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2013-04-22 15:45:03 +09:30
Rusty Russell	93a2cdff98	lguest: assume Switcher text is a single page. ie. SHARED_SWITCHER_PAGES == 1. It is well under a page, and it's a minor simplification: it's nice to have one simplification in a patch series! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2013-04-22 15:31:36 +09:30
Rusty Russell	406a590ba1	lguest: prepare to make SWITCHER_ADDR a variable. We currently use the whole top PGD entry for the switcher, but that's hitting the fixmap in some configurations (mainly, large NR_CPUS). Introduce a variable, currently set to the constant. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2013-04-22 15:31:33 +09:30
Jacob Shin	c43ca5091a	perf/x86/amd: Add support for AMD NB and L2I "uncore" counters Add support for AMD Family 15h [and above] northbridge performance counters. MSRs 0xc0010240 ~ 0xc0010247 are shared across all cores that share a common northbridge. Add support for AMD Family 16h L2 performance counters. MSRs 0xc0010230 ~ 0xc0010237 are shared across all cores that share a common L2 cache. We do not enable counter overflow interrupts. Sampling mode and per-thread events are not supported. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Stephane Eranian <eranian@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20130419213428.GA8229@jshin-Toonie Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-21 11:01:24 +02:00
Ingo Molnar	73e21ce28d	Merge branch 'perf/urgent' into perf/core Conflicts: arch/x86/kernel/cpu/perf_event_intel.c Merge in the latest fixes before applying new patches, resolve the conflict. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-21 10:57:33 +02:00
Linus Torvalds	db93f8b420	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "Three groups of fixes: 1. Make sure we don't execute the early microcode patching if family < 6, since it would touch MSRs which don't exist on those families, causing crashes. 2. The Xen partial emulation of HyperV can be dealt with more gracefully than just disabling the driver. 3. More EFI variable space magic. In particular, variables hidden from runtime code need to be taken into account too." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode: Verify the family before dispatching microcode patching x86, hyperv: Handle Xen emulation of Hyper-V more gracefully x86,efi: Implement efi_no_storage_paranoia parameter efi: Export efi_query_variable_store() for efivars.ko x86/Kconfig: Make EFI select UCS2_STRING efi: Distinguish between "remaining space" and actually used space efi: Pass boot services variable info to runtime code Move utf16 functions to kernel core and rename x86,efi: Check max_size only if it is non-zero. x86, efivars: firmware bug workarounds should be in platform code	2013-04-20 18:38:48 -07:00
H. Peter Anvin	c0a9f451e4	Merge remote-tracking branch 'efi/urgent' into x86/urgent Matt Fleming (1): x86, efivars: firmware bug workarounds should be in platform code Matthew Garrett (3): Move utf16 functions to kernel core and rename efi: Pass boot services variable info to runtime code efi: Distinguish between "remaining space" and actually used space Richard Weinberger (2): x86,efi: Check max_size only if it is non-zero. x86,efi: Implement efi_no_storage_paranoia parameter Sergey Vlasov (2): x86/Kconfig: Make EFI select UCS2_STRING efi: Export efi_query_variable_store() for efivars.ko Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-04-19 17:09:03 -07:00
Joerg Roedel	35d3d814cb	iommu: Fix compile warnings with forward declarations The irq_remapping.h file for x86 does not include all necessary forward declarations for the data structures used. This causes compile warnings, so fix it. Signed-off-by: Joerg Roedel <joro@8bytes.org>	2013-04-19 20:34:55 +02:00
Ingo Molnar	5379f8c0d7	Add required support for AMD F16h to amd64_edac. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRcSI0AAoJEBLB8Bhh3lVKWjIP/3BdMQljadBzIm5xSU5Vt4lW 1R22XrfNAwqdbjuwHGV7TPMJrjnV8zGQBsghALdqCT7xtRnl71ce7XVGbd1MblGc llTcOBlCp1v5xHbNaL1xmunvI7JuMushX69YRD/KOFoEaXo2iQDCjSpA+73K3d/g OHjFQGrZkuuNkdTEWiOo6cplDMwXKNzLlS1ccN7XzzKPQ3aA0Q/XwlUftjHdBfGP WCxahtsz60xCxJt1x4hy5/ocZT2WfLp/shSS/udsGcTl5g75Yjhz8RkTuOmNySzo KDwsSQaT7HBz+b3SirG1IHEzgsF2DKZqDziDlBt8qx2xcHnkp7DfF/rZajCOzoLy k6tdFI2FEj3JZqyCDVxFAP/xmct39S/arqFlLLAUIAQHbsVkrmIH6ZC9NGiSPq9M kEdOOl/G0CbfnPFGd1hYCQUdRzJc7m5hKT05WFRRexRdsLVoAluO8rPWHtbqlmRS lvYkrvVglnOTlN0jVN6xw9LaTkX/q3+kUFzu4ZPiDQYqVzw+XmKYYh+L9OTgnDqA rKZqljdDgwb5GoiexyKtvuJUt7dJ2KlDZoEZ7INLQvA4Im9EQnYW/RH48G+8qouJ XfSjxUET0fmapYzulQI8Ikhs/dbPt7FItUn/ipX7I9fCxMDJOyoXTqZb2S0FbyYa kdZIbxzNDja0cCCcZfOs =5BNU -----END PGP SIGNATURE----- Merge tag 'edac_amd_f16h' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull AMD F16h support for amd64_edac from Borislav Petkov. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-19 13:03:08 +02:00
Neil Horman	03bbcb2e7e	iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets A few years back intel published a spec update: http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf For the 5520 and 5500 chipsets which contained an errata (specificially errata 53), which noted that these chipsets can't properly do interrupt remapping, and as a result the recommend that interrupt remapping be disabled in bios. While many vendors have a bios update to do exactly that, not all do, and of course not all users update their bios to a level that corrects the problem. As a result, occasionally interrupts can arrive at a cpu even after affinity for that interrupt has be moved, leading to lost or spurrious interrupts (usually characterized by the message: kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) There have been several incidents recently of people seeing this error, and investigation has shown that they have system for which their BIOS level is such that this feature was not properly turned off. As such, it would be good to give them a reminder that their systems are vulnurable to this problem. For details of those that reported the problem, please see: https://bugzilla.redhat.com/show_bug.cgi?id=887006 [ Joerg: Removed CONFIG_IRQ_REMAP ifdef from early-quirks.c ] Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Prarit Bhargava <prarit@redhat.com> CC: Don Zickus <dzickus@redhat.com> CC: Don Dutile <ddutile@redhat.com> CC: Bjorn Helgaas <bhelgaas@google.com> CC: Asit Mallick <asit.k.mallick@intel.com> CC: David Woodhouse <dwmw2@infradead.org> CC: linux-pci@vger.kernel.org CC: Joerg Roedel <joro@8bytes.org> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Arkadiusz Miśkiewicz <arekm@maven.pl> Signed-off-by: Joerg Roedel <joro@8bytes.org>	2013-04-18 17:00:47 +02:00
Kristen Carlson Accardi	ca58710f3a	tools/power turbostat: display C8, C9, C10 residency Display residency in the new C-states, C8, C9, C10. C8, C9, C10 are present on some: "Fourth Generation Intel(R) Core(TM) Processors", which are based on Intel(R) microarchitecture code name Haswell. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2013-04-17 19:23:26 -04:00
Yang Zhang	a20ed54d6e	KVM: VMX: Add the deliver posted interrupt algorithm Only deliver the posted interrupt when target vcpu is running and there is no previous interrupt pending in pir. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-04-16 16:32:40 -03:00
Yang Zhang	01e439be77	KVM: VMX: Check the posted interrupt capability Detect the posted interrupt feature. If it exists, then set it in vmcs_config. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-04-16 16:32:40 -03:00
Yang Zhang	d78f266483	KVM: VMX: Register a new IPI for posted interrupt Posted Interrupt feature requires a special IPI to deliver posted interrupt to guest. And it should has a high priority so the interrupt will not be blocked by others. Normally, the posted interrupt will be consumed by vcpu if target vcpu is running and transparent to OS. But in some cases, the interrupt will arrive when target vcpu is scheduled out. And host will see it. So we need to register a dump handler to handle it. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-04-16 16:32:39 -03:00
Yang Zhang	a547c6db4d	KVM: VMX: Enable acknowledge interupt on vmexit The "acknowledge interrupt on exit" feature controls processor behavior for external interrupt acknowledgement. When this control is set, the processor acknowledges the interrupt controller to acquire the interrupt vector on VM exit. After enabling this feature, an interrupt which arrived when target cpu is running in vmx non-root mode will be handled by vmx handler instead of handler in idt. Currently, vmx handler only fakes an interrupt stack and jump to idt table to let real handler to handle it. Further, we will recognize the interrupt and only delivery the interrupt which not belong to current vcpu through idt table. The interrupt which belonged to current vcpu will be handled inside vmx handler. This will reduce the interrupt handle cost of KVM. Also, interrupt enable logic is changed if this feature is turnning on: Before this patch, hypervior call local_irq_enable() to enable it directly. Now IF bit is set on interrupt stack frame, and will be enabled on a return from interrupt handler if exterrupt interrupt exists. If no external interrupt, still call local_irq_enable() to enable it. Refer to Intel SDM volum 3, chapter 33.2. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-04-16 16:32:39 -03:00
Ingo Molnar	b5210b2a34	Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core Pull uprobes updates from Oleg Nesterov: - "uretprobes" - an optimization to uprobes, like kretprobes are an optimization to kprobes. "perf probe -x file sym%return" now works like kretprobes. - PowerPC fixes plus a couple of cleanups/optimizations in uprobes and trace_uprobes. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-16 11:04:10 +02:00
Matthew Garrett	cc5a080c5d	efi: Pass boot services variable info to runtime code EFI variables can be flagged as being accessible only within boot services. This makes it awkward for us to figure out how much space they use at runtime. In theory we could figure this out by simply comparing the results from QueryVariableInfo() to the space used by all of our variables, but that fails if the platform doesn't garbage collect on every boot. Thankfully, calling QueryVariableInfo() while still inside boot services gives a more reliable answer. This patch passes that information from the EFI boot stub up to the efi platform code. Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2013-04-15 21:31:09 +01:00
Linus Torvalds	6c4c4d4bda	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set x86/mm/cpa/selftest: Fix false positive in CPA self test x86/mm/cpa: Convert noop to functional fix x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates	2013-04-14 11:13:24 -07:00
Gleb Natapov	991eebf9f8	KVM: VMX: do not try to reexecute failed instruction while emulating invalid guest state During invalid guest state emulation vcpu cannot enter guest mode to try to reexecute instruction that emulator failed to emulate, so emulation will happen again and again. Prevent that by telling the emulator that instruction reexecution should not be attempted. Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-14 09:44:17 +03:00
Anton Arapov	791eca1010	uretprobes/x86: Hijack return address Hijack the return address and replace it with a trampoline address. Signed-off-by: Anton Arapov <anton@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2013-04-13 15:31:55 +02:00
Dave Hansen	1de14c3c5c	x86-32: Fix possible incomplete TLB invalidate with PAE pagetables This patch attempts to fix: https://bugzilla.kernel.org/show_bug.cgi?id=56461 The symptom is a crash and messages like this: chrome: Corrupted page table at address 34a03000 pdpt = 0000000000000000 pde = 0000000000000000 Bad pagetable: 000f [#1] PREEMPT SMP Ingo guesses this got introduced by commit `611ae8e3f5` ("x86/tlb: enable tlb flush range support for x86") since that code started to free unused pagetables. On x86-32 PAE kernels, that new code has the potential to free an entire PMD page and will clear one of the four page-directory-pointer-table (aka pgd_t entries). The hardware aggressively "caches" these top-level entries and invlpg does not actually affect the CPU's copy. If we clear one we HAVE to do a full TLB flush, otherwise we might continue using a freed pmd page. (note, we do this properly on the population side in pud_populate()). This patch tracks whenever we clear one of these entries in the 'struct mmu_gather', and ensures that we follow up with a full tlb flush. BTW, I disassembled and checked that: if (tlb->fullmm == 0) and if (!tlb->fullmm && !tlb->need_flush_all) generate essentially the same code, so there should be zero impact there to the !PAE case. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Artem S Tashkinov <t.artem@mailcity.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-12 16:56:47 -07:00
Paul Bolle	a7e6567585	x86/mm/fixmap: Remove unused FIX_CYCLONE_TIMER The last users of FIX_CYCLONE_TIMER were removed in v2.6.18. We can remove this unneeded constant. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Link: http://lkml.kernel.org/r/1365698982.1427.3.camel@x61.thuisdomein Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-12 07:21:18 +02:00
Konrad Rzeszutek Wilk	357d122670	x86, xen, gdt: Remove the pvops variant of store_gdt. The two use-cases where we needed to store the GDT were during ACPI S3 suspend and resume. As the patches: x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume path is not needed x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed. have demonstrated - there are other mechanism by which the GDT is saved and reloaded during early resume path. Hence we do not need to worry about the pvops call-chain for saving the GDT and can and can eliminate it. The other areas where the store_gdt is used are never going to be hit when running under the pvops platforms. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1365194544-14648-4-git-send-email-konrad.wilk@oracle.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-04-11 15:40:38 -07:00
Konrad Rzeszutek Wilk	84e70971e6	x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed During the ACPI S3 suspend, we store the GDT in the wakup_header (see wakeup_asm.s) field called 'pmode_gdt'. Which is then used during the resume path and has the same exact value as what the store/load_gdt do with the saved_context (which is saved/restored via save/restore_processor_state()). The flow during resume from ACPI S3 is simpler than the 64-bit counterpart. We only use the early bootstrap once (wakeup_gdt) and do various checks in real mode. After the checks are completed, we load the saved GDT ('pmode_gdt') and continue on with the resume (by heading to startup_32 in trampoline_32.S) - which quickly jumps to what was saved in 'pmode_entry' aka 'wakeup_pmode_return'. The 'wakeup_pmode_return' restores the GDT (saved_gdt) again (which was saved in do_suspend_lowlevel initially). After that it ends up calling the 'ret_point' which calls 'restore_processor_state()'. We have two opportunities to remove code where we restore the same GDT twice. Here is the call chain: wakeup_start \|- lgdtl wakeup_gdt [the work-around broken BIOSes] \| \| - lgdtl pmode_gdt [the real one] \| \-- startup_32 (in trampoline_32.S) \-- wakeup_pmode_return (in wakeup_32.S) \|- lgdtl saved_gdt [the real one] \-- ret_point \|.. \|- call restore_processor_state The hibernate path is much simpler. During the saving of the hibernation image we call save_processor_state() and save the contents of that along with the rest of the kernel in the hibernation image destination. We save the EIP of 'restore_registers' (restore_jump_address) and cr3 (restore_cr3). During hibernate resume, the 'restore_registers' (via the 'restore_jump_address) in hibernate_asm_32.S is invoked which restores the contents of most registers. Naturally the resume path benefits from already being in 32-bit mode, so it does not have to reload the GDT. It only reloads the cr3 (from restore_cr3) and continues on. Note that the restoration of the restore image page-tables is done prior to this. After the 'restore_registers' it returns and we end up called restore_processor_state() - where we reload the GDT. The reload of the GDT is not needed as bootup kernel has already loaded the GDT which is at the same physical location as the the restored kernel. Note that the hibernation path assumes the GDT is correct during its 'restore_registers'. The assumption in the code is that the restored image is the same as saved - meaning we are not trying to restore an different kernel in the virtual address space of a new kernel. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1365194544-14648-3-git-send-email-konrad.wilk@oracle.com Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-04-11 15:40:17 -07:00
Konrad Rzeszutek Wilk	e7a5cd063c	x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path is not needed. During the ACPI S3 resume path the trampoline code handles it already. During the ACPI S3 suspend phase (acpi_suspend_lowlevel) we set: early_gdt_descr.address = (..)get_cpu_gdt_table(smp_processor_id()); which is then used during the resume path and has the same exact value as what the store/load_gdt do with the saved_context (which is saved/restored via save/restore_processor_state()). The flow during resume is complex and for 64-bit kernels we use three GDTs - one early bootstrap GDT (wakeup_igdt) that we load to workaround broken BIOSes, an early Protected Mode to Long Mode transition one (tr_gdt), and the final one - early_gdt_descr (which points to the real GDT). The early ('wakeup_gdt') is loaded in 'trampoline_start' for working around broken BIOSes, and then when we end up in Protected Mode in the startup_32 (in trampoline_64.s, not head_32.s) we use the 'tr_gdt' (still in trampoline_64.s). This 'tr_gdt' has a a 32-bit code segment, 64-bit code segment with L=1, and a 32-bit data segment. Once we have transitioned from Protected Mode to Long Mode we then set the GDT to 'early_gdt_desc' and then via an iretq emerge in wakeup_long64 (set via 'initial_code' variable in acpi_suspend_lowlevel). In the wakeup_long64 we end up restoring the %rip (which is set to 'resume_point') and jump there. In 'resume_point' we call 'restore_processor_state' which does the load_gdt on the saved context. This load_gdt is redundant as the GDT loaded via early_gdt_desc is the same. Here is the call-chain: wakeup_start \|- lgdtl wakeup_gdt [the work-around broken BIOSes] \| \-- trampoline_start (trampoline_64.S) \|- lgdtl tr_gdt \| \-- startup_32 (trampoline_64.S) \| \-- startup_64 (trampoline_64.S) \| \-- secondary_startup_64 \|- lgdtl early_gdt_desc \| ... \|- movq initial_code(%rip), %eax \|-.. lretq \-- wakeup_64 \|-- other registers are reloaded \|-- call restore_processor_state The hibernate path is much simpler. During the saving of the hibernation image we call save_processor_state() and save the contents of that along with the rest of the kernel in the hibernation image destination. We save the EIP of 'restore_registers' (restore_jump_address) and cr3 (restore_cr3). During hibernate resume, the 'restore_registers' (via the 'restore_jump_address) in hibernate_asm_64.S is invoked which restores the contents of most registers. Naturally the resume path benefits from already being in 64-bit mode, so it does not have to load the GDT. It only reloads the cr3 (from restore_cr3) and continues on. Note that the restoration of the restore image page-tables is done prior to this. After the 'restore_registers' it returns and we end up called restore_processor_state() - where we reload the GDT. The reload of the GDT is not needed as bootup kernel has already loaded the GDT which is at the same physical location as the the restored kernel. Note that the hibernation path assumes the GDT is correct during its 'restore_registers'. The assumption in the code is that the restored image is the same as saved - meaning we are not trying to restore an different kernel in the virtual address space of a new kernel. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1365194544-14648-2-git-send-email-konrad.wilk@oracle.com Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-04-11 15:39:38 -07:00
Kees Cook	4eefbe792b	x86: Use a read-only IDT alias on all CPUs Make a copy of the IDT (as seen via the "sidt" instruction) read-only. This primarily removes the IDT from being a target for arbitrary memory write attacks, and has the added benefit of also not leaking the kernel base offset, if it has been relocated. We already did this on vendor == Intel and family == 5 because of the F0 0F bug -- regardless of if a particular CPU had the F0 0F bug or not. Since the workaround was so cheap, there simply was no reason to be very specific. This patch extends the readonly alias to all CPUs, but does not activate the #PF to #UD conversion code needed to deliver the proper exception in the F0 0F case except on Intel family 5 processors. Signed-off-by: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/20130410192422.GA17344@www.outflux.net Cc: Eric Northup <digitaleric@google.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-04-11 13:53:19 -07:00
Boris Ostrovsky	511ba86e1d	x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal Invoking arch_flush_lazy_mmu_mode() results in calls to preempt_enable()/disable() which may have performance impact. Since lazy MMU is not used on bare metal we can patch away arch_flush_lazy_mmu_mode() so that it is never called in such environment. [ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU updates" may cause a minor performance regression on bare metal. This patch resolves that performance regression. It is somewhat unclear to me if this is a good -stable candidate. ] Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1364045796-10720-2-git-send-email-konrad.wilk@oracle.com Tested-by: Josh Boyer <jwboyer@redhat.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> SEE NOTE ABOVE	2013-04-10 11:25:10 -07:00
Borislav Petkov	5952886bfe	x86/mm/cpa: Cleanup split_large_page() and its callee So basically we're generating the pte_t * from a struct page and we're handing it down to the __split_large_page() internal version which then goes and gets back struct page * from it because it needs it. Change the caller to hand down struct page * directly and the callee can compute the pte_t itself. Net save is one virt_to_page() call and simpler code. While at it, make __split_large_page() static. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1363886217-24703-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-10 14:39:08 +02:00
Jacob Shin	9c5320c8ea	cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor Future AMD processors, starting with Family 16h, can provide software with feedback on how the workload may respond to frequency change -- memory-bound workloads will not benefit from higher frequency, where as compute-bound workloads will. This patch enables this "frequency sensitivity feedback" to aid the ondemand governor to make better frequency change decisions by hooking into the powersave bias. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Acked-by: Thomas Renninger <trenn@suse.de> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-10 13:19:26 +02:00
Thomas Gleixner	ee761f629d	arch: Consolidate tsk_is_polling() Move it to a common place. Preparatory patch for implementing set/clear for the idle need_resched poll implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130321215233.446034505@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-04-08 17:39:22 +02:00
Geoff Levand	8b415dcd76	KVM: Move kvm_rebooting declaration out of x86 The variable kvm_rebooting is a common kvm variable, so move its declaration from arch/x86/include/asm/kvm_host.h to include/asm/kvm_host.h. Fixes this sparse warning when building on arm64: virt/kvm/kvm_main.c⚠️ symbol 'kvm_rebooting' was not declared. Should it be static? Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-08 13:02:09 +03:00
Geoff Levand	fc1b74925f	KVM: Move vm_list kvm_lock declarations out of x86 The variables vm_list and kvm_lock are common to all architectures, so move the declarations from arch/x86/include/asm/kvm_host.h to include/linux/kvm_host.h. Fixes sparse warnings like these when building for arm64: virt/kvm/kvm_main.c: warning: symbol 'kvm_lock' was not declared. Should it be static? virt/kvm/kvm_main.c: warning: symbol 'vm_list' was not declared. Should it be static? Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-08 13:02:00 +03:00
Ingo Molnar	529801898b	Merge branch 'for-tip' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/core Pull IBM zEnterprise EC12 support patchlet from Robert Richter. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-08 11:43:30 +02:00
Borislav Petkov	1423bed239	x86, msr: Unify variable names Make sure all MSR-accessing primitives which split MSR values in two 32-bit parts have their variables called 'low' and 'high' for consistence with the rest of the code and for ease of staring. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1362428180-8865-5-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-04-02 16:03:32 -07:00
Borislav Petkov	8e3c2a8cf6	x86: Drop KERNEL_IMAGE_START We have KERNEL_IMAGE_START and __START_KERNEL_map which both contain the start of the kernel text mapping's virtual address. Remove the prior one which has been replicated a lot less times around the tree. No functionality change. Signed-off-by: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1362428180-8865-3-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-04-02 16:03:29 -07:00
Paul Moore	8b4b9f27e5	x86: remove the x32 syscall bitmask from syscall_get_nr() Commit `fca460f95e` simplified the x32 implementation by creating a syscall bitmask, equal to 0x40000000, that could be applied to x32 syscalls such that the masked syscall number would be the same as a x86_64 syscall. While that patch was a nice way to simplify the code, it went a bit too far by adding the mask to syscall_get_nr(); returning the masked syscall numbers can cause confusion with callers that expect syscall numbers matching the x32 ABI, e.g. unmasked syscall numbers. This patch fixes this by simply removing the mask from syscall_get_nr() while preserving the other changes from the original commit. While there are several syscall_get_nr() callers in the kernel, most simply check that the syscall number is greater than zero, in this case this patch will have no effect. Of those remaining callers, they appear to be few, seccomp and ftrace, and from my testing of seccomp without this patch the original commit definitely breaks things; the seccomp filter does not correctly filter the syscalls due to the difference in syscall numbers in the BPF filter and the value from syscall_get_nr(). Applying this patch restores the seccomp BPF filter functionality on x32. I've tested this patch with the seccomp BPF filters as well as ftrace and everything looks reasonable to me; needless to say general usage seemed fine as well. Signed-off-by: Paul Moore <pmoore@redhat.com> Link: http://lkml.kernel.org/r/20130215172143.12549.10292.stgit@localhost Cc: <stable@vger.kernel.org> Cc: Will Drewry <wad@chromium.org> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-04-02 14:38:09 -07:00
Srivatsa S. Bhat	7a0c819d28	x86/mce: Rework cmci_rediscover() to play well with CPU hotplug Dave Jones reports that offlining a CPU leads to this trace: numa_remove_cpu cpu 1 node 0: mask now 0,2-3 smpboot: CPU 1 is now offline BUG: using smp_processor_id() in preemptible [00000000] code: cpu-offline.sh/10591 caller is cmci_rediscover+0x6a/0xe0 Pid: 10591, comm: cpu-offline.sh Not tainted 3.9.0-rc3+ #2 Call Trace: [<ffffffff81333bbd>] debug_smp_processor_id+0xdd/0x100 [<ffffffff8101edba>] cmci_rediscover+0x6a/0xe0 [<ffffffff815f5b9f>] mce_cpu_callback+0x19d/0x1ae [<ffffffff8160ea66>] notifier_call_chain+0x66/0x150 [<ffffffff8107ad7e>] __raw_notifier_call_chain+0xe/0x10 [<ffffffff8104c2e3>] cpu_notify+0x23/0x50 [<ffffffff8104c31e>] cpu_notify_nofail+0xe/0x20 [<ffffffff815ef082>] _cpu_down+0x302/0x350 [<ffffffff815ef106>] cpu_down+0x36/0x50 [<ffffffff815f1c9d>] store_online+0x8d/0xd0 [<ffffffff813edc48>] dev_attr_store+0x18/0x30 [<ffffffff81226eeb>] sysfs_write_file+0xdb/0x150 [<ffffffff811adfb2>] vfs_write+0xa2/0x170 [<ffffffff811ae16c>] sys_write+0x4c/0xa0 [<ffffffff81613019>] system_call_fastpath+0x16/0x1b However, a look at cmci_rediscover shows that it can be simplified quite a bit, apart from solving the above issue. It invokes functions that take spin locks with interrupts disabled, and hence it can run in atomic context. Also, it is run in the CPU_POST_DEAD phase, so the dying CPU is already dead and out of the cpu_online_mask. So take these points into account and simplify the code, and thereby also fix the above issue. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-04-02 14:04:01 -07:00
Borislav Petkov	7d7dc116e5	x86, cpu: Convert AMD Erratum 400 Convert AMD erratum 400 to the bug infrastructure. Then, retract all exports for modules since they're not needed now and make the AMD erratum checking machinery local to amd.c. Use forward declarations to avoid shuffling too much code around needlessly. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1363788448-31325-7-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2013-04-02 10:12:55 -07:00
Borislav Petkov	e6ee94d58d	x86, cpu: Convert AMD Erratum 383 Convert the AMD erratum 383 testing code to the bug infrastructure. This allows keeping the AMD-specific erratum testing machinery private to amd.c and not export symbols to modules needlessly. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1363788448-31325-6-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2013-04-02 10:12:54 -07:00
Borislav Petkov	c5b41a6750	x86, cpu: Convert Cyrix coma bug detection ... to the new facility. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1363788448-31325-5-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2013-04-02 10:12:54 -07:00
Borislav Petkov	93a829e8e2	x86, cpu: Convert FDIV bug detection ... to the new facility. Add a reference to the wikipedia article explaining the FDIV test we're doing here. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1363788448-31325-4-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2013-04-02 10:12:53 -07:00
Borislav Petkov	e2604b49e8	x86, cpu: Convert F00F bug detection ... to using the new facility and drop the cpuinfo_x86 member. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1363788448-31325-3-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2013-04-02 10:12:52 -07:00
Borislav Petkov	65fc985b37	x86, cpu: Expand cpufeature facility to include cpu bugs We add another 32-bit vector at the end of the ->x86_capability bitvector which collects bugs present in CPUs. After all, a CPU bug is a kind of a capability, albeit a strange one. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1363788448-31325-2-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2013-04-02 10:12:52 -07:00
Paolo Bonzini	afd80d85ae	pmu: prepare for migration support In order to migrate the PMU state correctly, we need to restore the values of MSR_CORE_PERF_GLOBAL_STATUS (a read-only register) and MSR_CORE_PERF_GLOBAL_OVF_CTRL (which has side effects when written). We also need to write the full 40-bit value of the performance counter, which would only be possible with a v3 architectural PMU's full-width counter MSRs. To distinguish host-initiated writes from the guest's, pass the full struct msr_data to kvm_pmu_set_msr. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-02 17:42:44 +03:00
Stephane Eranian	f20093eef5	perf/x86: Add memory profiling via PEBS Load Latency This patch adds support for memory profiling using the PEBS Load Latency facility. Load accesses are sampled by HW and the instruction address, data address, load latency, data source, tlb, locked information can be saved in the sampling buffer if using the PERF_SAMPLE_COST (for latency), PERF_SAMPLE_ADDR, PERF_SAMPLE_DATA_SRC types. To enable PEBS Load Latency, users have to use the model specific event: - on NHM/WSM: MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD - on SNB/IVB: MEM_TRANS_RETIRED:LATENCY_ABOVE_THRESHOLD To make things easier, this patch also exports a generic alias via sysfs: mem-loads. It export the right event encoding based on the host CPU and can be used directly by the perf tool. Loosely based on Intel's Lin Ming patch posted on LKML in July 2011. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: ak@linux.intel.com Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: namhyung.kim@lge.com Link: http://lkml.kernel.org/r/1359040242-8269-9-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2013-04-01 12:16:31 -03:00
Linus Torvalds	dfca53fb16	ACPI and power management fixes for 3.9-rc5 - Fix for a recent cpufreq regression related to acpi-cpufreq and suspend/resume from Viresh Kumar. - cpufreq stats reference counting fix from Viresh Kumar. - intel_pstate driver fixes from Dirk Brandewie and Konrad Rzeszutek Wilk. - New ACPI suspend blacklist entry for Sony Vaio VGN-FW21M from Fabio Valentini. - ACPI Platform Error Interface (APEI) fix from Chen Gong. - PCI root bridge hotplug locking fix from Yinghai Lu. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJRVETOAAoJEKhOf7ml8uNs30kP/3GsKWacHsaIPdhIiHQC3f91 HMLabrW7NE7ldrOoXzj1lTHsIc1TQHm722vyI+aF061HErfkF8Jkdi5rkIai8VMq IJXe4CtwuuCi0SeKQsV9ymiQanTrgsP/AlGV5x/KM/As8dvAVW/1+Ln/gXAnH0IJ /Onqf3eA4NBw/1Hjg7AGHGeCmOlDHvcetHF7eX4MaiYZHEwuy/a7jswH4aNOjwgx GZtbrnwUO6OtDKv6ie//1EbP753VrkHDtK3jzIy2lUA5YyLmr0XOTvy4uQh2n/r7 tVTqsVoNZNA4En0YUspfsWwBruUic3ra9qVTrJqn7Fzymyr+TgyCQQzSUGrOGy2a wY0vwMAwm1dMwAsZWPhnui6aqvu0bbg0u7sxCZQs8WapdtjxPdD7iIhRk2YU4wOZ omtejW0thUIwEmHWgBPo9rFvfZmxy9hb044UfhkLI9xBmuTVrDb/HqeVPA767ZoO k7IVg1DG4Ye6xboCIILfluoUAsc3DvkHpCIvWVujK3pF5j/M9ptt3d8eXDFIzmWD J6tm9ARkQoUPRAs6751cG1N0nP++ZlErYseU/h6eXoC0rkeC/WbGyxIumii4xJhg Gs6GGeM8OgQ/7Fat68kA2Z7jriY+MTteLbq1Sl3PBlfdURaceOXkTIVrxXo33Itq jQiEKa1CbJDi6OBKog8K =0bjZ -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael J Wysocki: - Fix for a recent cpufreq regression related to acpi-cpufreq and suspend/resume from Viresh Kumar. - cpufreq stats reference counting fix from Viresh Kumar. - intel_pstate driver fixes from Dirk Brandewie and Konrad Rzeszutek Wilk. - New ACPI suspend blacklist entry for Sony Vaio VGN-FW21M from Fabio Valentini. - ACPI Platform Error Interface (APEI) fix from Chen Gong. - PCI root bridge hotplug locking fix from Yinghai Lu. * tag 'pm+acpi-3.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PCI / ACPI: hold acpi_scan_lock during root bus hotplug ACPI / APEI: fix error status check condition for CPER ACPI / PM: fix suspend and resume on Sony Vaio VGN-FW21M cpufreq: acpi-cpufreq: Don't set policy->related_cpus from .init() cpufreq: stats: do cpufreq_cpu_put() corresponding to cpufreq_cpu_get() intel-pstate: Use #defines instead of hard-coded values. cpufreq / intel_pstate: Fix calculation of current frequency cpufreq / intel_pstate: Add function to check that all MSRs are valid	2013-03-28 13:47:31 -07:00
Linus Torvalds	33b65f1e9c	Bug-fixes: - Regression fixes for C-and-P states not being parsed properly. - Fix possible security issue with guests triggering DoS via non-assigned MSI-Xs. - Fix regression (introduced in v3.7) with raising an event (v2). - Fix hastily introduced band-aid during c0 for the CR3 blowup. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQEcBAABAgAGBQJRUxlVAAoJEFjIrFwIi8fJiUsH/2a3A8EVqS7OYDNgT0ZFb1VI rMLNiA50sRJNDsq0NbGl1Y+Lubus1czc0c7HXFQ557OakN6WqcmPPjCKp4JT6NnV Jz/IZ0iimdoHiPru1Qe4ah3fSgzUtht2LB48Z/a0Is4k3LsRP2W3/niVC3ypnyuJ 52HjjuxeFAfXIkNeqsrO2a6cUXZeXzUyR4g9GNxDozi4jHpoPQ4j9okZbo218xH+ /pRnFeMD7t7dFkgNeyeGXUiJn2AkNPHi3Hx+RH5nN9KXQ1eem9R4p7Qpez1dUEWF YEc/bs7MyOYezzTVHPYk77Yt8baOHJt7UbHjM6jfi1aGYYINTRr3m5mORd3rCmc= =61IX -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen bug-fixes from Konrad Rzeszutek Wilk: "This is mostly just the last stragglers of the regression bugs that this merge window had. There are also two bug-fixes: one that adds an extra layer of security, and a regression fix for a change that was added in v3.7 (the v1 was faulty, the v2 works). - Regression fixes for C-and-P states not being parsed properly. - Fix possible security issue with guests triggering DoS via non-assigned MSI-Xs. - Fix regression (introduced in v3.7) with raising an event (v2). - Fix hastily introduced band-aid during c0 for the CR3 blowup." * tag 'stable/for-linus-3.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/events: avoid race with raising an event in unmask_evtchn() xen/mmu: Move the setting of pvops.write_cr3 to later phase in bootup. xen/acpi-stub: Disable it b/c the acpi_processor_add is no longer called. xen-pciback: notify hypervisor about devices intended to be assigned to guests xen/acpi-processor: Don't dereference struct acpi_processor on all CPUs.	2013-03-27 12:56:25 -07:00
Konrad Rzeszutek Wilk	05e99c8cf9	intel-pstate: Use #defines instead of hard-coded values. They are defined in coreboot (MSR_PLATFORM) and the other one is already defined in msr-index.h. Let's use those. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-03-25 15:13:07 +01:00
Jan Beulich	909b3fdb0d	xen-pciback: notify hypervisor about devices intended to be assigned to guests For MSI-X capable devices the hypervisor wants to write protect the MSI-X table and PBA, yet it can't assume that resources have been assigned to their final values at device enumeration time. Thus have pciback do that notification, as having the device controlled by it is a prerequisite to assigning the device to guests anyway. This is the kernel part of hypervisor side commit 4245d33 ("x86/MSI: add mechanism to fully protect MSI-X table from PV guest accesses") on the master branch of git://xenbits.xen.org/xen.git. CC: stable@vger.kernel.org Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-03-22 10:20:55 -04:00
Linus Torvalds	cd82346934	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "A fair chunk of the linecount comes from a fix for a tracing bug that corrupts latency tracing buffers when the overwrite mode is changed on the fly - the rest is mostly assorted fewliner fixlets." * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Add SNB/SNB-EP scheduling constraints for cycle_activity event kprobes/x86: Check Interrupt Flag modifier when registering probe kprobes: Make hash_64() as always inlined perf: Generate EXIT event only once per task context perf: Reset hwc->last_period on sw clock events tracing: Prevent buffer overwrite disabled for latency tracers tracing: Keep overwrite in sync between regular and snapshot buffers tracing: Protect tracer flags with trace_types_lock perf tools: Fix LIBNUMA build with glibc 2.12 and older. tracing: Fix free of probe entry by calling call_rcu_sched() perf/POWER7: Create a sysfs format entry for Power7 events perf probe: Fix segfault libtraceevent: Remove hard coded include to /usr/local/include in Makefile perf record: Fix -C option perf tools: check if -DFORTIFY_SOURCE=2 is allowed perf report: Fix build with NO_NEWT=1 perf annotate: Fix build with NO_NEWT=1 tracing: Fix race in snapshot swapping	2013-03-21 08:29:11 -07:00
Marcelo Tosatti	2ae33b3896	Merge remote-tracking branch 'upstream/master' into queue Merge reason: From: Alexander Graf <agraf@suse.de> "Just recently this really important patch got pulled into Linus' tree for 3.9: commit `1674400aae` Author: Anton Blanchard <anton <at> samba.org> Date: Tue Mar 12 01:51:51 2013 +0000 Without that commit, I can not boot my G5, thus I can't run automated tests on it against my queue. Could you please merge kvm/next against linus/master, so that I can base my trees against that?" * upstream/master: (653 commits) PCI: Use ROM images from firmware only if no other ROM source available sparc: remove unused "config BITS" sparc: delete "if !ULTRA_HAS_POPULATION_COUNT" KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798) KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797) KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796) arm64: Kconfig.debug: Remove unused CONFIG_DEBUG_ERRORS arm64: Do not select GENERIC_HARDIRQS_NO_DEPRECATED inet: limit length of fragment queue hash table bucket lists qeth: Fix scatter-gather regression qeth: Fix invalid router settings handling qeth: delay feature trace sgy-cts1000: Remove __dev* attributes KVM: x86: fix deadlock in clock-in-progress request handling KVM: allow host header to be included even for !CONFIG_KVM hwmon: (lm75) Fix tcn75 prefix hwmon: (lm75.h) Update header inclusion MAINTAINERS: Remove Mark M. Hoffman xfs: ensure we capture IO errors correctly xfs: fix xfs_iomap_eof_prealloc_initial_size type ... Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-03-21 11:11:52 -03:00
Linus Torvalds	ea4a0ce111	Merge git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Marcelo Tosatti. * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798) KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797) KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796) KVM: x86: fix deadlock in clock-in-progress request handling KVM: allow host header to be included even for !CONFIG_KVM	2013-03-19 18:24:12 -07:00
Andy Honig	0b79459b48	KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797) There is a potential use after free issue with the handling of MSR_KVM_SYSTEM_TIME. If the guest specifies a GPA in a movable or removable memory such as frame buffers then KVM might continue to write to that address even after it's removed via KVM_SET_USER_MEMORY_REGION. KVM pins the page in memory so it's unlikely to cause an issue, but if the user space component re-purposes the memory previously used for the guest, then the guest will be able to corrupt that memory. Tested: Tested against kvmclock unit test Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-03-19 14:17:35 -03:00
Masami Hiramatsu	9a556ab998	kprobes/x86: Check Interrupt Flag modifier when registering probe Currently kprobes check whether the copied instruction modifies IF (interrupt flag) on each probe hit. This results not only in introducing overhead but also involving inat_get_opcode_attribute into the kprobes hot path, and it can cause an infinite recursive call (and kernel panic in the end). Actually, since the copied instruction itself can never be modified on the buffer, it is needless to analyze the instruction on every probe hit. To fix this issue, we check it only once when registering probe and store the result on ainsn->if_modifier. Reported-by: Timo Juhani Lindfors <timo.lindfors@iki.fi> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: David S. Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20130314115242.19690.33573.stgit@mhiramat-M0-7522 Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-03-18 10:21:23 +01:00
Feng Tang	c54fdbb282	x86: Add cpu capability flag X86_FEATURE_NONSTOP_TSC_S3 On some new Intel Atom processors (Penwell and Cloverview), there is a feature that the TSC won't stop in S3 state, say the TSC value won't be reset to 0 after resume. This feature makes TSC a more reliable clocksource and could benefit the timekeeping code during system suspend/resume cycle, so add a flag for it. Signed-off-by: Feng Tang <feng.tang@intel.com> [jstultz: Fix checkpatch warning] Signed-off-by: John Stultz <john.stultz@linaro.org>	2013-03-15 16:50:26 -07:00
Takuya Yoshikawa	982b3394dd	KVM: x86: Optimize mmio spte zapping when creating/moving memslot When we create or move a memory slot, we need to zap mmio sptes. Currently, zap_all() is used for this and this is causing two problems: - extra page faults after zapping mmu pages - long mmu_lock hold time during zapping mmu pages For the latter, Marcelo reported a disastrous mmu_lock hold time during hot-plug, which made the guest unresponsive for a long time. This patch takes a simple way to fix these problems: do not zap mmu pages unless they are marked mmio cached. On our test box, this took only 50us for the 4GB guest and we did not see ms of mmu_lock hold time any more. Note that we still need to do zap_all() for other cases. So another work is also needed: Xiao's work may be the one. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-03-14 10:21:21 +02:00
Takuya Yoshikawa	95b0430d1a	KVM: MMU: Mark sp mmio cached when creating mmio spte This will be used not to zap unrelated mmu pages when creating/moving a memory slot later. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-03-14 10:21:10 +02:00
Jan Kiszka	0238ea913c	KVM: nVMX: Add preemption timer support Provided the host has this feature, it's straightforward to offer it to the guest as well. We just need to load to timer value on L2 entry if the feature was enabled by L1 and watch out for the corresponding exit reason. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-03-14 10:01:21 +02:00
Jan Kiszka	c18911a23c	KVM: nVMX: Provide EFER.LMA saving support We will need EFER.LMA saving to provide unrestricted guest mode. All what is missing for this is picking up EFER.LMA from VM_ENTRY_CONTROLS on L2->L1 switches. If the host does not support EFER.LMA saving, no change is performed, otherwise we properly emulate for L1 what the hardware does for L0. Advertise the support, depending on the host feature. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-03-14 10:00:55 +02:00
Jan Kiszka	eabeaaccfc	KVM: nVMX: Clean up and fix pin-based execution controls Only interrupt and NMI exiting are mandatory for KVM to work, thus can be exposed to the guest unconditionally, virtual NMI exiting is optional. So we must not advertise it unless the host supports it. Introduce the symbolic constant PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR at this chance. Reviewed-by:: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-03-13 16:14:40 +02:00
Jan Kiszka	66450a21f9	KVM: x86: Rework INIT and SIPI handling A VCPU sending INIT or SIPI to some other VCPU races for setting the remote VCPU's mp_state. When we were unlucky, KVM_MP_STATE_INIT_RECEIVED was overwritten by kvm_emulate_halt and, thus, got lost. This introduces APIC events for those two signals, keeping them in kvm_apic until kvm_apic_accept_events is run over the target vcpu context. kvm_apic_has_events reports to kvm_arch_vcpu_runnable if there are pending events, thus if vcpu blocking should end. The patch comes with the side effect of effectively obsoleting KVM_MP_STATE_SIPI_RECEIVED. We still accept it from user space, but immediately translate it to KVM_MP_STATE_INIT_RECEIVED + KVM_APIC_SIPI. The vcpu itself will no longer enter the KVM_MP_STATE_SIPI_RECEIVED state. That also means we no longer exit to user space after receiving a SIPI event. Furthermore, we already reset the VCPU on INIT, only fixing up the code segment later on when SIPI arrives. Moreover, we fix INIT handling for the BSP: it never enter wait-for-SIPI but directly starts over on INIT. Tested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-03-13 16:08:10 +02:00
Jan Kiszka	57f252f229	KVM: x86: Drop unused return code from VCPU reset callback Neither vmx nor svm nor the common part may generate an error on kvm_vcpu_reset. So drop the return code. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-03-12 13:25:56 +02:00
Jan Kiszka	33fb20c39e	KVM: nVMX: Fix content of MSR_IA32_VMX_ENTRY/EXIT_CTLS Properly set those bits to 1 that the spec demands in case bit 55 of VMX_BASIC is 0 - like in our case. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-03-07 15:47:11 -03:00
Frederic Weisbecker	56dd9470d7	context_tracking: Move exception handling to generic code Exceptions handling on context tracking should share common treatment: on entry we exit user mode if the exception triggered in that context. Then on exception exit we return to that previous context. Generalize this to avoid duplication across archs. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Mats Liljegren <mats.liljegren@enea.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2013-03-07 17:09:25 +01:00
Peter Jones	3c4aff6b9a	x86, doc: Be explicit about what the x86 struct boot_params requires If the sentinel triggers, we do not want the boot loader authors to just poke it and make the error go away, we want them to actually fix the problem. This should help avoid making the incorrect change in non-compliant bootloaders. [ hpa: dropped the Documentation/x86/boot.txt hunk pending clarifications ] Signed-off-by: Peter Jones <pjones@redhat.com> Link: http://lkml.kernel.org/r/1362592823-28967-1-git-send-email-pjones@redhat.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2013-03-06 20:34:43 -08:00
Josh Boyer	2e604c0f19	x86: Don't clear efi_info even if the sentinel hits When boot_params->sentinel is set, all we really know is that some undefined set of fields in struct boot_params contain garbage. In the particular case of efi_info, however, there is a private magic for that substructure, so it is generally safe to leave it even if the bootloader is broken. kexec (for which we did the initial analysis) did not initialize this field, but of course all the EFI bootloaders do, and most EFI bootloaders are broken in this respect (and should be fixed.) Reported-by: Robin Holt <holt@sgi.com> Link: http://lkml.kernel.org/r/CA%2B5PVA51-FT14p4CRYKbicykugVb=PiaEycdQ57CK2km_OQuRQ@mail.gmail.com Tested-by: Josh Boyer <jwboyer@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2013-03-06 20:23:30 -08:00
Borislav Petkov	6276a074c6	x86: Make Linux guest support optional Put all config options needed to run Linux as a guest behind a CONFIG_HYPERVISOR_GUEST menu so that they don't get built-in by default but be selectable by the user. Also, make all units which depend on x86_hyper, depend on this new symbol so that compilation doesn't fail when CONFIG_HYPERVISOR_GUEST is disabled but those units assume its presence. Sort options in the new HYPERVISOR_GUEST menu, adapt config text and drop redundant select. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1362428421-9244-3-git-send-email-bp@alien8.de Cc: Dmitry Torokhov <dtor@vmware.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-03-04 13:14:25 -08:00
Al Viro	4cce1a207c	x86: trim sys_ia32.h remove the externs for functions that don't exist anymore Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-03-03 23:00:34 -05:00
Al Viro	07b053457b	x86: sys32_kill and sys32_mprotect are pointless their argument types are identical to those of sys_kill and sys_mprotect resp., so we are not doing any kind of argument validation, etc. in those - they turn into unconditional branches to corresponding syscalls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-03-03 23:00:33 -05:00
Al Viro	56e41d3c5a	merge compat sys_ipc instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-03-03 23:00:27 -05:00
Al Viro	d5dc77bfee	consolidate compat lookup_dcookie() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-03-03 23:00:23 -05:00
Al Viro	19f4fc3aee	convert sendfile{,64} to COMPAT_SYSCALL_DEFINE Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-03-03 22:58:46 -05:00
Al Viro	2cf0966683	make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect ... and switch i386 to HAVE_SYSCALL_WRAPPERS, killing open-coded uses of asmlinkage_protect() in a bunch of syscalls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-03-03 22:58:33 -05:00
Al Viro	e1b5bb6d12	consolidate cond_syscall and SYSCALL_ALIAS declarations take them to asm/linkage.h, with default in linux/linkage.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-03-03 22:55:19 -05:00
Linus Torvalds	14cc0b55b7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull signal/compat fixes from Al Viro: "Fixes for several regressions introduced in the last signal.git pile, along with fixing bugs in truncate and ftruncate compat (on just about anything biarch at least one of those two had been done wrong)." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: compat: restore timerfd settime and gettime compat syscalls [regression] braino in "sparc: convert to ksignal" fix compat truncate/ftruncate switch lseek to COMPAT_SYSCALL_DEFINE lseek() and truncate() on sparc really need sign extension	2013-03-02 08:34:06 -08:00
Linus Torvalds	e3c4877de8	Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/EFI changes from Peter Anvin: - Improve the initrd handling in the EFI boot stub by allowing forward slashes in the pathname - from Chun-Yi Lee. - Cleanup code duplication in the EFI mixed kernel/firmware code - from Satoru Takeuchi. - efivarfs bug fixes for more strict filename validation, with lots of input from Al Viro. * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, efi: remove duplicate code in setup_arch() by using, efi_is_native() efivarfs: guid part of filenames are case-insensitive efivarfs: Validate filenames much more aggressively efivarfs: Use sizeof() instead of magic number x86, efi: Allow slash in file path of initrd	2013-02-27 16:17:42 -08:00
Linus Torvalds	f8ef15d6b9	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar. * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Add Intel IvyBridge event scheduling constraints ftrace: Call ftrace cleanup module notifier after all other notifiers tracing/syscalls: Allow archs to ignore tracing compat syscalls	2013-02-26 19:40:37 -08:00
Linus Torvalds	556f12f602	PCI changes for the v3.9 merge window: Host bridge hotplug - Major overhaul of ACPI host bridge add/start (Rafael Wysocki, Yinghai Lu) - Major overhaul of PCI/ACPI binding (Rafael Wysocki, Yinghai Lu) - Split out ACPI host bridge and ACPI PCI device hotplug (Yinghai Lu) - Stop caching _PRT and make independent of bus numbers (Yinghai Lu) PCI device hotplug - Clean up cpqphp dead code (Sasha Levin) - Disable ARI unless device and upstream bridge support it (Yijing Wang) - Initialize all hot-added devices (not functions 0-7) (Yijing Wang) Power management - Don't touch ASPM if disabled (Joe Lawrence) - Fix ASPM link state management (Myron Stowe) Miscellaneous - Fix PCI_EXP_FLAGS accessor (Alex Williamson) - Disable Bus Master in pci_device_shutdown (Konstantin Khlebnikov) - Document hotplug resource and MPS parameters (Yijing Wang) - Add accessor for PCIe capabilities (Myron Stowe) - Drop pciehp suspend/resume messages (Paul Bolle) - Make pci_slot built-in only (not a module) (Jiang Liu) - Remove unused PCI/ACPI bind ops (Jiang Liu) - Removed used pci_root_bus (Bjorn Helgaas) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRKS3hAAoJEFmIoMA60/r8xxoP/j1CS4oCZAnBIVT9fKBkis+/ CENcfHIUKj6J9iMfJEVvqBELvqaLqtpeNwAGMcGPxV7VuT3K1QumChfaTpRDP0HC VDRmrjcmfenEK+YPOG7acsrTyqk2wjpLOyu9MKRxtC5u7tF6376KQpkEFpO4haL4 eUHTxfE76OkrPBSvx3+PUSf6jqrvrNbjX8K6HdDVVlm3sVAQKmYJU/Wphv2NPOqa CAMyCzEGybFjr8hDRwvWgr+06c718GMwQUbnrPdHXAe7lMNMrN/XVBmU9ABN3Aas icd3lrDs+yPObgcO/gT8+sAZErCtdJ9zuHGYHdYpRbIQj/5JT4TMk7tw/Bj7vKY9 Mqmho9GR5YmYTRN9f1r+2n5AQ/KYWXJDrRNOnt5/ys5BOM3vwJ7WJ902zpSwtFQp nLX+oD/hLfzpnoIQGDuBAoAXp2Kam3XWRgVvG78buRNrPj+kUzimk14a8qQeY+CB El6UKuwi5Uv/qgs1gAqqjmZmsAkon2DnsRZa6Fl8NTkDlis7LY4gp9OU38ySFpB+ PhCmRyCZmDDqTVtwj6XzR3nPQ5LBSbvsTfgMxYMIUSXHa06tyb2q5p4mEIas0OmU RKaP5xQqZuTgD8fbdYrx0xgSrn7JHt/j/X//Qs6unlLCWhlpm3LjJZKxyw2FwBGr o4Lci+PiBh3MowCrju9D =ER3b -----END PGP SIGNATURE----- Merge tag 'pci-v3.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI changes from Bjorn Helgaas: "Host bridge hotplug - Major overhaul of ACPI host bridge add/start (Rafael Wysocki, Yinghai Lu) - Major overhaul of PCI/ACPI binding (Rafael Wysocki, Yinghai Lu) - Split out ACPI host bridge and ACPI PCI device hotplug (Yinghai Lu) - Stop caching _PRT and make independent of bus numbers (Yinghai Lu) PCI device hotplug - Clean up cpqphp dead code (Sasha Levin) - Disable ARI unless device and upstream bridge support it (Yijing Wang) - Initialize all hot-added devices (not functions 0-7) (Yijing Wang) Power management - Don't touch ASPM if disabled (Joe Lawrence) - Fix ASPM link state management (Myron Stowe) Miscellaneous - Fix PCI_EXP_FLAGS accessor (Alex Williamson) - Disable Bus Master in pci_device_shutdown (Konstantin Khlebnikov) - Document hotplug resource and MPS parameters (Yijing Wang) - Add accessor for PCIe capabilities (Myron Stowe) - Drop pciehp suspend/resume messages (Paul Bolle) - Make pci_slot built-in only (not a module) (Jiang Liu) - Remove unused PCI/ACPI bind ops (Jiang Liu) - Removed used pci_root_bus (Bjorn Helgaas)" * tag 'pci-v3.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (51 commits) PCI/ACPI: Don't cache _PRT, and don't associate them with bus numbers PCI: Fix PCI Express Capability accessors for PCI_EXP_FLAGS ACPI / PCI: Make pci_slot built-in only, not a module PCI/PM: Clear state_saved during suspend PCI: Use atomic_inc_return() rather than atomic_add_return() PCI: Catch attempts to disable already-disabled devices PCI: Disable Bus Master unconditionally in pci_device_shutdown() PCI: acpiphp: Remove dead code for PCI host bridge hotplug PCI: acpiphp: Create companion ACPI devices before creating PCI devices PCI: Remove unused "rc" in virtfn_add_bus() PCI: pciehp: Drop suspend/resume ENTRY messages PCI/ASPM: Don't touch ASPM if forcibly disabled PCI/ASPM: Deallocate upstream link state even if device is not PCIe PCI: Document MPS parameters pci=pcie_bus_safe, pci=pcie_bus_perf, etc PCI: Document hpiosize= and hpmemsize= resource reservation parameters PCI: Use PCI Express Capability accessor PCI: Introduce accessor to retrieve PCIe Capabilities Register PCI: Put pci_dev in device tree as early as possible PCI: Skip attaching driver in device_add() PCI: acpiphp: Keep driver loaded even if no slots found ...	2013-02-25 21:18:18 -08:00
Linus Torvalds	77be36de8b	Features: - Xen ACPI memory and CPU hotplug drivers - allowing Xen hypervisor to be aware of new CPU and new DIMMs - Cleanups Bug-fixes: - Fixes a long-standing bug in the PV spinlock wherein we did not kick VCPUs that were in a tight loop. - Fixes in the error paths for the event channel machinery. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQEcBAABAgAGBQJRJS1kAAoJEFjIrFwIi8fJj2YIAMO3/LVUZyojX/d8U9pqrCly lFfEF2UVjcxHJSj0ZFNXt1o3fnYP1SLRlT9u7ZLDjXf6Lmxmw6/C3Haw2wp3DfGq yUR0G/X9CPTBEgMYDdX7bjeTjyURvZcUaFwr+qodaaeL3uXx2pW6621Sc6jRKuia yAFVZMAKeaRrvUUIXjKHtlpRp9LKFdSztShMtYqmFvxEwrJPq2b37caKruoUCa6o X/YO0fvE9QtYD/pG0jsghFmLh/mcr+n9IFMCUXo1Yc9FdQBExtKzABDS5jdpuFND 4aMDE3dqUmHmpbaQhRE7SdblvpyrGdQXL6FSTjvwBgISfLo847CrnRKRgPp0YeA= =LQeU -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.9-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen update from Konrad Rzeszutek Wilk: "This has two new ACPI drivers for Xen - a physical CPU offline/online and a memory hotplug. The way this works is that ACPI kicks the drivers and they make the appropiate hypercall to the hypervisor to tell it that there is a new CPU or memory. There also some changes to the Xen ARM ABIs and couple of fixes. One particularly nasty bug in the Xen PV spinlock code was fixed by Stefan Bader - and has been there since the 2.6.32! Features: - Xen ACPI memory and CPU hotplug drivers - allowing Xen hypervisor to be aware of new CPU and new DIMMs - Cleanups Bug-fixes: - Fixes a long-standing bug in the PV spinlock wherein we did not kick VCPUs that were in a tight loop. - Fixes in the error paths for the event channel machinery" Fix up a few semantic conflicts with the ACPI interface changes in drivers/xen/xen-acpi-{cpu,mem}hotplug.c. * tag 'stable/for-linus-3.9-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: event channel arrays are xen_ulong_t and not unsigned long xen: Send spinlock IPI to all waiters xen: introduce xen_remap, use it instead of ioremap xen: close evtchn port if binding to irq fails xen-evtchn: correct comment and error output xen/tmem: Add missing %s in the printk statement. xen/acpi: move xen_acpi_get_pxm under CONFIG_XEN_DOM0 xen/acpi: ACPI cpu hotplug xen/acpi: Move xen_acpi_get_pxm to Xen's acpi.h xen/stub: driver for CPU hotplug xen/acpi: ACPI memory hotplug xen/stub: driver for memory hotplug xen: implement updated XENMEM_add_to_physmap_range ABI xen/smp: Move the common CPU init code a bit to prep for PVH patch.	2013-02-24 16:18:31 -08:00
Linus Torvalds	89f883372f	Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Marcelo Tosatti: "KVM updates for the 3.9 merge window, including x86 real mode emulation fixes, stronger memory slot interface restrictions, mmu_lock spinlock hold time reduction, improved handling of large page faults on shadow, initial APICv HW acceleration support, s390 channel IO based virtio, amongst others" * tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits) Revert "KVM: MMU: lazily drop large spte" x86: pvclock kvm: align allocation size to page size KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr x86 emulator: fix parity calculation for AAD instruction KVM: PPC: BookE: Handle alignment interrupts booke: Added DBCR4 SPR number KVM: PPC: booke: Allow multiple exception types KVM: PPC: booke: use vcpu reference from thread_struct KVM: Remove user_alloc from struct kvm_memory_slot KVM: VMX: disable apicv by default KVM: s390: Fix handling of iscs. KVM: MMU: cleanup __direct_map KVM: MMU: remove pt_access in mmu_set_spte KVM: MMU: cleanup mapping-level KVM: MMU: lazily drop large spte KVM: VMX: cleanup vmx_set_cr0(). KVM: VMX: add missing exit names to VMX_EXIT_REASONS array KVM: VMX: disable SMEP feature when guest is in non-paging mode KVM: Remove duplicate text in api.txt Revert "KVM: MMU: split kvm_mmu_free_page" ...	2013-02-24 13:07:18 -08:00
Al Viro	561c673197	switch lseek to COMPAT_SYSCALL_DEFINE Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-24 10:52:26 -05:00
Linus Torvalds	9e2d59ad58	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull signal handling cleanups from Al Viro: "This is the first pile; another one will come a bit later and will contain SYSCALL_DEFINE-related patches. - a bunch of signal-related syscalls (both native and compat) unified. - a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE (fixing several potential problems with missing argument validation, while we are at it) - a lot of now-pointless wrappers killed - a couple of architectures (cris and hexagon) forgot to save altstack settings into sigframe, even though they used the (uninitialized) values in sigreturn; fixed. - microblaze fixes for delivery of multiple signals arriving at once - saner set of helpers for signal delivery introduced, several architectures switched to using those." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits) x86: convert to ksignal sparc: convert to ksignal arm: switch to struct ksignal * passing alpha: pass k_sigaction and siginfo_t using ksignal pointer burying unused conditionals make do_sigaltstack() static arm64: switch to generic old sigaction() (compat-only) arm64: switch to generic compat rt_sigaction() arm64: switch compat to generic old sigsuspend arm64: switch to generic compat rt_sigqueueinfo() arm64: switch to generic compat rt_sigpending() arm64: switch to generic compat rt_sigprocmask() arm64: switch to generic sigaltstack sparc: switch to generic old sigsuspend sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE sparc: kill sign-extending wrappers for native syscalls kill sparc32_open() sparc: switch to use of generic old sigaction sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE mips: switch to generic sys_fork() and sys_clone() ...	2013-02-23 18:50:11 -08:00
Linus Torvalds	5ce1a70e2f	Merge branch 'akpm' (more incoming from Andrew) Merge second patch-bomb from Andrew Morton: - A little DM fix - the MM queue * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits) ksm: allocate roots when needed mm: cleanup "swapcache" in do_swap_page mm,ksm: swapoff might need to copy mm,ksm: FOLL_MIGRATION do migration_entry_wait ksm: shrink 32-bit rmap_item back to 32 bytes ksm: treat unstable nid like in stable tree ksm: add some comments tmpfs: fix mempolicy object leaks tmpfs: fix use-after-free of mempolicy object mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages mm: export mmu notifier invalidates mm: accelerate mm_populate() treatment of THP pages mm: use long type for page counts in mm_populate() and get_user_pages() mm: accurately document nr_free_*_pages functions with code comments HWPOISON: change order of error_states[]'s elements HWPOISON: fix misjudgement of page_action() for errors on mlocked pages memcg: stop warning on memcg_propagate_kmem net: change type of virtio_chan->p9_max_pages vmscan: change type of vm_total_pages to unsigned long fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used ...	2013-02-23 17:50:35 -08:00
Wen Congyang	e13fe8695c	cpu-hotplug,memory-hotplug: clear cpu_to_node() when offlining the node When the node is offlined, there is no memory/cpu on the node. If a sleep task runs on a cpu of this node, it will be migrated to the cpu on the other node. So we can clear cpu-to-node mapping. [akpm@linux-foundation.org: numa_clear_node() and numa_set_node() can no longer be __cpuinit] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:13 -08:00
Wen Congyang	ae9aae9eda	memory-hotplug: common APIs to support page tables hot-remove When memory is removed, the corresponding pagetables should alse be removed. This patch introduces some common APIs to support vmemmap pagetable and x86_64 architecture direct mapping pagetable removing. All pages of virtual mapping in removed memory cannot be freed if some pages used as PGD/PUD include not only removed memory but also other memory. So this patch uses the following way to check whether a page can be freed or not. 1) When removing memory, the page structs of the removed memory are filled with 0FD. 2) All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared. In this case, the page used as PT/PMD can be freed. For direct mapping pages, update direct_pages_count[level] when we freed their pagetables. And do not free the pages again because they were freed when offlining. For vmemmap pages, free the pages and their pagetables. For larger pages, do not split them into smaller ones because there is no way to know if the larger page has been split. As a result, there is no way to decide when to split. We deal the larger pages in the following way: 1) For direct mapped pages, all the pages were freed when they were offlined. And since menmory offline is done section by section, all the memory ranges being removed are aligned to PAGE_SIZE. So only need to deal with unaligned pages when freeing vmemmap pages. 2) For vmemmap pages being used to store page_struct, if part of the larger page is still in use, just fill the unused part with 0xFD. And when the whole page is fulfilled with 0xFD, then free the larger page. [akpm@linux-foundation.org: fix typo in comment] [tangchen@cn.fujitsu.com: do not calculate direct mapping pages when freeing vmemmap pagetables] [tangchen@cn.fujitsu.com: do not free direct mapping pages twice] [tangchen@cn.fujitsu.com: do not free page split from hugepage one by one] [tangchen@cn.fujitsu.com: do not split pages when freeing pagetable pages] [akpm@linux-foundation.org: use pmd_page_vaddr()] [akpm@linux-foundation.org: fix used-uninitialised bug] Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:12 -08:00
Linus Torvalds	c47f39e3b7	Merge branch 'x86/microcode' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading update from Peter Anvin: "This patchset lets us update the CPU microcode very, very early in initialization if the BIOS fails to do so (never happens, right?) This is handy for dealing with things like the Atom erratum where we have to run without PSE because microcode loading happens too late. As I mentioned in the x86/mm push request it depends on that infrastructure but it is otherwise a standalone feature." * 'x86/microcode' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/Kconfig: Make early microcode loading a configuration feature x86/mm/init.c: Copy ucode from initrd image to kernel memory x86/head64.c: Early update ucode in 64-bit x86/head_32.S: Early update ucode in 32-bit x86/microcode_intel_early.c: Early update ucode on Intel's CPU x86/tlbflush.h: Define __native_flush_tlb_global_irq_disabled() x86/microcode_intel_lib.c: Early update ucode on Intel's CPU x86/microcode_core_early.c: Define interfaces for early loading ucode x86/common.c: load ucode in 64 bit or show loading ucode info in 32 bit on AP x86/common.c: Make have_cpuid_p() a global function x86/microcode_intel.h: Define functions and macros for early loading ucode x86, doc: Documentation for early microcode loading	2013-02-22 19:22:52 -08:00
Linus Torvalds	ac630dd98a	x86-64: don't set the early IDT to point directly to 'early_idt_handler' The code requires the use of the proper per-exception-vector stub functions (set up as the early_idt_handlers[] array - note the 's') that make sure to set up the error vector number. This is true regardless of whether CONFIG_EARLY_PRINTK is set or not. Why? The stack offset for the comparison of __KERNEL_CS won't be right otherwise, nor will the new check (from commit `8170e6bed4`: "x86, 64bit: Use a #PF handler to materialize early mappings on demand") for the page fault exception vector. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-22 13:09:51 -08:00
Linus Torvalds	2ef14f465b	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Peter Anvin: "This is a huge set of several partly interrelated (and concurrently developed) changes, which is why the branch history is messier than one would like. The really big items are two humonguous patchsets mostly developed by Yinghai Lu at my request, which completely revamps the way we create initial page tables. In particular, rather than estimating how much memory we will need for page tables and then build them into that memory -- a calculation that has shown to be incredibly fragile -- we now build them (on 64 bits) with the aid of a "pseudo-linear mode" -- a #PF handler which creates temporary page tables on demand. This has several advantages: 1. It makes it much easier to support things that need access to data very early (a followon patchset uses this to load microcode way early in the kernel startup). 2. It allows the kernel and all the kernel data objects to be invoked from above the 4 GB limit. This allows kdump to work on very large systems. 3. It greatly reduces the difference between Xen and native (Xen's equivalent of the #PF handler are the temporary page tables created by the domain builder), eliminating a bunch of fragile hooks. The patch series also gets us a bit closer to W^X. Additional work in this pull is the 64-bit get_user() work which you were also involved with, and a bunch of cleanups/speedups to __phys_addr()/__pa()." * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits) x86, mm: Move reserving low memory later in initialization x86, doc: Clarify the use of asm("%edx") in uaccess.h x86, mm: Redesign get_user with a __builtin_choose_expr hack x86: Be consistent with data size in getuser.S x86, mm: Use a bitfield to mask nuisance get_user() warnings x86/kvm: Fix compile warning in kvm_register_steal_time() x86-32: Add support for 64bit get_user() x86-32, mm: Remove reference to alloc_remap() x86-32, mm: Remove reference to resume_map_numa_kva() x86-32, mm: Rip out x86_32 NUMA remapping code x86/numa: Use __pa_nodebug() instead x86: Don't panic if can not alloc buffer for swiotlb mm: Add alloc_bootmem_low_pages_nopanic() x86, 64bit, mm: hibernate use generic mapping_init x86, 64bit, mm: Mark data/bss/brk to nx x86: Merge early kernel reserve for 32bit and 64bit x86: Add Crash kernel low reservation x86, kdump: Remove crashkernel range find limit for 64bit memblock: Add memblock_mem_size() x86, boot: Not need to check setup_header version for setup_data ...	2013-02-21 18:06:55 -08:00
Linus Torvalds	cb715a8366	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Peter Anvin: "This is a corrected attempt at the x86/cpu branch, this time with the fixes in that makes it not break on KVM (current or past), or any other virtualizer which traps on this configuration. Again, the biggest change here is enabling the WC+ memory type on AMD processors, if the BIOS doesn't." * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, kvm: Add MSR_AMD64_BU_CFG2 to the list of ignored MSRs x86, cpu, amd: Fix WC+ workaround for older virtual hosts x86, AMD: Enable WC+ memory type on family 10 processors x86, AMD: Clean up init_amd() x86/process: Change %8s to %s for pr_warn() in release_thread() x86/cpu/hotplug: Remove CONFIG_EXPERIMENTAL dependency	2013-02-21 18:03:39 -08:00
Linus Torvalds	8793422fd9	ACPI and power management updates for 3.9-rc1 - Rework of the ACPI namespace scanning code from Rafael J. Wysocki with contributions from Bjorn Helgaas, Jiang Liu, Mika Westerberg, Toshi Kani, and Yinghai Lu. - ACPI power resources handling and ACPI device PM update from Rafael J. Wysocki. - ACPICA update to version 20130117 from Bob Moore and Lv Zheng with contributions from Aaron Lu, Chao Guan, Jesper Juhl, and Tim Gardner. - Support for Intel Lynxpoint LPSS from Mika Westerberg. - cpuidle update from Len Brown including Intel Haswell support, C1 state for intel_idle, removal of global pm_idle. - cpuidle fixes and cleanups from Daniel Lezcano. - cpufreq fixes and cleanups from Viresh Kumar and Fabio Baltieri with contributions from Stratos Karafotis and Rickard Andersson. - Intel P-states driver for Sandy Bridge processors from Dirk Brandewie. - cpufreq driver for Marvell Kirkwood SoCs from Andrew Lunn. - cpufreq fixes related to ordering issues between acpi-cpufreq and powernow-k8 from Borislav Petkov and Matthew Garrett. - cpufreq support for Calxeda Highbank processors from Mark Langsdorf and Rob Herring. - cpufreq driver for the Freescale i.MX6Q SoC and cpufreq-cpu0 update from Shawn Guo. - cpufreq Exynos fixes and cleanups from Jonghwan Choi, Sachin Kamat, and Inderpal Singh. - Support for "lightweight suspend" from Zhang Rui. - Removal of the deprecated power trace API from Paul Gortmaker. - Assorted updates from Andreas Fleig, Colin Ian King, Davidlohr Bueso, Joseph Salisbury, Kees Cook, Li Fei, Nishanth Menon, ShuoX Liu, Srinivas Pandruvada, Tejun Heo, Thomas Renninger, and Yasuaki Ishimatsu. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJRIsArAAoJEKhOf7ml8uNsD6MP/j7C4NA+GTq6RdwoJt+Yki0K 9Ep8I4pEuRFoN/oskv24EyQhpGJIk6UxWcJ/DWFBc+1VhmKORta7k2Idv/wlJA77 s7AcDveA9xcDh+TVfbh87TeuiMSXiSdDZbiaQO+wMizWJAF3F84AnjiAqqqyQcSK bA5/Siz/vWlt9PyYDaQtHTVE4lpvPuVcQdYewsdaH2PsmUjvIg/TUzg28CTrdyvv eHOdBK9R0/OLQLhzRbL0VOGJ//wEl+HJRO0QEhTKPgdQ1e/VH/4Zu5WSzF8P/x4C s2f8U4IKQqulDuDHXtpMpelFm7hRWgsOqZLkcyXLs+0dvSM9CTPO6P0ZaImxUctk 5daHWEsXUnCErDQawt1mcZP8l6qnxofMQIfLXyPVzvlSnHyToTmrtXa1v2u4AuL/ hOo4MYWsFNUmRdtGFFGlExGgEDZ4G5NwiYjRBl/6XJ3v4nhnnMbuzxP8scpoe5m1 8tjroJHZFUUs/mFU/H+oRbHzSzXPmp1sddNaTg4OpVmTn3DDh6ljnFhiItd1Ndw0 5ldVbSe6ETq5RoK0TbzvQOeVpa9F3JfqbrXLQPqfd2iz/No41LQYG1uShRYuXKuA wfEcc+c9VMd3FILu05pGwBnU8VS9VbxTYMz7xDxg6b29Ywnb7u+Q1ycCk2gFYtkS E2oZDuyewTJxaskzYsNr =wijn -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: - Rework of the ACPI namespace scanning code from Rafael J. Wysocki with contributions from Bjorn Helgaas, Jiang Liu, Mika Westerberg, Toshi Kani, and Yinghai Lu. - ACPI power resources handling and ACPI device PM update from Rafael J Wysocki. - ACPICA update to version 20130117 from Bob Moore and Lv Zheng with contributions from Aaron Lu, Chao Guan, Jesper Juhl, and Tim Gardner. - Support for Intel Lynxpoint LPSS from Mika Westerberg. - cpuidle update from Len Brown including Intel Haswell support, C1 state for intel_idle, removal of global pm_idle. - cpuidle fixes and cleanups from Daniel Lezcano. - cpufreq fixes and cleanups from Viresh Kumar and Fabio Baltieri with contributions from Stratos Karafotis and Rickard Andersson. - Intel P-states driver for Sandy Bridge processors from Dirk Brandewie. - cpufreq driver for Marvell Kirkwood SoCs from Andrew Lunn. - cpufreq fixes related to ordering issues between acpi-cpufreq and powernow-k8 from Borislav Petkov and Matthew Garrett. - cpufreq support for Calxeda Highbank processors from Mark Langsdorf and Rob Herring. - cpufreq driver for the Freescale i.MX6Q SoC and cpufreq-cpu0 update from Shawn Guo. - cpufreq Exynos fixes and cleanups from Jonghwan Choi, Sachin Kamat, and Inderpal Singh. - Support for "lightweight suspend" from Zhang Rui. - Removal of the deprecated power trace API from Paul Gortmaker. - Assorted updates from Andreas Fleig, Colin Ian King, Davidlohr Bueso, Joseph Salisbury, Kees Cook, Li Fei, Nishanth Menon, ShuoX Liu, Srinivas Pandruvada, Tejun Heo, Thomas Renninger, and Yasuaki Ishimatsu. * tag 'pm+acpi-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (267 commits) PM idle: remove global declaration of pm_idle unicore32 idle: delete stray pm_idle comment openrisc idle: delete pm_idle mn10300 idle: delete pm_idle microblaze idle: delete pm_idle m32r idle: delete pm_idle, and other dead idle code ia64 idle: delete pm_idle cris idle: delete idle and pm_idle ARM64 idle: delete pm_idle ARM idle: delete pm_idle blackfin idle: delete pm_idle sparc idle: rename pm_idle to sparc_idle sh idle: rename global pm_idle to static sh_idle x86 idle: rename global pm_idle to static x86_idle APM idle: register apm_cpu_idle via cpuidle cpufreq / intel_pstate: Add kernel command line option disable intel_pstate. cpufreq / intel_pstate: Change to disallow module build tools/power turbostat: display SMI count by default intel_idle: export both C1 and C1E ACPI / hotplug: Fix concurrency issues and memory leaks ...	2013-02-20 11:26:56 -08:00
Ian Campbell	c81611c4e9	xen: event channel arrays are xen_ulong_t and not unsigned long On ARM we want these to be the same size on 32- and 64-bit. This is an ABI change on ARM. X86 does not change. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Keir (Xen.org) <keir@xen.org> Cc: Tim Deegan <tim@xen.org> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: linux-arm-kernel@lists.infradead.org Cc: xen-devel@lists.xen.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-02-20 08:45:07 -05:00
Ingo Molnar	ff1fb5f6b4	Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent Pull two fixes from Steven Rostedt. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-20 11:26:21 +01:00
Linus Torvalds	1a13c0b181	Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 UV3 support update from Ingo Molnar: "Support for the SGI Ultraviolet System 3 (UV3) platform - the upcoming third major iteration and upscaling of the SGI UV supercomputing platform." * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, uv, uv3: Trim MMR register definitions after code changes for SGI UV3 x86, uv, uv3: Check current gru hub support for SGI UV3 x86, uv, uv3: Update Time Support for SGI UV3 x86, uv, uv3: Update x2apic Support for SGI UV3 x86, uv, uv3: Update Hub Info for SGI UV3 x86, uv, uv3: Update ACPI Check to include SGI UV3 x86, uv, uv3: Update MMR register definitions for SGI Ultraviolet System 3 (UV3)	2013-02-19 20:12:57 -08:00
Linus Torvalds	f98982ce80	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform changes from Ingo Molnar: - Support for the Technologic Systems TS-5500 platform, by Vivien Didelot - Improved NUMA support on AMD systems: Add support for federated systems where multiple memory controllers can exist and see each other over multiple PCI domains. This basically means that AMD node ids can be more than 8 now and the code handling this is taught to incorporate PCI domain into those IDs. - Support for the Goldfish virtual Android emulator, by Jun Nakajima, Intel, Google, et al. - Misc fixlets. * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Add TS-5500 platform support x86/srat: Simplify memory affinity init error handling x86/apb/timer: Remove unnecessary "if" goldfish: platform device for x86 amd64_edac: Fix type usage in NB IDs and memory ranges amd64_edac: Fix PCI function lookup x86, AMD, NB: Use u16 for northbridge IDs in amd_get_nb_id x86, AMD, NB: Add multi-domain support	2013-02-19 20:11:07 -08:00
Linus Torvalds	29d5052329	Merge branch 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/hyperv changes from Ingo Molnar: "The biggest change is support for Windows 8's improved hypervisor interrupt model on the Linux Hyper-V guest subsystem code side. Smallish fixes otherwise." * 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, hyperv: HYPERV depends on X86_LOCAL_APIC X86: Handle Hyper-V vmbus interrupts as special hypervisor interrupts X86: Add a check to catch Xen emulation of Hyper-V x86: Hyper-V: register clocksource only if its advertised	2013-02-19 20:10:21 -08:00
Linus Torvalds	5abcd76f5d	Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 bootup changes from Ingo Molnar: "Deal with bootloaders which fail to initialize unknown fields in boot_params to zero, by sanitizing boot params passed in. This unbreaks versions of kexec-utils. Other bootloaders do not appear to show sensitivity to this change, but it's a possibility for breakage nevertheless." * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, boot: Sanitize boot_params if not zeroed on creation	2013-02-19 19:11:10 -08:00
Linus Torvalds	a57ed93600	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/asm changes from Ingo Molnar: "The biggest change (by line count) is the unification of the XOR code and then the introduction of an additional SSE based XOR assembly method. The other bigger change is the head_32.S rework/cleanup by Borislav Petkov. Last but not least there's the usual laundry list of small but dangerous (and hopefully perfectly tested) changes to subtle low level x86 code, plus cleanups." * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, head_32: Give the 6 label a real name x86, head_32: Remove second CPUID detection from default_entry x86: Detect CPUID support early at boot x86, head_32: Remove i386 pieces x86: Require MOVBE feature in cpuid when we use it x86: Enable ARCH_USE_BUILTIN_BSWAP x86/xor: Add alternative SSE implementation only prefetching once per 64-byte line x86/xor: Unify SSE-base xor-block routines x86: Fix a typo x86/mm: Fix the argument passed to sync_global_pgds() x86/mm: Convert update_mmu_cache() and update_mmu_cache_pmd() to functions ix86: Tighten asmlinkage_protect() constraints	2013-02-19 19:09:42 -08:00
Linus Torvalds	5800700f66	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/apic changes from Ingo Molnar: "Main changes: - Multiple MSI support added to the APIC, PCI and AHCI code - acked by all relevant maintainers, by Alexander Gordeev. The advantage is that multiple AHCI ports can have multiple MSI irqs assigned, and can thus spread to multiple CPUs. [ Drivers can make use of this new facility via the pci_enable_msi_block_auto() method ] - x86 IOAPIC code from interrupt remapping cleanups from Joerg Roedel: These patches move all interrupt remapping specific checks out of the x86 core code and replaces the respective call-sites with function pointers. As a result the interrupt remapping code is better abstraced from x86 core interrupt handling code. - Various smaller improvements, fixes and cleanups." * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits) x86/intel/irq_remapping: Clean up x2apic opt-out security warning mess x86, kvm: Fix intialization warnings in kvm.c x86, irq: Move irq_remapped out of x86 core code x86, io_apic: Introduce eoi_ioapic_pin call-back x86, msi: Introduce x86_msi.compose_msi_msg call-back x86, irq: Introduce setup_remapped_irq() x86, irq: Move irq_remapped() check into free_remapped_irq x86, io-apic: Remove !irq_remapped() check from __target_IO_APIC_irq() x86, io-apic: Move CONFIG_IRQ_REMAP code out of x86 core x86, irq: Add data structure to keep AMD specific irq remapping information x86, irq: Move irq_remapping_enabled declaration to iommu code x86, io_apic: Remove irq_remapping_enabled check in setup_timer_IRQ0_pin x86, io_apic: Move irq_remapping_enabled checks out of check_timer() x86, io_apic: Convert setup_ioapic_entry to function pointer x86, io_apic: Introduce set_affinity function pointer x86, msi: Use IRQ remapping specific setup_msi_irqs routine x86, hpet: Introduce x86_msi_ops.setup_hpet_msi x86, io_apic: Introduce x86_io_apic_ops.print_entries for debugging x86, io_apic: Introduce x86_io_apic_ops.disable() x86, apic: Mask IO-APIC and PIC unconditionally on LAPIC resume ...	2013-02-19 19:07:27 -08:00
Stefano Stabellini	3216dceb31	xen: introduce xen_remap, use it instead of ioremap ioremap can't be used to map ring pages on ARM because it uses device memory caching attributes (MT_DEVICE*). Introduce a Xen specific abstraction to map ring pages, called xen_remap, that is defined as ioremap on x86 (no behavioral changes). On ARM it explicitly calls __arm_ioremap with the right caching attributes: MT_MEMORY. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-02-19 22:02:34 -05:00
Linus Torvalds	8f55cea410	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "There are lots of improvements, the biggest changes are: Main kernel side changes: - Improve uprobes performance by adding 'pre-filtering' support, by Oleg Nesterov. - Make some POWER7 events available in sysfs, equivalent to what was done on x86, from Sukadev Bhattiprolu. - tracing updates by Steve Rostedt - mostly misc fixes and smaller improvements. - Use perf/event tracing to report PCI Express advanced errors, by Tony Luck. - Enable northbridge performance counters on AMD family 15h, by Jacob Shin. - This tracing commit: tracing: Remove the extra 4 bytes of padding in events changes the ABI. All involved parties (PowerTop in particular) seem to agree that it's safe to do now with the introduction of libtraceevent, but the devil is in the details ... Main tooling side changes: - Add 'event group view', from Namyung Kim: To use it, 'perf record' should group events when recording. And then perf report parses the saved group relation from file header and prints them together if --group option is provided. You can use the 'perf evlist' command to see event group information: $ perf record -e '{ref-cycles,cycles}' noploop 1 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.385 MB perf.data (~16807 samples) ] $ perf evlist --group {ref-cycles,cycles} With this example, default perf report will show you each event separately. You can use --group option to enable event group view: $ perf report --group ... # group: {ref-cycles,cycles} # ======== # Samples: 7K of event 'anon group { ref-cycles, cycles }' # Event count (approx.): 6876107743 # # Overhead Command Shared Object Symbol # ................ ....... ................. .......................... 99.84% 99.76% noploop noploop [.] main 0.07% 0.00% noploop ld-2.15.so [.] strcmp 0.03% 0.00% noploop [kernel.kallsyms] [k] timerqueue_del 0.03% 0.03% noploop [kernel.kallsyms] [k] sched_clock_cpu 0.02% 0.00% noploop [kernel.kallsyms] [k] account_user_time 0.01% 0.00% noploop [kernel.kallsyms] [k] __alloc_pages_nodemask 0.00% 0.00% noploop [kernel.kallsyms] [k] native_write_msr_safe 0.00% 0.11% noploop [kernel.kallsyms] [k] _raw_spin_lock 0.00% 0.06% noploop [kernel.kallsyms] [k] find_get_page 0.00% 0.02% noploop [kernel.kallsyms] [k] rcu_check_callbacks 0.00% 0.02% noploop [kernel.kallsyms] [k] __current_kernel_time As you can see the Overhead column now contains both of ref-cycles and cycles and header line shows group information also - 'anon group { ref-cycles, cycles }'. The output is sorted by period of group leader first. - Initial GTK+ annotate browser, from Namhyung Kim. - Add option for runtime switching perf data file in perf report, just press 's' and a menu with the valid files found in the current directory will be presented, from Feng Tang. - Add support to display whole group data for raw columns, from Jiri Olsa. - Add per processor socket count aggregation in perf stat, from Stephane Eranian. - Add interval printing in 'perf stat', from Stephane Eranian. - 'perf test' improvements - Add support for wildcards in tracepoint system name, from Jiri Olsa. - Add anonymous huge page recognition, from Joshua Zhu. - perf build-id cache now can show DSOs present in a perf.data file that are not in the cache, to integrate with build-id servers being put in place by organizations such as Fedora. - perf top now shares more of the evsel config/creation routines with 'record', paving the way for further integration like 'top' snapshots, etc. - perf top now supports DWARF callchains. - Fix mmap limitations on 32-bit, fix from David Miller. - 'perf bench numa mem' NUMA performance measurement suite - ... and lots of fixes, performance improvements, cleanups and other improvements I failed to list - see the shortlog and git log for details." * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (270 commits) perf/x86/amd: Enable northbridge performance counters on AMD family 15h perf/hwbp: Fix cleanup in case of kzalloc failure perf tools: Fix build with bison 2.3 and older. perf tools: Limit unwind support to x86 archs perf annotate: Make it to be able to skip unannotatable symbols perf gtk/annotate: Fail early if it can't annotate perf gtk/annotate: Show source lines with gray color perf gtk/annotate: Support multiple event annotation perf ui/gtk: Implement basic GTK2 annotation browser perf annotate: Fix warning message on a missing vmlinux perf buildid-cache: Add --update option uprobes/perf: Avoid uprobe_apply() whenever possible uprobes/perf: Teach trace_uprobe/perf code to use UPROBE_HANDLER_REMOVE uprobes/perf: Teach trace_uprobe/perf code to pre-filter uprobes/perf: Teach trace_uprobe/perf code to track the active perf_event's uprobes: Introduce uprobe_apply() perf: Introduce hw_perf_event->tp_target and ->tp_list uprobes/perf: Always increment trace_uprobe->nhit uprobes/tracing: Kill uprobe_trace_consumer, embed uprobe_consumer into trace_uprobe uprobes/tracing: Introduce is_trace_uprobe_enabled() ...	2013-02-19 17:49:41 -08:00
Rafael J. Wysocki	10baf04e95	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (35 commits) PM idle: remove global declaration of pm_idle unicore32 idle: delete stray pm_idle comment openrisc idle: delete pm_idle mn10300 idle: delete pm_idle microblaze idle: delete pm_idle m32r idle: delete pm_idle, and other dead idle code ia64 idle: delete pm_idle cris idle: delete idle and pm_idle ARM64 idle: delete pm_idle ARM idle: delete pm_idle blackfin idle: delete pm_idle sparc idle: rename pm_idle to sparc_idle sh idle: rename global pm_idle to static sh_idle x86 idle: rename global pm_idle to static x86_idle APM idle: register apm_cpu_idle via cpuidle tools/power turbostat: display SMI count by default intel_idle: export both C1 and C1E cpuidle: remove vestage definition of cpuidle_state_usage.driver_data x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok flag x86 idle: remove mwait_idle() and "idle=mwait" cmdline param ... Conflicts: arch/x86/kernel/process.c (with PM / tracing commit `43720bd`) drivers/acpi/processor_idle.c (with ACPICA commit `4f84291`)	2013-02-18 22:34:11 +01:00
Len Brown	ca62cf59ce	Merge branch 'misc' into release Conflicts: arch/x86/kernel/process.c Signed-off-by: Len Brown <len.brown@intel.com>	2013-02-18 00:25:53 -05:00
Jacob Shin	e259514eef	perf/x86/amd: Enable northbridge performance counters on AMD family 15h On AMD family 15h processors, there are 4 new performance counters (in addition to 6 core performance counters) that can be used for counting northbridge events (i.e. DRAM accesses). Their bit fields are almost identical to the core performance counters. However, unlike the core performance counters, these MSRs are shared between multiple cores (that share the same northbridge). We will reuse the same code path as existing family 10h northbridge event constraints handler logic to enforce this sharing. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Acked-by: Stephane Eranian <eranian@google.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jacob Shin <jacob.shin@amd.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1360171589-6381-7-git-send-email-jacob.shin@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-16 09:37:27 +01:00
H. Peter Anvin	0da3e7f526	Merge branch 'x86/mm2' into x86/mm x86/mm2 is testing out fine, but has developed conflicts with x86/mm due to patches in adjacent code. Merge them so we can drop x86/mm2 and have a unified branch. Resolved Conflicts: arch/x86/kernel/setup.c	2013-02-15 09:25:08 -08:00
Al Viro	235b80226b	x86: convert to ksignal Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-14 09:21:17 -05:00
Al Viro	d64008a8f3	burying unused conditionals __ARCH_WANT_SYS_RT_SIGACTION, __ARCH_WANT_SYS_RT_SIGSUSPEND, __ARCH_WANT_COMPAT_SYS_RT_SIGSUSPEND, __ARCH_WANT_COMPAT_SYS_SCHED_RR_GET_INTERVAL - not used anymore CONFIG_GENERIC_{SIGALTSTACK,COMPAT_RT_SIG{ACTION,QUEUEINFO,PENDING,PROCMASK}} - can be assumed always set.	2013-02-14 09:21:15 -05:00
Satoru Takeuchi	6b59e366e0	x86, efi: remove duplicate code in setup_arch() by using, efi_is_native() The check, "IS_ENABLED(CONFIG_X86_64) != efi_enabled(EFI_64BIT)", in setup_arch() can be replaced by efi_is_enabled(). This change remove duplicate code and improve readability. Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Olof Johansson <olof@lixom.net> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2013-02-14 10:36:18 +00:00
Len Brown	1ed51011af	tools/power turbostat: display SMI count by default The SMI counter is popular -- so display it by default rather than requiring an option. What the heck, we've blown the 80 column budget on many systems already... Note that the value displayed is the delta during the measurement interval. The absolute value of the counter can still be seen with the generic 32-bit MSR option, ie. -m 0x34 Signed-off-by: Len Brown <len.brown@intel.com>	2013-02-13 18:22:12 -05:00
Mel Gorman	0ee364eb31	x86/mm: Check if PUD is large when validating a kernel address A user reported the following oops when a backup process reads /proc/kcore: BUG: unable to handle kernel paging request at ffffbb00ff33b000 IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110 [...] Call Trace: [<ffffffff811b8aaa>] read_kcore+0x17a/0x370 [<ffffffff811ad847>] proc_reg_read+0x77/0xc0 [<ffffffff81151687>] vfs_read+0xc7/0x130 [<ffffffff811517f3>] sys_read+0x53/0xa0 [<ffffffff81449692>] system_call_fastpath+0x16/0x1b Investigation determined that the bug triggered when reading system RAM at the 4G mark. On this system, that was the first address using 1G pages for the virt->phys direct mapping so the PUD is pointing to a physical address, not a PMD page. The problem is that the page table walker in kern_addr_valid() is not checking pud_large() and treats the physical address as if it was a PMD. If it happens to look like pmd_none then it'll silently fail, probably returning zeros instead of real data. If the data happens to look like a present PMD though, it will be walked resulting in the oops above. This patch adds the necessary pud_large() check. Unfortunately the problem was not readily reproducible and now they are running the backup program without accessing /proc/kcore so the patch has not been validated but I think it makes sense. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.coM> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: stable@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20130211145236.GX21389@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-13 10:02:55 +01:00
K. Y. Srinivasan	bc2b0331e0	X86: Handle Hyper-V vmbus interrupts as special hypervisor interrupts Starting with win8, vmbus interrupts can be delivered on any VCPU in the guest and furthermore can be concurrently active on multiple VCPUs. Support this interrupt delivery model by setting up a separate IDT entry for Hyper-V vmbus. interrupts. I would like to thank Jan Beulich <JBeulich@suse.com> and Thomas Gleixner <tglx@linutronix.de>, for their help. In this version of the patch, based on the feedback, I have merged the IDT vector for Xen and Hyper-V and made the necessary adjustments. Furhermore, based on Jan's feedback I have added the necessary compilation switches. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Link: http://lkml.kernel.org/r/1359940959-32168-3-git-send-email-kys@microsoft.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-12 16:27:15 -08:00
H. Peter Anvin	8ecba5af94	Linux 3.8-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQEcBAABAgAGBQJRFWw4AAoJEHm+PkMAQRiGnTAH/jBHA2umNc3lc7VkUpusve4q GGIlNzYh6iuvIGwKQVj9YPsl37qtQnkDUmY8f8WxZjfxiIDw3TkRKDgNLJaM3Jy5 E426/FGlRx/Iia5+4tuBeoVYMoIPnndgW5lEAMRK1SvhTByhIYAXsaM0UwPBetb+ Z5NMdH1f1HVF7RCCmHAkzEk9z78UpdeCzI0t0XuasP2hp2ARAcE1okdO7fNaLiyo EmenGhRvy9bAsbRwV0rCdl0rQiZXEYM353DWS7n6q4fyVm8MXFwloUxnWCJTzOIL ZLJaz18adFj7Ip/X6ksnMQiQU2Q3B7aThs5ycv0QGxxL2rDFveYRRQ5ICrXOy3M= =jjBc -----END PGP SIGNATURE----- Merge tag 'v3.8-rc7' into x86/asm Merge in the updates to head_32.S from the previous urgent branch, as upcoming patches will make further changes. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-12 15:47:45 -08:00
H. Peter Anvin	ff52c3b02b	x86, doc: Clarify the use of asm("%edx") in uaccess.h Put in a comment that explains that the use of asm("%edx") in uaccess.h doesn't actually necessarily mean %edx alone. Cc: Jamie Lokier <jamie@shareable.org> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: H. J. Lu <hjl.tools@gmail.com> Link: http://lkml.kernel.org/r/511ACDFB.1050707@zytor.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-12 15:37:02 -08:00
Steven Rostedt	f431b634f2	tracing/syscalls: Allow archs to ignore tracing compat syscalls The tracing of ia32 compat system calls has been a bit of a pain as they use different system call numbers than the 64bit equivalents. I wrote a simple 'lls' program that lists files. I compiled it as a i686 ELF binary and ran it under a x86_64 box. This is the result: echo 0 > /debug/tracing/tracing_on echo 1 > /debug/tracing/events/syscalls/enable echo 1 > /debug/tracing/tracing_on ; ./lls ; echo 0 > /debug/tracing/tracing_on grep lls /debug/tracing/trace [.. skipping calls before TS_COMPAT is set ...] lls-1127 [005] d... 936.409188: sys_recvfrom(fd: 0, ubuf: 4d560fc4, size: 0, flags: 8048034, addr: 8, addr_len: f7700420) lls-1127 [005] d... 936.409190: sys_recvfrom -> 0x8a77000 lls-1127 [005] d... 936.409211: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22) lls-1127 [005] d... 936.409215: sys_lgetxattr -> 0xf76ff000 lls-1127 [005] d... 936.409223: sys_dup2(oldfd: 4d55ae9b, newfd: 4) lls-1127 [005] d... 936.409228: sys_dup2 -> 0xfffffffffffffffe lls-1127 [005] d... 936.409236: sys_newfstat(fd: 4d55b085, statbuf: 80000) lls-1127 [005] d... 936.409242: sys_newfstat -> 0x3 lls-1127 [005] d... 936.409243: sys_removexattr(pathname: 3, name: ffcd0060) lls-1127 [005] d... 936.409244: sys_removexattr -> 0x0 lls-1127 [005] d... 936.409245: sys_lgetxattr(pathname: 0, name: 19614, value: 1, size: 2) lls-1127 [005] d... 936.409248: sys_lgetxattr -> 0xf76e5000 lls-1127 [005] d... 936.409248: sys_newlstat(filename: 3, statbuf: 19614) lls-1127 [005] d... 936.409249: sys_newlstat -> 0x0 lls-1127 [005] d... 936.409262: sys_newfstat(fd: f76fb588, statbuf: 80000) lls-1127 [005] d... 936.409279: sys_newfstat -> 0x3 lls-1127 [005] d... 936.409279: sys_close(fd: 3) lls-1127 [005] d... 936.421550: sys_close -> 0x200 lls-1127 [005] d... 936.421558: sys_removexattr(pathname: 3, name: ffcd00d0) lls-1127 [005] d... 936.421560: sys_removexattr -> 0x0 lls-1127 [005] d... 936.421569: sys_lgetxattr(pathname: 4d564000, name: 1b1abc, value: 5, size: 802) lls-1127 [005] d... 936.421574: sys_lgetxattr -> 0x4d564000 lls-1127 [005] d... 936.421575: sys_capget(header: 4d70f000, dataptr: 1000) lls-1127 [005] d... 936.421580: sys_capget -> 0x0 lls-1127 [005] d... 936.421580: sys_lgetxattr(pathname: 4d710000, name: 3000, value: 3, size: 812) lls-1127 [005] d... 936.421589: sys_lgetxattr -> 0x4d710000 lls-1127 [005] d... 936.426130: sys_lgetxattr(pathname: 4d713000, name: 2abc, value: 3, size: 32) lls-1127 [005] d... 936.426141: sys_lgetxattr -> 0x4d713000 lls-1127 [005] d... 936.426145: sys_newlstat(filename: 3, statbuf: f76ff3f0) lls-1127 [005] d... 936.426146: sys_newlstat -> 0x0 lls-1127 [005] d... 936.431748: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22) Obviously I'm not calling newfstat with a fd of 4d55b085. The calls are obviously incorrect, and confusing. Other efforts have been made to fix this: https://lkml.org/lkml/2012/3/26/367 But the real solution is to rewrite the syscall internals and come up with a fixed solution. One that doesn't require all the kluge that the current solution has. Thus for now, instead of outputting incorrect data, simply ignore them. With this patch the changes now have: #> grep lls /debug/tracing/trace #> Compat system calls simply are not traced. If users need compat syscalls, then they should just use the raw syscall tracepoints. For an architecture to make their compat syscalls ignored, it must define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS (done in asm/ftrace.h) and also define an arch_trace_is_compat_syscall() function that will return true if the current task should ignore tracing the syscall. I want to stress that this change does not affect actual syscalls in any way, shape or form. It is only used within the tracing system and doesn't interfere with the syscall logic at all. The changes are consolidated nicely into trace_syscalls.c and asm/ftrace.h. I had to make one small modification to asm/thread_info.h and that was to remove the include of asm/ftrace.h. As asm/ftrace.h required the current_thread_info() it was causing include hell. That include was added back in 2008 when the function graph tracer was added: commit `caf4b323` "tracing, x86: add low level support for ftrace return tracing" It does not need to be included there. Link: http://lkml.kernel.org/r/1360703939.21867.99.camel@gandalf.local.home Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-02-12 17:46:28 -05:00
H. Peter Anvin	3578baaed4	x86, mm: Redesign get_user with a __builtin_choose_expr hack Instead of using a bitfield, use an odd little trick using typeof, __builtin_choose_expr, and sizeof. __builtin_choose_expr is explicitly defined to not convert its type (its argument is required to be a constant expression) so this should be well-defined. The code is still not 100% preturbation-free versus the baseline before 64-bit get_user(), but the differences seem to be very small, mostly related to padding and to gcc deciding when to spill registers. Cc: Jamie Lokier <jamie@shareable.org> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: H. J. Lu <hjl.tools@gmail.com> Link: http://lkml.kernel.org/r/511A8922.6050908@zytor.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-12 12:46:40 -08:00
H. Peter Anvin	b390784dc1	x86, mm: Use a bitfield to mask nuisance get_user() warnings Even though it is never executed, gcc wants to warn for casting from a large integer to a pointer. Furthermore, using a variable with __typeof__() doesn't work because __typeof__ retains storage specifiers (const, restrict, volatile). However, we can declare a bitfield using sizeof(), which is legal because sizeof() is a constant expression. This quiets the warning, although the code generated isn't 100% identical from the baseline before `96477b4` x86-32: Add support for 64bit get_user(): [x86-mb is baseline, x86-mm is this commit] text data bss filename 113716147 15858380 35037184 tip.x86-mb/o.i386-allconfig/vmlinux 113716145 15858380 35037184 tip.x86-mm/o.i386-allconfig/vmlinux 12989837 `3597944` 12255232 tip.x86-mb/o.i386-modconfig/vmlinux 12989831 `3597944` 12255232 tip.x86-mm/o.i386-modconfig/vmlinux 1462784 237608 1401988 tip.x86-mb/o.i386-noconfig/vmlinux 1462837 237608 1401964 tip.x86-mm/o.i386-noconfig/vmlinux 7938994 553688 7639040 tip.x86-mb/o.i386-pae/vmlinux 7943136 557784 7639040 tip.x86-mm/o.i386-pae/vmlinux 7186126 510572 6574080 tip.x86-mb/o.i386/vmlinux 7186124 510572 6574080 tip.x86-mm/o.i386/vmlinux 103747269 33578856 65888256 tip.x86-mb/o.x86_64-allconfig/vmlinux 103746949 33578856 65888256 tip.x86-mm/o.x86_64-allconfig/vmlinux 12116695 11035832 20160512 tip.x86-mb/o.x86_64-modconfig/vmlinux 12116567 11035832 20160512 tip.x86-mm/o.x86_64-modconfig/vmlinux 1700790 380524 511808 tip.x86-mb/o.x86_64-noconfig/vmlinux 1700790 380524 511808 tip.x86-mm/o.x86_64-noconfig/vmlinux 12413612 1133376 1101824 tip.x86-mb/o.x86_64/vmlinux 12413484 1133376 1101824 tip.x86-mm/o.x86_64/vmlinux Cc: Jamie Lokier <jamie@shareable.org> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20130209110031.GA17833@n2100.arm.linux.org.uk Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2013-02-11 17:26:51 -08:00
Mike Travis	d924f947a4	x86, uv, uv3: Trim MMR register definitions after code changes for SGI UV3 This patch trims the MMR register definitions after the updates for the SGI UV3 system have been applied. Note that because these definitions are automatically generated from the RTL we cannot control the length of the names. Therefore there are lines that exceed 80 characters. Signed-off-by: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/20130211194509.173026880@gulag1.americas.sgi.com Acked-by: Russ Anderson <rja@sgi.com> Reviewed-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-11 17:18:25 -08:00
Mike Travis	6edbd4714e	x86, uv, uv3: Update Hub Info for SGI UV3 This patch updates the UV HUB info for UV3. The "is_uv3_hub" and "is_uvx_hub" (UV2 or UV3) functions are added as well as the addresses and sizes of the MMR regions for UV3. Signed-off-by: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/20130211194508.610723192@gulag1.americas.sgi.com Acked-by: Russ Anderson <rja@sgi.com> Reviewed-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-11 17:17:50 -08:00
Mike Travis	60fe7be34d	x86, uv, uv3: Update MMR register definitions for SGI Ultraviolet System 3 (UV3) This patch updates the MMR register definitions for the SGI UV3 system. Note that because these definitions are automatically generated from the RTL we cannot control the length of the names. Therefore there are lines that exceed 80 characters. All the new MMR definitions are added in this patch. The patches that follow then update the references. The last patch is a "trim" patch which reduces the size of the MMR definitions file by about a third. This keeps "bi-sectability" in place as the intermediate patches would not compile correctly if the trimmed MMR defines were done first. Signed-off-by: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/20130211194508.326204556@gulag1.americas.sgi.com Acked-by: Russ Anderson <rja@sgi.com> Reviewed-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-11 17:17:33 -08:00
Rafael J. Wysocki	48694bdb38	Merge branch 'acpica' * acpica: (56 commits) ACPICA: Update version to 20130117 ACPICA: Update predefined info table for _MLS method ACPICA: Remove some extraneous newlines in ACPI_ERROR type calls ACPICA: iASL/Disassembler: Add option to ignore NOOP opcodes/operators ACPICA: AcpiGetSleepTypeData: Allow \_Sx to return either 1 or 2 integers ACPICA: Update ACPICA copyrights to 2013 ACPICA: Update predefined info table ACPICA: Cleanup table handler naming conflicts. ACPICA: Source restructuring: split large files into 8 new files. ACPICA: Cleanup PM_TIMER_FREQUENCY definition. ACPICA: Cleanup ACPI_DEBUG_PRINT macros to fix potential build breakages. ACPICA: Update version to `20121220`. ACPICA: Interpreter: Fix Store() when implicit conversion is not possible. ACPICA: Resources: Split interrupt share/wake bits into two fields. ACPICA: Resources: Support for ACPI 5 wake bit in ExtendedInterrupt descriptor. ACPICA: Interpreter: Add warning if 64-bit constant appears in 32-bit table. ACPICA: Update ACPICA initialization messages. ACPICA: Namespace: Eliminate dot...dot output during initialization. ACPICA: Resource manager: Add support for ACPI 5 wake bit in IRQ descriptor. ACPICA: Fix possible memory leak in dispatcher error path. ...	2013-02-11 13:20:33 +01:00
Len Brown	27be457000	x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok flag Remove 32-bit x86 a cmdline param "no-hlt", and the cpuinfo_x86.hlt_works_ok that it sets. If a user wants to avoid HLT, then "idle=poll" is much more useful, as it avoids invocation of HLT in idle, while "no-hlt" failed to do so. Indeed, hlt_works_ok was consulted in only 3 places. First, in /proc/cpuinfo where "hlt_bug yes" would be printed if and only if the user booted the system with "no-hlt" -- as there was no other code to set that flag. Second, check_hlt() would not invoke halt() if "no-hlt" were on the cmdline. Third, it was consulted in stop_this_cpu(), which is invoked by native_machine_halt()/reboot_interrupt()/smp_stop_nmi_callback() -- all cases where the machine is being shutdown/reset. The flag was not consulted in the more frequently invoked play_dead()/hlt_play_dead() used in processor offline and suspend. Since Linux-3.0 there has been a run-time notice upon "no-hlt" invocations indicating that it would be removed in 2012. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org	2013-02-10 03:32:22 -05:00
Len Brown	69fb3676df	x86 idle: remove mwait_idle() and "idle=mwait" cmdline param mwait_idle() is a C1-only idle loop intended to be more efficient than HLT, starting on Pentium-4 HT-enabled processors. But mwait_idle() has been replaced by the more general mwait_idle_with_hints(), which handles both C1 and deeper C-states. ACPI processor_idle and intel_idle use only mwait_idle_with_hints(), and no longer use mwait_idle(). Here we simplify the x86 native idle code by removing mwait_idle(), and the "idle=mwait" bootparam used to invoke it. Since Linux 3.0 there has been a boot-time warning when "idle=mwait" was invoked saying it would be removed in 2012. This removal was also noted in the (now removed:-) feature-removal-schedule.txt. After this change, kernels configured with (CONFIG_ACPI=n && CONFIG_INTEL_IDLE=n) when run on hardware that supports MWAIT will simply use HLT. If MWAIT is desired on those systems, cpuidle and the cpuidle drivers above can be enabled. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org	2013-02-10 03:03:41 -05:00
Len Brown	6a377ddc4e	xen idle: make xen-specific macro xen-specific This macro is only invoked by Xen, so make its definition specific to Xen. > set_pm_idle_to_default() < xen_set_default_idle() Signed-off-by: Len Brown <len.brown@intel.com> Cc: xen-devel@lists.xensource.com	2013-02-10 01:06:34 -05:00
Len Brown	e022e7eb90	intel_idle: remove assumption of one C-state per MWAIT flag Remove the assumption that cstate_tables are indexed by MWAIT flag values. Each entry identifies itself via its own flags value. This change is needed to support multiple states that share the same MWAIT flags. Note that this can have an effect on what state is described by 'N' on cmdline intel_idle.max_cstate=N on some systems. intel_idle.max_cstate=0 still disables the driver intel_idle.max_cstate=1 still results in just C1(E) However, "place holders" in the sparse C-state name-space (eg. Atom) have been removed. Signed-off-by: Len Brown <len.brown@intel.com>	2013-02-08 19:29:16 -05:00
Len Brown	137ecc779c	intel_idle: remove use and definition of MWAIT_MAX_NUM_CSTATES Cosmetic only. Replace use of MWAIT_MAX_NUM_CSTATES with CPUIDLE_STATE_MAX. They are both 8, so this patch has no functional change. The reason to change is that intel_idle will soon be able to export more than the 8 "major" states supported by MWAIT. When we hit that limit, it is important to know where the limit comes from. Signed-off-by: Len Brown <len.brown@intel.com>	2013-02-08 19:28:10 -05:00
Len Brown	6792041834	tools/power turbostat: decode MSR_IA32_POWER_CTL When verbose is enabled, print the C1E-Enable bit in MSR_IA32_POWER_CTL. also delete some redundant tests on the verbose variable. Signed-off-by: Len Brown <len.brown@intel.com>	2013-02-08 19:26:16 -05:00
Ville Syrjälä	96477b4cd7	x86-32: Add support for 64bit get_user() Implement __get_user_8() for x86-32. It will return the 64-bit result in edx:eax register pair, and ecx is used to pass in the address and return the error value. For consistency, change the register assignment for all other __get_user_x() variants, so that address is passed in ecx/rcx, the error value is returned in ecx/rcx, and eax/rax contains the actual value. [ hpa: I modified the patch so that it does NOT change the calling conventions for the existing callsites, this also means that the code is completely unchanged for 64 bits. Instead, continue to use eax for address input/error output and use the ecx:edx register pair for the output. ] This is a partial refresh of a patch [1] by Jamie Lokier from 2004. Only the minimal changes to implement 64bit get_user() were picked from the original patch. [1] http://article.gmane.org/gmane.linux.kernel/198823 Originally-by: Jamie Lokier <jamie@shareable.org> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://lkml.kernel.org/r/1355312043-11467-1-git-send-email-ville.syrjala@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-07 15:07:28 -08:00
H. Peter Anvin	bb9b1a834f	Retract MCE-specific UAPI exports which are unused and shouldn't be used. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQ7XZeAAoJEBLB8Bhh3lVKzAAQAKJPmNM4jK3o7QixkySFOKAN GKDmciwBCSLDvhDU5bXSKEzhxaWq+/Wp8aTy0u28TUtIq+Dnya8+9xAMZqe56lNv slBNk1N2IIJBqZoul1FCY7MyBnss19nh2h/jN9PChcY9dGpgkRnKc3rkjjy4UJ10 YibUm/wLuzNGCskt6nGO4uVSDXyv5W6PCnpOb6IW5KFywHE0ojsVwZQM5Vnakjrd vsyicHHaIXSnRtyDWGZSX4JxY7HDK7+v03OHNvO3FtFzNjTJ8r1qU7LVbC1yBPP6 uFN+piBUYn6q6Gsy67H1FcufGTz0IzFwyx/akW1MkgdN4FggrLiFf0a10Fh0Dx3s jd2tuCv72rfOQUZAYg9kobHk1NecCZt0Wt3NWul3WjZHZC35nFgznnUiznAPhoCI /sBKE2VQt0CejigNTIEpaToc/kHuQy3RzmMphknROEehXdm79TqdirNNHKrDkYug U9SV4nsOaRFcSZDPXK52xFIEb9Uc90zSWqNblhBRXt7sBC34EqZk95jYDgOrZxHQ v8iNieYgOezXzIyyUAHC7dlIUbkHlBsPG+8eEoWFcvOBI3UbGvOv3abMO18yBZqr 28An+v8pMsT9Q5sUwHFwvblD/vDaj/NIO0OP+y40Z96KnXz8Cm4mpkzBvBdVVxUs cmaDId47QRqdvIW2mBZZ =2Zr4 -----END PGP SIGNATURE----- Merge tag 'ras_for_3.8' into x86/urgent Retract MCE-specific UAPI exports which are unused and shouldn't be used. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-06 14:18:53 -08:00
Jacob Shin	9f19010af8	perf/x86/amd: Use proper naming scheme for AMD bit field definitions Update these AMD bit field names to be consistent with naming convention followed by the rest of the file. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Acked-by: Stephane Eranian <eranian@google.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1360171589-6381-4-git-send-email-jacob.shin@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-06 19:45:23 +01:00
Gleb Natapov	b0da5bec30	KVM: VMX: add missing exit names to VMX_EXIT_REASONS array Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-02-05 23:29:46 -02:00
Al Viro	5b3eb3ade4	x86: switch to generic old sigaction Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:27 -05:00
Al Viro	29fd448084	x86: switch to generic compat rt_sigaction() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:26 -05:00
Al Viro	d7c43e4afb	x86: switch to generic compat sched_rr_get_interval() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:26 -05:00
Al Viro	15ce1f7154	x86,um: switch to generic old sigsuspend() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:26 -05:00
Al Viro	7b83d1a297	x86: switch to generic compat rt_sigqueueinfo() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:25 -05:00
Al Viro	f45adb0499	x86: switch to generic compat rt_sigpending() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:25 -05:00
Al Viro	49cb25e929	x86: get rid of pt_regs argument in vm86/vm86old Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:24 -05:00
Al Viro	3fe26fa34d	x86: get rid of pt_regs argument in sigreturn variants Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:24 -05:00
Al Viro	b3af11afe0	x86: get rid of pt_regs argument of iopl(2) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:24 -05:00
Al Viro	574c4866e3	consolidate kernel-side struct sigaction declarations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 15:09:22 -05:00
Al Viro	92a3ce4a1e	consolidate declarations of k_sigaction Only alpha and sparc are unusual - they have ka_restorer in it. And nobody needs that exposed to userland. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 15:09:22 -05:00
H. Peter Anvin	68d00bbebb	Merge remote-tracking branch 'origin/x86/mm' into x86/mm2 Explicitly merging these two branches due to nontrivial conflicts and to allow further work. Resolved Conflicts: arch/x86/kernel/head32.c arch/x86/kernel/head64.c arch/x86/mm/init_64.c arch/x86/realmode/init.c Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-01 02:28:36 -08:00
H. Peter Anvin	bb112aec5e	x86-32, mm: Remove reference to resume_map_numa_kva() Remove reference to removed function resume_map_numa_kva(). Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20130131005616.1C79F411@kernel.stglabs.ibm.com	2013-01-31 14:12:30 -08:00
Boris Ostrovsky	f0322bd341	x86, AMD: Enable WC+ memory type on family 10 processors In some cases BIOS may not enable WC+ memory type on family 10 processors, instead converting what would be WC+ memory to CD type. On guests using nested pages this could result in performance degradation. This patch enables WC+. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> Link: http://lkml.kernel.org/r/1359495169-23278-1-git-send-email-ostr@amd64.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 13:35:38 -08:00
Fenghua Yu	086fc8f803	x86/tlbflush.h: Define __native_flush_tlb_global_irq_disabled() This function is called in __native_flush_tlb_global() and after apply_microcode_early(). Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1356075872-3054-8-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 13:19:16 -08:00
Fenghua Yu	a8ebf6d1d6	x86/microcode_core_early.c: Define interfaces for early loading ucode Define interfaces load_ucode_bsp() and load_ucode_ap() to load ucode on BSP and AP in early boot time. These are generic interfaces. Internally they call vendor specific implementations. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1356075872-3054-6-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 13:19:12 -08:00
Fenghua Yu	d288e1cf8e	x86/common.c: Make have_cpuid_p() a global function Remove static declaration in have_cpuid_p() to make it a global function. The function will be called in early loading microcode. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1356075872-3054-4-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 13:18:58 -08:00
Fenghua Yu	9cd4d78e21	x86/microcode_intel.h: Define functions and macros for early loading ucode Define some functions and macros that will be used in early loading ucode. Some of them are moved from microcode_intel.c driver in order to be called in early boot phase before module can be called. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1356075872-3054-3-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 13:18:50 -08:00
Linus Torvalds	04c2eee5b9	Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 EFI fixes from Peter Anvin: "This is a collection of fixes for the EFI support. The controversial bit here is a set of patches which bumps the boot protocol version as part of fixing some serious problems with the EFI handover protocol, used when booting under EFI using a bootloader as opposed to directly from EFI. These changes should also make it a lot saner to support cross-mode 32/64-bit EFI booting in the future. Getting these changes into 3.8 means we avoid presenting an inconsistent ABI to bootloaders. Other changes are display detection and fixing efivarfs." * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, efi: remove attribute check from setup_efi_pci x86, build: Dynamically find entry points in compressed startup code x86, efi: Fix PCI ROM handing in EFI boot stub, in 32-bit mode x86, efi: Fix 32-bit EFI handover protocol entry point x86, efi: Fix display detection in EFI boot stub x86, boot: Define the 2.12 bzImage boot protocol x86/boot: Fix minor fd leakage in tools/relocs.c x86, efi: Set runtime_version to the EFI spec revision x86, efi: fix 32-bit warnings in setup_efi_pci() efivarfs: Delete dentry from dcache in efivarfs_file_write() efivarfs: Never return ENOENT from firmware efi, x86: Pass a proper identity mapping in efi_call_phys_prelog efivarfs: Drop link count of the right inode	2013-01-31 17:10:36 +11:00
Matt Fleming	83e6818974	efi: Make 'efi_enabled' a function to query EFI facilities Originally 'efi_enabled' indicated whether a kernel was booted from EFI firmware. Over time its semantics have changed, and it now indicates whether or not we are booted on an EFI machine with bit-native firmware, e.g. 64-bit kernel with 64-bit firmware. The immediate motivation for this patch is the bug report at, https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557 which details how running a platform driver on an EFI machine that is designed to run under BIOS can cause the machine to become bricked. Also, the following report, https://bugzilla.kernel.org/show_bug.cgi?id=47121 details how running said driver can also cause Machine Check Exceptions. Drivers need a new means of detecting whether they're running on an EFI machine, as sadly the expression, if (!efi_enabled) hasn't been a sufficient condition for quite some time. Users actually want to query 'efi_enabled' for different reasons - what they really want access to is the list of available EFI facilities. For instance, the x86 reboot code needs to know whether it can invoke the ResetSystem() function provided by the EFI runtime services, while the ACPI OSL code wants to know whether the EFI config tables were mapped successfully. There are also checks in some of the platform driver code to simply see if they're running on an EFI machine (which would make it a bad idea to do BIOS-y things). This patch is a prereq for the samsung-laptop fix patch. Cc: David Airlie <airlied@linux.ie> Cc: Corentin Chary <corentincj@iksaif.net> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Olof Johansson <olof@lixom.net> Cc: Peter Jones <pjones@redhat.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Steve Langasek <steve.langasek@canonical.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: <stable@vger.kernel.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-30 11:51:59 -08:00
Yinghai Lu	0e691cf824	x86, kexec, 64bit: Only set ident mapping for ram. We should set mappings only for usable memory ranges under max_pfn Otherwise causes same problem that is fixed by x86, mm: Only direct map addresses that are marked as E820_RAM This patch exposes pfn_mapped array, and only sets ident mapping for ranges in that array. This patch relies on new kernel_ident_mapping_init that could handle existing pgd/pud between different calls. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-25-git-send-email-yinghai@kernel.org Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:26:35 -08:00
Yinghai Lu	577af55d80	x86, kexec: Remove 1024G limitation for kexec buffer on 64bit Now 64bit kernel supports more than 1T ram and kexec tools could find buffer above 1T, remove that obsolete limitation. and use MAXMEM instead. Tested on system with more than 1024G ram. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-22-git-send-email-yinghai@kernel.org Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:26:23 -08:00
H. Peter Anvin	8170e6bed4	x86, 64bit: Use a #PF handler to materialize early mappings on demand Linear mode (CR0.PG = 0) is mutually exclusive with 64-bit mode; all 64-bit code has to use page tables. This makes it awkward before we have first set up properly all-covering page tables to access objects that are outside the static kernel range. So far we have dealt with that simply by mapping a fixed amount of low memory, but that fails in at least two upcoming use cases: 1. We will support load and run kernel, struct boot_params, ramdisk, command line, etc. above the 4 GiB mark. 2. need to access ramdisk early to get microcode to update that as early possible. We could use early_iomap to access them too, but it will make code to messy and hard to be unified with 32 bit. Hence, set up a #PF table and use a fixed number of buffers to set up page tables on demand. If the buffers fill up then we simply flush them and start over. These buffers are all in __initdata, so it does not increase RAM usage at runtime. Thus, with the help of the #PF handler, we can set the final kernel mapping from blank, and switch to init_level4_pgt later. During the switchover in head_64.S, before #PF handler is available, we use three pages to handle kernel crossing 1G, 512G boundaries with sharing page by playing games with page aliasing: the same page is mapped twice in the higher-level tables with appropriate wraparound. The kernel region itself will be properly mapped; other mappings may be spurious. early_make_pgtable is using kernel high mapping address to access pages to set page table. -v4: Add phys_base offset to make kexec happy, and add init_mapping_kernel() - Yinghai -v5: fix compiling with xen, and add back ident level3 and level2 for xen also move back init_level4_pgt from BSS to DATA again. because we have to clear it anyway. - Yinghai -v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai -v7: remove not needed clear_page for init_level4_page it is with fill 512,8,0 already in head_64.S - Yinghai -v8: we need to keep that handler alive until init_mem_mapping and don't let early_trap_init to trash that early #PF handler. So split early_trap_pf_init out and move it down. - Yinghai -v9: switchover only cover kernel space instead of 1G so could avoid touch possible mem holes. - Yinghai -v11: change far jmp back to far return to initial_code, that is needed to fix failure that is reported by Konrad on AMD systems. - Yinghai Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-12-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:20:06 -08:00
Yinghai Lu	4f7b92263a	x86, realmode: Separate real_mode reserve and setup After we switch to use #PF handler help to set page table, init_level4_pgt will only have entries set after init_mem_mapping(). We need to move copying init_level4_pgt to trampoline_pgd after that. So split reserve and setup, and move the setup after init_mem_mapping() Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-11-git-send-email-yinghai@kernel.org Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com> Acked-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:13:24 -08:00
Yinghai Lu	aece27851d	x86, 64bit, mm: Add generic kernel/ident mapping helper It is simple version for kernel_physical_mapping_init. it will work to build one page table that will be used later. Use mapping_info to control 1. alloc_pg_page method 2. if PMD is EXEC, 3. if pgd is with kernel low mapping or ident mapping. Will use to replace some local versions in kexec, hibernation and etc. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-8-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:12:25 -08:00
H. Peter Anvin	de65d816aa	Merge remote-tracking branch 'origin/x86/boot' into x86/mm2 Coming patches to x86/mm2 require the changes and advanced baseline in x86/boot. Resolved Conflicts: arch/x86/kernel/setup.c mm/nobootmem.c Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:10:15 -08:00
H. Peter Anvin	5dcd14ecd4	x86, boot: Sanitize boot_params if not zeroed on creation Use the new sentinel field to detect bootloaders which fail to follow protocol and don't initialize fields in struct boot_params that they do not explicitly initialize to zero. Based on an original patch and research by Yinghai Lu. Changed by hpa to be invoked both in the decompression path and in the kernel proper; the latter for the case where a bootloader takes over decompression. Originally-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 01:22:17 -08:00
Yang Zhang	c7c9c56ca2	x86, apicv: add virtual interrupt delivery support Virtual interrupt delivery avoids KVM to inject vAPIC interrupts manually, which is fully taken care of by the hardware. This needs some special awareness into existing interrupr injection path: - for pending interrupt, instead of direct injection, we may need update architecture specific indicators before resuming to guest. - A pending interrupt, which is masked by ISR, should be also considered in above update action, since hardware will decide when to inject it at right time. Current has_interrupt and get_interrupt only returns a valid vector from injection p.o.v. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-01-29 10:48:19 +02:00
Yang Zhang	8d14695f95	x86, apicv: add virtual x2apic support basically to benefit from apicv, we need to enable virtualized x2apic mode. Currently, we only enable it when guest is really using x2apic. Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled x2apic: 0x800 - 0x8ff: no read intercept for apicv register virtualization, except APIC ID and TMCCT which need software's assistance to get right value. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-01-29 10:48:06 +02:00
Yang Zhang	83d4c28693	x86, apicv: add APICv register virtualization support - APIC read doesn't cause VM-Exit - APIC write becomes trap-like Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-01-29 10:47:54 +02:00
Ingo Molnar	e9b6025bf8	Cleanup X86 IOAPIC code from interrupt remapping details These patches move all interrupt remapping specific checks out of the x86 core code and replaces the respective call-sites with function pointers. As a result the interrupt remapping code is better abstraced from x86 core interrupt handling code. The code was rebased to v3.8-rc4 and tested on systems with AMD-Vi and Intel VT-d (both capable of interrupt remapping). The systems were tested with IOMMU enabled and with IOMMU disabled. No issues were found. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRBpGoAAoJECvwRC2XARrjxn4P/2sNpncRptR37HBP4Dl+Rd+u lnp9agUzzHQ67j+1o4bYGknhgfxGqDHJKNW41AinciE1EiyX1QAdFUK1mNEmtRrL dgsoZZdZjKIlNU8rsTon18UJvUT8wR9s8I/+DAVzO5WkRbCQWQ0kC9Ft4QQSKIfs M0dg8eAauIGp7/UsmttZIY9PT1KOhMWPDAaCsEYSLgfXO3VzXNtlZIjQI8bQJhSs DLCoge7bHfUIk13p5WM73qMFDXEXNkHgTDqMjXYB9uJiHSWyGg1vs2dV5rc0fbHQ KBin1NwFwT2HveMlGVy1DVgbdDEy5e7uvPnyHIwI6JedAA2MxYqOsL/VbLnRCgvt OH3bHoGrBJDBEwv/tleyf8SyJ6TO4px6TlOzUlIHHu+33Bc0uKigKXLuXwvTus0q 3cP++X6jW2yn0f65Lpr7LTu/02FRaKMvHWpU1T7yc+rXwJREn1/0bEdYC/fb1u3N GT2aI2orlGMwLtJ1mcxbveKWBnPN5RBrOe0Zha6ycVw7aJol2fAIczSnUq/6SHDQ eFXS7nMqb2J5p8IgQbB5wHDovsqWEBbGC9bEYkjxQEjqD8YgLnIHDR4KgRzhFI1f LX19+wsBkxzwQX8sImEaKzWMoSAl0Sui01s0tc7nJkwfhnbMhIes7pfqkUx0d4ZM AK0It3VSCw4hCnCUC+b2 =2d89 -----END PGP SIGNATURE----- Merge tag 'ioapic-cleanups-for-tip' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu into x86/apic Pull "x86 IOAPIC code from interrupt remapping details cleanups" from Joerg Roedel: "These patches move all interrupt remapping specific checks out of the x86 core code and replaces the respective call-sites with function pointers. As a result the interrupt remapping code is better abstraced from x86 core interrupt handling code. The code was rebased to v3.8-rc4 and tested on systems with AMD-Vi and Intel VT-d (both capable of interrupt remapping). The systems were tested with IOMMU enabled and with IOMMU disabled. No issues were found." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-01-29 09:14:11 +01:00
Alok N Kataria	3b4a505821	x86, kvm: Fix intialization warnings in kvm.c With commit: `4cca6ea04d` ("x86/apic: Allow x2apic without IR on VMware platform") we started seeing "incompatible initialization" warning messages, since x2apic_available() expects a bool return type while kvm_para_available() returns an int. Reported by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Alok N Kataria <akataria@vmware.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-01-29 09:13:49 +01:00
David Woodhouse	2b9b6d8c71	x86: Require MOVBE feature in cpuid when we use it Add MOVBE to asm/required-features.h so we check for it during startup and don't bother checking for it later. CONFIG_MATOM is used because it corresponds to -march=atom in the Makefiles. If the rules get more complicated it may be necessary to make this an explicit Kconfig option which uses -mmovbe/-mno-movbe to control the use of this instruction explicitly. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Link: http://lkml.kernel.org/r/1359395390.3529.65.camel@shinybook.infradead.org [ hpa: added a patch description ] Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-28 16:59:55 -08:00
Joerg Roedel	a1bb20c232	x86, irq: Move irq_remapped out of x86 core code The irq_remapped function is only used in IOMMU code after the last patch. So move its definition there too. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-01-28 12:51:52 +01:00
Joerg Roedel	da165322df	x86, io_apic: Introduce eoi_ioapic_pin call-back This callback replaces the old __eoi_ioapic_pin function which needs a special path for interrupt remapping. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-01-28 12:51:52 +01:00
Joerg Roedel	7601384f91	x86, msi: Introduce x86_msi.compose_msi_msg call-back This call-back points to the right function for initializing the msi_msg structure. The old code for msi_msg generation was split up into the irq-remapped and the default case. The irq-remapped case just calls into the specific Intel or AMD implementation when the device is behind an IOMMU. Otherwise the default function is called. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-01-28 12:42:48 +01:00
Joerg Roedel	2976fd8417	x86, irq: Introduce setup_remapped_irq() This function does irq-remapping specific interrupt setup like modifying the chip defaults. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-01-28 12:17:28 +01:00
Joerg Roedel	9b1b0e42f5	x86, io-apic: Move CONFIG_IRQ_REMAP code out of x86 core Move all the code to either to the header file asm/irq_remapping.h or to drivers/iommu/. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-01-28 12:17:27 +01:00
Joerg Roedel	819508d302	x86, irq: Add data structure to keep AMD specific irq remapping information Add a data structure to store information the IOMMU driver can use to get from a 'struct irq_cfg' to the remapping entry. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-01-28 12:17:27 +01:00
Joerg Roedel	078e1ee26a	x86, irq: Move irq_remapping_enabled declaration to iommu code Remove the last left-over from this flag from x86 code. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-01-28 12:17:26 +01:00
Joerg Roedel	6a9f5de272	x86, io_apic: Move irq_remapping_enabled checks out of check_timer() Move these checks to IRQ remapping code by introducing the panic_on_irq_remap() function. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-01-28 12:17:26 +01:00
Joerg Roedel	a6a25dd327	x86, io_apic: Convert setup_ioapic_entry to function pointer This pointer is changed to a different function when IRQ remapping is enabled. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-01-28 12:17:26 +01:00
Joerg Roedel	373dd7a27f	x86, io_apic: Introduce set_affinity function pointer With interrupt remapping a special function is used to change the affinity of an IO-APIC interrupt. Abstract this with a function pointer. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-01-28 12:17:26 +01:00
Joerg Roedel	5afba62cc8	x86, msi: Use IRQ remapping specific setup_msi_irqs routine Use seperate routines to setup MSI IRQs for both irq_remapping_enabled cases. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-01-28 12:17:25 +01:00
Joerg Roedel	71054d8841	x86, hpet: Introduce x86_msi_ops.setup_hpet_msi This function pointer can be overwritten by the IRQ remapping code. The irq_remapping_enabled check can be removed from default_setup_hpet_msi. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-01-28 10:48:30 +01:00
Joerg Roedel	afcc8a40a0	x86, io_apic: Introduce x86_io_apic_ops.print_entries for debugging This call-back is used to dump IO-APIC entries for debugging purposes into the kernel log. VT-d needs a special routine for this and will overwrite the default. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-01-28 10:48:30 +01:00
Joerg Roedel	1c4248ca4e	x86, io_apic: Introduce x86_io_apic_ops.disable() This function pointer is used to call a system-specific function for disabling the IO-APIC. Currently this is used for IRQ remapping which has its own disable routine. Also introduce the necessary infrastructure in the interrupt remapping code to overwrite this and other function pointers as necessary by interrupt remapping. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-01-28 10:48:30 +01:00
H. Peter Anvin	09c205afde	x86, boot: Define the 2.12 bzImage boot protocol Define the 2.12 bzImage boot protocol: add xloadflags and additional fields to allow the command line, initramfs and struct boot_params to live above the 4 GiB mark. The xloadflags now communicates if this is a 64-bit kernel with the legacy 64-bit entry point and which of the EFI handover entry points are supported. Avoid adding new read flags to loadflags because of claimed bootloaders testing the whole byte for == 1 to determine bzImageness at least until the issue can be researched further. This is based on patches by Yinghai Lu and David Woodhouse. Originally-by: Yinghai Lu <yinghai@kernel.org> Originally-by: David Woodhouse <dwmw2@infradead.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: David Woodhouse <dwmw2@infradead.org> Acked-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org Cc: Rob Landley <rob@landley.net> Cc: Gokul Caushik <caushik1@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joe Millenbach <jmillenbach@gmail.com>	2013-01-27 15:56:37 -08:00
Dave Hansen	d765653445	x86, mm: Create slow_virt_to_phys() This is necessary because __pa() does not work on some kinds of memory, like vmalloc() or the alloc_remap() areas on 32-bit NUMA systems. We have some functions to do conversions _like_ this in the vmalloc() code (like vmalloc_to_page()), but they do not work on sizes other than 4k pages. We would potentially need to be able to handle all the page sizes that we use for the kernel linear mapping (4k, 2M, 1G). In practice, on 32-bit NUMA systems, the percpu areas get stuck in the alloc_remap() area. Any __pa() call on them will break and basically return garbage. This patch introduces a new function slow_virt_to_phys(), which walks the kernel page tables on x86 and should do precisely the same logical thing as __pa(), but actually work on a wider range of memory. It should work on the normal linear mapping, vmalloc(), kmap(), etc... Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20130122212433.4D1FCA62@kernel.stglabs.ibm.com Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-25 16:33:23 -08:00
Dave Hansen	4cbeb51b86	x86, mm: Pagetable level size/shift/mask helpers I plan to use lookup_address() to walk the kernel pagetables in a later patch. It returns a "pte" and the level in the pagetables where the "pte" was found. The level is just an enum and needs to be converted to a useful value in order to do address calculations with it. These helpers will be used in at least two places. This also gives the anonymous enum a real name so that no one gets confused about what they should be passing in to these helpers. "PTE_SHIFT" was chosen for naming consistency with the other pagetable levels (PGD/PUD/PMD_SHIFT). Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20130122212431.405D3A8C@kernel.stglabs.ibm.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-25 16:33:22 -08:00
H. Peter Anvin	7b5c4a65cc	Linux 3.8-rc5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQEcBAABAgAGBQJRAuO3AAoJEHm+PkMAQRiGbfAH/1C3QQKB11aBpYLAw7qijAze yOui26UCnwRryxsO8zBCQjGoByy5DvY/Q0zyUCWUE6nf/JFSoKGUHzfJ1ATyzGll 3vENP6Fnmq0Hgc4t8/gXtXrZ1k/c43cYA2XEhDnEsJlFNmNj2wCQQj9njTNn2cl1 k6XhZ9U1V2hGYpLL5bmsZiLVI6dIpkCVw8d4GZ8BKxSLUacVKMS7ml2kZqxBTMgt AF6T2SPagBBxxNq8q87x4b7vyHYchZmk+9tAV8UMs1ecimasLK8vrRAJvkXXaH1t xgtR0sfIp5raEjoFYswCK+cf5NEusLZDKOEvoABFfEgL4/RKFZ8w7MMsmG8m0rk= =m68Y -----END PGP SIGNATURE----- Merge tag 'v3.8-rc5' into x86/mm The __pa() fixup series that follows touches KVM code that is not present in the existing branch based on v3.7-rc5, so merge in the current upstream from Linus. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-25 16:31:21 -08:00
Jan Beulich	f317820cb6	x86/xor: Add alternative SSE implementation only prefetching once per 64-byte line On CPUs with 64-byte last level cache lines, this yields roughly 10% better performance, independent of CPU vendor or specific model (as far as I was able to test). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/5093E4B802000078000A615E@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-01-25 09:23:50 +01:00
Jan Beulich	e8f6e3f8a1	x86/xor: Unify SSE-base xor-block routines Besides folding duplicate code, this has the advantage of fixing x86-64's failure to use proper (para-virtualizable) accessors for dealing with CR0.TS. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/5093E47602000078000A615B@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-01-25 09:23:50 +01:00
Kirill A. Shutemov	602e018607	x86/mm: Convert update_mmu_cache() and update_mmu_cache_pmd() to functions Converting macros to functions unhide type problems before changes will be integrated and trigger problems on other architectures. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-01-24 16:12:13 +01:00
Alex Shi	57c4f43043	arch/x86/platform/uv: Fix incorrect tlb flush all issue The flush tlb optimization code has logical issue on UV platform. It doesn't flush the full range at all, since it simply ignores its 'end' parameter (and hence also the "all" indicator) in uv_flush_tlb_others() function. Cliff's notes: \| I tested the patch on a UV. It has the effect of either \| clearing 1 or all TLBs in a cpu. I added some debugging to \| test for the cases when clearing all TLBs is overkill, and in \| practice it happens very seldom. Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Alex Shi <alex.shi@intel.com> Signed-off-by: Cliff Wickman <cpw@sgi.com> Tested-by: Cliff Wickman <cpw@sgi.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-01-24 15:58:54 +01:00
Alok N Kataria	4cca6ea04d	x86/apic: Allow x2apic without IR on VMware platform This patch updates x2apic initializaition code to allow x2apic on VMware platform even without interrupt remapping support. The hypervisor_x2apic_available hook was added in x2apic initialization code and used by KVM and XEN, before this. I have also cleaned up that code to export this hook through the hypervisor_x86 structure. Compile tested for KVM and XEN configs, this patch doesn't have any functional effect on those two platforms. On VMware platform, verified that x2apic is used in physical mode on products that support this. Signed-off-by: Alok N Kataria <akataria@vmware.com> Reviewed-by: Doug Covelli <dcovelli@vmware.com> Reviewed-by: Dan Hecht <dhecht@vmware.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Avi Kivity <avi@redhat.com> Link: http://lkml.kernel.org/r/1358466282.423.60.camel@akataria-dtop.eng.vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-01-24 13:11:18 +01:00
Jan Beulich	d59fe3f13d	ix86: Tighten asmlinkage_protect() constraints While the description of the commit that originally introduced asmlinkage_protect() validly says that this doesn't guarantee clobbering of the function arguments, using "m" constraints rather than "g" ones reduces the risk (by making it less attractive to the compiler to move those variables into registers) and generally results in better code (because we know the arguments are in memory anyway, and are frequently - if not always - used just once, with the second [compiler visible] use in asmlinkage_protect() itself being a fake one). Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: <roland@hack.frob.com> Cc: <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/50FE84EC02000078000B83B7@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-01-24 11:25:59 +01:00
Xiao Guangrong	93c05d3ef2	KVM: x86: improve reexecute_instruction The current reexecute_instruction can not well detect the failed instruction emulation. It allows guest to retry all the instructions except it accesses on error pfn For example, some cases are nested-write-protect - if the page we want to write is used as PDE but it chains to itself. Under this case, we should stop the emulation and report the case to userspace Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-21 22:58:33 -02:00
Masami Hiramatsu	06aeaaeabf	ftrace: Move ARCH_SUPPORTS_FTRACE_SAVE_REGS in Kconfig Move SAVE_REGS support flag into Kconfig and rename it to CONFIG_DYNAMIC_FTRACE_WITH_REGS. This also introduces CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS which indicates the architecture depending part of ftrace has a code that saves full registers. On the other hand, CONFIG_DYNAMIC_FTRACE_WITH_REGS indicates the code is enabled. Link: http://lkml.kernel.org/r/20120928081516.3560.72534.stgit@ltc138.sdl.hitachi.co.jp Cc: Ingo Molnar <mingo@elte.hu> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-01-21 13:22:35 -05:00
Bjorn Helgaas	708b59bfe1	Merge branch 'pci/rafael-set-root-bridge-handle' into next * pci/rafael-set-root-bridge-handle: ACPI / PCI: Set root bridge ACPI handle in advance	2013-01-17 16:00:36 -07:00
Takuya Yoshikawa	e12091ce7b	KVM: Remove unused slot_bitmap from kvm_mmu_page Not needed any more. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-01-14 11:13:58 +02:00
Rafael J. Wysocki	6c0cc950ae	ACPI / PCI: Set root bridge ACPI handle in advance The ACPI handles of PCI root bridges need to be known to acpi_bind_one(), so that it can create the appropriate "firmware_node" and "physical_node" files for them, but currently the way it gets to know those handles is not exactly straightforward (to put it lightly). This is how it works, roughly: 1. acpi_bus_scan() finds the handle of a PCI root bridge, creates a struct acpi_device object for it and passes that object to acpi_pci_root_add(). 2. acpi_pci_root_add() creates a struct acpi_pci_root object, populates its "device" field with its argument's address (device->handle is the ACPI handle found in step 1). 3. The struct acpi_pci_root object created in step 2 is passed to pci_acpi_scan_root() and used to get resources that are passed to pci_create_root_bus(). 4. pci_create_root_bus() creates a struct pci_host_bridge object and passes its "dev" member to device_register(). 5. platform_notify(), which for systems with ACPI is set to acpi_platform_notify(), is called. So far, so good. Now it starts to be "interesting". 6. acpi_find_bridge_device() is used to find the ACPI handle of the given device (which is the PCI root bridge) and executes acpi_pci_find_root_bridge(), among other things, for the given device object. 7. acpi_pci_find_root_bridge() uses the name (sic!) of the given device object to extract the segment and bus numbers of the PCI root bridge and passes them to acpi_get_pci_rootbridge_handle(). 8. acpi_get_pci_rootbridge_handle() browses the list of ACPI PCI root bridges and finds the one that matches the given segment and bus numbers. Its handle is then used to initialize the ACPI handle of the PCI root bridge's device object by acpi_bind_one(). However, this is exactly the ACPI handle we started with in step 1. Needless to say, this is quite embarassing, but it may be avoided thanks to commit `f3fd0c8` (ACPI: Allow ACPI handles of devices to be initialized in advance), which makes it possible to initialize the ACPI handle of a device before passing it to device_register(). Accordingly, add a new __weak routine, pcibios_root_bridge_prepare(), defaulting to an empty implementation that can be replaced by the interested architecutres (x86 and ia64 at the moment) with functions that will set the root bridge's ACPI handle before its dev member is passed to device_register(). Make both x86 and ia64 provide such implementations of pcibios_root_bridge_prepare() and remove acpi_pci_find_root_bridge() and acpi_get_pci_rootbridge_handle() that aren't necessary any more. Included is a fix for breakage on systems with non-ACPI PCI host bridges from Bjorn Helgaas. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2013-01-13 17:14:28 -07:00
Daniel J Blueman	8b84c8df38	x86, AMD, NB: Use u16 for northbridge IDs in amd_get_nb_id Change amd_get_nb_id to return u16 to support >255 memory controllers, and related consistency fixes. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/1353997932-8475-2-git-send-email-daniel@numascale-asia.com Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-10 16:17:58 +01:00
Daniel J Blueman	772c3ff385	x86, AMD, NB: Add multi-domain support Fix get_node_id to match northbridge IDs from the array of detected ones, allowing multi-server support such as with Numascale's NumaConnect, renaming to 'amd_get_node_id' for consistency. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/1353997932-8475-1-git-send-email-daniel@numascale-asia.com [Boris: shorten lines to fit 80 cols] Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-10 16:17:58 +01:00
Lv Zheng	0947c6dee3	ACPICA: Update compilation environment settings. This patch does not affect the generation of the Linux binary. This patch decreases 300 lines of 20121018 divergence.diff. This patch updates architecture specific environment settings for compiling ACPICA as such enhancement already has been done in ACPICA. Note that the appended compiler default settings in the <acpi/platform/acenv.h> will deprecate some of the macros defined in the architecture specific <asm/acpi.h>. Thus two of the <asm/acpi.h> headers have been cleaned up in this patch accordingly. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-01-10 12:36:17 +01:00
Borislav Petkov	f51bde6f0d	x86, MCE: Retract most UAPI exports Retract back most macro definitions which went into the user-visible mce.h header. Even though those bits are mostly hardware-defined/-architectural, their naming is not. If we export them to userspace, any kernel unification/renaming/cleanup cannot be done anymore since those are effectively cast in stone. Besides, if userspace wants those definitions, they can write their own defines and go crazy. Signed-off-by: Borislav Petkov <bp@suse.de>	2013-01-09 14:49:02 +01:00
Greg Kroah-Hartman	a18e3690a5	X86: drivers: remove __dev* attributes. CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, __devinitconst, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Daniel Drake <dsd@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-03 15:57:04 -08:00
Bjorn Helgaas	b7869ba17c	x86/PCI: Remove unused pci_root_bus pci_root_bus is unused, so remove all references to it. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2013-01-03 15:28:34 -07:00
Jesse Larrew	11393a077d	x86: kvm_para: fix typo in hypercall comments Correct a typo in the comment explaining hypercalls. Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-02 16:02:25 -02:00
Linus Torvalds	54d46ea993	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull signal handling cleanups from Al Viro: "sigaltstack infrastructure + conversion for x86, alpha and um, COMPAT_SYSCALL_DEFINE infrastructure. Note that there are several conflicts between "unify SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline; resolution is trivial - just remove definitions of SS_ONSTACK and SS_DISABLED from arch//uapi/asm/signal.h; they are all identical and include/uapi/linux/signal.h contains the unified variant." Fixed up conflicts as per Al. 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: alpha: switch to generic sigaltstack new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those generic compat_sys_sigaltstack() introduce generic sys_sigaltstack(), switch x86 and um to it new helper: compat_user_stack_pointer() new helper: restore_altstack() unify SS_ONSTACK/SS_DISABLE definitions new helper: current_user_stack_pointer() missing user_stack_pointer() instances Bury the conditionals from kernel_thread/kernel_execve series COMPAT_SYSCALL_DEFINE: infrastructure	2012-12-20 18:05:28 -08:00
Linus Torvalds	787314c35f	IOMMU Updates for Linux v3.8 A few new features this merge-window. The most important one is probably, that dma-debug now warns if a dma-handle is not checked with dma_mapping_error by the device driver. This requires minor changes to some architectures which make use of dma-debug. Most of these changes have the respective Acks by the Arch-Maintainers. Besides that there are updates to the AMD IOMMU driver for refactor the IOMMU-Groups support and to make sure it does not trigger a hardware erratum. The OMAP changes (for which I pulled in a branch from Tony Lindgren's tree) have a conflict in linux-next with the arm-soc tree. The conflict is in the file arch/arm/mach-omap2/clock44xx_data.c which is deleted in the arm-soc tree. It is safe to delete the file too so solve the conflict. Similar changes are done in the arm-soc tree in the common clock framework migration. A missing hunk from the patch in the IOMMU tree will be submitted as a seperate patch when the merge-window is closed. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJQzbQQAAoJECvwRC2XARrjXCIP/2RxBzbVOiaPOorl+ZWbsZ41 lzWiXsCHJkh4BK4/qGsVeKhiNd9LcbQUlhywnBbhWxym3spzmjGtvU2Hcg8QiO/M R83r9S4e8Z6DnF9Gcats1Ns9BufgpyhLXg3XoXPxtyHOgRS59fvYi6xXOxyX30Dy uhbj+WL6UD0zvOMNztEnM1p6UhX+XlpvzKDTR5+G5xKdVPkcgeiaKSwqz739caTn QE2NpqIh+8Mwuu1nIapk8h07xhUYU5eGMXa38u1LvDwSHsrsCMLC+lXIjtInn7Gw Bv+XcCHgtOaoPQwwk/xd2HVwJQxO9HNb5YX51EIjwP0C5S/3yW9Ji1RgqFb6Ewqq jIkF6ckwUheLWsBGkw5UknI/f7RX3MDiTWkziYLIniYKKewm+ymGfgIqPt2TzLIO tMZZiIssKvy7wOXQ5JjpYJg5Xmrau6opNwdEguC8pWkJT7qsn+3SeLjMt0Lh9IoY +37DOgOLb3O3/vnZJ3i0KMRZBfVeaRj5HaGmlxFCYUZCNQymIPTih9Jtqm+WuVcu YaGQCTtynsQ0JVh8YEekLzSfgd3OODP68fyCg1CQNixEgvUi2hd/toX2/Z1wkkSA JC9bZarcoPkSWqaTAA2HvmaaxvRR+0UbhFPopFTQarVV0MVLZWBxoyuKy/nMrmMd UgTzrDYy74UKdrSTwIXg =pPHZ -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: "A few new features this merge-window. The most important one is probably, that dma-debug now warns if a dma-handle is not checked with dma_mapping_error by the device driver. This requires minor changes to some architectures which make use of dma-debug. Most of these changes have the respective Acks by the Arch-Maintainers. Besides that there are updates to the AMD IOMMU driver for refactor the IOMMU-Groups support and to make sure it does not trigger a hardware erratum. The OMAP changes (for which I pulled in a branch from Tony Lindgren's tree) have a conflict in linux-next with the arm-soc tree. The conflict is in the file arch/arm/mach-omap2/clock44xx_data.c which is deleted in the arm-soc tree. It is safe to delete the file too so solve the conflict. Similar changes are done in the arm-soc tree in the common clock framework migration. A missing hunk from the patch in the IOMMU tree will be submitted as a seperate patch when the merge-window is closed." * tag 'iommu-updates-v3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (29 commits) ARM: dma-mapping: support debug_dma_mapping_error ARM: OMAP4: hwmod data: ipu and dsp to use parent clocks instead of leaf clocks iommu/omap: Adapt to runtime pm iommu/omap: Migrate to hwmod framework iommu/omap: Keep mmu enabled when requested iommu/omap: Remove redundant clock handling on ISR iommu/amd: Remove obsolete comment iommu/amd: Don't use 512GB pages iommu/tegra: smmu: Move bus_set_iommu after probe for multi arch iommu/tegra: gart: Move bus_set_iommu after probe for multi arch iommu/tegra: smmu: Remove unnecessary PTC/TLB flush all tile: dma_debug: add debug_dma_mapping_error support sh: dma_debug: add debug_dma_mapping_error support powerpc: dma_debug: add debug_dma_mapping_error support mips: dma_debug: add debug_dma_mapping_error support microblaze: dma-mapping: support debug_dma_mapping_error ia64: dma_debug: add debug_dma_mapping_error support c6x: dma_debug: add debug_dma_mapping_error support ARM64: dma_debug: add debug_dma_mapping_error support intel-iommu: Prevent devices with RMRRs from being placed into SI Domain ...	2012-12-20 10:07:25 -08:00
Al Viro	9026843952	generic compat_sys_sigaltstack() Again, conditional on CONFIG_GENERIC_SIGALTSTACK Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-12-19 18:07:41 -05:00
Al Viro	6bf9adfc90	introduce generic sys_sigaltstack(), switch x86 and um to it Conditional on CONFIG_GENERIC_SIGALTSTACK; architectures that do not select it are completely unaffected Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-12-19 18:07:40 -05:00
Al Viro	9b064fc3f9	new helper: compat_user_stack_pointer() Compat counterpart of current_user_stack_pointer(); for most of the biarch architectures those two are identical, but e.g. arm64 and arm use different registers for stack pointer... Note that amd64 variants of current_user_stack_pointer/compat_user_stack_pointer do not rely on pt_regs having been through FIXUP_TOP_OF_STACK. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-12-19 18:07:40 -05:00
Al Viro	031b656698	unify SS_ONSTACK/SS_DISABLE definitions Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-12-19 18:07:39 -05:00
Al Viro	ae903caae2	Bury the conditionals from kernel_thread/kernel_execve series All architectures have CONFIG_GENERIC_KERNEL_THREAD CONFIG_GENERIC_KERNEL_EXECVE __ARCH_WANT_SYS_EXECVE None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers of kernel_execve() (which is a trivial wrapper for do_execve() now) left. Kill the conditionals and make both callers use do_execve(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-12-19 18:07:38 -05:00
Linus Torvalds	6842d98de7	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull powertool update from Len Brown: "This updates the tree w/ the latest version of turbostat, which reports temperature and - on SNB and later - Watts." Fix up semantic merge conflict as per Len. * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools: Allow tools to be installed in a user specified location tools/power: turbostat: make Makefile a bit more capable tools/power x86_energy_perf_policy: close /proc/stat in for_every_cpu() tools/power turbostat: v3.0: monitor Watts and Temperature tools/power turbostat: fix output buffering issue tools/power turbostat: prevent infinite loop on migration error path x86 power: define RAPL MSRs tools/power/x86/turbostat: share kernel MSR #defines	2012-12-18 12:34:29 -08:00
David Rientjes	c36e0501ee	x86, paravirt: fix build error when thp is disabled With CONFIG_PARAVIRT=y and CONFIG_TRANSPARENT_HUGEPAGE=n, the build breaks because set_pmd_at() is undeclared: mm/memory.c: In function 'do_pmd_numa_page': mm/memory.c:3520: error: implicit declaration of function 'set_pmd_at' mm/mprotect.c: In function 'change_pmd_protnuma': mm/mprotect.c:120: error: implicit declaration of function 'set_pmd_at' This is because paravirt defines set_pmd_at() only when CONFIG_TRANSPARENT_HUGEPAGE=y and such a restriction is unneeded. The fix is to define it for all CONFIG_PARAVIRT configurations. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 09:49:03 -08:00
Andrew Morton	d7124073ad	create non-empty arch/x86/include/uapi/asm/ files patch(1) doesn't create zero-length files, so my kernel didn't compile. Put something in these files so patch(1) actually creates them. Cc: David Howells <dhowells@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:11 -08:00
Linus Torvalds	3d59eebc5e	Automatic NUMA Balancing V11 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQIcBAABAgAGBQJQx0kQAAoJEHzG/DNEskfi4fQP/R5PRovayroZALBMLnVJDaLD Ttr9p40VNXbiJ+MfRgatJjSSJZ4Jl+fC3NEqBhcwVZhckZZb9R2s0WtrSQo5+ZbB vdRfiuKoCaKM4cSZ08C12uTvsF6xjhjd27CTUlMkyOcDoKxMEFKelv0hocSxe4Wo xqlv3eF+VsY7kE1BNbgBP06SX4tDpIHRxXfqJPMHaSKQmre+cU0xG2GcEu3QGbHT DEDTI788YSaWLmBfMC+kWoaQl1+bV/FYvavIAS8/o4K9IKvgR42VzrXmaFaqrbgb 72ksa6xfAi57yTmZHqyGmts06qYeBbPpKI+yIhCMInxA9CY3lPbvHppRf0RQOyzj YOi4hovGEMJKE+BCILukhJcZ9jCTtS3zut6v1rdvR88f4y7uhR9RfmRfsxuW7PNj 3Rmh191+n0lVWDmhOs2psXuCLJr3LEiA0dFffN1z8REUTtTAZMsj8Rz+SvBNAZDR hsJhERVeXB6X5uQ5rkLDzbn1Zic60LjVw7LIp6SF2OYf/YKaF8vhyWOA8dyCEu8W CGo7AoG0BO8tIIr8+LvFe8CweypysZImx4AjCfIs4u9pu/v11zmBvO9NO5yfuObF BreEERYgTes/UITxn1qdIW4/q+Nr0iKO3CTqsmu6L1GfCz3/XzPGs3U26fUhllqi Ka0JKgnWvsa6ez6FSzKI =ivQa -----END PGP SIGNATURE----- Merge tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma Pull Automatic NUMA Balancing bare-bones from Mel Gorman: "There are three implementations for NUMA balancing, this tree (balancenuma), numacore which has been developed in tip/master and autonuma which is in aa.git. In almost all respects balancenuma is the dumbest of the three because its main impact is on the VM side with no attempt to be smart about scheduling. In the interest of getting the ball rolling, it would be desirable to see this much merged for 3.8 with the view to building scheduler smarts on top and adapting the VM where required for 3.9. The most recent set of comparisons available from different people are mel: https://lkml.org/lkml/2012/12/9/108 mingo: https://lkml.org/lkml/2012/12/7/331 tglx: https://lkml.org/lkml/2012/12/10/437 srikar: https://lkml.org/lkml/2012/12/10/397 The results are a mixed bag. In my own tests, balancenuma does reasonably well. It's dumb as rocks and does not regress against mainline. On the other hand, Ingo's tests shows that balancenuma is incapable of converging for this workloads driven by perf which is bad but is potentially explained by the lack of scheduler smarts. Thomas' results show balancenuma improves on mainline but falls far short of numacore or autonuma. Srikar's results indicate we all suffer on a large machine with imbalanced node sizes. My own testing showed that recent numacore results have improved dramatically, particularly in the last week but not universally. We've butted heads heavily on system CPU usage and high levels of migration even when it shows that overall performance is better. There are also cases where it regresses. Of interest is that for specjbb in some configurations it will regress for lower numbers of warehouses and show gains for higher numbers which is not reported by the tool by default and sometimes missed in treports. Recently I reported for numacore that the JVM was crashing with NullPointerExceptions but currently it's unclear what the source of this problem is. Initially I thought it was in how numacore batch handles PTEs but I'm no longer think this is the case. It's possible numacore is just able to trigger it due to higher rates of migration. These reports were quite late in the cycle so I/we would like to start with this tree as it contains much of the code we can agree on and has not changed significantly over the last 2-3 weeks." * tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma: (50 commits) mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable mm/rmap: Convert the struct anon_vma::mutex to an rwsem mm: migrate: Account a transhuge page properly when rate limiting mm: numa: Account for failed allocations and isolations as migration failures mm: numa: Add THP migration for the NUMA working set scanning fault case build fix mm: numa: Add THP migration for the NUMA working set scanning fault case. mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG mm: sched: numa: Control enabling and disabling of NUMA balancing mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships mm: numa: migrate: Set last_nid on newly allocated page mm: numa: split_huge_page: Transfer last_nid on tail page mm: numa: Introduce last_nid to the page frame sched: numa: Slowly increase the scanning period as NUMA faults are handled mm: numa: Rate limit setting of pte_numa if node is saturated mm: numa: Rate limit the amount of memory that is migrated between nodes mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting mm: numa: Migrate pages handled during a pmd_numa hinting fault mm: numa: Migrate on reference policy ...	2012-12-16 15:18:08 -08:00
Joerg Roedel	9c6ecf6a3a	Merge branches 'iommu/fixes', 'dma-debug', 'x86/amd', 'x86/vt-d', 'arm/tegra' and 'arm/omap' into next	2012-12-16 12:24:09 +01:00
Linus Torvalds	11520e5e7c	Revert "x86-64/efi: Use EFI to deal with platform wall clock (again)" This reverts commit `bd52276fa1` ("x86-64/efi: Use EFI to deal with platform wall clock (again)"), and the two supporting commits: `da5a108d05`: "x86/kernel: remove tboot 1:1 page table creation code" `185034e72d`: "x86, efi: 1:1 pagetable mapping for virtual EFI calls") as they all depend semantically on commit `53b87cf088` ("x86, mm: Include the entire kernel memory map in trampoline_pgd") that got reverted earlier due to the problems it caused. This was pointed out by Yinghai Lu, and verified by me on my Macbook Air that uses EFI. Pointed-out-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-15 15:20:41 -08:00
Linus Torvalds	1ed55eac3b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: - Added aesni/avx/x86_64 implementations for camellia. - Optimised AVX code for cast5/serpent/twofish/cast6. - Fixed vmac bug with unaligned input. - Allow compression algorithms in FIPS mode. - Optimised crc32c implementation for Intel. - Misc fixes. * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (32 commits) crypto: caam - Updated SEC-4.0 device tree binding for ERA information. crypto: testmgr - remove superfluous initializers for xts(aes) crypto: testmgr - allow compression algs in fips mode crypto: testmgr - add larger crc32c test vector to test FPU path in crc32c_intel crypto: testmgr - clean alg_test_null entries in alg_test_descs[] crypto: testmgr - remove fips_allowed flag from camellia-aesni null-tests crypto: cast5/cast6 - move lookup tables to shared module padata: use __this_cpu_read per-cpu helper crypto: s5p-sss - Fix compilation error crypto: picoxcell - Add terminating entry for platform_device_id table crypto: omap-aes - select BLKCIPHER2 crypto: camellia - add AES-NI/AVX/x86_64 assembler implementation of camellia cipher crypto: camellia-x86_64 - share common functions and move structures and function definitions to header file crypto: tcrypt - add async speed test for camellia cipher crypto: tegra-aes - fix error-valued pointer dereference crypto: tegra - fix missing unlock on error case crypto: cast5/avx - avoid using temporary stack buffers crypto: serpent/avx - avoid using temporary stack buffers crypto: twofish/avx - avoid using temporary stack buffers crypto: cast6/avx - avoid using temporary stack buffers ...	2012-12-15 12:35:19 -08:00
David Howells	af170c5061	UAPI: (Scripted) Disintegrate arch/x86/include/asm Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>	2012-12-14 22:37:13 +00:00
Linus Torvalds	d42b3a2906	Merge branch 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 EFI update from Peter Anvin: "EFI tree, from Matt Fleming. Most of the patches are the new efivarfs filesystem by Matt Garrett & co. The balance are support for EFI wallclock in the absence of a hardware-specific driver, and various fixes and cleanups." * 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) efivarfs: Make efivarfs_fill_super() static x86, efi: Check table header length in efi_bgrt_init() efivarfs: Use query_variable_info() to limit kmalloc() efivarfs: Fix return value of efivarfs_file_write() efivarfs: Return a consistent error when efivarfs_get_inode() fails efivarfs: Make 'datasize' unsigned long efivarfs: Add unique magic number efivarfs: Replace magic number with sizeof(attributes) efivarfs: Return an error if we fail to read a variable efi: Clarify GUID length calculations efivarfs: Implement exclusive access for {get,set}_variable efivarfs: efivarfs_fill_super() ensure we clean up correctly on error efivarfs: efivarfs_fill_super() ensure we free our temporary name efivarfs: efivarfs_fill_super() fix inode reference counts efivarfs: efivarfs_create() ensure we drop our reference on inode on error efivarfs: efivarfs_file_read ensure we free data in error paths x86-64/efi: Use EFI to deal with platform wall clock (again) x86/kernel: remove tboot 1:1 page table creation code x86, efi: 1:1 pagetable mapping for virtual EFI calls x86, mm: Include the entire kernel memory map in trampoline_pgd ...	2012-12-14 10:08:40 -08:00
Linus Torvalds	2d9c8b5d6a	Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS update from Ingo Molnar: "Rework all config variables used throughout the MCA code and collect them together into a mca_config struct. This keeps them tightly and neatly packed together instead of spilled all over the place. Then, convert those which are used as booleans into real booleans and save some space. These bits are exposed via /sys/devices/system/machinecheck/machinecheck/" 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, MCA: Finish mca_config conversion x86, MCA: Convert the next three variables batch x86, MCA: Convert rip_msr, mce_bootlog, monarch_timeout x86, MCA: Convert dont_log_ce, banks and tolerant drivers/base: Add a DEVICE_BOOL_ATTR macro	2012-12-14 09:59:59 -08:00
Alex Williamson	0f888f5acd	KVM: Increase user memory slots on x86 to 125 With the 3 private slots, this gives us a nice round 128 slots total. The primary motivation for this is to support more assigned devices. Each assigned device can theoretically use up to 8 slots (6 MMIO BARs, 1 ROM BAR, 1 spare for a split MSI-X table mapping) though it's far more typical for a device to use 3-4 slots. If we assume a typical VM uses a dozen slots for non-assigned devices purposes, we should always be able to support 14 worst case assigned devices or 28 to 37 typical devices. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-13 23:25:25 -02:00
Alex Williamson	0743247fbf	KVM: Make KVM_PRIVATE_MEM_SLOTS optional Seems like everyone copied x86 and defined 4 private memory slots that never actually get used. Even x86 only uses 3 of the 4. These aren't exposed so there's no need to add padding. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-13 23:21:58 -02:00
Alex Williamson	bbacc0c111	KVM: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS It's easy to confuse KVM_MEMORY_SLOTS and KVM_MEM_SLOTS_NUM. One is the user accessible slots and the other is user + private. Make this more obvious. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-13 23:21:57 -02:00
Linus Torvalds	66cdd0ceaf	Merge tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Marcelo Tosatti: "Considerable KVM/PPC work, x86 kvmclock vsyscall support, IA32_TSC_ADJUST MSR emulation, amongst others." Fix up trivial conflict in kernel/sched/core.c due to cross-cpu migration notifier added next to rq migration call-back. * tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (156 commits) KVM: emulator: fix real mode segment checks in address linearization VMX: remove unneeded enable_unrestricted_guest check KVM: VMX: fix DPL during entry to protected mode x86/kexec: crash_vmclear_local_vmcss needs __rcu kvm: Fix irqfd resampler list walk KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessary KVM: MMU: optimize for set_spte KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface KVM: PPC: bookehv: Add EPCR support in mtspr/mfspr emulation KVM: PPC: bookehv: Add guest computation mode for irq delivery KVM: PPC: Make EPCR a valid field for booke64 and bookehv KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit KVM: PPC: e500: Mask MAS2 EPN high 32-bits in 32/64 tlbwe emulation KVM: PPC: Mask ea's high 32-bits in 32/64 instr emulation KVM: PPC: e500: Add emulation helper for getting instruction ea KVM: PPC: bookehv64: Add support for interrupt handling KVM: PPC: bookehv: Remove GET_VCPU macro from exception handler KVM: PPC: booke: Fix get_tb() compile error on 64-bit KVM: PPC: e500: Silence bogus GCC warning in tlb code ...	2012-12-13 15:31:08 -08:00
Linus Torvalds	896ea17d3d	Features: - Add necessary infrastructure to make balloon driver work under ARM. - Add /dev/xen/privcmd interfaces to work with ARM and PVH. - Improve Xen PCIBack wild-card parsing. - Add Xen ACPI PAD (Processor Aggregator) support - so can offline/online sockets depending on the power consumption. - PVHVM + kexec = use an E820_RESV region for the shared region so we don't overwrite said region during kexec reboot. - Cleanups, compile fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJQyJaAAAoJEFjIrFwIi8fJ9DoIALAjj3qaGDimykc/RPSu2MLL Tfchb1su0WxSu6fP17jBadq39Qna85UzZATMCyN47k8wB3KoSEW13rqwe7JSsdT/ SEfZDrlbhNK+JAWJETx+6gq7J7dMwi/tFt4CbwPv/zAHb7C7JyzEgKctbi4Q1e89 FFMXZru2IWDbaqlcJQjJcE/InhWy5vKW3bY5nR/Bz0RBf9lk/WHbcJwLXirsDcKk uMVmPy4yiApX6ZCPbYP5BZvsIFkmLKQEfpmwdzbLGDoL7N1onqq/lgYNgZqPJUkE XL1GVBbRGpy+NQr++vUS1NiRyR81EChRO3IrDZwzvNEPqKa9GoF5U1CdRh71R5I= =uZQZ -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen updates from Konrad Rzeszutek Wilk: - Add necessary infrastructure to make balloon driver work under ARM. - Add /dev/xen/privcmd interfaces to work with ARM and PVH. - Improve Xen PCIBack wild-card parsing. - Add Xen ACPI PAD (Processor Aggregator) support - so can offline/ online sockets depending on the power consumption. - PVHVM + kexec = use an E820_RESV region for the shared region so we don't overwrite said region during kexec reboot. - Cleanups, compile fixes. Fix up some trivial conflicts due to the balloon driver now working on ARM, and there were changes next to the previous work-arounds that are now gone. * tag 'stable/for-linus-3.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/PVonHVM: fix compile warning in init_hvm_pv_info xen: arm: implement remap interfaces needed for privcmd mappings. xen: correctly use xen_pfn_t in remap_domain_mfn_range. xen: arm: enable balloon driver xen: balloon: allow PVMMU interfaces to be compiled out xen: privcmd: support autotranslated physmap guests. xen: add pages parameter to xen_remap_domain_mfn_range xen/acpi: Move the xen_running_on_version_or_later function. xen/xenbus: Remove duplicate inclusion of asm/xen/hypervisor.h xen/acpi: Fix compile error by missing decleration for xen_domain. xen/acpi: revert pad config check in xen_check_mwait xen/acpi: ACPI PAD driver xen-pciback: reject out of range inputs xen-pciback: simplify and tighten parsing of device IDs xen PVonHVM: use E820_Reserved area for shared_info	2012-12-13 14:29:16 -08:00
Linus Torvalds	193c0d6825	PCI changes for the v3.8 merge window: Host bridge hotplug: - Untangle _PRT from struct pci_bus (Bjorn Helgaas) - Request _OSC control before scanning root bus (Taku Izumi) - Assign resources when adding host bridge (Yinghai Lu) - Remove root bus when removing host bridge (Yinghai Lu) - Remove _PRT during hot remove (Yinghai Lu) SRIOV - Add sysfs knobs to control numVFs (Don Dutile) Power management - Notify devices when power resource turned on (Huang Ying) Bug fixes - Work around broken _SEG on HP xw9300 (Bjorn Helgaas) - Keep runtime PM enabled for unbound PCI devices (Huang Ying) - Fix Optimus dual-GPU runtime D3 suspend issue (Dave Airlie) - Fix xen frontend shutdown issue (David Vrabel) - Work around PLX PCI 9050 BAR alignment erratum (Ian Abbott) Miscellaneous - Add GPL license for drivers/pci/ioapic (Andrew Cooks) - Add standard PCI-X, PCIe ASPM register #defines (Bjorn Helgaas) - NumaChip remote PCI support (Daniel Blueman) - Fix PCIe Link Capabilities Supported Link Speed definition (Jingoo Han) - Convert dev_printk() to dev_info(), etc (Joe Perches) - Add support for non PCI BAR ROM data (Matthew Garrett) - Add x86 support for host bridge translation offset (Mike Yoknis) - Report success only when every driver supports AER (Vijay Pandarathil) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAABAgAGBQJQyKwSAAoJEPGMOI97Hn6zScgQAJZK2VDfCv74mKrgSDNokIzH 5nVDrc9AHKJm7CUODs6keJK5d4TD/za3Zao68zrYHsJJKes2ni2Z3W34HP2RXKK2 eOmePXOHYPPZMlimP9r9cVxNu1ZJCyp/yWSBcsPF4zUgWhBWLRaSj85I049gQ0sz +05nZYfLjVd3HNiaXsG4CQyMrNF46XEsLhF9vs+Nr2GHPwrpzhfScgYv63oDS86C 3ICKsjmiRUZcNelxIFYmyxa5u89QdW5XHjzc9eHGQuus24Vxw+TZzsdfc17sUJEE HTyXY+RjDpOVhdtwwUjrCEOiyZYvy3g9+3sKxoxgt/76ghdUaR7fxITwB97qVMFD T0ESlKjSV/Qv5QYdyy5uP4zwNs/PXCWXkTg/L1m71F30BxKWDa7tgiA6uK7Z7fl5 1aokKBdk3mtJJJIDJG1YkxPXx/JItTGCNYrx7CcFj49rSjrUWLQdmrYahersRIsB 3wiD2xTi9e4dXeP/+VGzGOWB/sHk+73jvrvZe/REa1FCnMINDz4+9V9WaGROMqyq MQ8kX0KfYcNVNxy1GOXjU5wLpMN/t/QbvI7gwzRP1DAUCJPoOgFy7AjvSTVG3zuy 8CtdOFttVkUn5dqsbQR0gVbyQVTS3PGSKz5XC/s8kVDWhja0xZTBYwrskM/4zdSD Xf48OyYV5EjpC3FYUSiU =OE3Q -----END PGP SIGNATURE----- Merge tag 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI update from Bjorn Helgaas: "Host bridge hotplug: - Untangle _PRT from struct pci_bus (Bjorn Helgaas) - Request _OSC control before scanning root bus (Taku Izumi) - Assign resources when adding host bridge (Yinghai Lu) - Remove root bus when removing host bridge (Yinghai Lu) - Remove _PRT during hot remove (Yinghai Lu) SRIOV - Add sysfs knobs to control numVFs (Don Dutile) Power management - Notify devices when power resource turned on (Huang Ying) Bug fixes - Work around broken _SEG on HP xw9300 (Bjorn Helgaas) - Keep runtime PM enabled for unbound PCI devices (Huang Ying) - Fix Optimus dual-GPU runtime D3 suspend issue (Dave Airlie) - Fix xen frontend shutdown issue (David Vrabel) - Work around PLX PCI 9050 BAR alignment erratum (Ian Abbott) Miscellaneous - Add GPL license for drivers/pci/ioapic (Andrew Cooks) - Add standard PCI-X, PCIe ASPM register #defines (Bjorn Helgaas) - NumaChip remote PCI support (Daniel Blueman) - Fix PCIe Link Capabilities Supported Link Speed definition (Jingoo Han) - Convert dev_printk() to dev_info(), etc (Joe Perches) - Add support for non PCI BAR ROM data (Matthew Garrett) - Add x86 support for host bridge translation offset (Mike Yoknis) - Report success only when every driver supports AER (Vijay Pandarathil)" Fix up trivial conflicts. * tag 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (48 commits) PCI: Use phys_addr_t for physical ROM address x86/PCI: Add NumaChip remote PCI support ath9k: Use standard #defines for PCIe Capability ASPM fields iwlwifi: Use standard #defines for PCIe Capability ASPM fields iwlwifi: collapse wrapper for pcie_capability_read_word() iwlegacy: Use standard #defines for PCIe Capability ASPM fields iwlegacy: collapse wrapper for pcie_capability_read_word() cxgb3: Use standard #defines for PCIe Capability ASPM fields PCI: Add standard PCIe Capability Link ASPM field names PCI/portdrv: Use PCI Express Capability accessors PCI: Use standard PCIe Capability Link register field names x86: Use PCI setup data PCI: Add support for non-BAR ROMs PCI: Add pcibios_add_device EFI: Stash ROMs if they're not in the PCI BAR PCI: Add and use standard PCI-X Capability register names PCI/PM: Keep runtime PM enabled for unbound PCI devices xen-pcifront: Handle backend CLOSED without CLOSING PCI: SRIOV control and status via sysfs (documentation) PCI/AER: Report success only when every device has AER-aware driver ...	2012-12-13 12:14:47 -08:00
Linus Torvalds	9977d9b379	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull big execve/kernel_thread/fork unification series from Al Viro: "All architectures are converted to new model. Quite a bit of that stuff is actually shared with architecture trees; in such cases it's literally shared branch pulled by both, not a cherry-pick. A lot of ugliness and black magic is gone (-3KLoC total in this one): - kernel_thread()/kernel_execve()/sys_execve() redesign. We don't do syscalls from kernel anymore for either kernel_thread() or kernel_execve(): kernel_thread() is essentially clone(2) with callback run before we return to userland, the callbacks either never return or do successful do_execve() before returning. kernel_execve() is a wrapper for do_execve() - it doesn't need to do transition to user mode anymore. As a result kernel_thread() and kernel_execve() are arch-independent now - they live in kernel/fork.c and fs/exec.c resp. sys_execve() is also in fs/exec.c and it's completely architecture-independent. - daemonize() is gone, along with its parts in fs/.c - struct pt_regs is no longer passed to do_fork/copy_process/ copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump. - sys_fork()/sys_vfork()/sys_clone() unified; some architectures still need wrappers (ones with callee-saved registers not saved in pt_regs on syscall entry), but the main part of those suckers is in kernel/fork.c now." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits) do_coredump(): get rid of pt_regs argument print_fatal_signal(): get rid of pt_regs argument ptrace_signal(): get rid of unused arguments get rid of ptrace_signal_deliver() arguments new helper: signal_pt_regs() unify default ptrace_signal_deliver flagday: kill pt_regs argument of do_fork() death to idle_regs() don't pass regs to copy_process() flagday: don't pass regs to copy_thread() bfin: switch to generic vfork, get rid of pointless wrappers xtensa: switch to generic clone() openrisc: switch to use of generic fork and clone unicore32: switch to generic clone(2) score: switch to generic fork/vfork/clone c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone() take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h mn10300: switch to generic fork/vfork/clone h8300: switch to generic fork/vfork/clone tile: switch to generic clone() ... Conflicts: arch/microblaze/include/asm/Kbuild	2012-12-12 12:22:13 -08:00
Linus Torvalds	1ebaf4f4e6	Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 timer update from Ingo Molnar: "This tree includes HPET fixes and also implements a calibration-free, TSC match driven APIC timer interrupt mode: 'TSC deadline mode' supported in SandyBridge and later CPUs." * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: hpet: Fix inverted return value check in arch_setup_hpet_msi() x86: hpet: Fix masking of MSI interrupts x86: apic: Use tsc deadline for oneshot when available	2012-12-11 20:01:33 -08:00
Linus Torvalds	743aa456c1	Merge branch 'x86-nuke386-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull "Nuke 386-DX/SX support" from Ingo Molnar: "This tree removes ancient-386-CPUs support and thus zaps quite a bit of complexity: 24 files changed, 56 insertions(+), 425 deletions(-) ... which complexity has plagued us with extra work whenever we wanted to change SMP primitives, for years. Unfortunately there's a nostalgic cost: your old original 386 DX33 system from early 1991 won't be able to boot modern Linux kernels anymore. Sniff." I'm not sentimental. Good riddance. * 'x86-nuke386-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, 386 removal: Document Nx586 as a 386 and thus unsupported x86, cleanups: Simplify sync_core() in the case of no CPUID x86, 386 removal: Remove CONFIG_X86_POPAD_OK x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK x86, 386 removal: Remove CONFIG_INVLPG x86, 386 removal: Remove CONFIG_BSWAP x86, 386 removal: Remove CONFIG_XADD x86, 386 removal: Remove CONFIG_CMPXCHG x86, 386 removal: Remove CONFIG_M386 from Kconfig	2012-12-11 19:59:32 -08:00
Linus Torvalds	a05a4e24dc	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 topology discovery improvements from Ingo Molnar: "These changes improve topology discovery on AMD CPUs. Right now this feeds information displayed in /sys/devices/system/cpu/cpuX/cache/indexY/* - but in the future we could use this to set up a better scheduling topology." * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cacheinfo: Base cache sharing info on CPUID 0x8000001d on AMD x86, cacheinfo: Make use of CPUID 0x8000001d for cache information on AMD x86, cacheinfo: Determine number of cache leafs using CPUID 0x8000001d on AMD x86: Add cpu_has_topoext	2012-12-11 19:58:29 -08:00
Linus Torvalds	74b8423345	Merge branch 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 BSP hotplug changes from Ingo Molnar: "This tree enables CPU#0 (the boot processor) to be onlined/offlined on x86, just like any other CPU. Enabled on Intel CPUs for now. Allowing this required the identification and fixing of latent CPU#0 assumptions (such as CPU#0 initializations, etc.) in the x86 architecture code, plus the identification of barriers to BSP-offlining, such as active PIC interrupts which can only be serviced on the BSP. It's behind a default-off option, and there's a debug option that allows the automatic testing of this feature. The motivation of this feature is to allow and prepare for true CPU-hotplug hardware support: recent changes to MCE support enable us to detect a deteriorating but not yet hard-failing L1/L2 cache on a CPU that could be soft-unplugged - or a failing L3 cache on a multi-socket system. Note that true hardware hot-plug is not yet fully enabled by this, because that requires a special platform wakeup sequence to be sent to the freshly powered up CPU#0. Future patches for this are planned, once such a platform exists. Chicken and egg" * 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, topology: Debug CPU0 hotplug x86/i387.c: Initialize thread xstate only on CPU0 only once x86, hotplug: Handle retrigger irq by the first available CPU x86, hotplug: The first online processor saves the MTRR state x86, hotplug: During CPU0 online, enable x2apic, set_numa_node. x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI x86-32, hotplug: Add start_cpu0() entry point to head_32.S x86-64, hotplug: Add start_cpu0() entry point to head_64.S kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback x86, hotplug, suspend: Online CPU0 for suspend or hibernate x86, hotplug: Support functions for CPU0 online/offline x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it x86, Kconfig: Add config switch for CPU0 hotplug doc: Add x86 CPU0 online/offline feature	2012-12-11 19:56:33 -08:00
Linus Torvalds	0019fab355	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm changes from Ingo Molnar: "Two fixlets and a cleanup." * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86_32: Return actual stack when requesting sp from regs x86: Don't clobber top of pt_regs in nested NMI x86/asm: Clean up copy_page_*() comments and code	2012-12-11 19:55:20 -08:00
Linus Torvalds	090f8ccba3	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Lots of activity: 211 files changed, 8328 insertions(+), 4116 deletions(-) most of it on the tooling side. Main changes: * ftrace enhancements and fixes from Steve Rostedt. * uprobes fixes, cleanups and preparation for the ARM port from Oleg Nesterov. * UAPI fixes, from David Howels - prepares the arch/x86 UAPI transition * Separate perf tests into multiple objects, one per test, from Jiri Olsa. * Make hardware event translations available in sysfs, from Jiri Olsa. * Fixes to /proc/pid/maps parsing, preparatory to supporting data maps, from Namhyung Kim * Implement ui_progress for GTK, from Namhyung Kim * Add framework for automated perf_event_attr tests, where tools with different command line options will be run from a 'perf test', via python glue, and the perf syscall will be intercepted to verify that the perf_event_attr fields set by the tool are those expected, from Jiri Olsa * Add a 'link' method for hists, so that we can have the leader with buckets for all the entries in all the hists. This new method is now used in the default 'diff' output, making the sum of the 'baseline' column be 100%, eliminating blind spots. * libtraceevent fixes for compiler warnings trying to make perf it build on some distros, like fedora 14, 32-bit, some of the warnings really pointed to real bugs. * Add a browser for 'perf script' and make it available from the report and annotate browsers. It does filtering to find the scripts that handle events found in the perf.data file used. From Feng Tang * perf inject changes to allow showing where a task sleeps, from Andrew Vagin. * Makefile improvements from Namhyung Kim. * Add --pre and --post command hooks in 'stat', from Peter Zijlstra. * Don't stop synthesizing threads when one vanishes, this is for the existing threads when we start a tool like trace. * Use sched:sched_stat_runtime to provide a thread summary, this produces the same output as the 'trace summary' subcommand of tglx's original "trace" tool. * Support interrupted syscalls in 'trace' * Add an event duration column and filter in 'trace'. * There are references to the man pages in some tools, so try to build Documentation when installing, warning the user if that is not possible, from Borislav Petkov. * Give user better message if precise is not supported, from David Ahern. * Try to find cross-built objdump path by using the session environment information in the perf.data file header, from Irina Tirdea, original patch and idea by Namhyung Kim. * Diplays more output on features check for make V=1, so that one can figure out what is happening by looking at gcc output, etc. From Jiri Olsa. * Add on_exit implementation for systems without one, e.g. Android, from Bernhard Rosenkraenzer. * Only process events for vcpus of interest, helps handling large number of events, from David Ahern. * Cross compilation fixes for Android, from Irina Tirdea. * Add documentation on compiling for Android, from Irina Tirdea. * perf diff improvements from Jiri Olsa. * Target (task/user/cpu/syswide) handling improvements, from Namhyung Kim. * Add support in 'trace' for tracing workload given by command line, from Namhyung Kim. * ... and much more." * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (194 commits) uprobes: Use percpu_rw_semaphore to fix register/unregister vs dup_mmap() race perf evsel: Introduce is_group_member method perf powerpc: Use uapi/unistd.h to fix build error tools: Pass the target in descend tools: Honour the O= flag when tool build called from a higher Makefile tools: Define a Makefile function to do subdir processing perf ui: Always compile browser setup code perf ui: Add ui_progress__finish() perf ui gtk: Implement ui_progress functions perf ui: Introduce generic ui_progress helper perf ui tui: Move progress.c under ui/tui directory perf tools: Add basic event modifier sanity check perf tools: Omit group members from perf_evlist__disable/enable perf tools: Ensure single disable call per event in record comand perf tools: Fix 'disabled' attribute config for record command perf tools: Fix attributes for '{}' defined event groups perf tools: Use sscanf for parsing /proc/pid/maps perf tools: Add gtk.<command> config option for launching GTK browser perf tools: Fix compile error on NO_NEWT=1 build perf hists: Initialize all of he->stat with zeroes ...	2012-12-11 18:14:31 -08:00
Linus Torvalds	37ea95a959	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU update from Ingo Molnar: "The major features of this tree are: 1. A first version of no-callbacks CPUs. This version prohibits offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y. Relaxing this constraint is in progress, but not yet ready for prime time. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/724. 2. Changes to SRCU that allows statically initialized srcu_struct structures. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/296. 3. Restructuring of RCU's debugfs output. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/341. 4. Additional CPU-hotplug/RCU improvements, posted to LKML at https://lkml.org/lkml/2012/10/30/327. Note that the commit eliminating __stop_machine() was judged to be too-high of risk, so is deferred to 3.9. 5. Changes to RCU's idle interface, most notably a new module parameter that redirects normal grace-period operations to their expedited equivalents. These were posted to LKML at https://lkml.org/lkml/2012/10/30/739. 6. Additional diagnostics for RCU's CPU stall warning facility, posted to LKML at https://lkml.org/lkml/2012/10/30/315. The most notable change reduces the default RCU CPU stall-warning time from 60 seconds to 21 seconds, so that it once again happens sooner than the softlockup timeout. 7. Documentation updates, which were posted to LKML at https://lkml.org/lkml/2012/10/30/280. A couple of late-breaking changes were posted at https://lkml.org/lkml/2012/11/16/634 and https://lkml.org/lkml/2012/11/16/547. 8. Miscellaneous fixes, which were posted to LKML at https://lkml.org/lkml/2012/10/30/309. 9. Finally, a fix for an lockdep-RCU splat was posted to LKML at https://lkml.org/lkml/2012/11/7/486." * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits) context_tracking: New context tracking susbsystem sched: Mark RCU reader in sched_show_task() rcu: Separate accounting of callbacks from callback-free CPUs rcu: Add callback-free CPUs rcu: Add documentation for the new rcuexp debugfs trace file rcu: Update documentation for TREE_RCU debugfs tracing rcu: Reduce default RCU CPU stall warning timeout rcu: Fix TINY_RCU rcu_is_cpu_rrupt_from_idle check rcu: Clarify memory-ordering properties of grace-period primitives rcu: Add new rcutorture module parameters to start/end test messages rcu: Remove list_for_each_continue_rcu() rcu: Fix batch-limit size problem rcu: Add tracing for synchronize_sched_expedited() rcu: Remove old debugfs interfaces and also RCU flavor name rcu: split 'rcuhier' to each flavor rcu: split 'rcugp' to each flavor rcu: split 'rcuboost' to each flavor rcu: split 'rcubarrier' to each flavor rcu: Fix tracing formatting rcu: Remove the interface "rcudata.csv" ...	2012-12-11 18:10:49 -08:00
Linus Torvalds	608ff1a210	Merge branch 'akpm' (Andrew's patchbomb) Merge misc updates from Andrew Morton: "About half of most of MM. Going very early this time due to uncertainty over the coreautounifiednumasched things. I'll send the other half of most of MM tomorrow. The rest of MM awaits a slab merge from Pekka." * emailed patches from Andrew Morton: (71 commits) memory_hotplug: ensure every online node has NORMAL memory memory_hotplug: handle empty zone when online_movable/online_kernel mm, memory-hotplug: dynamic configure movable memory and portion memory drivers/base/node.c: cleanup node_state_attr[] bootmem: fix wrong call parameter for free_bootmem() avr32, kconfig: remove HAVE_ARCH_BOOTMEM mm: cma: remove watermark hacks mm: cma: skip watermarks check for already isolated blocks in split_free_page() mm, oom: fix race when specifying a thread as the oom origin mm, oom: change type of oom_score_adj to short mm: cleanup register_node() mm, mempolicy: remove duplicate code mm/vmscan.c: try_to_freeze() returns boolean mm: introduce putback_movable_pages() virtio_balloon: introduce migration primitives to balloon pages mm: introduce compaction and migration for ballooned pages mm: introduce a common interface for balloon pages mobility mm: redefine address_space.assoc_mapping mm: adjust address_space_operations.migratepage() return code arch/sparc/kernel/sys_sparc_64.c: s/COLOUR/COLOR/ ...	2012-12-11 18:05:37 -08:00
Michel Lespinasse	f99024729e	mm: use vm_unmapped_area() on x86_64 architecture Update the x86_64 arch_get_unmapped_area[_topdown] functions to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-11 17:22:25 -08:00
Andi Kleen	42d7395feb	mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB There was some desire in large applications using MAP_HUGETLB or SHM_HUGETLB to use 1GB huge pages on some mappings, and stay with 2MB on others. This is useful together with NUMA policy: use 2MB interleaving on some mappings, but 1GB on local mappings. This patch extends the IPC/SHM syscall interfaces slightly to allow specifying the page size. It borrows some upper bits in the existing flag arguments and allows encoding the log of the desired page size in addition to the *_HUGETLB flag. When 0 is specified the default size is used, this makes the change fully compatible. Extending the internal hugetlb code to handle this is straight forward. Instead of a single mount it just keeps an array of them and selects the right mount based on the specified page size. When no page size is specified it uses the mount of the default page size. The change is not visible in /proc/mounts because internal mounts don't appear there. It also has very little overhead: the additional mounts just consume a super block, but not more memory when not used. I also exported the new flags to the user headers (they were previously under __KERNEL__). Right now only symbols for x86 and some other architecture for 1GB and 2MB are defined. The interface should already work for all other architectures though. Only architectures that define multiple hugetlb sizes actually need it (that is currently x86, tile, powerpc). However tile and powerpc have user configurable hugetlb sizes, so it's not easy to add defines. A program on those architectures would need to query sysfs and use the appropiate log2. [akpm@linux-foundation.org: cleanups] [rientjes@google.com: fix build] [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-11 17:22:25 -08:00
Zhang Yanfei	0ca0d818cb	x86/kexec: crash_vmclear_local_vmcss needs __rcu This removes the sparse warning: arch/x86/kernel/crash.c:49:32: sparse: incompatible types in comparison expression (different address spaces) Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-11 19:55:23 -02:00
Linus Torvalds	bad73c5aa0	ACPI and power management updates for 3.8-rc1 * Introduction of device PM QoS flags. * ACPI device power management update allowing subsystems other than PCI to use it more easily. * ACPI device enumeration rework allowing additional kinds of devices to be enumerated via ACPI. From Mika Westerberg, Adrian Hunter, Mathias Nyman, Andy Shevchenko, and Rafael J. Wysocki. * ACPICA update to version 20121018 from Bob Moore and Lv Zheng. * ACPI memory hotplug update from Wen Congyang and Yasuaki Ishimatsu. * Introduction of acpi_handle_<level>() messaging macros and ACPI-based CPU hot-remove support from Toshi Kani. * ACPI EC updates from Feng Tang. * cpufreq updates from Viresh Kumar, Fabio Baltieri and others. * cpuidle changes to quickly notice governor prediction failure from Youquan Song. * Support for using multiple cpuidle drivers at the same time and cpuidle cleanups from Daniel Lezcano. * devfreq updates from Nishanth Menon and others. * cpupower update from Thomas Renninger. * Fixes and small cleanups all over the place. -- -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJQxxevAAoJEKhOf7ml8uNsHacQAK2xoQozDddPBAaTCf1OWW/G J4E2qMn+gy4AtFmQ0/xeZVvvylOCn9eu/kv3QJ/bFlVoUsqTgfPwQBjT6HjF1Acn 7yVFdr9H/tn2wi9nW2Gv6Tl2Hr/kFLpo0f5Kd40Z+P8SV5sKX5ktJcVpJ/I/P4Vh Qw2nWtj7hQktZDERzgG4ZpMbQToNhbLhbKWB9ad3/XQSSA9JkfgvBFgrTEGHcZD5 bwsggjNdOAWNGeDdzRsQSDn0Alld5ILVdSJ5xKimO1O70WvKc7fqA1IdYRIeKL90 yz4bcoYKzl9iktlw8+x5o1U9mrc8TFV5p4+zV+t5Z6pzS/J3kWvnsW4zu9sCrxFv Wic3SKyiem7s2dxrYyj4ZXAci3GK4ouRTrCLqk7/00tEGdwAQD1ZNfsUJp6jKayz FvtZUgItcOyrlQ6B4nh951OY6dI3AUYJ2NuWWNr5NZkgVAvQGV8zTGOImbeVeL2+ pMiw14zScO3ylYilVcjTKDDMj2sDZ68mw5PIcbmksvWsCLo26jDBVDtLVmtYWyd4 ek3WnOrQZr0R3agvOLLssMKXompvpP+N4Klf4rV+GtqGsWtHryYKys2Laju9FwFj yYLchxYlxhGTzqq8LjF90HDL0TWpPe6cPi+B5ow9g/SXLexbMKNQGhv3Jovm2yR3 j54tKBWy7e9AAYEDPirX =6OEP -----END PGP SIGNATURE----- Merge tag 'pm+acpi-for-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: - Introduction of device PM QoS flags. - ACPI device power management update allowing subsystems other than PCI to use it more easily. - ACPI device enumeration rework allowing additional kinds of devices to be enumerated via ACPI. From Mika Westerberg, Adrian Hunter, Mathias Nyman, Andy Shevchenko, and Rafael J. Wysocki. - ACPICA update to version 20121018 from Bob Moore and Lv Zheng. - ACPI memory hotplug update from Wen Congyang and Yasuaki Ishimatsu. - Introduction of acpi_handle_<level>() messaging macros and ACPI-based CPU hot-remove support from Toshi Kani. - ACPI EC updates from Feng Tang. - cpufreq updates from Viresh Kumar, Fabio Baltieri and others. - cpuidle changes to quickly notice governor prediction failure from Youquan Song. - Support for using multiple cpuidle drivers at the same time and cpuidle cleanups from Daniel Lezcano. - devfreq updates from Nishanth Menon and others. - cpupower update from Thomas Renninger. - Fixes and small cleanups all over the place. * tag 'pm+acpi-for-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (196 commits) mmc: sdhci-acpi: enable runtime-pm for device HID INT33C6 ACPI: add Haswell LPSS devices to acpi_platform_device_ids list ACPI: add documentation about ACPI 5 enumeration pnpacpi: fix incorrect TEST_ALPHA() test ACPI / PM: Fix header of acpi_dev_pm_detach() in acpi.h ACPI / video: ignore BIOS initial backlight value for HP Folio 13-2000 ACPI : do not use Lid and Sleep button for S5 wakeup ACPI / PNP: Do not crash due to stale pointer use during system resume ACPI / video: Add "Asus UL30VT" to ACPI video detect blacklist ACPI: do acpisleep dmi check when CONFIG_ACPI_SLEEP is set spi / ACPI: add ACPI enumeration support gpio / ACPI: add ACPI support PM / devfreq: remove compiler error with module governors (2) cpupower: IvyBridge (0x3a and 0x3e models) support cpupower: Provide -c param for cpupower monitor to schedule process on all cores cpupower tools: Fix warning and a bug with the cpu package count cpupower tools: Fix malloc of cpu_info structure cpupower tools: Fix issues with sysfs_topology_read_file cpupower tools: Fix minor warnings cpupower tools: Update .gitignore for files created in the debug directories ...	2012-12-11 12:45:35 -08:00
Andrea Arcangeli	be3a728427	mm: numa: pte_numa() and pmd_numa() Implement pte_numa and pmd_numa. We must atomically set the numa bit and clear the present bit to define a pte_numa or pmd_numa. Once a pte or pmd has been set as pte_numa or pmd_numa, the next time a thread touches a virtual address in the corresponding virtual range, a NUMA hinting page fault will trigger. The NUMA hinting page fault will clear the NUMA bit and set the present bit again to resolve the page fault. The expectation is that a NUMA hinting page fault is used as part of a placement policy that decides if a page should remain on the current node or migrated to a different node. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:42:36 +00:00
Andrea Arcangeli	dbe4d2035a	mm: numa: define _PAGE_NUMA The objective of _PAGE_NUMA is to be able to trigger NUMA hinting page faults to identify the per NUMA node working set of the thread at runtime. Arming the NUMA hinting page fault mechanism works similarly to setting up a mprotect(PROT_NONE) virtual range: the present bit is cleared at the same time that _PAGE_NUMA is set, so when the fault triggers we can identify it as a NUMA hinting page fault. _PAGE_NUMA on x86 shares the same bit number of _PAGE_PROTNONE (but it could also use a different bitflag, it's up to the architecture to decide). It would be confusing to call the "NUMA hinting page faults" as "do_prot_none faults". They're different events and _PAGE_NUMA doesn't alter the semantics of mprotect(PROT_NONE) in any way. Sharing the same bitflag with _PAGE_PROTNONE in fact complicates things: it requires us to ensure the code paths executed by _PAGE_PROTNONE remains mutually exclusive to the code paths executed by _PAGE_NUMA at all times, to avoid _PAGE_NUMA and _PAGE_PROTNONE to step into each other toes. Because we want to be able to set this bitflag in any established pte or pmd (while clearing the present bit at the same time) without losing information, this bitflag must never be set when the pte and pmd are present, so the bitflag picked for _PAGE_NUMA usage, must not be used by the swap entry format. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com>	2012-12-11 14:28:35 +00:00
Rik van Riel	2c3cf556b2	x86/mm: Introduce pte_accessible() We need pte_present to return true for _PAGE_PROTNONE pages, to indicate that the pte is associated with a page. However, for TLB flushing purposes, we would like to know whether the pte points to an actually accessible page. This allows us to skip remote TLB flushes for pages that are not actually accessible. Fill in this method for x86 and provide a safe (but slower) method on other architectures. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixed-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-66p11te4uj23gevgh4j987ip@git.kernel.org [ Added Linus's review fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-12-11 14:28:34 +00:00
Ingo Molnar	cc1b39dbf9	Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core Pull ftrace updates from Steve Rostedt. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-12-08 15:54:35 +01:00
Bjorn Helgaas	3ced69f8bb	Merge branch 'pci/daniel-numachip' into next * pci/daniel-numachip: x86/PCI: Add NumaChip remote PCI support	2012-12-07 14:27:33 -07:00
Daniel J Blueman	f9726bfd4b	x86/PCI: Add NumaChip remote PCI support Add NumaChip-specific PCI access mechanism via MMCONFIG cycles, but preventing access to AMD Northbridges which shouldn't respond. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2012-12-07 14:24:32 -07:00
Zhang Yanfei	f23d1f4a11	x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessary This patch provides a way to VMCLEAR VMCSs related to guests on all cpus before executing the VMXOFF when doing kdump. This is used to ensure the VMCSs in the vmcore updated and non-corrupted. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-06 18:25:36 +02:00
Matthew Garrett	dd5fc854de	EFI: Stash ROMs if they're not in the PCI BAR EFI provides support for providing PCI ROMs via means other than the ROM BAR. This support vanishes after we've exited boot services, so add support for stashing copies of the ROMs in setup_data if they're not otherwise available. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Seth Forshee <seth.forshee@canonical.com>	2012-12-05 14:33:26 -07:00
Zhang Xiantao	2b3c5cbc0d	kvm: don't use bit24 for detecting address-specific invalidation capability Bit24 in VMX_EPT_VPID_CAP_MASI is not used for address-specific invalidation capability reporting, so remove it from KVM to avoid conflicts in future. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-05 16:35:48 +02:00
Ingo Molnar	630e1e0bcd	Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Conflicts: arch/x86/kernel/ptrace.c Pull the latest RCU tree from Paul E. McKenney: " The major features of this series are: 1. A first version of no-callbacks CPUs. This version prohibits offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y. Relaxing this constraint is in progress, but not yet ready for prime time. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/724, and are at branch rcu/nocb. 2. Changes to SRCU that allows statically initialized srcu_struct structures. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/296, and are at branch rcu/srcu. 3. Restructuring of RCU's debugfs output. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/341, and are at branch rcu/tracing. 4. Additional CPU-hotplug/RCU improvements, posted to LKML at https://lkml.org/lkml/2012/10/30/327, and are at branch rcu/hotplug. Note that the commit eliminating __stop_machine() was judged to be too-high of risk, so is deferred to 3.9. 5. Changes to RCU's idle interface, most notably a new module parameter that redirects normal grace-period operations to their expedited equivalents. These were posted to LKML at https://lkml.org/lkml/2012/10/30/739, and are at branch rcu/idle. 6. Additional diagnostics for RCU's CPU stall warning facility, posted to LKML at https://lkml.org/lkml/2012/10/30/315, and are at branch rcu/stall. The most notable change reduces the default RCU CPU stall-warning time from 60 seconds to 21 seconds, so that it once again happens sooner than the softlockup timeout. 7. Documentation updates, which were posted to LKML at https://lkml.org/lkml/2012/10/30/280, and are at branch rcu/doc. A couple of late-breaking changes were posted at https://lkml.org/lkml/2012/11/16/634 and https://lkml.org/lkml/2012/11/16/547. 8. Miscellaneous fixes, which were posted to LKML at https://lkml.org/lkml/2012/10/30/309, along with a late-breaking change posted at Fri, 16 Nov 2012 11:26:25 -0800 with message-ID <20121116192625.GA447@linux.vnet.ibm.com>, but which lkml.org seems to have missed. These are at branch rcu/fixes. 9. Finally, a fix for an lockdep-RCU splat was posted to LKML at https://lkml.org/lkml/2012/11/7/486. This is at rcu/next. " Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-12-03 06:27:05 +01:00
Linus Torvalds	455e987c0c	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This is mostly about unbreaking architectures that took the UAPI changes in the v3.7 cycle, plus misc fixes." * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf kvm: Fix building perf kvm on non x86 arches perf kvm: Rename perf_kvm to perf_kvm_stat perf: Make perf build for x86 with UAPI disintegration applied perf powerpc: Use uapi/unistd.h to fix build error tools: Pass the target in descend tools: Honour the O= flag when tool build called from a higher Makefile tools: Define a Makefile function to do subdir processing x86: Export asm/{svm.h,vmx.h,perf_regs.h} perf tools: Fix strbuf_addf() when the buffer needs to grow perf header: Fix numa topology printing perf, powerpc: Fix hw breakpoints returning -ENOSPC	2012-12-01 13:07:48 -08:00
Vincent Palatin	644c154186	x86, fpu: Avoid FPU lazy restore after suspend When a cpu enters S3 state, the FPU state is lost. After resuming for S3, if we try to lazy restore the FPU for a process running on the same CPU, this will result in a corrupted FPU context. Ensure that "fpu_owner_task" is properly invalided when (re-)initializing a CPU, so nobody will try to lazy restore a state which doesn't exist in the hardware. Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off, by doing thousands of suspend/resume cycles with 4 processes doing FPU operations running. Without the patch, a process is killed after a few hundreds cycles by a SIGFPE. Cc: Duncan Laurie <dlaurie@chromium.org> Cc: Olof Johansson <olofj@chromium.org> Cc: <stable@kernel.org> v3.4+ # for 3.4 need to replace this_cpu_write by percpu_write Signed-off-by: Vincent Palatin <vpalatin@chromium.org> Link: http://lkml.kernel.org/r/1354306532-1014-1-git-send-email-vpalatin@chromium.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-30 13:48:05 -08:00
Will Auld	ba904635d4	KVM: x86: Emulate IA32_TSC_ADJUST MSR CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control. However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations. The argument against storing it in the actual MSR is performance. This is likely to be seldom used while the save/restore is required on every transition. IA32_TSC_ADJUST was created as a way to solve some issues with writing TSC itself so that is not an option either. The remaining option, defined above as our solution has the problem of returning incorrect vmcs tsc_offset values (unless we intercept and fix, not done here) as mentioned above. However, more problematic is that storing the data in vmcs tsc_offset will have a different semantic effect on the system than does using the actual MSR. This is illustrated in the following example: The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest process performs a rdtsc. In this case the guest process will get TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics as seen by the guest do not and hence this will not cause a problem. Signed-off-by: Will Auld <will.auld@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-30 18:29:30 -02:00
Will Auld	8fe8ab46be	KVM: x86: Add code to track call origin for msr assignment In order to track who initiated the call (host or guest) to modify an msr value I have changed function call parameters along the call path. The specific change is to add a struct pointer parameter that points to (index, data, caller) information rather than having this information passed as individual parameters. The initial use for this capability is for updating the IA32_TSC_ADJUST msr while setting the tsc value. It is anticipated that this capability is useful for other tasks. Signed-off-by: Will Auld <will.auld@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-30 18:26:12 -02:00
Frederic Weisbecker	91d1aa43d3	context_tracking: New context tracking susbsystem Create a new subsystem that probes on kernel boundaries to keep track of the transitions between level contexts with two basic initial contexts: user or kernel. This is an abstraction of some RCU code that use such tracking to implement its userspace extended quiescent state. We need to pull this up from RCU into this new level of indirection because this tracking is also going to be used to implement an "on demand" generic virtual cputime accounting. A necessary step to shutdown the tick while still accounting the cputime. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Gilad Ben-Yossef <gilad@benyossef.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> [ paulmck: fix whitespace error and email address. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-11-30 11:40:07 -08:00
H. Peter Anvin	45c39fb0cc	x86, cleanups: Simplify sync_core() in the case of no CPUID Simplify the implementation of sync_core() for the case where we may not have the CPUID instruction available. [ v2: stylistic cleanup of the #else clause per suggestion by Borislav Petkov. ] Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1354132230-21854-9-git-send-email-hpa@linux.intel.com Cc: Borislav Petkov <bp@alien8.de>	2012-11-29 13:25:39 -08:00
H. Peter Anvin	a5c2a893db	x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK All 486+ CPUs support WP in supervisor mode, so remove the fallback 386 support code. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1354132230-21854-7-git-send-email-hpa@linux.intel.com	2012-11-29 13:23:03 -08:00
H. Peter Anvin	094ab1db7c	x86, 386 removal: Remove CONFIG_INVLPG All 486+ CPUs support INVLPG, so remove the fallback 386 support code. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1354132230-21854-6-git-send-email-hpa@linux.intel.com	2012-11-29 13:23:02 -08:00
H. Peter Anvin	e5bb8ad862	x86, 386 removal: Remove CONFIG_BSWAP All 486+ CPUs support BSWAP, so remove the fallback 386 support code. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1354132230-21854-5-git-send-email-hpa@linux.intel.com	2012-11-29 13:23:02 -08:00
H. Peter Anvin	7ac468b130	x86, 386 removal: Remove CONFIG_XADD All 486+ CPUs support XADD, so remove the fallback 386 support code. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1354132230-21854-4-git-send-email-hpa@linux.intel.com	2012-11-29 13:23:02 -08:00
H. Peter Anvin	d55c5a93db	x86, 386 removal: Remove CONFIG_CMPXCHG All 486+ CPUs support CMPXCHG, so remove the fallback 386 support code. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1354132230-21854-3-git-send-email-hpa@linux.intel.com	2012-11-29 13:23:01 -08:00
H. Peter Anvin	eb068e7810	x86, 386 removal: Remove CONFIG_M386 from Kconfig Remove the CONFIG_M386 symbol from Kconfig so that it cannot be selected. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1354132230-21854-2-git-send-email-hpa@linux.intel.com	2012-11-29 13:23:01 -08:00
Rafael J. Wysocki	bcacbdbdc8	Merge branch 'acpi-enumeration' * acpi-enumeration: ACPI: remove unnecessary INIT_LIST_HEAD ACPI / platform: include missed header into acpi_platform.c platform / ACPI: Attach/detach ACPI PM during probe/remove/shutdown mmc: sdhci-acpi: add SDHCI ACPI driver ACPI: add SDHCI to ACPI platform devices ACPI / PNP: skip ACPI device nodes associated with physical nodes already i2c / ACPI: add ACPI enumeration support ACPI / platform: Initialize ACPI handles of platform devices in advance ACPI / driver core: Introduce struct acpi_dev_node and related macros ACPI: Allow ACPI handles of devices to be initialized in advance ACPI / resources: Use AE_CTRL_TERMINATE to terminate resources walks ACPI: Centralized processing of ACPI device resources ACPI / platform: Use common ACPI device resource parsing routines ACPI: Move device resources interpretation code from PNP to ACPI core ACPI / platform: use ACPI device name instead of _HID._UID ACPI: Add support for platform bus type ACPI / ia64: Export acpi_[un]register_gsi() ACPI / x86: Export acpi_[un]register_gsi() ACPI: Provide generic functions for matching ACPI device nodes driver core / ACPI: Move ACPI support to core device and driver types	2012-11-29 21:41:25 +01:00
Ian Campbell	f832da068b	xen: arm: implement remap interfaces needed for privcmd mappings. We use XENMEM_add_to_physmap_range which is the preferred interface for foreign mappings. Acked-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-11-29 14:00:19 +00:00
Al Viro	4f4202fe5a	unify default ptrace_signal_deliver Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-11-29 00:01:23 -05:00
Al Viro	18c26c27ae	death to idle_regs() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-11-28 23:43:42 -05:00
Al Viro	24465a40ba	take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h now it can be done... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-11-28 23:43:27 -05:00
Al Viro	1d4b4b2994	x86, um: switch to generic fork/vfork/clone Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-11-28 22:13:44 -05:00
Al Viro	6b94631f9e	consolidate sys_execve() prototype Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-11-28 21:53:35 -05:00
Marcelo Tosatti	b48aa97e38	KVM: x86: require matched TSC offsets for master clock With master clock, a pvclock clock read calculates: ret = system_timestamp + [ (rdtsc + tsc_offset) - tsc_timestamp ] Where 'rdtsc' is the host TSC. system_timestamp and tsc_timestamp are unique, one tuple per VM: the "master clock". Given a host with synchronized TSCs, its obvious that guest TSC must be matched for the above to guarantee monotonicity. Allow master clock usage only if guest TSCs are synchronized. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:15 -02:00
Marcelo Tosatti	d828199e84	KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag KVM added a global variable to guarantee monotonicity in the guest. One of the reasons for that is that the time between 1. ktime_get_ts(&timespec); 2. rdtscll(tsc); Is variable. That is, given a host with stable TSC, suppose that two VCPUs read the same time via ktime_get_ts() above. The time required to execute 2. is not the same on those two instances executing in different VCPUS (cache misses, interrupts...). If the TSC value that is used by the host to interpolate when calculating the monotonic time is the same value used to calculate the tsc_timestamp value stored in the pvclock data structure, and a single <system_timestamp, tsc_timestamp> tuple is visible to all vcpus simultaneously, this problem disappears. See comment on top of pvclock_update_vm_gtod_copy for details. Monotonicity is then guaranteed by synchronicity of the host TSCs and guest TSCs. Set TSC stable pvclock flag in that case, allowing the guest to read clock from userspace. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:13 -02:00
Marcelo Tosatti	886b470cb1	KVM: x86: pass host_tsc to read_l1_tsc Allow the caller to pass host tsc value to kvm_x86_ops->read_l1_tsc(). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:11 -02:00
Marcelo Tosatti	51c19b4f59	x86: vdso: pvclock gettime support Improve performance of time system calls when using Linux pvclock, by reading time info from fixmap visible copy of pvclock data. Originally from Jeremy Fitzhardinge. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:11 -02:00
Marcelo Tosatti	3dc4f7cfb7	x86: kvm guest: pvclock vsyscall support Hook into generic pvclock vsyscall code, with the aim to allow userspace to have visibility into pvclock data. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:10 -02:00
Marcelo Tosatti	71056ae22d	x86: pvclock: generic pvclock vsyscall initialization Originally from Jeremy Fitzhardinge. Introduce generic, non hypervisor specific, pvclock initialization routines. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:09 -02:00
Marcelo Tosatti	189e11731a	x86: pvclock: add note about rdtsc barriers As noted by Gleb, not advertising SSE2 support implies no RDTSC barriers. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:08 -02:00
Marcelo Tosatti	2697902be8	x86: pvclock: introduce helper to read flags Acked-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:08 -02:00
Marcelo Tosatti	dce2db0a35	x86: pvclock: create helper for pvclock data retrieval Originally from Jeremy Fitzhardinge. So code can be reused. Acked-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:07 -02:00
Linus Torvalds	2654ad44b5	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 arch fixes from Peter Anvin: "Here is a collection of fixes for 3.7-rc7. This is a superset of tglx' earlier pull request." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64: Fix ordering of CFI directives and recent ASM_CLAC additions x86, microcode, AMD: Add support for family 16h processors x86-32: Export kernel_stack_pointer() for modules x86-32: Fix invalid stack address while in softirq x86, efi: Fix processor-specific memcpy() build error x86: remove dummy long from EFI stub x86, mm: Correct vmflag test for checking VM_HUGETLB x86, amd: Disable way access filter on Piledriver CPUs x86/mce: Do not change worker's running cpu in cmci_rediscover(). x86/ce4100: Fix PCI configuration register access for devices without interrupts x86/ce4100: Fix reboot by forcing the reboot method to be KBD x86/ce4100: Fix pm_poweroff MAINTAINERS: Update email address for Robert Richter x86, microcode_amd: Change email addresses, MAINTAINERS entry MAINTAINERS: Change Boris' email address EDAC: Change Boris' email address x86, AMD: Change Boris' email address	2012-11-23 20:03:14 -10:00
Len Brown	3fc808aaa0	x86 power: define RAPL MSRs The Run Time Average Power Limiting interface is currently model specific, present on Sandy Bridge and Ivy Bridge processors. These #defines correspond to documentation in the latest "Intel® 64 and IA-32 Architectures Software Developer Manual", plus some typos in that document corrected. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org	2012-11-23 21:40:11 -05:00
Len Brown	9c63a650bb	tools/power/x86/turbostat: share kernel MSR #defines Now that turbostat is built in the kernel tree, it can share MSR #defines with the kernel. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org	2012-11-23 21:40:04 -05:00
Robert Richter	1022623842	x86-32: Fix invalid stack address while in softirq In 32 bit the stack address provided by kernel_stack_pointer() may point to an invalid range causing NULL pointer access or page faults while in NMI (see trace below). This happens if called in softirq context and if the stack is empty. The address at &regs->sp is then out of range. Fixing this by checking if regs and &regs->sp are in the same stack context. Otherwise return the previous stack pointer stored in struct thread_info. If that address is invalid too, return address of regs. BUG: unable to handle kernel NULL pointer dereference at 0000000a IP: [<c1004237>] print_context_stack+0x6e/0x8d pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: Pid: 4434, comm: perl Not tainted 3.6.0-rc3-oprofile-i386-standard-g4411a05 #4 Hewlett-Packard HP xw9400 Workstation/0A1Ch EIP: 0060:[<c1004237>] EFLAGS: 00010093 CPU: 0 EIP is at print_context_stack+0x6e/0x8d EAX: ffffe000 EBX: 0000000a ECX: f4435f94 EDX: 0000000a ESI: f4435f94 EDI: f4435f94 EBP: f5409ec0 ESP: f5409ea0 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 8005003b CR2: 0000000a CR3: 34ac9000 CR4: 000007d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process perl (pid: 4434, ti=f5408000 task=f5637850 task.ti=f4434000) Stack: 000003e8 ffffe000 00001ffc f4e39b00 00000000 0000000a f4435f94 c155198c f5409ef0 c1003723 c155198c f5409f04 00000000 f5409edc 00000000 00000000 f5409ee8 f4435f94 f5409fc4 00000001 f5409f1c c12dce1c 00000000 c155198c Call Trace: [<c1003723>] dump_trace+0x7b/0xa1 [<c12dce1c>] x86_backtrace+0x40/0x88 [<c12db712>] ? oprofile_add_sample+0x56/0x84 [<c12db731>] oprofile_add_sample+0x75/0x84 [<c12ddb5b>] op_amd_check_ctrs+0x46/0x260 [<c12dd40d>] profile_exceptions_notify+0x23/0x4c [<c1395034>] nmi_handle+0x31/0x4a [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45 [<c13950ed>] do_nmi+0xa0/0x2ff [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45 [<c13949e5>] nmi_stack_correct+0x28/0x2d [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45 [<c1003603>] ? do_softirq+0x4b/0x7f <IRQ> [<c102a06f>] irq_exit+0x35/0x5b [<c1018f56>] smp_apic_timer_interrupt+0x6c/0x7a [<c1394746>] apic_timer_interrupt+0x2a/0x30 Code: 89 fe eb 08 31 c9 8b 45 0c ff 55 ec 83 c3 04 83 7d 10 00 74 0c 3b 5d 10 73 26 3b 5d e4 73 0c eb 1f 3b 5d f0 76 1a 3b 5d e8 73 15 <8b> 13 89 d0 89 55 e0 e8 ad 42 03 00 85 c0 8b 55 e0 75 a6 eb cc EIP: [<c1004237>] print_context_stack+0x6e/0x8d SS:ESP 0068:f5409ea0 CR2: 000000000000000a ---[ end trace 62afee3481b00012 ]--- Kernel panic - not syncing: Fatal exception in interrupt V2: add comments to kernel_stack_pointer() * always return a valid stack address by falling back to the address of regs Reported-by: Yang Wei <wei.yang@windriver.com> Cc: <stable@vger.kernel.org> Signed-off-by: Robert Richter <robert.richter@amd.com> Link: http://lkml.kernel.org/r/20120912135059.GZ8285@erda.amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Jun Zhang <jun.zhang@intel.com>	2012-11-20 22:23:20 -08:00
David Howells	60606d4248	Merge branch 'x86-pre-uapi' into perf-uapi David Howells (1): x86: Export asm/{svm.h,vmx.h,perf_regs.h}	2012-11-19 21:50:58 +00:00
Steven Rostedt	6c8d8b3c69	x86_32: Return actual stack when requesting sp from regs As x86_32 traps do not save sp when taken in kernel mode, we need to accommodate the sp when requesting to get the register. This affects kprobes. Before: # echo 'p:ftrace sys_read+4 s=%sp' > /debug/tracing/kprobe_events # echo 1 > /debug/tracing/events/kprobes/enable # cat trace sshd-1345 [000] d... 489.117168: ftrace: (sys_read+0x4/0x70) s=b7e96768 sshd-1345 [000] d... 489.117191: ftrace: (sys_read+0x4/0x70) s=b7e96768 cat-1447 [000] d... 489.117392: ftrace: (sys_read+0x4/0x70) s=5a7 cat-1447 [001] d... 489.118023: ftrace: (sys_read+0x4/0x70) s=b77ad05f less-1448 [000] d... 489.118079: ftrace: (sys_read+0x4/0x70) s=b7762e06 less-1448 [000] d... 489.118117: ftrace: (sys_read+0x4/0x70) s=b7764970 After: sshd-1352 [000] d... 362.348016: ftrace: (sys_read+0x4/0x70) s=f3febfa8 sshd-1352 [000] d... 362.348048: ftrace: (sys_read+0x4/0x70) s=f3febfa8 bash-1355 [001] d... 362.348081: ftrace: (sys_read+0x4/0x70) s=f5075fa8 sshd-1352 [000] d... 362.348082: ftrace: (sys_read+0x4/0x70) s=f3febfa8 sshd-1352 [000] d... 362.690950: ftrace: (sys_read+0x4/0x70) s=f3febfa8 bash-1355 [001] d... 362.691033: ftrace: (sys_read+0x4/0x70) s=f5075fa8 Link: http://lkml.kernel.org/r/1342208654.30075.22.camel@gandalf.stny.rr.com Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-11-19 05:31:37 -05:00
Yinghai Lu	c074eaac2a	x86, mm: kill numa_64.h Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-44-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:47 -08:00
Yinghai Lu	94b43c3d86	x86, mm: kill numa_free_all_bootmem() Now NO_BOOTMEM version free_all_bootmem_node() does not really do free_bootmem at all, and it only call register_page_bootmem_info_node instead. That is confusing, try to kill that free_all_bootmem_node(). Before that, this patch will remove numa_free_all_bootmem(). That function could be replaced with register_page_bootmem_info() and free_all_bootmem(); Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-43-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:47 -08:00
Yinghai Lu	c8dcdb9ce4	x86, mm: Move function declaration into mm_internal.h They are only for mm/init*.c. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-34-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:38 -08:00
Yinghai Lu	cf47065961	x86, mm: Move back pgt_buf_* to mm/init.c Also change them to static. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-31-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:36 -08:00
Yinghai Lu	6f80b68e9e	x86, mm, Xen: Remove mapping_pagetable_reserve() Page table area are pre-mapped now after x86, mm: setup page table in top-down x86, mm: Remove early_memremap workaround for page table accessing on 64bit mapping_pagetable_reserve is not used anymore, so remove it. Also remove operation in mask_rw_pte(), as modified allow_low_page always return pages that are already mapped, moreover xen_alloc_pte_init, xen_alloc_pmd_init, etc, will mark the page RO before hooking it into the pagetable automatically. -v2: add changelog about mask_rw_pte() from Stefano. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-27-git-send-email-yinghai@kernel.org Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:26 -08:00
Yinghai Lu	9985b4c6fa	x86, mm: Move min_pfn_mapped back to mm/init.c Also change it to static. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-26-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:24 -08:00
Yinghai Lu	8d57470d8f	x86, mm: setup page table in top-down Get pgt_buf early from BRK, and use it to map PMD_SIZE from top at first. Then use mapped pages to map more ranges below, and keep looping until all pages get mapped. alloc_low_page will use page from BRK at first, after that buffer is used up, will use memblock to find and reserve pages for page table usage. Introduce min_pfn_mapped to make sure find new pages from mapped ranges, that will be updated when lower pages get mapped. Also add step_size to make sure that don't try to map too big range with limited mapped pages initially, and increase the step_size when we have more mapped pages on hand. We don't need to call pagetable_reserve anymore, reserve work is done in alloc_low_page() directly. At last we can get rid of calculation and find early pgt related code. -v2: update to after fix_xen change, also use MACRO for initial pgt_buf size and add comments with it. -v3: skip big reserved range in memblock.reserved near end. -v4: don't need fix_xen change now. -v5: add changelog about moving about reserving pagetable to alloc_low_page. Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-22-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:19 -08:00
Jacob Shin	66520ebc2d	x86, mm: Only direct map addresses that are marked as E820_RAM Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT ) and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not backed by actual DRAM. This is fine for holes under 4GB which are covered by fixed and variable range MTRRs to be UC. However, we run into trouble on higher memory addresses which cannot be covered by MTRRs. Our system with 1TB of RAM has an e820 that looks like this: BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable and so direct mappings are created for huge memory hole between 0x000000e038000000 to 0x0000010000000000. Even though the kernel never generates memory accesses in that region, since the page tables mark them incorrectly as being WB, our (AMD) processor ends up causing a MCE while doing some memory bookkeeping/optimizations around that area. This patch iterates through e820 and only direct maps ranges that are marked as E820_RAM, and keeps track of those pfn ranges. Depending on the alignment of E820 ranges, this may possibly result in using smaller size (i.e. 4K instead of 2M or 1G) page tables. -v2: move changes from setup.c to mm/init.c, also use for_each_mem_pfn_range instead. - Yinghai Lu -v3: add calculate_all_table_space_size() to get correct needed page table size. - Yinghai Lu -v4: fix add_pfn_range_mapped() to get correct max_low_pfn_mapped when mem map does have hole under 4g that is found by Konard on xen domU with 8g ram. - Yinghai Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1353123563-3103-16-git-send-email-yinghai@kernel.org Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:14 -08:00
Jacob Shin	dda56e1340	x86, mm: Fixup code testing if a pfn is direct mapped Update code that previously assumed pfns [ 0 - max_low_pfn_mapped ) and [ 4GB - max_pfn_mapped ) were always direct mapped, to now look up pfn_mapped ranges instead. -v2: change applying sequence to keep git bisecting working. so add dummy pfn_range_is_mapped(). - Yinghai Lu Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1353123563-3103-12-git-send-email-yinghai@kernel.org Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:09 -08:00
Yinghai Lu	22ddfcaa0d	x86, mm: Move init_memory_mapping calling out of setup.c Now init_memory_mapping is called two times, later will be called for every ram ranges. Could put all related init_mem calling together and out of setup.c. Actually, it reverts commit `1bbbbe7` x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping. will address that later with complete solution include handling hole under 4g. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-5-git-send-email-yinghai@kernel.org Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:03 -08:00
Yinghai Lu	fa62aafea9	x86, mm: Add global page_size_mask and probe one time only Now we pass around use_gbpages and use_pse for calculating page table size, Later we will need to call init_memory_mapping for every ram range one by one, that mean those calculation will be done several times. Those information are the same for all ram range and could be stored in page_size_mask and could be probed it one time only. Move that probing code out of init_memory_mapping into separated function probe_page_size_mask(), and call it before all init_memory_mapping. Suggested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-2-git-send-email-yinghai@kernel.org Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:00 -08:00
Alexander Duyck	7d74275d39	x86: Make it so that __pa_symbol can only process kernel symbols on x86_64 I submitted an earlier patch that make __phys_addr an inline. This obviously results in an increase in the code size. One step I can take to reduce that is to make it so that the __pa_symbol call does a direct translation for kernel addresses instead of covering all of virtual memory. On my system this reduced the size for __pa_symbol from 5 instructions totalling 30 bytes to 3 instructions totalling 16 bytes. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Link: http://lkml.kernel.org/r/20121116215356.8521.92472.stgit@ahduyck-cp1.jf.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-16 16:42:09 -08:00
Alexander Duyck	0bdf525f04	x86: Improve __phys_addr performance by making use of carry flags and inlining This patch is meant to improve overall system performance when making use of the __phys_addr call. To do this I have implemented several changes. First if CONFIG_DEBUG_VIRTUAL is not defined __phys_addr is made an inline, similar to how this is currently handled in 32 bit. However in order to do this it is required to export phys_base so that it is available if __phys_addr is used in kernel modules. The second change was to streamline the code by making use of the carry flag on an add operation instead of performing a compare on a 64 bit value. The advantage to this is that it allows us to significantly reduce the overall size of the call. On my Xeon E5 system the entire __phys_addr inline call consumes a little less than 32 bytes and 5 instructions. I also applied similar logic to the debug version of the function. My testing shows that the debug version of the function with this patch applied is slightly faster than the non-debug version without the patch. Finally I also applied the same logic changes to __virt_addr_valid since it used the same general code flow as __phys_addr and could achieve similar gains though these changes. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Link: http://lkml.kernel.org/r/20121116215315.8521.46270.stgit@ahduyck-cp1.jf.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-16 16:42:08 -08:00
Alexander Duyck	fb50b020c5	x86: Move some contents of page_64_types.h into pgtable_64.h and page_64.h This patch is meant to clean-up the fact that we have several functions in page_64_types.h which really don't belong there. I found this issue when I had tried to replace __phys_addr with an inline function. It resulted in the realmode bits generating compile warnings about types. In order to resolve that I am relocating the address translation to page_64.h since this is in keeping with where these functions are located in 32 bit. In addtion I have relocated several functions defined in init_64.c to pgtable_64.h as this seems to be where most of the functions related to memory initialization were already located. [ hpa: added missing #include <asm/pgtable.h> to apic_numachip.c, as reported by Yinghai Lu. ] Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Link: http://lkml.kernel.org/r/20121116215244.8521.31505.stgit@ahduyck-cp1.jf.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Daniel J Blueman <daniel@numascale-asia.com>	2012-11-16 16:40:34 -08:00
Fenghua Yu	a71c8bc5df	x86, topology: Debug CPU0 hotplug CONFIG_DEBUG_HOTPLUG_CPU0 is for debugging the CPU0 hotplug feature. The switch offlines CPU0 as soon as possible and boots userspace up with CPU0 offlined. User can online CPU0 back after boot time. The default value of the switch is off. To debug CPU0 hotplug, you need to enable CPU0 offline/online feature by either turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during compilation or giving cpu0_hotplug kernel parameter at boot. It's safe and early place to take down CPU0 after all hotplug notifiers are installed and SMP is booted. Please note that some applications or drivers, e.g. some versions of udevd, during boot time may put CPU0 online again in this CPU0 hotplug debug mode. In this debug mode, setup_local_APIC() may report a warning on max_loops<=0 when CPU0 is onlined back after boot time. This is because pending interrupt in IRR can not move to ISR. The warning is not CPU0 specfic and it can happen on other CPUs as well. It is harmless except the first CPU0 online takes a bit longer time. And so this debug mode is useful to expose this issue. I'll send a seperate patch to fix this generic warning issue. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1352835171-3958-15-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-14 15:28:11 -08:00
Fenghua Yu	e1c467e690	x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI Instead of waiting for STARTUP after INITs, BSP will execute the BIOS boot-strap code which is not a desired behavior for waking up BSP. To avoid the boot-strap code, wake up CPU0 by NMI instead. This works to wake up soft offlined CPU0 only. If CPU0 is hard offlined (i.e. physically hot removed and then hot added), NMI won't wake it up. We'll change this code in the future to wake up hard offlined CPU0 if real platform and request are available. AP is still waken up as before by INIT, SIPI, SIPI sequence. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1352896613-25957-1-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-14 15:28:03 -08:00
Mika Westerberg	06f64c8f23	driver core / ACPI: Move ACPI support to core device and driver types With ACPI 5 we are starting to see devices that don't natively support discovery but can be enumerated with the help of the ACPI namespace. Typically, these devices can be represented in the Linux device driver model as platform devices or some serial bus devices, like SPI or I2C devices. Since we want to re-use existing drivers for those devices, we need a way for drivers to specify the ACPI IDs of supported devices, so that they can be matched against device nodes in the ACPI namespace. To this end, it is sufficient to add a pointer to an array of supported ACPI device IDs, that can be provided by the driver, to struct device. Moreover, things like ACPI power management need to have access to the ACPI handle of each supported device, because that handle is used to invoke AML methods associated with the corresponding ACPI device node. The ACPI handles of devices are now stored in the archdata member structure of struct device whose definition depends on the architecture and includes the ACPI handle only on x86 and ia64. Since the pointer to an array of supported ACPI IDs is added to struct device_driver in an architecture-independent way, it is logical to move the ACPI handle from archdata to struct device itself at the same time. This also makes code more straightforward in some places and follows the example of Device Trees that have a poiter to struct device_node in there too. This changeset is based on Mika Westerberg's work. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:28:00 +01:00
Fenghua Yu	30106c1743	x86, hotplug: Support functions for CPU0 online/offline Add smp_store_boot_cpu_info() to store cpu info for BSP during boot time. Now smp_store_cpu_info() stores cpu info for bringing up BSP or AP after it's offline. Continue to online CPU0 in native_cpu_up(). Continue to offline CPU0 in native_cpu_disable(). Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1352835171-3958-5-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-14 09:39:48 -08:00
David Sharp	8be0709f10	tracing: Format non-nanosec times from tsc clock without a decimal point. With the addition of the "tsc" clock, formatting timestamps to look like fractional seconds is misleading. Mark clocks as either in nanoseconds or not, and format non-nanosecond timestamps as decimal integers. Tested: $ cd /sys/kernel/debug/tracing/ $ cat trace_clock [local] global tsc $ echo sched_switch > set_event $ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on $ cat trace <idle>-0 [000] 6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120 sleep-29964 [000] 6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120 ... $ echo 1 > options/latency-format $ cat trace <idle>-0 0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120 sleep-29964 0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120 ... $ echo tsc > trace_clock $ cat trace $ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on $ echo 0 > options/latency-format $ cat trace <idle>-0 [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120 sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120 ... echo 1 > options/latency-format $ cat trace <idle>-0 0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120 sleep-31128 0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120 ... v2: Move arch-specific bits out of generic code. v4: Fix x86_32 build due to 64-bit division. Google-Bug-Id: 6980623 Link: http://lkml.kernel.org/r/1352837903-32191-2-git-send-email-dhsharp@google.com Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: David Sharp <dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-11-13 15:48:40 -05:00
David Sharp	8cbd9cc625	tracing,x86: Add a TSC trace_clock In order to promote interoperability between userspace tracers and ftrace, add a trace_clock that reports raw TSC values which will then be recorded in the ring buffer. Userspace tracers that also record TSCs are then on exactly the same time base as the kernel and events can be unambiguously interlaced. Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large timestamp values. v2: Move arch-specific bits out of generic code. v3: Rename "x86-tsc", cleanups v7: Generic arch bits in Kbuild. Google-Bug-Id: 6980623 Link: http://lkml.kernel.org/r/1352837903-32191-1-git-send-email-dhsharp@google.com Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Signed-off-by: David Sharp <dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-11-13 15:48:27 -05:00
Andreas Herrmann	04a1541828	x86, cacheinfo: Determine number of cache leafs using CPUID 0x8000001d on AMD CPUID 0x8000001d works quite similar to Intels' CPUID function 4. Use it to determine number of cache leafs. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Link: http://lkml.kernel.org/r/20121019085933.GE26718@alberich Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-13 11:22:29 -08:00
Andreas Herrmann	193f3fcb3a	x86: Add cpu_has_topoext Introduce cpu_has_topoext to check for AMD's CPUID topology extensions support. It indicates support for CPUID Fn8000_001D_EAX_x[N:0]-CPUID Fn8000_001E_EDX See AMD's CPUID Specification, Publication # 25481 (as of Rev. 2.34 September 2010) Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Link: http://lkml.kernel.org/r/20121019085813.GD26718@alberich Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-13 11:22:28 -08:00
Linus Torvalds	0020dd0b8c	Bug-fixes: * Fix compile issues on ARM. * Fix hypercall fallback code for old hypervisors. * Print out which HVM parameter failed if it fails. * Fix idle notifier call after irq_enter. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJQnQdGAAoJEFjIrFwIi8fJPBAIAMX1HRx3udqhv7fziynZvFTb hj47XYIJHOK7P4fK7vZoSNgMHjL6LW5cUqC8VN67G3zUSkX9JYFsPBj6v4bWn+rG b9CS+MW7hS80LGbbqkh1F+YSEfZ863RlF9PPX2acaHTw49MlIgIqwhxIo6hy+Nm6 thu6SlbEIJkSUdhbYMOAmy5aH/3+UuuQg+oq3P7mzV8fZjEihnrrF0NlT4wOZK1o gsfrKYKJLVT526W9PF/L23/A/MCHMpvjNStpaDLOGNjV9sBMpJI8JRax6+657+q1 0kXvN5mAwTKWOaXBl4LEC9R8n1IKB91TgOY6HJAcXkb1eoP5KAeNSmU8RbsZ2T0= =XZ+0 -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen fixes from Konrad Rzeszutek Wilk: "There are three ARM compile fixes (we forgot to export certain functions and if the drivers are built as an module - we go belly-up). There is also an mismatch of irq_enter() / exit_idle() calls sequence which were fixed some time ago in other piece of codes, but failed to appear in the Xen code. Lastly a fix for to help in the field with troubleshooting in case we cannot get the appropriate parameter and also fallback code when working with very old hypervisors." Bug-fixes: - Fix compile issues on ARM. - Fix hypercall fallback code for old hypervisors. - Print out which HVM parameter failed if it fails. - Fix idle notifier call after irq_enter. * tag 'stable/for-linus-3.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/arm: Fix compile errors when drivers are compiled as modules (export more). xen/arm: Fix compile errors when drivers are compiled as modules. xen/generic: Disable fallback build on ARM. xen/events: fix RCU warning, or Call idle notifier after irq_enter() xen/hvm: If we fail to fetch an HVM parameter print out which flag it is. xen/hypercall: fix hypercall fallback code for very old hypervisors	2012-11-10 06:56:21 +01:00
Jussi Kivilinna	cf582cceda	crypto: camellia-x86_64 - share common functions and move structures and function definitions to header file Prepare camellia-x86_64 functions to be reused from AVX/AESNI implementation module. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-11-09 17:32:31 +08:00
David Howells	6d369a09cc	x86: Export asm/{svm.h,vmx.h,perf_regs.h} Export asm/{svm.h,vmx.h,perf_regs.h} so that they can be disintegrated. It looks from previous commits that the first two should have been exported, but the header-y lines weren't added to the Kbuild. I'm guessing that asm/perf_regs.h should be exported too. Signed-off-by: David Howells <dhowells@redhat.com>	2012-11-08 11:38:44 +00:00
Jan Beulich	cf47a83fb0	xen/hypercall: fix hypercall fallback code for very old hypervisors While copying the argument structures in HYPERVISOR_event_channel_op() and HYPERVISOR_physdev_op() into the local variable is sufficiently safe even if the actual structure is smaller than the container one, copying back eventual output values the same way isn't: This may collide with on-stack variables (particularly "rc") which may change between the first and second memcpy() (i.e. the second memcpy() could discard that change). Move the fallback code into out-of-line functions, and handle all of the operations known by this old a hypervisor individually: Some don't require copying back anything at all, and for the rest use the individual argument structures' sizes rather than the container's. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> [v2: Reduce #define/#undef usage in HYPERVISOR_physdev_op_compat().] [v3: Fix compile errors when modules use said hypercalls] [v4: Add xen_ prefix to the HYPERCALL_..] [v5: Alter the name and only EXPORT_SYMBOL_GPL one of them] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-11-04 10:40:42 -05:00
Linus Torvalds	66b6a0c979	Bug-fixes: * Use appropriate macros instead of hand-rolling our own (ARM). * Fixes if FB/KBD closed unexpectedly. * Fix memory leak in /dev/gntdev ioctl calls. * Fix overflow check in xenbus_file_write. * Document cleanup. * Performance optimization when migrating guests. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJQk9ngAAoJEFjIrFwIi8fJXOcH/jEmTaV2rbUCCnnivlQGj5B2 AAXt03MM2F7Ohifo8IEHDhvJlUqQnglQq4wcku/8X/bqSkxtqJMfa/UAStmS2e6r 605msiMws/GKiDPgKywWHjMPk7JJow/T7du9mpT2Swla12+DXc7e0P6Sqm6qGtB5 tCBFYe3CS+j8Xi/siPhveAoLoDVmC8RpNzV8EWBdUKhNeD6U4s5M3+ChVexOrB/6 43YkzurkY/FOsP+8YhNnKFSFrpYleRB1GdFcr8PN5mv85sNKts7vHCb4qJFzZdbk BMImdLrTUnKArE4y4FS0iqabOTGXaUplEXfyxDw5hweESGa1qzrd29ocyMQ5p/U= =LQxc -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.7-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen bugfixes from Konrad Rzeszutek Wilk: - Use appropriate macros instead of hand-rolling our own (ARM). - Fixes if FB/KBD closed unexpectedly. - Fix memory leak in /dev/gntdev ioctl calls. - Fix overflow check in xenbus_file_write. - Document cleanup. - Performance optimization when migrating guests. * tag 'stable/for-linus-3.7-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/mmu: Use Xen specific TLB flush instead of the generic one. xen/arm: use the __HVC macro xen/xenbus: fix overflow check in xenbus_file_write() xen-kbdfront: handle backend CLOSED without CLOSING xen-fbfront: handle backend CLOSED without CLOSING xen/gntdev: don't leak memory from IOCTL_GNTDEV_MAP_GRANT_REF x86: remove obsolete comment from asm/xen/hypervisor.h	2012-11-02 13:26:11 -07:00
Suresh Siddha	279f146143	x86: apic: Use tsc deadline for oneshot when available If the TSC deadline mode is supported, LAPIC timer one-shot mode can be implemented using IA32_TSC_DEADLINE MSR. An interrupt will be generated when the TSC value equals or exceeds the value in the IA32_TSC_DEADLINE MSR. This enables us to skip the APIC calibration during boot. Also, in xapic mode, this enables us to skip the uncached apic access to re-arm the APIC timer. As this timer ticks at the high frequency TSC rate, we use the TSC_DIVISOR (32) to work with the 32-bit restrictions in the clockevent API's to avoid 64-bit divides etc (frequency is u32 and "unsigned long" in the set_next_event(), max_delta limits the next event to 32-bit for 32-bit kernel). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: venki@google.com Cc: len.brown@intel.com Link: http://lkml.kernel.org/r/1350941878.6017.31.camel@sbsiddha-desk.sc.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2012-11-02 11:23:37 +01:00
Olaf Hering	b6514633bd	x86: remove obsolete comment from asm/xen/hypervisor.h Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-10-30 09:27:32 -04:00
Matt Fleming	185034e72d	x86, efi: 1:1 pagetable mapping for virtual EFI calls Some firmware still needs a 1:1 (virt->phys) mapping even after we've called SetVirtualAddressMap(). So install the mapping alongside our existing kernel mapping whenever we make EFI calls in virtual mode. This bug was discovered on ASUS machines where the firmware implementation of GetTime() accesses the RTC device via physical addresses, even though that's bogus per the UEFI spec since we've informed the firmware via SetVirtualAddressMap() that the boottime memory map is no longer valid. This bug seems to be present in a lot of consumer devices, so there's not a lot we can do about this spec violation apart from workaround it. Cc: JérômeCarretero <cJ-ko@zougloub.eu> Cc: Vasco Dias <rafa.vasco@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2012-10-30 10:39:19 +00:00
Ingo Molnar	003db633d6	Rework all config variables used throughout the MCA code and collect them together into a mca_config struct. This keeps them tightly and neatly packed together instead of spilled all over the place. Then, convert those which are used as booleans into real booleans and save some space. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQioVNAAoJEBLB8Bhh3lVKH/MP/2tGCqEdB7uxIgAKy/2wkG9G G3ad7ggSCPA7YygWsgKFicHNp511LqjQ45Bs9WGjMn2Fnkq6TCAqNO9CDB1cfX6W N0uYMQKkE6Ccx8wU29ilDHh5hU0vOYDd3EOk7e4Ox64C1i4PjO1QnS2TvphJTkto riy3yWvshLYUZdciWHJcT18KuGK3JIetyVo6hanWb3uIXudQLsfE5O077N+rWUOg 7kw23iwlWN2AudOZD5JKUAsKztOfXZ1KS2WiVgTfw+g5o4lBtbYmQ5II4LJfVCgU y/f5Ux80Q71XRsV8WSfakMfVKRN0pmuEJ2zKHUrN9yySVLV/neJZoWIP0z/uyr3f GvOErf4PDRpNu43oobm32+BIMb2IUlhc5sIOV25bMYdbBewS/R5I4KLE51+BCJIY /XC+C4rL60i5W62A+Y88xpnlwI62b+QC20PkVJYnk4vLkIE5UliOy5jhACI6iJm4 0Bwz4zXFhlzBNQItOsA6AUwYOHYhjVzqKYQMC05fVGLNxEDCx3DjRavy/ICnSAgu G9aRBHtg1JPoXziMtIZkCWsiTKCzz5Vxugdo7WxyXFqXrlK/IxydVuRU2DuiJSXo 2+7n3M1G4uLj+6s/JaEj6RZTWpKFBo4VH8dUivfrTyoZsCg0hiVSYI2W/Rsc7ZCo kCsn+H79xW6RBbxtrEqe =EOHq -----END PGP SIGNATURE----- Merge tag 'mca_cfg' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull x86 RAS changes from Borislav Petkov: "Rework all config variables used throughout the MCA code and collect them together into a mca_config struct. This keeps them tightly and neatly packed together instead of spilled all over the place. Then, convert those which are used as booleans into real booleans and save some space." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-26 14:50:17 +02:00
Borislav Petkov	1462594bf2	x86, MCA: Finish mca_config conversion mce_ser, mce_bios_cmci_threshold and mce_disabled are the last three bools which need conversion. Move them to the mca_config struct and adjust usage sites accordingly. Signed-off-by: Borislav Petkov <bp@alien8.de> Acked-by: Tony Luck <tony.luck@intel.com>	2012-10-26 14:37:58 +02:00
Borislav Petkov	7af19e4afd	x86, MCA: Convert the next three variables batch Move them into the mca_config struct and adjust code touching them accordingly. Signed-off-by: Borislav Petkov <bp@alien8.de> Acked-by: Tony Luck <tony.luck@intel.com>	2012-10-26 14:37:57 +02:00
Borislav Petkov	84c2559dee	x86, MCA: Convert rip_msr, mce_bootlog, monarch_timeout Move above configuration variables into struct mca_config and adjust usage places accordingly. Signed-off-by: Borislav Petkov <bp@alien8.de> Acked-by: Tony Luck <tony.luck@intel.com>	2012-10-26 14:37:57 +02:00
Borislav Petkov	d203f0b824	x86, MCA: Convert dont_log_ce, banks and tolerant Move those MCA configuration variables into struct mca_config and adjust the places they're used accordingly. Signed-off-by: Borislav Petkov <bp@alien8.de> Acked-by: Tony Luck <tony.luck@intel.com>	2012-10-26 14:37:56 +02:00
Ingo Molnar	8b724e2a12	EFI updates for 3.7 Fix oops with EFI variables on mixed 32/64-bit firmware/kernels and document EFI git repository location on kernel.org. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQiatJAAoJEC84WcCNIz1VaCwP/RkNLYRGruxPFD0gf9ainwVQ qSXfpYLmpZeU+TcIBx6aG7PjzQsitvMU9YBtqvIN5uoYHuDjy2vDT52WBqg2Fn5k HYW13m/Pex+kCEpV4n6uYi9NM5mJeR/+QlpfQOcofxuqsvG6WUY1l55SF5+V/Dd/ Dv94yO21JNiBumPM7KadFl5EIZ53j8OdQhEJB0jnomC0cDAWnbIHk97XPrSp6+rf 03AQrYLnDNHq0HJo44LdoJleiRuxHBC6FrhCsrctvpVLd6iVNIGbJupNTBPvvAxl zY4aBoYym87uYo6y3LMevD+L2fkTC3qE6iQilYVbShkoYLnDTOnTCcuwUkRGQ/yX vBAHH/FNw2uUKSBeTdbA2/5OEctZ+GVEgkCkplAUfwJAxidyygBn9jD/YXHL+Fu+ fDMvVnZTKbTQmOOP9cpYbebqAGykyST97HuDxOZ8mha5UP0QhCz5CbRfENdbP8w9 00+hjEIkS0fjfjaSeCzp5tpkAVovzhZyoVZRCwoe42bZ7SAreDzNTYEnbK6G4owo x2mFXGlcTeZCmTNgQEmzby71tuAK+/+UEEXuoYV42wNda52iyvv7xkHJ/Q4li3um k0jjFFcqwd3mJC+OrJHr4LTCB1tvNgbpgsDUuUYckwPIIkWa7ZOF9xCWpiu2nC08 4TI5A5DXf1n5i9sX4aw5 =oM9H -----END PGP SIGNATURE----- Merge tag 'efi-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent Pull EFI fixes from Matt Fleming: "Fix oops with EFI variables on mixed 32/64-bit firmware/kernels and document EFI git repository location on kernel.org." Conflicts: arch/x86/include/asm/efi.h Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-26 10:17:38 +02:00
Olof Johansson	5189c2a7c7	x86: efi: Turn off efi_enabled after setup on mixed fw/kernel When 32-bit EFI is used with 64-bit kernel (or vice versa), turn off efi_enabled once setup is done. Beyond setup, it is normally used to determine if runtime services are available and we will have none. This will resolve issues stemming from efivars modprobe panicking on a 32/64-bit setup, as well as some reboot issues on similar setups. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=45991 Reported-by: Marko Kohtala <marko.kohtala@gmail.com> Reported-by: Maxim Kammerer <mk@dee.su> Signed-off-by: Olof Johansson <olof@lixom.net> Acked-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: stable@kernel.org # 3.4 - 3.6 Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2012-10-25 19:09:40 +01:00
Shuah Khan	6c9c6d6301	dma-debug: New interfaces to debug dma mapping errors Add dma-debug interface debug_dma_mapping_error() to debug drivers that fail to check dma mapping errors on addresses returned by dma_map_single() and dma_map_page() interfaces. This interface clears a flag set by debug_dma_map_page() to indicate that dma_mapping_error() has been called by the driver. When driver does unmap, debug_dma_unmap() checks the flag and if this flag is still set, prints warning message that includes call trace that leads up to the unmap. This interface can be called from dma_mapping_error() routines to enable dma mapping error check debugging. Tested: Intel iommu and swiotlb (iommu=soft) on x86-64 with CONFIG_DMA_API_DEBUG enabled and disabled. Signed-off-by: Shuah Khan <shuah.khan@hp.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2012-10-24 17:06:43 +02:00
Jussi Kivilinna	facd416fbc	crypto: serpent/avx - avoid using temporary stack buffers Introduce new assembler functions to avoid use temporary stack buffers in glue code. This also allows use of vector instructions for xoring output in CTR and CBC modes and construction of IVs for CTR mode. ECB mode sees ~0.5% decrease in speed because added one extra function call. CBC mode decryption and CTR mode benefit from vector operations and gain ~3%. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-10-24 21:10:55 +08:00
Jussi Kivilinna	58990986f1	crypto: x86/glue_helper - use le128 instead of u128 for CTR mode 'u128' currently used for CTR mode is on little-endian 'long long' swapped and would require extra swap operations by SSE/AVX code. Use of le128 instead of u128 allows IV calculations to be done with vector registers easier. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-10-24 21:10:54 +08:00
Matt Fleming	3e8fa263a9	x86/efi: Fix oops caused by incorrect set_memory_uc() usage Calling __pa() with an ioremap'd address is invalid. If we encounter an efi_memory_desc_t without EFI_MEMORY_WB set in ->attribute we currently call set_memory_uc(), which in turn calls __pa() on a potentially ioremap'd address. On CONFIG_X86_32 this results in the following oops: BUG: unable to handle kernel paging request at f7f22280 IP: [<c10257b9>] reserve_ram_pages_type+0x89/0x210 pdpt = 0000000001978001 pde = 0000000001ffb067 pte = 0000000000000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: Pid: 0, comm: swapper Not tainted 3.0.0-acpi-efi-0805 #3 EIP: 0060:[<c10257b9>] EFLAGS: 00010202 CPU: 0 EIP is at reserve_ram_pages_type+0x89/0x210 EAX: 0070e280 EBX: 38714000 ECX: f7814000 EDX: 00000000 ESI: 00000000 EDI: 38715000 EBP: c189fef0 ESP: c189fea8 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Process swapper (pid: 0, ti=c189e000 task=c18bbe60 task.ti=c189e000) Stack: 80000200 ff108000 00000000 c189ff00 00038714 00000000 00000000 c189fed0 c104f8ca 00038714 00000000 00038715 00000000 00000000 00038715 00000000 00000010 38715000 c189ff48 c1025aff 38715000 00000000 00000010 00000000 Call Trace: [<c104f8ca>] ? page_is_ram+0x1a/0x40 [<c1025aff>] reserve_memtype+0xdf/0x2f0 [<c1024dc9>] set_memory_uc+0x49/0xa0 [<c19334d0>] efi_enter_virtual_mode+0x1c2/0x3aa [<c19216d4>] start_kernel+0x291/0x2f2 [<c19211c7>] ? loglevel+0x1b/0x1b [<c19210bf>] i386_start_kernel+0xbf/0xc8 The only time we can call set_memory_uc() for a memory region is when it is part of the direct kernel mapping. For the case where we ioremap a memory region we must leave it alone. This patch reimplements the fix from `e8c7106280` ("x86, efi: Calling __pa() with an ioremap()ed address is invalid") which was reverted in `e1ad783b12` because it caused a regression on some MacBooks (they hung at boot). The regression was caused because the commit only marked EFI_RUNTIME_SERVICES_DATA as E820_RESERVED_EFI, when it should have marked all regions that have the EFI_MEMORY_RUNTIME attribute. Despite first impressions, it's not possible to use ioremap_cache() to map all cached memory regions on CONFIG_X86_64 because of the way that the memory map might be configured as detailed in the following bug report, https://bugzilla.redhat.com/show_bug.cgi?id=748516 e.g. some of the EFI memory regions need* to be mapped as part of the direct kernel mapping. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Huang Ying <huang.ying.caritas@gmail.com> Cc: Keith Packard <keithp@keithp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1350649546-23541-1-git-send-email-matt@console-pimps.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-24 12:48:47 +02:00
Konrad Rzeszutek Wilk	e05dacd71d	Merge commit 'v3.7-rc1' into stable/for-linus-3.7 * commit 'v3.7-rc1': (10892 commits) Linux 3.7-rc1 x86, boot: Explicitly include autoconf.h for hostprogs perf: Fix UAPI fallout ARM: config: make sure that platforms are ordered by option string ARM: config: sort select statements alphanumerically UAPI: (Scripted) Disintegrate include/linux/byteorder UAPI: (Scripted) Disintegrate include/linux UAPI: Unexport linux/blk_types.h UAPI: Unexport part of linux/ppp-comp.h perf: Handle new rbtree implementation procfs: don't need a PATH_MAX allocation to hold a string representation of an int vfs: embed struct filename inside of names_cache allocation if possible audit: make audit_inode take struct filename vfs: make path_openat take a struct filename pointer vfs: turn do_path_lookup into wrapper around struct filename variant audit: allow audit code to satisfy getname requests from its names_list vfs: define struct filename and have getname() return it btrfs: Fix compilation with user namespace support enabled userns: Fix posix_acl_file_xattr_userns gid conversion userns: Properly print bluetooth socket uids ...	2012-10-19 15:19:19 -04:00
Ian Campbell	ef32f89298	xen: grant: use xen_pfn_t type for frame_list. This correctly sizes it as 64 bit on ARM but leaves it as unsigned long on x86 (therefore no intended change on x86). The long and ulong guest handles are now unused (and a bit dangerous) so remove them. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-10-19 15:17:55 -04:00
Ian Campbell	37ea0fcb6a	xen: sysfs: fix build warning. Define PRI macros for xen_ulong_t and xen_pfn_t and use to fix: drivers/xen/sys-hypervisor.c:288:4: warning: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'xen_ulong_t' [-Wformat] Ideally this would use PRIx64 on ARM but these (or equivalent) don't seem to be available in the kernel. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-10-19 15:17:51 -04:00
Linus Torvalds	ade0899b29	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "This tree includes some late late perf items that missed the first round: tools: - Bash auto completion improvements, now we can auto complete the tools long options, tracepoint event names, etc, from Namhyung Kim. - Look up thread using tid instead of pid in 'perf sched'. - Move global variables into a perf_kvm struct, from David Ahern. - Hists refactorings, preparatory for improved 'diff' command, from Jiri Olsa. - Hists refactorings, preparatory for event group viewieng work, from Namhyung Kim. - Remove double negation on optional feature macro definitions, from Namhyung Kim. - Remove several cases of needless global variables, on most builtins. - misc fixes kernel: - sysfs support for IBS on AMD CPUs, from Robert Richter. - Support for an upcoming Intel CPU, the Xeon-Phi / Knights Corner HPC blade PMU, from Vince Weaver. - misc fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) perf: Fix perf_cgroup_switch for sw-events perf: Clarify perf_cpu_context::active_pmu usage by renaming it to ::unique_pmu perf/AMD/IBS: Add sysfs support perf hists: Add more helpers for hist entry stat perf hists: Move he->stat.nr_events initialization to a template perf hists: Introduce struct he_stat perf diff: Removing the total_period argument from output code perf tool: Add hpp interface to enable/disable hpp column perf tools: Removing hists pair argument from output path perf hists: Separate overhead and baseline columns perf diff: Refactor diff displacement possition info perf hists: Add struct hists pointer to struct hist_entry perf tools: Complete tracepoint event names perf/x86: Add support for Intel Xeon-Phi Knights Corner PMU perf evlist: Remove some unused methods perf evlist: Introduce add_newtp method perf kvm: Move global variables into a perf_kvm struct perf tools: Convert to BACKTRACE_SUPPORT perf tools: Long option completion support for each subcommands perf tools: Complete long option names of perf command ...	2012-10-13 10:20:11 +09:00
Linus Torvalds	4e21fc138b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull third pile of kernel_execve() patches from Al Viro: "The last bits of infrastructure for kernel_thread() et.al., with alpha/arm/x86 use of those. Plus sanitizing the asm glue and do_notify_resume() on alpha, fixing the "disabled irq while running task_work stuff" breakage there. At that point the rest of kernel_thread/kernel_execve/sys_execve work can be done independently for different architectures. The only pending bits that do depend on having all architectures converted are restrictred to fs/* and kernel/* - that'll obviously have to wait for the next cycle. I thought we'd have to wait for all of them done before we start eliminating the longjump-style insanity in kernel_execve(), but it turned out there's a very simple way to do that without flagday-style changes." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: alpha: switch to saner kernel_execve() semantics arm: switch to saner kernel_execve() semantics x86, um: convert to saner kernel_execve() semantics infrastructure for saner ret_from_kernel_thread semantics make sure that kernel_thread() callbacks call do_exit() themselves make sure that we always have a return path from kernel_execve() ppc: eeh_event should just use kthread_run() don't bother with kernel_thread/kernel_execve for launching linuxrc alpha: get rid of switch_stack argument of do_work_pending() alpha: don't bother passing switch_stack separately from regs alpha: take SIGPENDING/NOTIFY_RESUME loop into signal.c alpha: simplify TIF_NEED_RESCHED handling	2012-10-13 10:05:52 +09:00
Al Viro	22e2430d60	x86, um: convert to saner kernel_execve() semantics Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-10-12 13:35:22 -04:00
Linus Torvalds	03d3602a83	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core update from Thomas Gleixner: - Bug fixes (one for a longstanding dead loop issue) - Rework of time related vsyscalls - Alarm timer updates - Jiffies updates to remove compile time dependencies * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Cast raw_interval to u64 to avoid shift overflow timers: Fix endless looping between cascade() and internal_add_timer() time/jiffies: bring back unconditional LATCH definition time: Convert x86_64 to using new update_vsyscall time: Only do nanosecond rounding on GENERIC_TIME_VSYSCALL_OLD systems time: Introduce new GENERIC_TIME_VSYSCALL time: Convert CONFIG_GENERIC_TIME_VSYSCALL to CONFIG_GENERIC_TIME_VSYSCALL_OLD time: Move update_vsyscall definitions to timekeeper_internal.h time: Move timekeeper structure to timekeeper_internal.h for vsyscall changes jiffies: Remove compile time assumptions about CLOCK_TICK_RATE jiffies: Kill unused TICK_USEC_TO_NSEC alarmtimer: Rename alarmtimer_remove to alarmtimer_dequeue alarmtimer: Remove unused helpers & defines alarmtimer: Use hrtimer per-alarm instead of per-base alarmtimer: Implement minimum alarm interval for allowing suspend	2012-10-12 22:17:48 +09:00
Linus Torvalds	42859eea96	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull generic execve() changes from Al Viro: "This introduces the generic kernel_thread() and kernel_execve() functions, and switches x86, arm, alpha, um and s390 over to them." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (26 commits) s390: convert to generic kernel_execve() s390: switch to generic kernel_thread() s390: fold kernel_thread_helper() into ret_from_fork() s390: fold execve_tail() into start_thread(), convert to generic sys_execve() um: switch to generic kernel_thread() x86, um/x86: switch to generic sys_execve and kernel_execve x86: split ret_from_fork alpha: introduce ret_from_kernel_execve(), switch to generic kernel_execve() alpha: switch to generic kernel_thread() alpha: switch to generic sys_execve() arm: get rid of execve wrapper, switch to generic execve() implementation arm: optimized current_pt_regs() arm: introduce ret_from_kernel_execve(), switch to generic kernel_execve() arm: split ret_from_fork, simplify kernel_thread() [based on patch by rmk] generic sys_execve() generic kernel_execve() new helper: current_pt_regs() preparation for generic kernel_thread() um: kill thread->forking um: let signal_delivered() do SIGTRAP on singlestepping into handler ...	2012-10-10 12:02:25 +09:00
Linus Torvalds	9e2d8656f5	Merge branch 'akpm' (Andrew's patch-bomb) Merge patches from Andrew Morton: "A few misc things and very nearly all of the MM tree. A tremendous amount of stuff (again), including a significant rbtree library rework." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (160 commits) sparc64: Support transparent huge pages. mm: thp: Use more portable PMD clearing sequenece in zap_huge_pmd(). mm: Add and use update_mmu_cache_pmd() in transparent huge page code. sparc64: Document PGD and PMD layout. sparc64: Eliminate PTE table memory wastage. sparc64: Halve the size of PTE tables sparc64: Only support 4MB huge pages and 8KB base pages. memory-hotplug: suppress "Trying to free nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>" warning mm: memcg: clean up mm_match_cgroup() signature mm: document PageHuge somewhat mm: use %pK for /proc/vmallocinfo mm, thp: fix mlock statistics mm, thp: fix mapped pages avoiding unevictable list on mlock memory-hotplug: update memory block's state and notify userspace memory-hotplug: preparation to notify memory block's state at memory hot remove mm: avoid section mismatch warning for memblock_type_name make GFP_NOTRACK definition unconditional cma: decrease cc.nr_migratepages after reclaiming pagelist CMA: migrate mlocked pages kpageflags: fix wrong KPF_THP on non-huge compound pages ...	2012-10-09 16:23:15 +09:00
David Miller	b113da6578	mm: Add and use update_mmu_cache_pmd() in transparent huge page code. The transparent huge page code passes a PMD pointer in as the third argument of update_mmu_cache(), which expects a PTE pointer. This never got noticed because X86 implements update_mmu_cache() as a macro and thus we don't get any type checking, and X86 is the only architecture which supports transparent huge pages currently. Before other architectures can support transparent huge pages properly we need to add a new interface which will take a PMD pointer as the third argument rather than a PTE pointer. [akpm@linux-foundation.org: implement update_mm_cache_pmd() for s390] Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:23:05 +09:00
Andrea Arcangeli	027ef6c878	mm: thp: fix pmd_present for split_huge_page and PROT_NONE with THP In many places !pmd_present has been converted to pmd_none. For pmds that's equivalent and pmd_none is quicker so using pmd_none is better. However (unless we delete pmd_present) we should provide an accurate pmd_present too. This will avoid the risk of code thinking the pmd is non present because it's under __split_huge_page_map, see the pmd_mknotpresent there and the comment above it. If the page has been mprotected as PROT_NONE, it would also lead to a pmd_present false negative in the same way as the race with split_huge_page. Because the PSE bit stays on at all times (both during split_huge_page and when the _PAGE_PROTNONE bit get set), we could only check for the PSE bit, but checking the PROTNONE bit too is still good to remember pmd_present must always keep PROT_NONE into account. This explains a not reproducible BUG_ON that was seldom reported on the lists. The same issue is in pmd_large, it would go wrong with both PROT_NONE and if it races with split_huge_page. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:57 +09:00
Shaohua Li	e79bee24fd	atomic: implement generic atomic_dec_if_positive() The x86 implementation of atomic_dec_if_positive is quite generic, so make it available to all architectures. This is needed for "swap: add a simple detector for inappropriate swapin readahead". [akpm@linux-foundation.org: do the "#define foo foo" trick in the conventional manner] Signed-off-by: Shaohua Li <shli@fusionio.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:46 +09:00
Will Deacon	5d3a551c28	mm: hugetlb: add arch hook for clearing page flags before entering pool The core page allocator ensures that page flags are zeroed when freeing pages via free_pages_check. A number of architectures (ARM, PPC, MIPS) rely on this property to treat new pages as dirty with respect to the data cache and perform the appropriate flushing before mapping the pages into userspace. This can lead to cache synchronisation problems when using hugepages, since the allocator keeps its own pool of pages above the usual page allocator and does not reset the page flags when freeing a page into the pool. This patch adds a new architecture hook, arch_clear_hugepage_flags, so that architectures which rely on the page flags being in a particular state for fresh allocations can adjust the flags accordingly when a page is freed into the pool. Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Michal Hocko <mhocko@suse.cz> Reviewed-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:24 +09:00
Linus Torvalds	50e0d10232	This has three changes for asm-generic that did not really fit into any other branch as normal asm-generic changes do. One is a fix for a build warning, the other two are more interesting: * A patch from Mark Brown to allow using the common clock infrastructure on all architectures, so we can use the clock API in architecture independent device drivers. * The UAPI split patches from David Howells for the asm-generic files. There are other architecture specific series that are going through the arch maintainer tree and that depend on this one. There may be a few small merge conflicts between Mark's patch and the following arch header file split patches. In each case the solution will be to keep the new "generic-y += clkdev.h" line, even if it ends up being the only line in the Kbuild file. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIVAwUAUHLuO2CrR//JCVInAQLsKxAAoa+oSP3KGuQbLHq2wvUxAdXWDFcZgKo+ qMRejSJPI0sreJ9GJHpUjHtJ7W2gujeo9upmUIJzoWY9vrmjkhCDkaWliaQI8SmY CKB9zI2xCB9iFzHtWxocfnJzU7NvzjJm+jnIYrqkaO9HGMxL99tsv9TsBYXK/08j QmlGP5fHdGU3zZxVt5r1GL8/nfX4zn3/YEll9nJ7vqXZltIBbaksxmgPoa0QkkH8 LMeMAlgRR2DHWt58gXHyGB7Afx3QEnZBDaQpYxA446P+2gtvIhFYOnpuX14pZb7t m4IM0vOO6WzARQR6DJlRHfYJevojgGHu4Y8wkEzuWE+Hr2BqmiVct7UKqGJdqTY5 7+I7wwaJmdd3zE61LxRS9UOjJDwMh1gmsNU4+42RArQ5eLcikNR5zfYzDRLCTmnk qKZvbiaxgme2YvWazxbBT6EqmIVU6lfHHIoMLr8U0j40Cl0GCmN7EBbe7/r2Jhjs 6VnCOJ6vb4RCOJGGAcLRMQu7xEtqcCe0Zht839wl13QXewxS3QRgwg6Bjy/fwA9r jij5gf+R25J/fQW7yZv4LwcMowRE1xvpu0ebwkK3LLR8jcon71scd6f3PW/bUUpj j4tgFuJbXzOxQ4LFgBzvdVgx3wDzsQhqb/6p2l6ROdcw7xXFDdFZ4zq3h0A25wXZ J6WDO387tpg= =Aaki -----END PGP SIGNATURE----- Merge tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "This has three changes for asm-generic that did not really fit into any other branch as normal asm-generic changes do. One is a fix for a build warning, the other two are more interesting: * A patch from Mark Brown to allow using the common clock infrastructure on all architectures, so we can use the clock API in architecture independent device drivers. * The UAPI split patches from David Howells for the asm-generic files. There are other architecture specific series that are going through the arch maintainer tree and that depend on this one. There may be a few small merge conflicts between Mark's patch and the following arch header file split patches. In each case the solution will be to keep the new "generic-y += clkdev.h" line, even if it ends up being the only line in the Kbuild file." * tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: UAPI: (Scripted) Disintegrate include/asm-generic asm-generic: Add default clkdev.h asm-generic: xor: mark static functions as __maybe_unused	2012-10-09 15:58:38 +09:00
Linus Torvalds	f1c6872e49	Features: * Allow a Linux guest to boot as initial domain and as normal guests on Xen on ARM (specifically ARMv7 with virtualized extensions). PV console, block and network frontend/backends are working. Bug-fixes: * Fix compile linux-next fallout. * Fix PVHVM bootup crashing. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJQbJELAAoJEFjIrFwIi8fJSI4H/32qrQKyF5IIkFKHTN9FYDC1 OxEGc4y47DIQpGUd/PgZ/i6h9Iyhj+I6pb4lCevykwgd0j83noepluZlCIcJnTfL HVXNiRIQKqFhqKdjTANxVM4APup+7Lqrvqj6OZfUuoxaZ3tSTLhabJ/7UXf2+9xy g2RfZtbSdQ1sukQ/A2MeGQNT79rh7v7PrYQUYSrqytjSjSLPTqRf75HWQ+eapIAH X3aVz8Tn6nTixZWvZOK7rAaD4awsFxGP6E46oFekB02f4x9nWHJiCZiXwb35lORb tz9F9td99f6N4fPJ9LgcYTaCPwzVnceZKqE9hGfip4uT+0WrEqDxq8QmBqI5YtI= =gxJD -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull ADM Xen support from Konrad Rzeszutek Wilk: Features: * Allow a Linux guest to boot as initial domain and as normal guests on Xen on ARM (specifically ARMv7 with virtualized extensions). PV console, block and network frontend/backends are working. Bug-fixes: * Fix compile linux-next fallout. * Fix PVHVM bootup crashing. The Xen-unstable hypervisor (so will be 4.3 in a ~6 months), supports ARMv7 platforms. The goal in implementing this architecture is to exploit the hardware as much as possible. That means use as little as possible of PV operations (so no PV MMU) - and use existing PV drivers for I/Os (network, block, console, etc). This is similar to how PVHVM guests operate in X86 platform nowadays - except that on ARM there is no need for QEMU. The end result is that we share a lot of the generic Xen drivers and infrastructure. Details on how to compile/boot/etc are available at this Wiki: http://wiki.xen.org/wiki/Xen_ARMv7_with_Virtualization_Extensions and this blog has links to a technical discussion/presentations on the overall architecture: http://blog.xen.org/index.php/2012/09/21/xensummit-sessions-new-pvh-virtualisation-mode-for-arm-cortex-a15arm-servers-and-x86/ * tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (21 commits) xen/xen_initial_domain: check that xen_start_info is initialized xen: mark xen_init_IRQ __init xen/Makefile: fix dom-y build arm: introduce a DTS for Xen unprivileged virtual machines MAINTAINERS: add myself as Xen ARM maintainer xen/arm: compile netback xen/arm: compile blkfront and blkback xen/arm: implement alloc/free_xenballooned_pages with alloc_pages/kfree xen/arm: receive Xen events on ARM xen/arm: initialize grant_table on ARM xen/arm: get privilege status xen/arm: introduce CONFIG_XEN on ARM xen: do not compile manage, balloon, pci, acpi, pcpu and cpu_hotplug on ARM xen/arm: Introduce xen_ulong_t for unsigned long xen/arm: Xen detection and shared_info page mapping docs: Xen ARM DT bindings xen/arm: empty implementation of grant_table arch specific functions xen/arm: sync_bitops xen/arm: page.h definitions xen/arm: hypercalls ...	2012-10-07 07:13:01 +09:00
Denys Vlasenko	751f409db6	compat: move compat_siginfo_t definition to asm/compat.h This is a preparatory patch for the introduction of NT_SIGINFO elf note. Make the location of compat_siginfo_t uniform across eight architectures which have it. Now it can be pulled in by including asm/compat.h or linux/compat.h. Most of the copies are verbatim. compat_uid[32]_t had to be replaced by __compat_uid[32]_t. compat_uptr_t had to be moved up before compat_siginfo_t in asm/compat.h on a several architectures (tile already had it moved up). compat_sigval_t had to be relocated from linux/compat.h to asm/compat.h. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Amerigo Wang <amwang@redhat.com> Cc: "Jonathan M. Foote" <jmfoote@cert.org> Cc: Roland McGrath <roland@hack.frob.com> Cc: Pedro Alves <palves@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:16 +09:00
Andi Kleen	75fdd155ea	sections: fix section conflicts in arch/x86 Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:04:40 +09:00
Arnd Bergmann	c37d6154c0	Merge branch 'disintegrate-asm-generic' of git://git.infradead.org/users/dhowells/linux-headers into asm-generic Patches from David Howells <dhowells@redhat.com>: This is to complete part of the UAPI disintegration for which the preparatory patches were pulled recently. Note that there are some fixup patches which are at the base of the branch aimed at you, plus all arches get the asm-generic branch merged in too. * 'disintegrate-asm-generic' of git://git.infradead.org/users/dhowells/linux-headers: UAPI: (Scripted) Disintegrate include/asm-generic UAPI: Fix conditional header installation handling (notably kvm_para.h on m68k) c6x: remove c6x signal.h UAPI: Split compound conditionals containing __KERNEL__ in Arm64 UAPI: Fix the guards on various asm/unistd.h files Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2012-10-04 22:57:51 +02:00
Linus Torvalds	ecefbd94b8	KVM updates for the 3.7 merge window -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQbY/2AAoJEI7yEDeUysxlymQQAIv5svpAI/FUe3FhvBi3IW2h WWMIpbdhHyocaINT18qNp8prO0iwoaBfgsnU8zuB34MrbdUgiwSHgM6T4Ff4NGa+ R4u+gpyKYwxNQYKeJyj04luXra/krxwHL1u9OwN7o44JuQXAmzrw2tZ9ad1ArvL3 eoZ6kGsPcdHPZMZWw2jN5xzBsRtqybm0GPPQh1qPXdn8UlPPd1X7owvbaud2y4+e StVIpGY6wrsO36f7UcA4Gm1EP/1E6Lm5KMXJyHgM9WBRkEfp92jTY5+XKv91vK8Z VKUd58QMdZE5NCNBkAR9U5N9aH0oSXnFU/g8hgiwGvrhS3IsSkKUePE6sVyMVTIO VptKRYe0AdmD/g25p6ApJsguV7ITlgoCPaE4rMmRcW9/bw8+iY098r7tO7w11H8M TyFOXihc3B+rlH8WdzOblwxHMC4yRuiPIktaA3WwbX7eA7Xv/ZRtdidifXKtgsVE rtubVqwGyYcHoX1Y+JiByIW1NN0pYncJhPEdc8KbRe2wKs3amA9rio1mUpBYYBPO B0ygcITftyXbhcTtssgcwBDGXB0AAGqI7wqdtJhFeIrKwHXD7fNeAGRwO8oKxmlj 0aPwo9fDtpI+e6BFTohEgjZBocRvXXNWLnDSFB0E7xDR31bACck2FG5FAp1DxdS7 lb/nbAsXf9UJLgGir4I1 =kN6V -----END PGP SIGNATURE----- Merge tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Avi Kivity: "Highlights of the changes for this release include support for vfio level triggered interrupts, improved big real mode support on older Intels, a streamlines guest page table walker, guest APIC speedups, PIO optimizations, better overcommit handling, and read-only memory." * tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits) KVM: s390: Fix vcpu_load handling in interrupt code KVM: x86: Fix guest debug across vcpu INIT reset KVM: Add resampling irqfds for level triggered interrupts KVM: optimize apic interrupt delivery KVM: MMU: Eliminate pointless temporary 'ac' KVM: MMU: Avoid access/dirty update loop if all is well KVM: MMU: Eliminate eperm temporary KVM: MMU: Optimize is_last_gpte() KVM: MMU: Simplify walk_addr_generic() loop KVM: MMU: Optimize pte permission checks KVM: MMU: Update accessed and dirty bits after guest pagetable walk KVM: MMU: Move gpte_access() out of paging_tmpl.h KVM: MMU: Optimize gpte_access() slightly KVM: MMU: Push clean gpte write protection out of gpte_access() KVM: clarify kvmclock documentation KVM: make processes waiting on vcpu mutex killable KVM: SVM: Make use of asm.h KVM: VMX: Make use of asm.h KVM: VMX: Make lto-friendly KVM: x86: lapic: Clean up find_highest_vector() and count_vectors() ... Conflicts: arch/s390/include/asm/processor.h arch/x86/kvm/i8259.c	2012-10-04 09:30:33 -07:00
Vince Weaver	e717bf4e4f	perf/x86: Add support for Intel Xeon-Phi Knights Corner PMU The following patch adds perf_event support for the Xeon-Phi PMU, as documented in the "Intel Xeon Phi Coprocessor (codename: Knights Corner) Performance Monitoring Units" manual. Even though it is a co-processor, a Phi runs a full Linux environment and can support performance counters. This is just barebones support, it does not add support for interesting new features such as the SPFLT intruction that allows starting/stopping events without entering the kernel. The PMU internally is just like that of an original Pentium, but a "P6-like" MSR interface is provided. The interface is different enough from a real P6 that it's not easy (or practical) to re-use the code in perf_event_p6.c Acked-by: Lawrence F Meadows <lawrence.f.meadows@intel.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Vince Weaver <vincent.weaver@maine.edu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: eranian@gmail.com Cc: Lawrence F <lawrence.f.meadows@intel.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1209261405320.8398@vincent-weaver-1.um.maine.edu Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-04 13:32:37 +02:00
Linus Torvalds	9b2e077c42	Prepared for main script -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIVAwUAUGsfSBOxKuMESys7AQIQug/+LyViiXFmCSlM+lCGkp64/BfUvy0QHqN4 K/dMvbZKOQbvmgps/xj8G+6diDzeO4hz8e1I3c/SEZ3M9TTz/Ppv1slfET9uUZ4X aLLHKqXihsxEOslw7mgp91KTd1Nr+e41f/5hr3j5Ap1HQB4yJa2mmj3reb48VfjD jmXo/dID66c2ExaVO7C8yyZXWgMGTfiy27qmEnMTxW7xQPt1oYsV2Bq0PCC/zEcq JgnwMatDVMy9en9wuEVMNelImE+XLm1T3XpLHL2WkV2JWSai98TcvGZnNKIxpFqu PueHWWCs5F5bZfn4bf6QOEstRTW76NL2qFNYrBPi0Zuq8Pm53ucnnzJUY8JFPPoR kXYmv8K73Jb10eHFuc3X4UyzvnhmJ7y3kG3jx7WoJVkW1KPgEFNmvMHkLyHgPZOU nT1tZiO0QHF4zi0JWMfK+7aeEY7EKfqRSce0F3Jw91vaIlEOIqgMgVJ1Y/nMhu3s 92mpg8JDoAcgCghok4m4Pc1qO06Fe8Iw5Qap5KMdPutp5Br2ebLL5NrwdAE8LNpR 7826r9RTMhyVRgNJ71JMFDY1IBeLeY0bxipN8dh6VYqMiKgClUeNwv7/tIgI4YS7 acQ+GdcsgTtg5qx3xwX5N2TSJVvdwnXdnWhAw7wN48tbzH8LvMV61Pq8Ytc7iK3M cAMgkbxdZRk= =VtEQ -----END PGP SIGNATURE----- Merge tag 'uapi-prep-20121002' of git://git.infradead.org/users/dhowells/linux-headers Pull preparatory patches for user API disintegration from David Howells: "The patches herein prepare for the extraction of the Userspace API bits from the various header files named in the Kbuild files. New subdirectories are created under either include/uapi/ or arch/x/include/uapi/ that correspond to the subdirectory containing that file under include/ or arch/x/include/. The new subdirs under the uapi/ directory are populated with Kbuild files that mostly do nothing at this time. Further patches will disintegrate the headers in each original directory and fill in the Kbuild files as they do it. These patches also: (1) fix up #inclusions of "foo.h" rather than <foo.h>. (2) Remove some redundant #includes from the DRM code. (3) Make the kernel build infrastructure handle Kbuild files both in the old places and the new UAPI place that both specify headers to be exported. (4) Fix some kernel tools that #include kernel headers during their build. I have compile tested this with allyesconfig against x86_64, allmodconfig against i386 and a scattering of additional defconfigs of other arches. Prepared for main script Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com> Acked-by: H. Peter Anvin <hpa@zytor.com>" * tag 'uapi-prep-20121002' of git://git.infradead.org/users/dhowells/linux-headers: UAPI: Plumb the UAPI Kbuilds into the user header installation and checking UAPI: x86: Differentiate the generated UAPI and internal headers UAPI: Remove the objhdr-y export list UAPI: Move linux/version.h UAPI: Set up uapi/asm/Kbuild.asm UAPI: x86: Fix insn_sanity build failure after UAPI split UAPI: x86: Fix the test_get_len tool UAPI: (Scripted) Set up UAPI Kbuild files UAPI: Partition the header include path sets and add uapi/ header directories UAPI: (Scripted) Convert #include "..." to #include <path/...> in kernel system headers UAPI: (Scripted) Convert #include "..." to #include <path/...> in drivers/gpu/ UAPI: (Scripted) Remove redundant DRM UAPI header #inclusions from drivers/gpu/. UAPI: Refer to the DRM UAPI headers with <...> and from certain headers only	2012-10-03 13:45:43 -07:00
Mark Brown	e7a570ff7d	asm-generic: Add default clkdev.h Ease the deployment of clkdev by providing a default asm/clkdev.h for use if the arch does not have an include/asm/clkdev.h. Due to limitations in Kbuild we manually add clkdev.h to all architectures that don't have one rather than having the header appear by default. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2012-10-03 21:33:53 +02:00
Linus Torvalds	56d92aa5cf	Features: * When hotplugging PCI devices in a PV guest we can allocate Xen-SWIOTLB later. * Cleanup Xen SWIOTLB. * Support pages out grants from HVM domains in the backends. * Support wild cards in xen-pciback.hide=(BDF) arguments. * Update grant status updates with upstream hypervisor. * Boot PV guests with more than 128GB. * Cleanup Xen MMU code/add comments. * Obtain XENVERS using a preferred method. * Lay out generic changes to support Xen ARM. * Allow privcmd ioctl for HVM (used to do only PV). * Do v2 of mmap_batch for privcmd ioctls. * If hypervisor saves the LED keyboard light - we will now instruct the kernel about its state. Fixes: * More fixes to Xen PCI backend for various calls/FLR/etc. * With more than 4GB in a 64-bit PV guest disable native SWIOTLB. * Fix up smatch warnings. * Fix up various return values in privmcmd and mm. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJQaY8qAAoJEFjIrFwIi8fJwPMH+gKngf4vSqrHjw+V2nsmeYaw zrhRQrm3xV4BNR7yQHs+InDst/AJRAr0GjuReDK4BqDEzUfcFKvzalspdMGGqf+W MUp+pMdN2S6649r/KMFfPCYcQvmIkFu8l76aClAqfA77SZRv1VL2Gn9eBxd82jS0 sWAUu5ichDSdfm/vAKXhdvhlKsK0hmihEbCM3+wRBoXEJX0kKbhEGn82smaLqkEt uxWDJBT4nyYqbm6KVXQJ/WYCaWEmEImGSDb9J1WeqftGEn1Q55mpknvElkpNPE1b Ifayqk50Kt43qnLk/AUrm8KFFlNKb73wTyAb0hVw7SQDcw1AcLa8ZdohLIZOl/4= =prMY -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.7-x86-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen update from Konrad Rzeszutek Wilk: "Features: - When hotplugging PCI devices in a PV guest we can allocate Xen-SWIOTLB later. - Cleanup Xen SWIOTLB. - Support pages out grants from HVM domains in the backends. - Support wild cards in xen-pciback.hide=(BDF) arguments. - Update grant status updates with upstream hypervisor. - Boot PV guests with more than 128GB. - Cleanup Xen MMU code/add comments. - Obtain XENVERS using a preferred method. - Lay out generic changes to support Xen ARM. - Allow privcmd ioctl for HVM (used to do only PV). - Do v2 of mmap_batch for privcmd ioctls. - If hypervisor saves the LED keyboard light - we will now instruct the kernel about its state. Fixes: - More fixes to Xen PCI backend for various calls/FLR/etc. - With more than 4GB in a 64-bit PV guest disable native SWIOTLB. - Fix up smatch warnings. - Fix up various return values in privmcmd and mm." * tag 'stable/for-linus-3.7-x86-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (48 commits) xen/pciback: Restore the PCI config space after an FLR. xen-pciback: properly clean up after calling pcistub_device_find() xen/vga: add the xen EFI video mode support xen/x86: retrieve keyboard shift status flags from hypervisor. xen/gndev: Xen backend support for paged out grant targets V4. xen-pciback: support wild cards in slot specifications xen/swiotlb: Fix compile warnings when using plain integer instead of NULL pointer. xen/swiotlb: Remove functions not needed anymore. xen/pcifront: Use Xen-SWIOTLB when initting if required. xen/swiotlb: For early initialization, return zero on success. xen/swiotlb: Use the swiotlb_late_init_with_tbl to init Xen-SWIOTLB late when PV PCI is used. xen/swiotlb: Move the error strings to its own function. xen/swiotlb: Move the nr_tbl determination in its own function. xen/arm: compile and run xenbus xen: resynchronise grant table status codes with upstream xen/privcmd: return -EFAULT on error xen/privcmd: Fix mmap batch ioctl error status copy back. xen/privcmd: add PRIVCMD_MMAPBATCH_V2 ioctl xen/mm: return more precise error from xen_remap_domain_range() xen/mmu: If the revector fails, don't attempt to revector anything else. ...	2012-10-02 22:09:10 -07:00
Linus Torvalds	16642a2e7b	Power management updates for 3.7-rc1 * Improved system suspend/resume and runtime PM handling for the SH TMU, CMT and MTU2 clock event devices (also used by ARM/shmobile). * Generic PM domains framework extensions related to cpuidle support and domain objects lookup using names. * ARM/shmobile power management updates including improved support for the SH7372's A4S power domain containing the CPU core. * cpufreq changes related to AMD CPUs support from Matthew Garrett, Andre Przywara and Borislav Petkov. * cpu0 cpufreq driver from Shawn Guo. * cpufreq governor fixes related to the relaxing of limit from Michal Pecio. * OMAP cpufreq updates from Axel Lin and Richard Zhao. * cpuidle ladder governor fixes related to the disabling of states from Carsten Emde and me. * Runtime PM core updates related to the interactions with the system suspend core from Alan Stern and Kevin Hilman. * Wakeup sources modification allowing more helper functions to be called from interrupt context from John Stultz and additional diagnostic code from Todd Poynor. * System suspend error code path fix from Feng Hong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQIcBAABAgAGBQJQa1rRAAoJEKhOf7ml8uNsYZ0P/2RZ71sgLWcUCfr0yHaiZeOd 2GxEYSZ+9BZJHADgoAK/bHRTv8crm40Y2RkbaWbxPDRNuE4SutbvNTGTlJSAguSD yHkU/6AFC7u8Jwq+afsWIdGX7eHd78zPpj6EVtVtjHM903WDwbMU2vUz7tQ+fFa+ ZZ7eydq9j0ec0OoH3UeNhet7JSOpT5BSLgjmIkHMBgIvTxNVDbkB31QUxnUxocxn k6S2wQaUSJJWGMLksRRNrhwLq+cGYwTsaOtG/KzRLH1raUyn33B5pcZr0aqhOkjg ClaCks3V8o3vRghSwOPB5aVXzjBKvM3UnSyJNIl+FeCeyWuwSNbkEFdA/e7oPuxG UsW6dcHiuVo6Ir4+zhd9+lN+/AcPTChO5b7lbU8qRF4ce04czWlUY/KzJjaM+YOE CKGq6eX9AHwFjE+h4+VcCXgmzcioiS8Y/CPz13u8N1y0zzwW+ftjb12K+7lVBEG1 fhrePKHgLw3kJ9LqGpR+4vVur7C+rCf6WwCReTY2vXXVYJ+SuKWTRI4zAjTPXtHa i9dpMRASpF+ScRYBcgwIpv789WuHATFKqdBSinZUKBaxQZ5flJ2qIrfqN5VeAejh oQs/zZCdIuAtFKqVycQ0L42YxFNKgPFKQErUCSu3M5OuZLlLVLu7yQvIo2Xmo9qf Hcrpvo5K+w29YkiwGP9e =rbCk -----END PGP SIGNATURE----- Merge tag 'pm-for-3.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael J Wysocki: - Improved system suspend/resume and runtime PM handling for the SH TMU, CMT and MTU2 clock event devices (also used by ARM/shmobile). - Generic PM domains framework extensions related to cpuidle support and domain objects lookup using names. - ARM/shmobile power management updates including improved support for the SH7372's A4S power domain containing the CPU core. - cpufreq changes related to AMD CPUs support from Matthew Garrett, Andre Przywara and Borislav Petkov. - cpu0 cpufreq driver from Shawn Guo. - cpufreq governor fixes related to the relaxing of limit from Michal Pecio. - OMAP cpufreq updates from Axel Lin and Richard Zhao. - cpuidle ladder governor fixes related to the disabling of states from Carsten Emde and me. - Runtime PM core updates related to the interactions with the system suspend core from Alan Stern and Kevin Hilman. - Wakeup sources modification allowing more helper functions to be called from interrupt context from John Stultz and additional diagnostic code from Todd Poynor. - System suspend error code path fix from Feng Hong. Fixed up conflicts in cpufreq/powernow-k8 that stemmed from the workqueue fixes conflicting fairly badly with the removal of support for hardware P-state chips. The changes were independent but somewhat intertwined. * tag 'pm-for-3.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits) Revert "PM QoS: Use spinlock in the per-device PM QoS constraints code" PM / Runtime: let rpm_resume() succeed if RPM_ACTIVE, even when disabled, v2 cpuidle: rename function name "__cpuidle_register_driver", v2 cpufreq: OMAP: Check IS_ERR() instead of NULL for omap_device_get_by_hwmod_name cpuidle: remove some empty lines PM: Prevent runtime suspend during system resume PM QoS: Use spinlock in the per-device PM QoS constraints code PM / Sleep: use resume event when call dpm_resume_early cpuidle / ACPI : move cpuidle_device field out of the acpi_processor_power structure ACPI / processor: remove pointless variable initialization ACPI / processor: remove unused function parameter cpufreq: OMAP: remove loops_per_jiffy recalculate for smp sections: fix section conflicts in drivers/cpufreq cpufreq: conservative: update frequency when limits are relaxed cpufreq / ondemand: update frequency when limits are relaxed properly __init-annotate pm_sysrq_init() cpufreq: Add a generic cpufreq-cpu0 driver PM / OPP: Initialize OPP table from device tree ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp cpufreq: Remove support for hardware P-state chips from powernow-k8 ...	2012-10-02 18:32:35 -07:00
David Howells	10b63956fc	UAPI: Plumb the UAPI Kbuilds into the user header installation and checking Plumb the UAPI Kbuilds into the user header installation and checking system. As the headers are split the entries will be transferred across from the old Kbuild files to the UAPI Kbuild files. The changes made in this commit are: (1) Exported generated files (of which there are currently four) are moved to uapi/ directories under the appropriate generated/ directory, thus we get: include/generated/uapi/linux/version.h arch/x86/include/generated/uapi/asm/unistd_32.h arch/x86/include/generated/uapi/asm/unistd_64.h arch/x86/include/generated/uapi/asm/unistd_x32.h These paths were added to the build as -I flags in a previous patch. (2) scripts/Makefile.headersinst is now given the UAPI path to install from rather than the old path. It then determines the old path from that and includes that Kbuild also if it exists, thus permitting the headers to exist in either directory during the changeover. I also renamed the "install" variable to "installdir" as it refers to a directory not the install program. (3) scripts/headers_install.pl is altered to take a list of source file paths instead of just their names so that the makefile can tell it exactly where to find each file. For the moment, files can be obtained from one of four places for each output directory: .../include/uapi/foo/ .../include/generated/uapi/foo/ .../include/foo/ .../include/generated/foo/ The non-UAPI paths will be dropped later. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>	2012-10-02 18:01:57 +01:00
David Howells	4413e16d9d	UAPI: (Scripted) Set up UAPI Kbuild files Set up empty UAPI Kbuild files to be populated by the header splitter. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>	2012-10-02 18:01:35 +01:00
David Howells	abbf1590de	UAPI: Partition the header include path sets and add uapi/ header directories Partition the header include path flags into two sets, one for kernelspace builds and one for userspace builds. Add the following directories to build after the ordinary include directories so that #include will pick up the UAPI header directly if the kernel header has been moved there. The userspace set (represented by the USERINCLUDE make variable) contains: -I $(srctree)/arch/$(hdr-arch)/include/uapi -I arch/$(hdr-arch)/include/generated/uapi -I $(srctree)/include/uapi -I include/generated/uapi -include $(srctree)/include/linux/kconfig.h and the kernelspace set (represented by the LINUXINCLUDE make variable) contains: -I $(srctree)/arch/$(hdr-arch)/include -I arch/$(hdr-arch)/include/generated -I $(srctree)/include -I include --- if not building in the source tree plus everything in the USERINCLUDE set. Then use USERINCLUDE in building the x86 boot code. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>	2012-10-02 18:01:26 +01:00
David Howells	a1ce39288e	UAPI: (Scripted) Convert #include "..." to #include <path/...> in kernel system headers Convert #include "..." to #include <path/...> in kernel system headers. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>	2012-10-02 18:01:25 +01:00
Linus Torvalds	15385dfe7e	Merge branch 'x86-smap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/smap support from Ingo Molnar: "This adds support for the SMAP (Supervisor Mode Access Prevention) CPU feature on Intel CPUs: a hardware feature that prevents unintended user-space data access from kernel privileged code. It's turned on automatically when possible. This, in combination with SMEP, makes it even harder to exploit kernel bugs such as NULL pointer dereferences." Fix up trivial conflict in arch/x86/kernel/entry_64.S due to newly added includes right next to each other. * 'x86-smap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, smep, smap: Make the switching functions one-way x86, suspend: On wakeup always initialize cr4 and EFER x86-32: Start out eflags and cr4 clean x86, smap: Do not abuse the [f][x]rstor_checking() functions for user space x86-32, smap: Add STAC/CLAC instructions to 32-bit kernel entry x86, smap: Reduce the SMAP overhead for signal handling x86, smap: A page fault due to SMAP is an oops x86, smap: Turn on Supervisor Mode Access Prevention x86, smap: Add STAC and CLAC instructions to control user space access x86, uaccess: Merge prototypes for clear_user/__clear_user x86, smap: Add a header file with macros for STAC/CLAC x86, alternative: Add header guards to <asm/alternative-asm.h> x86, alternative: Use .pushsection/.popsection x86, smap: Add CR4 bit for SMAP x86-32, mm: The WP test should be done on a kernel page	2012-10-01 13:59:17 -07:00
Linus Torvalds	b3eda8d05c	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/microcode changes from Ingo Molnar: "The biggest changes are to AMD microcode patching: add code for caching all microcode patches which belong to the current family on which we're running, in the kernel. We look up the patch needed for each core from the cache at patch-application time instead of holding a single patch per-system" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode, AMD: Fix use after free in free_cache() x86, microcode, AMD: Rewrite patch application procedure x86, microcode, AMD: Add a small, per-family patches cache x86, microcode, AMD: Add reverse equiv table search x86, microcode: Add a refresh firmware flag to ->request_microcode_fw x86, microcode, AMD: Read CPUID(1).EAX on the correct cpu x86, microcode, AMD: Check before applying a patch x86, microcode, AMD: Remove useless get_ucode_data wrapper x86, microcode: Straighten out Kconfig text x86, microcode: Cleanup cpu hotplug notifier callback x86, microcode: Drop uci->mc check on resume path x86, microcode: Save an indentation level in reload_for_cpu	2012-10-01 11:15:17 -07:00
Linus Torvalds	a5fa7b7d8f	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/platform changes from Ingo Molnar: "This cleans up some Xen-induced pagetable init code uglies, by generalizing new platform callbacks and state: x86_init.paging." 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Document x86_init.paging.pagetable_init() x86: xen: Cleanup and remove x86_init.paging.pagetable_setup_done() x86: Move paging_init() call to x86_init.paging.pagetable_init() x86: Rename pagetable_setup_start() to pagetable_init() x86: Remove base argument from x86_init.paging.pagetable_setup_start	2012-10-01 11:14:17 -07:00
Linus Torvalds	2299930012	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/mm changes from Ingo Molnar: "The biggest change is new TLB partial flushing code for AMD CPUs. (The v3.6 kernel had the Intel CPU side code, see commits e0ba94f14f74..effee4b9b3b.) There's also various other refinements around the TLB flush code" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Distinguish TLB shootdown interrupts from other functions call interrupts x86/mm: Fix range check in tlbflush debugfs interface x86, cpu: Preset default tlb_flushall_shift on AMD x86, cpu: Add AMD TLB size detection x86, cpu: Push TLB detection CPUID check down x86, cpu: Fixup tlb_flushall_shift formatting	2012-10-01 11:13:33 -07:00
Linus Torvalds	7687b80a4f	Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/MCE update from Ingo Molnar: "Various MCE robustness enhancements. One of the changes adds CMCI (Corrected Machine Check Interrupt) poll mode on Intel Nehalem+ CPUs, which mode is automatically entered when the rate of messages is too high - and exited once the storm is over. An MCE events storm will roughly look like this: [ 5342.740616] mce: [Hardware Error]: Machine check events logged [ 5342.746501] mce: [Hardware Error]: Machine check events logged [ 5342.757971] CMCI storm detected: switching to poll mode [ 5372.674957] CMCI storm subsided: switching to interrupt mode This should make such events more survivable" * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Provide boot argument to honour bios-set CMCI threshold x86, MCE: Remove unused defines x86, mce: Enable MCA support by default x86/mce: Add CMCI poll mode x86/mce: Make cmci_discover() quiet x86: mce: Remove the frozen cases in the hotplug code x86: mce: Split timer init x86: mce: Serialize mce injection x86: mce: Disable preemption when calling raise_local()	2012-10-01 11:12:13 -07:00
Linus Torvalds	ac07f5c3cb	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/fpu update from Ingo Molnar: "The biggest change is the addition of the non-lazy (eager) FPU saving support model and enabling it on CPUs with optimized xsaveopt/xrstor FPU state saving instructions. There are also various Sparse fixes" Fix up trivial add-add conflict in arch/x86/kernel/traps.c * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, kvm: fix kvm's usage of kernel_fpu_begin/end() x86, fpu: remove cpu_has_xmm check in the fx_finit() x86, fpu: make eagerfpu= boot param tri-state x86, fpu: enable eagerfpu by default for xsaveopt x86, fpu: decouple non-lazy/eager fpu restore from xsave x86, fpu: use non-lazy fpu restore for processors supporting xsave lguest, x86: handle guest TS bit for lazy/non-lazy fpu host models x86, fpu: always use kernel_fpu_begin/end() for in-kernel FPU usage x86, kvm: use kernel_fpu_begin/end() in kvm_load/put_guest_fpu() x86, fpu: remove unnecessary user_fpu_end() in save_xstate_sig() x86, fpu: drop_fpu() before restoring new state from sigframe x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels x86, fpu: Consolidate inline asm routines for saving/restoring fpu state x86, signal: Cleanup ifdefs and is_ia32, is_x32	2012-10-01 11:10:52 -07:00
Linus Torvalds	58ae9c0d54	Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 debug update from Ingo Molnar: "Various small enhancements" * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/debug: Dump family, model, stepping of the boot CPU x86/iommu: Use NULL instead of plain 0 for __IOMMU_INIT x86/iommu: Drop duplicate const in __IOMMU_INIT x86/fpu/xsave: Keep __user annotation in casts x86/pci/probe_roms: Add missing __iomem annotation to pci_map_biosrom() x86/signals: ia32_signal.c: add __user casts to fix sparse warnings x86/vdso: Add __user annotation to VDSO32_SYMBOL x86: Fix __user annotations in asm/sys_ia32.h	2012-10-01 11:07:31 -07:00
Linus Torvalds	4a553e1450	Merge branches 'x86-cpu-for-linus' and 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/cpu and x86/cpufeature from Ingo Molnar: "One tiny cleanup, and prepare for SMAP (Supervisor Mode Access Prevention) support on x86" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Remove the useless branch in c_start() * 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpufeature: Add feature bit for SMAP	2012-10-01 11:06:35 -07:00
Linus Torvalds	da8347969f	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/asm changes from Ingo Molnar: "The one change that stands out is the alternatives patching change that prevents us from ever patching back instructions from SMP to UP: this simplifies things and speeds up CPU hotplug. Other than that it's smaller fixes, cleanups and improvements." * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Unspaghettize do_trap() x86_64: Work around old GAS bug x86: Use REP BSF unconditionally x86: Prefer TZCNT over BFS x86/64: Adjust types of temporaries used by ffs()/fls()/fls64() x86: Drop unnecessary kernel_eflags variable on 64-bit x86/smp: Don't ever patch back to UP if we unplug cpus	2012-10-01 10:46:27 -07:00
Linus Torvalds	7e92daaefa	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf update from Ingo Molnar: "Lots of changes in this cycle as well, with hundreds of commits from over 30 contributors. Most of the activity was on the tooling side. Higher level changes: - New 'perf kvm' analysis tool, from Xiao Guangrong. - New 'perf trace' system-wide tracing tool - uprobes fixes + cleanups from Oleg Nesterov. - Lots of patches to make perf build on Android out of box, from Irina Tirdea - Extend ftrace function tracing utility to be more dynamic for its users. It allows for data passing to the callback functions, as well as reading regs as if a breakpoint were to trigger at function entry. The main goal of this patch series was to allow kprobes to use ftrace as an optimized probe point when a probe is placed on an ftrace nop. With lots of help from Masami Hiramatsu, and going through lots of iterations, we finally came up with a good solution. - Add cpumask for uncore pmu, use it in 'stat', from Yan, Zheng. - Various tracing updates from Steve Rostedt - Clean up and improve 'perf sched' performance by elliminating lots of needless calls to libtraceevent. - Event group parsing support, from Jiri Olsa - UI/gtk refactorings and improvements from Namhyung Kim - Add support for non-tracepoint events in perf script python, from Feng Tang - Add --symbols to 'script', similar to the one in 'report', from Feng Tang. Infrastructure enhancements and fixes: - Convert the trace builtins to use the growing evsel/evlist tracepoint infrastructure, removing several open coded constructs like switch like series of strcmp to dispatch events, etc. Basically what had already been showcased in 'perf sched'. - Add evsel constructor for tracepoints, that uses libtraceevent just to parse the /format events file, use it in a new 'perf test' to make sure the libtraceevent format parsing regressions can be more readily caught. - Some strange errors were happening in some builds, but not on the next, reported by several people, problem was some parser related files, generated during the build, didn't had proper make deps, fix from Eric Sandeen. - Introduce struct and cache information about the environment where a perf.data file was captured, from Namhyung Kim. - Fix handling of unresolved samples when --symbols is used in 'report', from Feng Tang. - Add union member access support to 'probe', from Hyeoncheol Lee. - Fixups to die() removal, from Namhyung Kim. - Render fixes for the TUI, from Namhyung Kim. - Don't enable annotation in non symbolic view, from Namhyung Kim. - Fix pipe mode in 'report', from Namhyung Kim. - Move related stats code from stat to util/, will be used by the 'stat' kvm tool, from Xiao Guangrong. - Remove die()/exit() calls from several tools. - Resolve vdso callchains, from Jiri Olsa - Don't pass const char pointers to basename, so that we can unconditionally use libgen.h and thus avoid ifdef BIONIC lines, from David Ahern - Refactor hist formatting so that it can be reused with the GTK browser, From Namhyung Kim - Fix build for another rbtree.c change, from Adrian Hunter. - Make 'perf diff' command work with evsel hists, from Jiri Olsa. - Use the only field_sep var that is set up: symbol_conf.field_sep, fix from Jiri Olsa. - .gitignore compiled python binaries, from Namhyung Kim. - Get rid of die() in more libtraceevent places, from Namhyung Kim. - Rename libtraceevent 'private' struct member to 'priv' so that it works in C++, from Steven Rostedt - Remove lots of exit()/die() calls from tools so that the main perf exit routine can take place, from David Ahern - Fix x86 build on x86-64, from David Ahern. - {int,str,rb}list fixes from Suzuki K Poulose - perf.data header fixes from Namhyung Kim - Allow user to indicate objdump path, needed in cross environments, from Maciek Borzecki - Fix hardware cache event name generation, fix from Jiri Olsa - Add round trip test for sw, hw and cache event names, catching the problem Jiri fixed, after Jiri's patch, the test passes successfully. - Clean target should do clean for lib/traceevent too, fix from David Ahern - Check the right variable for allocation failure, fix from Namhyung Kim - Set up evsel->tp_format regardless of evsel->name being set already, fix from Namhyung Kim - Oprofile fixes from Robert Richter. - Remove perf_event_attr needless version inflation, from Jiri Olsa - Introduce libtraceevent strerror like error reporting facility, from Namhyung Kim - Add pmu mappings to perf.data header and use event names from cmd line, from Robert Richter - Fix include order for bison/flex-generated C files, from Ben Hutchings - Build fixes and documentation corrections from David Ahern - Assorted cleanups from Robert Richter - Let O= makes handle relative paths, from Steven Rostedt - perf script python fixes, from Feng Tang. - Initial bash completion support, from Frederic Weisbecker - Allow building without libelf, from Namhyung Kim. - Support DWARF CFI based unwind to have callchains when %bp based unwinding is not possible, from Jiri Olsa. - Symbol resolution fixes, while fixing support PPC64 files with an .opt ELF section was the end goal, several fixes for code that handles all architectures and cleanups are included, from Cody Schafer. - Assorted fixes for Documentation and build in 32 bit, from Robert Richter - Cache the libtraceevent event_format associated to each evsel early, so that we avoid relookups, i.e. calling pevent_find_event repeatedly when processing tracepoint events. [ This is to reduce the surface contact with libtraceevents and make clear what is that the perf tools needs from that lib: so far parsing the common and per event fields. ] - Don't stop the build if the audit libraries are not installed, fix from Namhyung Kim. - Fix bfd.h/libbfd detection with recent binutils, from Markus Trippelsdorf. - Improve warning message when libunwind devel packages not present, from Jiri Olsa" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (282 commits) perf trace: Add aliases for some syscalls perf probe: Print an enum type variable in "enum variable-name" format when showing accessible variables perf tools: Check libaudit availability for perf-trace builtin perf hists: Add missing period_* fields when collapsing a hist entry perf trace: New tool perf evsel: Export the event_format constructor perf evsel: Introduce rawptr() method perf tools: Use perf_evsel__newtp in the event parser perf evsel: The tracepoint constructor should store sys:name perf evlist: Introduce set_filter() method perf evlist: Renane set_filters method to apply_filters perf test: Add test to check we correctly parse and match syscall open parms perf evsel: Handle endianity in intval method perf evsel: Know if byte swap is needed perf tools: Allow handling a NULL cpu_map as meaning "all cpus" perf evsel: Improve tracepoint constructor setup tools lib traceevent: Fix error path on pevent_parse_event perf test: Fix build failure trace: Move trace event enable from fs_initcall to core_initcall tracing: Add an option for disabling markers ...	2012-10-01 10:28:49 -07:00
Linus Torvalds	620e77533f	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU changes from Ingo Molnar: 0. 'idle RCU': Adds RCU APIs that allow non-idle tasks to enter RCU idle mode and provides x86 code to make use of them, allowing RCU to treat user-mode execution as an extended quiescent state when the new RCU_USER_QS kernel configuration parameter is specified. (Work is in progress to port this to a few other architectures, but is not part of this series.) 1. A fix for a latent bug that has been in RCU ever since the addition of CPU stall warnings. This bug results in false-positive stall warnings, but thus far only on embedded systems with severely cut-down userspace configurations. 2. Further reductions in latency spikes for huge systems, along with additional boot-time adaptation to the actual hardware. This is a large change, as it moves RCU grace-period initialization and cleanup, along with quiescent-state forcing, from softirq to a kthread. However, it appears to be in quite good shape (famous last words). 3. Updates to documentation and rcutorture, the latter category including keeping statistics on CPU-hotplug latencies and fixing some initialization-time races. 4. CPU-hotplug fixes and improvements. 5. Idle-loop fixes that were omitted on an earlier submission. 6. Miscellaneous fixes and improvements In certain RCU configurations new kernel threads will show up (rcu_bh, rcu_sched), showing RCU processing overhead. * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (90 commits) rcu: Apply micro-optimization and int/bool fixes to RCU's idle handling rcu: Userspace RCU extended QS selftest x86: Exit RCU extended QS on notify resume x86: Use the new schedule_user API on userspace preemption rcu: Exit RCU extended QS on user preemption rcu: Exit RCU extended QS on kernel preemption after irq/exception x86: Exception hooks for userspace RCU extended QS x86: Unspaghettize do_general_protection() x86: Syscall hooks for userspace RCU extended QS rcu: Switch task's syscall hooks on context switch rcu: Ignore userspace extended quiescent state by default rcu: Allow rcu_user_enter()/exit() to nest rcu: Settle config for userspace extended quiescent state rcu: Make RCU_FAST_NO_HZ handle adaptive ticks rcu: New rcu_user_enter_after_irq() and rcu_user_exit_after_irq() APIs rcu: New rcu_user_enter() and rcu_user_exit() APIs ia64: Add missing RCU idle APIs on idle loop xtensa: Add missing RCU idle APIs on idle loop score: Add missing RCU idle APIs on idle loop parisc: Add missing RCU idle APIs on idle loop ...	2012-10-01 10:16:42 -07:00
Linus Torvalds	99dbb1632f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull the trivial tree from Jiri Kosina: "Tiny usual fixes all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits) doc: fix old config name of kprobetrace fs/fs-writeback.c: cleanup riteback_sb_inodes kerneldoc btrfs: fix the commment for the action flags in delayed-ref.h btrfs: fix trivial typo for the comment of BTRFS_FREE_INO_OBJECTID vfs: fix kerneldoc for generic_fh_to_parent() treewide: fix comment/printk/variable typos ipr: fix small coding style issues doc: fix broken utf8 encoding nfs: comment fix platform/x86: fix asus_laptop.wled_type module parameter mfd: printk/comment fixes doc: getdelays.c: remember to close() socket on error in create_nl_socket() doc: aliasing-test: close fd on write error mmc: fix comment typos dma: fix comments spi: fix comment/printk typos in spi Coccinelle: fix typo in memdup_user.cocci tmiofb: missing NULL pointer checks tools: perf: Fix typo in tools/perf tools/testing: fix comment / output typos ...	2012-10-01 09:06:36 -07:00
Al Viro	6783eaa2e1	x86, um/x86: switch to generic sys_execve and kernel_execve 32bit wrapper is lost on that; 64bit one is not, since we need to arrange for full pt_regs on stack when we call sys_execve() and we need to load callee-saved ones from there afterwards. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-30 22:53:32 -04:00
Al Viro	7076aada10	x86: split ret_from_fork Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-30 22:53:31 -04:00
Tomoki Sekiyama	fd0f586972	x86: Distinguish TLB shootdown interrupts from other functions call interrupts As TLB shootdown requests to other CPU cores are now using function call interrupts, TLB shootdowns entry in /proc/interrupts is always shown as 0. This behavior change was introduced by commit `52aec3308d` ("x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR"). This patch reverts TLB shootdowns entry in /proc/interrupts to count TLB shootdowns separately from the other function call interrupts. Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com> Link: http://lkml.kernel.org/r/20120926021128.22212.20440.stgit@hpxw Acked-by: Alex Shi <alex.shi@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-09-27 22:52:34 -07:00
Naveen N. Rao	450cc20103	x86/mce: Provide boot argument to honour bios-set CMCI threshold The ACPI spec doesn't provide for a way for the bios to pass down recommended thresholds to the OS on a _per-bank_ basis. This patch adds a new boot option, which if passed, tells Linux to use CMCI thresholds set by the bios. As fail-safe, we initialize threshold to 1 if some banks have not been initialized by the bios and warn the user. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2012-09-27 10:08:00 -07:00
Konrad Rzeszutek Wilk	ae1659ee6b	Merge branch 'xenarm-for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm into stable/for-linus-3.7 * 'xenarm-for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm: arm: introduce a DTS for Xen unprivileged virtual machines MAINTAINERS: add myself as Xen ARM maintainer xen/arm: compile netback xen/arm: compile blkfront and blkback xen/arm: implement alloc/free_xenballooned_pages with alloc_pages/kfree xen/arm: receive Xen events on ARM xen/arm: initialize grant_table on ARM xen/arm: get privilege status xen/arm: introduce CONFIG_XEN on ARM xen: do not compile manage, balloon, pci, acpi, pcpu and cpu_hotplug on ARM xen/arm: Introduce xen_ulong_t for unsigned long xen/arm: Xen detection and shared_info page mapping docs: Xen ARM DT bindings xen/arm: empty implementation of grant_table arch specific functions xen/arm: sync_bitops xen/arm: page.h definitions xen/arm: hypercalls arm: initial Xen support Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-09-26 16:43:35 -04:00
Frederic Weisbecker	0430499ce9	x86: Use the new schedule_user API on userspace preemption This way we can exit the RCU extended quiescent state before we schedule a new task from irq/exception exit. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-09-26 15:47:12 +02:00
Frederic Weisbecker	6ba3c97a38	x86: Exception hooks for userspace RCU extended QS Add necessary hooks to x86 exception for userspace RCU extended quiescent state support. This includes traps, page fault, debug exceptions, etc... Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-09-26 15:47:07 +02:00
Frederic Weisbecker	bf5a3c13b9	x86: Syscall hooks for userspace RCU extended QS Add syscall slow path hooks to notify syscall entry and exit on CPUs that want to support userspace RCU extended quiescent state. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-09-26 15:47:04 +02:00
Tao Guo	1b2b23d857	x86_64: Work around old GAS bug GAS in binutils(2.16.91) could not parse parentheses within macro parameters unless fully parenthesized, and this is a workaround to make old gas work without generating below errors: arch/x86/kernel/entry_64.S: Assembler messages: arch/x86/kernel/entry_64.S:387: Error: too many positional arguments arch/x86/kernel/entry_64.S:389: Error: too many positional arguments [...] Signed-off-by: Tao Guo <glorioustao@gmail.com> Reluctantly-Acked-by: Jan Beulich <jbeulich@novell.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1348648102-12653-1-git-send-email-glorioustao@gmail.com [ Jan argues that these old GAS versions are fragile - which is so, but lets give them a chance. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-09-26 13:35:32 +02:00
H. Peter Anvin	e139e95590	x86, smap: Do not abuse the [f][x]rstor_checking() functions for user space With SMAP, the [f][x]rstor_checking() functions are no longer usable for user-space pointers by applying a simple __force cast. Instead, create new [f][x]rstor_user() functions which do the proper SMAP magic. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1343171129-2747-3-git-send-email-suresh.b.siddha@intel.com	2012-09-25 15:42:18 -07:00
John Stultz	650ea02475	time: Convert x86_64 to using new update_vsyscall Switch x86_64 to using sub-ns precise vsyscall Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>	2012-09-24 12:38:09 -04:00
Jan Kiszka	c863901075	KVM: x86: Fix guest debug across vcpu INIT reset If we reset a vcpu on INIT, we so far overwrote dr7 as provided by KVM_SET_GUEST_DEBUG, and we also cleared switch_db_regs unconditionally. Fix this by saving the dr7 used for guest debugging and calculating the effective register value as well as switch_db_regs on any potential change. This will change to focus of the set_guest_debug vendor op to update_dp_bp_intercept. Found while trying to stop on start_secondary. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-09-23 15:00:07 +02:00
Konrad Rzeszutek Wilk	a5f9515570	Merge branch 'stable/late-swiotlb.v3.3' into stable/for-linus-3.7 * stable/late-swiotlb.v3.3: xen/swiotlb: Fix compile warnings when using plain integer instead of NULL pointer. xen/swiotlb: Remove functions not needed anymore. xen/pcifront: Use Xen-SWIOTLB when initting if required. xen/swiotlb: For early initialization, return zero on success. xen/swiotlb: Use the swiotlb_late_init_with_tbl to init Xen-SWIOTLB late when PV PCI is used. xen/swiotlb: Move the error strings to its own function. xen/swiotlb: Move the nr_tbl determination in its own function. swiotlb: add the late swiotlb initialization function with iotlb memory xen/swiotlb: With more than 4GB on 64-bit, disable the native SWIOTLB. xen/swiotlb: Simplify the logic. Conflicts: arch/x86/xen/pci-swiotlb-xen.c Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-09-22 20:01:24 -04:00
H. Peter Anvin	49b8c695e3	Merge branch 'x86/fpu' into x86/smap Reason for merge: x86/fpu changed the structure of some of the code that x86/smap changes; mostly fpu-internal.h but also minor changes to the signal code. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Resolved Conflicts: arch/x86/ia32/ia32_signal.c arch/x86/include/asm/fpu-internal.h arch/x86/kernel/signal.c	2012-09-21 17:18:44 -07:00
Suresh Siddha	b1a74bf821	x86, kvm: fix kvm's usage of kernel_fpu_begin/end() Preemption is disabled between kernel_fpu_begin/end() and as such it is not a good idea to use these routines in kvm_load/put_guest_fpu() which can be very far apart. kvm_load/put_guest_fpu() routines are already called with preemption disabled and KVM already uses the preempt notifier to save the guest fpu state using kvm_put_guest_fpu(). So introduce __kernel_fpu_begin/end() routines which don't touch preemption and use them instead of kernel_fpu_begin/end() for KVM's use model of saving/restoring guest FPU state. Also with this change (and with eagerFPU model), fix the host cr0.TS vm-exit state in the case of VMX. For eagerFPU case, host cr0.TS is always clear. So no need to worry about it. For the traditional lazyFPU restore case, change the cr0.TS bit for the host state during vm-exit to be always clear and cr0.TS bit is set in the __vmx_load_host_state() when the FPU (guest FPU or the host task's FPU) state is not active. This ensures that the host/guest FPU state is properly saved, restored during context-switch and with interrupts (using irq_fpu_usable()) not stomping on the active FPU state. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1348164109.26695.338.camel@sbsiddha-desk.sc.intel.com Cc: Avi Kivity <avi@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-09-21 16:59:04 -07:00
H. Peter Anvin	5e88353d8b	x86, smap: Reduce the SMAP overhead for signal handling Signal handling contains a bunch of accesses to individual user space items, which causes an excessive number of STAC and CLAC instructions. Instead, let get/put_user_try ... get/put_user_catch() contain the STAC and CLAC instructions. This means that get/put_user_try no longer nests, and furthermore that it is no longer legal to use user space access functions other than __get/put_user_ex() inside those blocks. However, these macros are x86-specific anyway and are only used in the signal-handling paths; a simple reordering of moving the larger subroutine calls out of the try...catch blocks resolves that problem. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1348256595-29119-12-git-send-email-hpa@linux.intel.com	2012-09-21 12:45:27 -07:00
H. Peter Anvin	63bcff2a30	x86, smap: Add STAC and CLAC instructions to control user space access When Supervisor Mode Access Prevention (SMAP) is enabled, access to userspace from the kernel is controlled by the AC flag. To make the performance of manipulating that flag acceptable, there are two new instructions, STAC and CLAC, to set and clear it. This patch adds those instructions, via alternative(), when the SMAP feature is enabled. It also adds X86_EFLAGS_AC unconditionally to the SYSCALL entry mask; there is simply no reason to make that one conditional. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1348256595-29119-9-git-send-email-hpa@linux.intel.com	2012-09-21 12:45:27 -07:00
H. Peter Anvin	a052858fab	x86, uaccess: Merge prototypes for clear_user/__clear_user The prototypes for clear_user() and __clear_user() are identical in the 32- and 64-bit headers. No functionality change. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1348256595-29119-8-git-send-email-hpa@linux.intel.com	2012-09-21 12:45:26 -07:00
H. Peter Anvin	51ae4a2d77	x86, smap: Add a header file with macros for STAC/CLAC The STAC/CLAC instructions are only available with SMAP, but on the other hand they aren't needed if SMAP is not available, or before we start to run userspace, so construct them as alternatives which start out as noops and are enabled by the alternatives mechanism. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1348256595-29119-7-git-send-email-hpa@linux.intel.com	2012-09-21 12:45:26 -07:00
H. Peter Anvin	76f30759f6	x86, alternative: Add header guards to <asm/alternative-asm.h> Add header guards to protect <asm/alternative-asm.h> against multiple inclusion. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1348256595-29119-6-git-send-email-hpa@linux.intel.com	2012-09-21 12:45:26 -07:00
H. Peter Anvin	9cebed423c	x86, alternative: Use .pushsection/.popsection .section/.previous doesn't nest. Use .pushsection/.popsection in <asm/alternative.h> so that they can be properly nested. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1348256595-29119-5-git-send-email-hpa@linux.intel.com	2012-09-21 12:45:25 -07:00
H. Peter Anvin	85fdf05cc3	x86, smap: Add CR4 bit for SMAP Add X86_CR4_SMAP to <asm/processor-flags.h>. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1348256595-29119-4-git-send-email-hpa@linux.intel.com	2012-09-21 12:45:25 -07:00
Linus Torvalds	8ca7de9164	Bug-fixes: * Fix M2P batching re-using the incorrect structure field. * Disable BIOS SMP MP table search. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJQXGfdAAoJEFjIrFwIi8fJbWcH/0FI2d/VyB+ZU0ng3R0Oa7mt iR/x+Z+mfFdp2dXS6gs6DgJIZVA7i2K9pX4rOXjpDGGGyUeo1xoqjlQfsFWQGjZ/ p49RrDrM93c2GdRXk3iMSWfboQI7BXBs5rnyYZQL7kMxUSR75MxbeONvhPrMSO9I 3EBidWH08qjrn2HVF44F6xh5ONjpclo5AvGIzJ0eU4X0D0eqMnhvlAw8/UYJU2HV heRvuxWF9l2jNpLhKhZy1730D1X/vKA5qKAcBW8rCOpEijyPpmtKbqapeUJg/9pH NVquuwGutP5ozrSi7a/23+L+ezvQBmCPm5ZRG44PccBoZ/HVs8haT8UypSWSDzo= =TwvM -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen bug-fixes from Konrad Rzeszutek Wilk: - Fix M2P batching re-using the incorrect structure field. In v3.5 we added batching for M2P override (Machine Frame Number -> Physical Frame Number), but the original MFN was saved in an incorrect structure - and we would oops/restore when restoring with the old MFN. - Disable BIOS SMP MP table search. A bootup issue that we had ignored until we found that on DL380 G6 it was needed. * tag 'stable/for-linus-3.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/boot: Disable BIOS SMP MP table search. xen/m2p: do not reuse kmap_op->dev_bus_addr	2012-09-21 12:06:54 -07:00
Xiao Guangrong	26bf264e87	KVM: x86: Export svm/vmx exit code and vector code to userspace Exporting KVM exit information to userspace to be consumed by perf. Signed-off-by: Dong Hao <haodong@linux.vnet.ibm.com> [ Dong Hao <haodong@linux.vnet.ibm.com>: rebase it on acme's git tree ] Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: kvm@vger.kernel.org Cc: Runzhen Wang <runzhen@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1347870675-31495-2-git-send-email-haodong@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2012-09-21 12:48:09 -03:00
Al Viro	e76623d694	x86: get rid of TIF_IRET hackery TIF_NOTIFY_RESUME will work in precisely the same way; all that is achieved by TIF_IRET is appearing that there's some work to be done, so we end up on the iret exit path. Just use NOTIFY_RESUME. And for execve() do that in 32bit start_thread(), not sys_execve() itself. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-20 09:50:17 -04:00
Gleb Natapov	1e08ec4a13	KVM: optimize apic interrupt delivery Most interrupt are delivered to only one vcpu. Use pre-build tables to find interrupt destination instead of looping through all vcpus. In case of logical mode loop only through vcpus in a logical cluster irq is sent to. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-09-20 15:05:26 +03:00
Avi Kivity	6fd01b711b	KVM: MMU: Optimize is_last_gpte() Instead of branchy code depending on level, gpte.ps, and mmu configuration, prepare everything in a bitmap during mode changes and look it up during runtime. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-09-20 13:00:09 +03:00
Avi Kivity	97d64b7881	KVM: MMU: Optimize pte permission checks walk_addr_generic() permission checks are a maze of branchy code, which is performed four times per lookup. It depends on the type of access, efer.nxe, cr0.wp, cr4.smep, and in the near future, cr4.smap. Optimize this away by precalculating all variants and storing them in a bitmap. The bitmap is recalculated when rarely-changing variables change (cr0, cr4) and is indexed by the often-changing variables (page fault error code, pte access permissions). The permission check is moved to the end of the loop, otherwise an SMEP fault could be reported as a false positive, when PDE.U=1 but PTE.U=0. Noted by Xiao Guangrong. The result is short, branch-free code. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-09-20 13:00:08 +03:00
Jan Beulich	e26a44a2d6	x86: Use REP BSF unconditionally Make "REP BSF" unconditional, as per the suggestion of hpa and Linus, this removes the insane BSF_PREFIX conditional and simplifies the logic. Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/5058741E020000780009C014@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-09-19 17:26:08 +02:00
Suresh Siddha	a8615af4bc	x86, fpu: remove cpu_has_xmm check in the fx_finit() CPUs with FXSAVE but no XMM/MXCSR (Pentium II from Intel, Crusoe/TM-3xxx/5xxx from Transmeta, and presumably some of the K6 generation from AMD) ever looked at the mxcsr field during fxrstor/fxsave. So remove the cpu_has_xmm check in the fx_finit() Reported-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1347300665-6209-6-git-send-email-suresh.b.siddha@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-09-18 15:52:24 -07:00
Suresh Siddha	212b02125f	x86, fpu: enable eagerfpu by default for xsaveopt xsaveopt/xrstor support optimized state save/restore by tracking the INIT state and MODIFIED state during context-switch. Enable eagerfpu by default for processors supporting xsaveopt. Can be disabled by passing "eagerfpu=off" boot parameter. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1347300665-6209-3-git-send-email-suresh.b.siddha@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-09-18 15:52:23 -07:00
Suresh Siddha	5d2bd7009f	x86, fpu: decouple non-lazy/eager fpu restore from xsave Decouple non-lazy/eager fpu restore policy from the existence of the xsave feature. Introduce a synthetic CPUID flag to represent the eagerfpu policy. "eagerfpu=on" boot paramter will enable the policy. Requested-by: H. Peter Anvin <hpa@zytor.com> Requested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1347300665-6209-2-git-send-email-suresh.b.siddha@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-09-18 15:52:22 -07:00
Suresh Siddha	304bceda6a	x86, fpu: use non-lazy fpu restore for processors supporting xsave Fundamental model of the current Linux kernel is to lazily init and restore FPU instead of restoring the task state during context switch. This changes that fundamental lazy model to the non-lazy model for the processors supporting xsave feature. Reasons driving this model change are: i. Newer processors support optimized state save/restore using xsaveopt and xrstor by tracking the INIT state and MODIFIED state during context-switch. This is faster than modifying the cr0.TS bit which has serializing semantics. ii. Newer glibc versions use SSE for some of the optimized copy/clear routines. With certain workloads (like boot, kernel-compilation etc), application completes its work with in the first 5 task switches, thus taking upto 5 #DNA traps with the kernel not getting a chance to apply the above mentioned pre-load heuristic. iii. Some xstate features (like AMD's LWP feature) don't honor the cr0.TS bit and thus will not work correctly in the presence of lazy restore. Non-lazy state restore is needed for enabling such features. Some data on a two socket SNB system: * Saved 20K DNA exceptions during boot on a two socket SNB system. * Saved 50K DNA exceptions during kernel-compilation workload. * Improved throughput of the AVX based checksumming function inside the kernel by ~15% as xsave/xrstor is faster than the serializing clts/stts pair. Also now kernel_fpu_begin/end() relies on the patched alternative instructions. So move check_fpu() which uses the kernel_fpu_begin/end() after alternative_instructions(). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1345842782-24175-7-git-send-email-suresh.b.siddha@intel.com Merge 32-bit boot fix from, Link: http://lkml.kernel.org/r/1347300665-6209-4-git-send-email-suresh.b.siddha@intel.com Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: NeilBrown <neilb@suse.de> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-09-18 15:52:11 -07:00
Suresh Siddha	841e3604d3	x86, fpu: always use kernel_fpu_begin/end() for in-kernel FPU usage use kernel_fpu_begin/end() instead of unconditionally accessing cr0 and saving/restoring just the few used xmm/ymm registers. This has some advantages like: * If the task's FPU state is already active, then kernel_fpu_begin() will just save the user-state and avoiding the read/write of cr0. In general, cr0 accesses are much slower. * Manual save/restore of xmm/ymm registers will affect the 'modified' and the 'init' optimizations brought in the by xsaveopt/xrstor infrastructure. * Foward compatibility with future vector register extensions will be a problem if the xmm/ymm registers are manually saved and restored (corrupting the extended state of those vector registers). With this patch, there was no significant difference in the xor throughput using AVX, measured during boot. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1345842782-24175-5-git-send-email-suresh.b.siddha@intel.com Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: NeilBrown <neilb@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-09-18 15:52:08 -07:00
Suresh Siddha	377ffbcc53	x86, fpu: remove unnecessary user_fpu_end() in save_xstate_sig() Few lines below we do drop_fpu() which is more safer. Remove the unnecessary user_fpu_end() in save_xstate_sig(), which allows the drop_fpu() to ignore any pending exceptions from the user-space and drop the current fpu. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1345842782-24175-3-git-send-email-suresh.b.siddha@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-09-18 15:52:06 -07:00
Suresh Siddha	e962591749	x86, fpu: drop_fpu() before restoring new state from sigframe No need to save the state with unlazy_fpu(), that is about to get overwritten by the state from the signal frame. Instead use drop_fpu() and continue to restore the new state. Also fold the stop_fpu_preload() into drop_fpu(). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1345842782-24175-2-git-send-email-suresh.b.siddha@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-09-18 15:52:05 -07:00
Suresh Siddha	72a671ced6	x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels Currently for x86 and x86_32 binaries, fpstate in the user sigframe is copied to/from the fpstate in the task struct. And in the case of signal delivery for x86_64 binaries, if the fpstate is live in the CPU registers, then the live state is copied directly to the user sigframe. Otherwise fpstate in the task struct is copied to the user sigframe. During restore, fpstate in the user sigframe is restored directly to the live CPU registers. Historically, different code paths led to different bugs. For example, x86_64 code path was not preemption safe till recently. Also there is lot of code duplication for support of new features like xsave etc. Unify signal handling code paths for x86 and x86_64 kernels. New strategy is as follows: Signal delivery: Both for 32/64-bit frames, align the core math frame area to 64bytes as needed by xsave (this where the main fpu/extended state gets copied to and excludes the legacy compatibility fsave header for the 32-bit [f]xsave frames). If the state is live, copy the register state directly to the user frame. If not live, copy the state in the thread struct to the user frame. And for 32-bit [f]xsave frames, construct the fsave header separately before the actual [f]xsave area. Signal return: As the 32-bit frames with [f]xstate has an additional 'fsave' header, copy everything back from the user sigframe to the fpstate in the task structure and reconstruct the fxstate from the 'fsave' header (Also user passed pointers may not be correctly aligned for any attempt to directly restore any partial state). At the next fpstate usage, everything will be restored to the live CPU registers. For all the 64-bit frames and the 32-bit fsave frame, restore the state from the user sigframe directly to the live CPU registers. 64-bit signals always restored the math frame directly, so we can expect the math frame pointer to be correctly aligned. For 32-bit fsave frames, there are no alignment requirements, so we can restore the state directly. "lat_sig catch" microbenchmark numbers (for x86, x86_64, x86_32 binaries) are with in the noise range with this change. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1343171129-2747-4-git-send-email-suresh.b.siddha@intel.com [ Merged in compilation fix ] Link: http://lkml.kernel.org/r/1344544736.8326.17.camel@sbsiddha-desk.sc.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-09-18 15:51:48 -07:00
Suresh Siddha	0ca5bd0d88	x86, fpu: Consolidate inline asm routines for saving/restoring fpu state Consolidate x86, x86_64 inline asm routines saving/restoring fpu state using config_enabled(). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1343171129-2747-3-git-send-email-suresh.b.siddha@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-09-18 15:51:26 -07:00
Suresh Siddha	050902c011	x86, signal: Cleanup ifdefs and is_ia32, is_x32 Use config_enabled() to cleanup the definitions of is_ia32/is_x32. Move the function prototypes to the header file to cleanup ifdefs, and move the x32_setup_rt_frame() code around. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1343171129-2747-2-git-send-email-suresh.b.siddha@intel.com Merged in compilation fix from, Link: http://lkml.kernel.org/r/1344544736.8326.17.camel@sbsiddha-desk.sc.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-09-18 15:51:26 -07:00
Borislav Petkov	57639bedd2	x86, MCE: Remove unused defines Those were sitting there unused since the dawn of time, drop them. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-09-17 19:33:38 +02:00
Konrad Rzeszutek Wilk	b827760053	xen/swiotlb: Use the swiotlb_late_init_with_tbl to init Xen-SWIOTLB late when PV PCI is used. With this patch we provide the functionality to initialize the Xen-SWIOTLB late in the bootup cycle - specifically for Xen PCI-frontend. We still will work if the user had supplied 'iommu=soft' on the Linux command line. Note: We cannot depend on after_bootmem to automatically determine whether this is early or not. This is because when PCI IOMMUs are initialized it is after after_bootmem but before a lot of "other" subsystems are initialized. CC: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> [v1: Fix smatch warnings] [v2: Added check for xen_swiotlb] [v3: Rebased with new xen-swiotlb changes] [v4: squashed xen/swiotlb: Depending on after_bootmem is not correct in] Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-09-17 12:58:16 -04:00
Oleg Nesterov	baedbf02b1	uprobes: Make arch_uprobe_task->saved_trap_nr "unsigned int" Make arch_uprobe_task->saved_trap_nr "unsigned int" and move it down after ->saved_scratch_register, this changes sizeof() from 24 to 16. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>	2012-09-15 17:37:32 +02:00
Oleg Nesterov	3a4664aa83	uprobes/x86: Xol should send SIGTRAP if X86_EFLAGS_TF was set arch_uprobe_disable_step() correctly preserves X86_EFLAGS_TF and returns to user-mode. But this means the application gets SIGTRAP only after the next insn. This means that UPROBE_CLEAR_TF logic is not really right. _enable should only record the state of X86_EFLAGS_TF, and _disable should check it separately from UPROBE_FIX_SETF. Remove arch_uprobe_task->restore_flags, add ->saved_tf instead, and change enable/disable accordingly. This assumes that the probed insn was not trapped, see the next patch. arch_uprobe_skip_sstep() logic has the same problem, change it to check X86_EFLAGS_TF and send SIGTRAP as well. We will cleanup this all after we fold enable/disable_step into pre/post_hol hooks. Note: send_sig(SIGTRAP) is not actually right, we need send_sigtrap(). But this needs more changes, handle_swbp() does the same and this is equally wrong. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>	2012-09-15 17:37:31 +02:00
Oleg Nesterov	9bd1190a11	uprobes/x86: Do not (ab)use TIF_SINGLESTEP/user_*_single_step() for single-stepping user_enable/disable_single_step() was designed for ptrace, it assumes a single user and does unnecessary and wrong things for uprobes. For example: - arch_uprobe_enable_step() can't trust TIF_SINGLESTEP, an application itself can set X86_EFLAGS_TF which must be preserved after arch_uprobe_disable_step(). - we do not want to set TIF_SINGLESTEP/TIF_FORCED_TF in arch_uprobe_enable_step(), this only makes sense for ptrace. - otoh we leak TIF_SINGLESTEP if arch_uprobe_disable_step() doesn't do user_disable_single_step(), the application will be killed after the next syscall. - arch_uprobe_enable_step() does access_process_vm() we do not need/want. Change arch_uprobe_enable/disable_step() to set/clear X86_EFLAGS_TF directly, this is much simpler and more correct. However, we need to clear TIF_BLOCKSTEP/DEBUGCTLMSR_BTF before executing the probed insn, add set_task_blockstep(false). Note: with or without this patch, there is another (hopefully minor) problem. A probed "pushf" insn can see the wrong X86_EFLAGS_TF set by uprobes. Perhaps we should change _disable to update the stack, or teach arch_uprobe_skip_sstep() to emulate this insn. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>	2012-09-15 17:37:30 +02:00
Sebastian Andrzej Siewior	bdc1e47217	uprobes/x86: Implement x86 specific arch_uprobe_*_step The arch specific implementation behaves like user_enable_single_step() except that it does not disable single stepping if it was already enabled by ptrace. This allows the debugger to single step over an uprobe. The state of block stepping is not restored. It makes only sense together with TF and if that was enabled then the debugger is notified. Note: this is still not correct. For example, TIF_SINGLESTEP check is not right, the application itself can set X86_EFLAGS_TF. And otoh we leak TIF_SINGLESTEP (set by enable) if the probed insn is "popf". See the next patches, we need the changes in arch/x86/kernel/step.c first. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>	2012-09-15 17:37:28 +02:00
Jan Beulich	5870661c09	x86: Prefer TZCNT over BFS Following a relatively recent compiler change, make use of the fact that for non-zero input BSF and TZCNT produce the same result, and that CPUs not knowing of TZCNT will treat the instruction as BSF (i.e. ignore what looks like a REP prefix to them). The assumption here is that TZCNT would never have worse performance than BSF. For the moment, only do this when the respective generic-CPU option is selected (as there are no specific-CPU options covering the CPUs supporting TZCNT), and don't do that when size optimization was requested. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/504DEA1B020000780009A277@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-09-13 17:44:01 +02:00
Jan Beulich	1edfbb4153	x86/64: Adjust types of temporaries used by ffs()/fls()/fls64() The 64-bit special cases of the former two (the thrird one is 64-bit only anyway) don't need to use "long" temporaries, as the result will always fit in a 32-bit variable, and the functions return plain "int". This avoids a few REX prefixes, i.e. minimally reduces code size. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/504DE550020000780009A258@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-09-13 17:43:58 +02:00
Ian Campbell	6eebdda35e	x86: Drop unnecessary kernel_eflags variable on 64-bit On 64 bit x86 we save the current eflags in cpu_init for use in ret_from_fork. Strictly speaking reserved bits in EFLAGS should be read as written but in practise it is unlikely that EFLAGS could ever be extended in this way and the kernel alread clears any undefined flags early on. The equivalent 32 bit code simply hard codes 0x0202 as the new EFLAGS. This change makes 64 bit use the same mechanism to setup the initial EFLAGS on fork. Note that 64 bit resets EFLAGS before calling schedule_tail() as opposed to 32 bit which calls schedule_tail() first. Therefore the correct value for EFLAGS has opposite IF bit. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Pekka Enberg <penberg@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20120824195847.GA31628@moon Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-09-13 17:32:47 +02:00
Stefano Stabellini	2fc136eecd	xen/m2p: do not reuse kmap_op->dev_bus_addr If the caller passes a valid kmap_op to m2p_add_override, we use kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is part of the interface with Xen and if we are batching the hypercalls it might not have been written by the hypervisor yet. That means that later on Xen will write to it and we'll think that the original mfn is actually what Xen has written to it. Rather than "stealing" struct members from kmap_op, keep using page->index to store the original mfn and add another parameter to m2p_remove_override to get the corresponding kmap_op instead. It is now responsibility of the caller to keep track of which kmap_op corresponds to a particular page in the m2p_override (gntdev, the only user of this interface that passes a valid kmap_op, is already doing that). CC: stable@kernel.org Reported-and-Tested-By: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-09-12 11:21:40 -04:00
Konrad Rzeszutek Wilk	25a765b7f0	Merge branch 'x86/platform' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into stable/for-linus-3.7 * 'x86/platform' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (9690 commits) x86: Document x86_init.paging.pagetable_init() x86: xen: Cleanup and remove x86_init.paging.pagetable_setup_done() x86: Move paging_init() call to x86_init.paging.pagetable_init() x86: Rename pagetable_setup_start() to pagetable_init() x86: Remove base argument from x86_init.paging.pagetable_setup_start Linux 3.6-rc5 HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured Remove user-triggerable BUG from mpol_to_str xen/pciback: Fix proper FLR steps. uml: fix compile error in deliver_alarm() dj: memory scribble in logi_dj Fix order of arguments to compat_put_time[spec\|val] xen: Use correct masking in xen_swiotlb_alloc_coherent. xen: fix logical error in tlb flushing xen/p2m: Fix one-off error in checking the P2M tree directory. powerpc: Don't use __put_user() in patch_instruction powerpc: Make sure IPI handlers see data written by IPI senders powerpc: Restore correct DSCR in context switch powerpc: Fix DSCR inheritance in copy_thread() powerpc: Keep thread.dscr and thread.dscr_inherit in sync ...	2012-09-12 11:14:33 -04:00
Attilio Rao	6428227898	x86: Document x86_init.paging.pagetable_init() Signed-off-by: Attilio Rao <attilio.rao@citrix.com> Acked-by: <konrad.wilk@oracle.com> Cc: <Ian.Campbell@citrix.com> Cc: <Stefano.Stabellini@eu.citrix.com> Cc: <xen-devel@lists.xensource.com> Link: http://lkml.kernel.org/r/1345580561-8506-6-git-send-email-attilio.rao@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2012-09-12 15:33:07 +02:00
Attilio Rao	c711288727	x86: xen: Cleanup and remove x86_init.paging.pagetable_setup_done() At this stage x86_init.paging.pagetable_setup_done is only used in the XEN case. Move its content in the x86_init.paging.pagetable_init setup function and remove the now unused x86_init.paging.pagetable_setup_done remaining infrastructure. Signed-off-by: Attilio Rao <attilio.rao@citrix.com> Acked-by: <konrad.wilk@oracle.com> Cc: <Ian.Campbell@citrix.com> Cc: <Stefano.Stabellini@eu.citrix.com> Cc: <xen-devel@lists.xensource.com> Link: http://lkml.kernel.org/r/1345580561-8506-5-git-send-email-attilio.rao@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2012-09-12 15:33:06 +02:00
Attilio Rao	843b8ed2ec	x86: Move paging_init() call to x86_init.paging.pagetable_init() Move the paging_init() call to the platform specific pagetable_init() function, so we can get rid of the extra pagetable_setup_done() function pointer. Signed-off-by: Attilio Rao <attilio.rao@citrix.com> Acked-by: <konrad.wilk@oracle.com> Cc: <Ian.Campbell@citrix.com> Cc: <Stefano.Stabellini@eu.citrix.com> Cc: <xen-devel@lists.xensource.com> Link: http://lkml.kernel.org/r/1345580561-8506-4-git-send-email-attilio.rao@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2012-09-12 15:33:06 +02:00
Attilio Rao	7737b215ad	x86: Rename pagetable_setup_start() to pagetable_init() In preparation for unifying the pagetable_setup_start() and pagetable_setup_done() setup functions, rename appropriately all the infrastructure related to pagetable_setup_start(). Signed-off-by: Attilio Rao <attilio.rao@citrix.com> Ackedd-by: <konrad.wilk@oracle.com> Cc: <Ian.Campbell@citrix.com> Cc: <Stefano.Stabellini@eu.citrix.com> Cc: <xen-devel@lists.xensource.com> Link: http://lkml.kernel.org/r/1345580561-8506-3-git-send-email-attilio.rao@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2012-09-12 15:33:06 +02:00
Attilio Rao	73090f8993	x86: Remove base argument from x86_init.paging.pagetable_setup_start We either use swapper_pg_dir or the argument is unused. Preparatory patch to simplify platform pagetable setup further. Signed-off-by: Attilio Rao <attilio.rao@citrix.com> Ackedb-by: <konrad.wilk@oracle.com> Cc: <Ian.Campbell@citrix.com> Cc: <Stefano.Stabellini@eu.citrix.com> Cc: <xen-devel@lists.xensource.com> Link: http://lkml.kernel.org/r/1345580561-8506-2-git-send-email-attilio.rao@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2012-09-12 15:33:06 +02:00
Matthew Garrett	f594065faf	ACPI: Add fixups for AMD P-state figures Some AMD systems may round the frequencies in ACPI tables to 100MHz boundaries. We can obtain the real frequencies from MSRs, so add a quirk to fix these frequencies up on AMD systems. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-09-09 22:05:02 +02:00
Matthew Garrett	3dc9a633f8	acpi-cpufreq: Add support for modern AMD CPUs The programming model for P-states on modern AMD CPUs is very similar to that of Intel and VIA. It makes sense to consolidate this support into one driver rather than duplicating functionality between two of them. This patch adds support for AMDs with hardware P-state control to acpi-cpufreq. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-09-09 22:04:26 +02:00
H. Peter Anvin	05194cfc54	x86, cpufeature: Add feature bit for SMAP Add CPUID feature bit for Supervisor Mode Access Prevention (SMAP). Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/n/tip-ethzcr5nipikl6hd5q8ssepq@git.kernel.org	2012-09-09 11:12:04 -07:00
Gleb Natapov	b3356bf0db	KVM: emulator: optimize "rep ins" handling Optimize "rep ins" by allowing emulator to write back more than one datum at a time. Introduce new operand type OP_MEM_STR which tells writeback() that dst contains pointer to an array that should be written back as opposite to just one data element. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-09-06 18:07:38 +03:00
Gleb Natapov	9d1b39a967	KVM: emulator: make x86 emulation modes enum instead of defines Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-09-06 18:07:01 +03:00
Gleb Natapov	716d51abff	KVM: Provide userspace IO exit completion callback Current code assumes that IO exit was due to instruction emulation and handles execution back to emulator directly. This patch adds new userspace IO exit completion callback that can be set by any other code that caused IO exit to userspace. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-09-06 18:06:37 +03:00
Mathias Krause	0225fb509d	KVM: x86 emulator: constify emulate_ops We never change emulate_ops[] at runtime so it should be r/o. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-09-05 12:41:48 +03:00
Mathias Krause	ae13b7b4e0	x86/iommu: Use NULL instead of plain 0 for __IOMMU_INIT IOMMU_INIT_POST and IOMMU_INIT_POST_FINISH pass the plain value 0 instead of NULL to __IOMMU_INIT. Fix this and make sparse happy by doing so. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/1346621506-30857-8-git-send-email-minipli@googlemail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-09-05 10:52:26 +02:00
Mathias Krause	2b11afd1ab	x86/iommu: Drop duplicate const in __IOMMU_INIT It's redundant and makes sparse complain about it. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/1346621506-30857-7-git-send-email-minipli@googlemail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-09-05 10:52:26 +02:00
Mathias Krause	3d1334064f	x86/vdso: Add __user annotation to VDSO32_SYMBOL The address calculated by VDSO32_SYMBOL() is a pointer into userland. Add the __user annotation to fix related sparse warnings in its users. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Andy Lutomirski <luto@MIT.EDU> Link: http://lkml.kernel.org/r/1346621506-30857-3-git-send-email-minipli@googlemail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-09-05 10:52:23 +02:00
Mathias Krause	f00026276a	x86: Fix __user annotations in asm/sys_ia32.h Fix the following sparse warnings: sys_ia32.c:293:38: warning: incorrect type in argument 2 (different address spaces) sys_ia32.c:293:38: expected unsigned int [noderef] [usertype] <asn:1>stat_addr sys_ia32.c:293:38: got unsigned int stat_addr Ironically, sys_ia32.h was introduced to fix sparse warnings but missed that one. Signed-off-by: Mathias Krause <minipli@googlemail.com> Link: http://lkml.kernel.org/r/1346621506-30857-2-git-send-email-minipli@googlemail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-09-05 10:52:23 +02:00
Jon Mason	e09a841782	hpet: Remove unused PCI Vendor ID #define HPET_ID_VENDOR_8086 is defined but never used. It would be a redefine of PCI_VENDOR_ID_INTEL if it was ever used. Signed-off-by: Jon Mason <jdmason@kudzu.us> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-09-01 08:44:28 -07:00
Ingo Molnar	508dc4f8ee	Merge branch 'perf/urgent' into perf/core Pick up the latest fixes because upcoming uprobes changes will rely on it. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-08-28 18:05:55 +02:00
Avi Kivity	dd856efafe	KVM: x86 emulator: access GPRs on demand Instead of populating the entire register file, read in registers as they are accessed, and write back only the modified ones. This saves a VMREAD and VMWRITE on Intel (for rsp, since it is not usually used during emulation), and a two 128-byte copies for the registers. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-08-27 18:38:55 -03:00
Marcelo Tosatti	c78aa4c4b9	Merge remote-tracking branch 'upstream/master' into queue Merging critical fixes from upstream required for development. * upstream/master: (809 commits) libata: Add a space to " 2GB ATA Flash Disk" DMA blacklist entry Revert "powerpc: Update g5_defconfig" powerpc/perf: Use pmc_overflow() to detect rolled back events powerpc: Fix VMX in interrupt check in POWER7 copy loops powerpc: POWER7 copy_to_user/copy_from_user patch applied twice powerpc: Fix personality handling in ppc64_personality() powerpc/dma-iommu: Fix IOMMU window check powerpc: Remove unnecessary ifdefs powerpc/kgdb: Restore current_thread_info properly powerpc/kgdb: Bail out of KGDB when we've been triggered powerpc/kgdb: Do not set kgdb_single_step on ppc powerpc/mpic_msgr: Add missing includes powerpc: Fix null pointer deref in perf hardware breakpoints powerpc: Fixup whitespace in xmon powerpc: Fix xmon dl command for new printk implementation xfs: check for possible overflow in xfs_ioc_trim xfs: unlock the AGI buffer when looping in xfs_dialloc xfs: fix uninitialised variable in xfs_rtbuf_get() powerpc/fsl: fix "Failed to mount /dev: No such device" errors powerpc/fsl: update defconfigs ... Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-08-26 13:58:41 -03:00
Steven Rostedt	d57c5d51a3	ftrace/x86: Add support for -mfentry to x86_64 If the kernel is compiled with gcc 4.6.0 which supports -mfentry, then use that instead of mcount. With mcount, frame pointers are forced with the -pg option and we get something like: <can_vma_merge_before>: 55 push %rbp 48 89 e5 mov %rsp,%rbp 53 push %rbx 41 51 push %r9 e8 fe 6a 39 00 callq ffffffff81483d00 <mcount> 31 c0 xor %eax,%eax 48 89 fb mov %rdi,%rbx 48 89 d7 mov %rdx,%rdi 48 33 73 30 xor 0x30(%rbx),%rsi 48 f7 c6 ff ff ff f7 test $0xfffffffff7ffffff,%rsi With -mfentry, frame pointers are no longer forced and the call looks like this: <can_vma_merge_before>: e8 33 af 37 00 callq ffffffff81461b40 <__fentry__> 53 push %rbx 48 89 fb mov %rdi,%rbx 31 c0 xor %eax,%eax 48 89 d7 mov %rdx,%rdi 41 51 push %r9 48 33 73 30 xor 0x30(%rbx),%rsi 48 f7 c6 ff ff ff f7 test $0xfffffffff7ffffff,%rsi This adds the ftrace hook at the beginning of the function before a frame is set up, and allows the function callbacks to be able to access parameters. As kprobes now can use function tracing (at least on x86) this speeds up the kprobe hooks that are at the beginning of the function. Link: http://lkml.kernel.org/r/20120807194100.130477900@goodmis.org Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-08-23 11:26:36 -04:00
Stefano Stabellini	bd3f79b71d	xen: Introduce xen_pfn_t for pfn and mfn types All the original Xen headers have xen_pfn_t as mfn and pfn type, however when they have been imported in Linux, xen_pfn_t has been replaced with unsigned long. That might work for x86 and ia64 but it does not for arm. Bring back xen_pfn_t and let each architecture define xen_pfn_t as they see fit. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-08-23 10:18:17 -04:00
Rusty Russell	816afe4ff9	x86/smp: Don't ever patch back to UP if we unplug cpus We still patch SMP instructions to UP variants if we boot with a single CPU, but not at any other time. In particular, not if we unplug CPUs to return to a single cpu. Paul McKenney points out: mean offline overhead is 6251/48=130.2 milliseconds. If I remove the alternatives_smp_switch() from the offline path [...] the mean offline overhead is 550/42=13.1 milliseconds Basically, we're never going to get those 120ms back, and the code is pretty messy. We get rid of: 1) The "smp-alt-once" boot option. It's actually "smp-alt-boot", the documentation is wrong. It's now the default. 2) The skip_smp_alternatives flag used by suspend. 3) arch_disable_nonboot_cpus_begin() and arch_disable_nonboot_cpus_end() which were only used to set this one flag. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul McKenney <paul.mckenney@us.ibm.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/87vcgwwive.fsf@rustcorp.com.au Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-08-23 10:45:13 +02:00
Marcelo Tosatti	90993cdd18	x86: KVM guest: merge CONFIG_KVM_CLOCK into CONFIG_KVM_GUEST The distinction between CONFIG_KVM_CLOCK and CONFIG_KVM_GUEST is not so clear anymore, as demonstrated by recent bugs caused by poor handling of on/off combinations of these options. Merge CONFIG_KVM_CLOCK into CONFIG_KVM_GUEST. Reported-By: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-08-23 04:57:54 -03:00
Borislav Petkov	48e30685ca	x86, microcode: Add a refresh firmware flag to ->request_microcode_fw This is done in preparation for teaching the ucode driver to either load a new ucode patches container from userspace or use an already cached version. No functionality change in this patch. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1344361461-10076-10-git-send-email-bp@amd64.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-08-22 16:15:58 -07:00
Borislav Petkov	e7e632f5ba	x86, microcode, AMD: Remove useless get_ucode_data wrapper get_ucode_data was a trivial memcpy wrapper. Remove it so as not to obfuscate code unnecessarily with no obvious gain. No functional change. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1344361461-10076-7-git-send-email-bp@amd64.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-08-22 16:15:26 -07:00
Xiao Guangrong	4d8b81abc4	KVM: introduce readonly memslot In current code, if we map a readonly memory space from host to guest and the page is not currently mapped in the host, we will get a fault pfn and async is not allowed, then the vm will crash We introduce readonly memory region to map ROM/ROMD to the guest, read access is happy for readonly memslot, write access on readonly memslot will cause KVM_EXIT_MMIO exit Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-08-22 15:09:03 +03:00
Richard Weinberger	83be4ffa1a	x86/spinlocks: Fix comment in spinlock.h This comment is no longer true. We support up to 2^16 CPUs because __ticket_t is an u16 if NR_CPUS is larger than 256. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-08-22 09:52:47 +02:00
Stefano Stabellini	4d9310e397	xen: missing includes Changes in v2: - remove pvclock hack; - remove include linux/types.h from xen/interface/xen.h. v3: - Compile under IA64 Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-08-21 14:49:21 -04:00
Ingo Molnar	bcada3d4b8	perf/core improvements and fixes: . Fix include order for bison/flex-generated C files, from Ben Hutchings . Build fixes and documentation corrections from David Ahern . Group parsing support, from Jiri Olsa . UI/gtk refactorings and improvements from Namhyung Kim . NULL deref fix for perf script, from Namhyung Kim . Assorted cleanups from Robert Richter . Let O= makes handle relative paths, from Steven Rostedt Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJQMkGhAAoJENZQFvNTUqpAqjsQAJE5iD1LFogC8o/WjvRHz0TY Y0x+sR/XfW61KYpeq5g+UaKuFU3P44ijCoyks3y5sza97DkYgUwMpEHlLXFSM8Pp sNOapqY57s24nq3MLrhH1V9w+cSE+m2u/Gi5fGLCQekio9gkOBwYxNGk7vpKri/n LBRsMozBu/mZjMy20uWOb7Uk8xsAToh+TFaAtjyQ9Snn9nNJj49NUAp37uN888H/ ducMLq32HN5v/6Zd3q6IWdDWgZsHLkIa3R5FIs/GNe3Dih07gtYLmDol4ktPbTFm yoaWpP5wbtu/62EZlJwE393vMuoeqN/96394ZZQGFafhHVxN4+rcBhXbejBs0T2b wk/0CzntW8bbUAI/cl3SB9aui//FWOxcjG9aDQ7PsmHzPw1Q4VD0F9Mcod4p+dRX PsA9q/tST1eAiwzWYthDtj81U7iChINcXKhoZn2xn6+0+aMH+6FFNBmCH8MR5aCU BvrXhTJjvau/Ym/sILl4Tf4wfssTq49yMsn/YKCwLJ0hg0XlTObWfQRy2MOayXH9 NJvUE+9GSXoTEKhmr1AfTYEG9vObaXZyFwAI74xvPPwUYojCb4ZjEKmG0egW+VGk IJKFCaJZwwVsGau4aIbFAMP12/L8Qs/Ox91ddCJ0j5TIlSGMaqW5lbV1N1crzlTT a0GsN49NvhbFttBXrcNX =0a2X -----END PGP SIGNATURE----- Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: * Fix include order for bison/flex-generated C files, from Ben Hutchings * Build fixes and documentation corrections from David Ahern * Group parsing support, from Jiri Olsa * UI/gtk refactorings and improvements from Namhyung Kim * NULL deref fix for perf script, from Namhyung Kim * Assorted cleanups from Robert Richter * Let O= makes handle relative paths, from Steven Rostedt * perf script python fixes, from Feng Tang. * Improve 'perf lock' error message when the needed tracepoints are not present, from David Ahern. * Initial bash completion support, from Frederic Weisbecker * Allow building without libelf, from Namhyung Kim. * Support DWARF CFI based unwind to have callchains when %bp based unwinding is not possible, from Jiri Olsa. * Symbol resolution fixes, while fixing support PPC64 files with an .opt ELF section was the end goal, several fixes for code that handles all architectures and cleanups are included, from Cody Schafer. * Add a description for the JIT interface, from Andi Kleen. * Assorted fixes for Documentation and build in 32 bit, from Robert Richter * Add support for non-tracepoint events in perf script python, from Feng Tang * Cache the libtraceevent event_format associated to each evsel early, so that we avoid relookups, i.e. calling pevent_find_event repeatedly when processing tracepoint events. [ This is to reduce the surface contact with libtraceevents and make clear what is that the perf tools needs from that lib: so far parsing the common and per event fields. ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-08-21 11:27:00 +02:00
Ingo Molnar	26198c21d1	Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core Pull ftrace updates from Steve Rostedt: " This patch series extends ftrace function tracing utility to be more dynamic for its users. It allows for data passing to the callback functions, as well as reading regs as if a breakpoint were to trigger at function entry. The main goal of this patch series was to allow kprobes to use ftrace as an optimized probe point when a probe is placed on an ftrace nop. With lots of help from Masami Hiramatsu, and going through lots of iterations, we finally came up with a good solution. " Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-08-21 11:23:40 +02:00
Raghavendra K T	e423ca155d	KVM: Correct vmrun to vmcall typo Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-08-13 17:39:59 -03:00
Marcelo Tosatti	51d59c6b42	KVM: x86: fix pvclock guest stopped flag reporting kvm_guest_time_update unconditionally clears hv_clock.flags field, so the notification never reaches the guest. Fix it by allowing PVCLOCK_GUEST_STOPPED to passthrough. Reviewed-by: Eric B Munson <emunson@mgebm.net> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-08-13 16:10:45 -03:00
Frederic Weisbecker	91d7753a45	perf: Factor __output_copy to be usable with specific copy function Adding a generic way to use __output_copy function with specific copy function via DEFINE_PERF_OUTPUT_COPY macro. Using this to add new __output_copy_user function, that provides output copy from user pointers. For x86 the copy_from_user_nmi function is used and __copy_from_user_inatomic for the rest of the architectures. This new function will be used in user stack dump on sample, coming in next patches. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: "Frank Ch. Eigler" <fche@redhat.com> Cc: Arun Sharma <asharma@fb.com> Cc: Benjamin Redelings <benjamin.redelings@nescent.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Ulrich Drepper <drepper@gmail.com> Link: http://lkml.kernel.org/r/1344345647-11536-4-git-send-email-jolsa@redhat.com Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2012-08-10 11:44:06 -03:00
Jiri Olsa	c5e63197db	perf: Unified API to record selective sets of arch registers This brings a new API to help the selective dump of registers on event sampling, and its implementation for x86 arch. Added HAVE_PERF_REGS config option to determine if the architecture provides perf registers ABI. The information about desired registers will be passed in u64 mask. It's up to the architecture to map the registers into the mask bits. For the x86 arch implementation, both 32 and 64 bit registers bits are defined within single enum to ensure 64 bit system can provide register dump for compat task if needed in the future. Original-patch-by: Frederic Weisbecker <fweisbec@gmail.com> [ Added missing linux/errno.h include ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: "Frank Ch. Eigler" <fche@redhat.com> Cc: Arun Sharma <asharma@fb.com> Cc: Benjamin Redelings <benjamin.redelings@nescent.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Ulrich Drepper <drepper@gmail.com> Link: http://lkml.kernel.org/r/1344345647-11536-2-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2012-08-10 11:21:37 -03:00
Stefano Stabellini	256f631f1f	xen/arm: Introduce xen_ulong_t for unsigned long All the original Xen headers have xen_ulong_t as unsigned long type, however when they have been imported in Linux, xen_ulong_t has been replaced with unsigned long. That might work for x86 and ia64 but it does not for arm. Bring back xen_ulong_t and let each architecture define xen_ulong_t as they see fit. Also explicitly size pointers (__DEFINE_GUEST_HANDLE) to 64 bit. Changes in v3: - remove the incorrect changes to multicall_entry; - remove the change to apic_physbase. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-09-14 13:34:43 +00:00
Takuya Yoshikawa	d89cc617b9	KVM: Push rmap into kvm_arch_memory_slot Two reasons: - x86 can integrate rmap and rmap_pde and remove heuristics in __gfn_to_rmap(). - Some architectures do not need rmap. Since rmap is one of the most memory consuming stuff in KVM, ppc'd better restrict the allocation to Book3S HV. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-08-06 12:47:30 +03:00
Avi Kivity	fe56097b23	Merge remote-tracking branch 'upstream' into next - bring back critical fixes (esp. `aa67f6096c`) - provide an updated base for development * upstream: (4334 commits) missed mnt_drop_write() in do_dentry_open() UBIFS: nuke pdflush from comments gfs2: nuke pdflush from comments drbd: nuke pdflush from comments nilfs2: nuke write_super from comments hfs: nuke write_super from comments vfs: nuke pdflush from comments jbd/jbd2: nuke write_super from comments btrfs: nuke pdflush from comments btrfs: nuke write_super from comments ext4: nuke pdflush from comments ext4: nuke write_super from comments ext3: nuke write_super from comments Documentation: fix the VM knobs descritpion WRT pdflush Documentation: get rid of write_super vfs: kill write_super and sync_supers ACPI processor: Fix tick_broadcast_mask online/offline regression ACPI: Only count valid srat memory structures ACPI: Untangle a return statement for better readability Linux 3.6-rc1 ... Signed-off-by: Avi Kivity <avi@redhat.com>	2012-08-05 13:25:10 +03:00
Linus Torvalds	1ca0049f2c	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Various fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64, kcmp: The kcmp system call can be common arch/x86/kernel/kdebugfs.c: Ensure a consistent return value in error case x86/mce: Add quirk for instruction recovery on Sandy Bridge processors x86/mce: Move MCACOD defines from mce-severity.c to <asm/mce.h> x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqs x86, nops: Missing break resulting in incorrect selection on Intel x86: CONFIG_CC_STACKPROTECTOR=y is no longer experimental	2012-08-03 10:59:36 -07:00
Linus Torvalds	bd463a0606	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Fix merge window fallout and fix sleep profiling (this was always broken, so it's not a fix for the merge window - we can skip this one from the head of the tree)." * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/trace: Add ability to set a target task for events perf/x86: Fix USER/KERNEL tagging of samples properly perf/x86/intel/uncore: Make UNCORE_PMU_HRTIMER_INTERVAL 64-bit	2012-08-03 10:57:20 -07:00
Linus Torvalds	fc6bdb59a5	Merge branch 'for-linus-3.6' of git://dev.laptop.org/users/dilinger/linux-olpc Pull OLPC platform updates from Andres Salomon: "These move the OLPC Embedded Controller driver out of arch/x86/platform and into drivers/platform/olpc. OLPC machines are now ARM-based (which means lots of x86 and ARM changes), but are typically pretty self-contained.. so it makes more sense to go through a separate OLPC tree after getting the appropriate review/ACKs." * 'for-linus-3.6' of git://dev.laptop.org/users/dilinger/linux-olpc: x86: OLPC: move s/r-related EC cmds to EC driver Platform: OLPC: move global variables into priv struct Platform: OLPC: move debugfs support from x86 EC driver x86: OLPC: switch over to using new EC driver on x86 Platform: OLPC: add a suspended flag to the EC driver Platform: OLPC: turn EC driver into a platform_driver Platform: OLPC: allow EC cmd to be overridden, and create a workqueue to call it drivers: OLPC: update various drivers to include olpc-ec.h Platform: OLPC: add a stub to drivers/platform/ for the OLPC EC driver	2012-08-02 11:52:39 -07:00
Andres Salomon	85f90cf6ca	x86: OLPC: switch over to using new EC driver on x86 This uses the new EC driver framework in drivers/platform/olpc. The XO-1 and XO-1.5-specific code is still in arch/x86, but the generic stuff (including a new workqueue; no more running EC commands with IRQs disabled!) can be shared with other architectures. Signed-off-by: Andres Salomon <dilinger@queued.net> Acked-by: Paul Fox <pgf@laptop.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2012-07-31 23:27:30 -04:00
Andres Salomon	3bf9428f22	drivers: OLPC: update various drivers to include olpc-ec.h Switch over to using olpc-ec.h in multiple steps, so as not to break builds. This covers every driver that calls olpc_ec_cmd(). Signed-off-by: Andres Salomon <dilinger@queued.net> Acked-by: Paul Fox <pgf@laptop.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2012-07-31 23:27:29 -04:00
Andres Salomon	392a325c43	Platform: OLPC: add a stub to drivers/platform/ for the OLPC EC driver The OLPC EC driver has outgrown arch/x86/platform/. It's time to both share common code amongst different architectures, as well as move it out of arch/x86/. The XO-1.75 is ARM-based, and the EC driver shares a lot of code with the x86 code. Signed-off-by: Andres Salomon <dilinger@queued.net> Acked-by: Paul Fox <pgf@laptop.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2012-07-31 23:27:29 -04:00
Linus Torvalds	bca1a5c0ea	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The biggest changes are Intel Nehalem-EX PMU uncore support, uprobes updates/cleanups/fixes from Oleg and diverse tooling updates (mostly fixes) now that Arnaldo is back from vacation." * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) uprobes: __replace_page() needs munlock_vma_page() uprobes: Rename vma_address() and make it return "unsigned long" uprobes: Fix register_for_each_vma()->vma_address() check uprobes: Introduce vaddr_to_offset(vma, vaddr) uprobes: Teach build_probe_list() to consider the range uprobes: Remove insert_vm_struct()->uprobe_mmap() uprobes: Remove copy_vma()->uprobe_mmap() uprobes: Fix overflow in vma_address()/find_active_uprobe() uprobes: Suppress uprobe_munmap() from mmput() uprobes: Uprobe_mmap/munmap needs list_for_each_entry_safe() uprobes: Clean up and document write_opcode()->lock_page(old_page) uprobes: Kill write_opcode()->lock_page(new_page) uprobes: __replace_page() should not use page_address_in_vma() uprobes: Don't recheck vma/f_mapping in write_opcode() perf/x86: Fix missing struct before structure name perf/x86: Fix format definition of SNB-EP uncore QPI box perf/x86: Make bitfield unsigned perf/x86: Fix LLC-* and node-* events on Intel SandyBridge perf/x86: Add Intel Nehalem-EX uncore support perf/x86: Fix typo in format definition of uncore PCU filter ...	2012-07-31 15:34:13 -07:00
Peter Zijlstra	d07bdfd322	perf/x86: Fix USER/KERNEL tagging of samples properly Some PMUs don't provide a full register set for their sample, specifically 'advanced' PMUs like AMD IBS and Intel PEBS which provide 'better' than regular interrupt accuracy. In this case we use the interrupt regs as basis and over-write some fields (typically IP) with different information. The perf core however uses user_mode() to distinguish user/kernel samples, user_mode() relies on regs->cs. If the interrupt skid pushed us over a boundary the new IP might not be in the same domain as the interrupt. Commit `ce5c1fe9a9` ("perf/x86: Fix USER/KERNEL tagging of samples") tried to fix this by making the perf core use kernel_ip(). This however is wrong (TM), as pointed out by Linus, since it doesn't allow for VM86 and non-zero based segments in IA32 mode. Therefore, provide a new helper to set the regs->ip field, set_linear_ip(), which massages the regs into a suitable state assuming the provided IP is in fact a linear address. Also modify perf_instruction_pointer() and perf_callchain_user() to deal with segments base offsets. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1341910954.3462.102.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-07-31 17:02:04 +02:00
Masami Hiramatsu	e525389651	kprobes/x86: ftrace based optimization for x86 Add function tracer based kprobe optimization support handlers on x86. This allows kprobes to use function tracer for probing on mcount call. Link: http://lkml.kernel.org/r/20120605102838.27845.26317.stgit@localhost.localdomain Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: "Frank Ch. Eigler" <fche@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> [ Updated to new port of ftrace save regs functions ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-07-31 10:29:59 -04:00
Will Deacon	c1d7e01d78	ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION Rather than #define the options manually in the architecture code, add Kconfig options for them and select them there instead. This also allows us to select the compat IPC version parsing automatically for platforms using the old compat IPC interface. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:21 -07:00
Linus Torvalds	4cb38750d4	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/mm changes from Peter Anvin: "The big change here is the patchset by Alex Shi to use INVLPG to flush only the affected pages when we only need to flush a small page range. It also removes the special INVALIDATE_TLB_VECTOR interrupts (32 vectors!) and replace it with an ordinary IPI function call." Fix up trivial conflicts in arch/x86/include/asm/apic.h (added code next to changed line) * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tlb: Fix build warning and crash when building for !SMP x86/tlb: do flush_tlb_kernel_range by 'invlpg' x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR x86/tlb: enable tlb flush range support for x86 mm/mmu_gather: enable tlb flush range in generic mmu_gather x86/tlb: add tlb_flushall_shift knob into debugfs x86/tlb: add tlb_flushall_shift for specific CPU x86/tlb: fall back to flush all when meet a THP large page x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range x86/tlb_info: get last level TLB entry number of CPU x86: Add read_mostly declaration/definition to variables from smp.h x86: Define early read-mostly per-cpu macros	2012-07-26 13:17:17 -07:00
Linus Torvalds	0a2fe19ccc	Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pul x86/efi changes from Ingo Molnar: "This tree adds an EFI bootloader handover protocol, which, once supported on the bootloader side, will make bootup faster and might result in simpler bootloaders. The other change activates the EFI wall clock time accessors on x86-64 as well, instead of the legacy RTC readout." * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, efi: Handover Protocol x86-64/efi: Use EFI to deal with platform wall clock	2012-07-26 13:13:25 -07:00
Linus Torvalds	c1b669b72a	Merge branches 'x86-cleanups-for-linus' and 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanup and cpufeature from Ingo Molnar: "Just a single cleanup and and a commit that adds new CPU feature names" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, boot: Remove ancient, unconditionally #ifdef'd out dead code * 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpufeature: Add the RDSEED and ADX features	2012-07-26 13:12:09 -07:00
Linus Torvalds	44a6b84421	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: - Fixed algorithm construction hang when self-test fails. - Added SHA variants to talitos AEAD list. - New driver for Exynos random number generator. - Performance enhancements for arc4. - Added hwrng support to caam. - Added ahash support to caam. - Fixed bad kfree in aesni-intel. - Allow aesni-intel in FIPS mode. - Added atmel driver with support for AES/3DES/SHA. - Bug fixes for mv_cesa. - CRC hardware driver for BF60x family processors. * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (66 commits) crypto: twofish-avx - remove useless instruction crypto: testmgr - add aead cbc aes hmac sha1,256,512 test vectors crypto: talitos - add sha224, sha384 and sha512 to existing AEAD algorithms crypto: talitos - export the talitos_submit function crypto: talitos - move talitos structures to header file crypto: atmel - add new tests to tcrypt crypto: atmel - add Atmel SHA1/SHA256 driver crypto: atmel - add Atmel DES/TDES driver crypto: atmel - add Atmel AES driver ARM: AT91SAM9G45: add crypto peripherals crypto: testmgr - allow aesni-intel and ghash_clmulni-intel in fips mode hwrng: exynos - Add support for Exynos random number generator crypto: aesni-intel - fix wrong kfree pointer crypto: caam - ERA retrieval and printing for SEC device crypto: caam - Using alloc_coherent for caam job rings crypto: algapi - Fix hang on crypto allocation crypto: arc4 - now arc needs blockcipher support crypto: caam - one tasklet per job ring crypto: caam - consolidate memory barriers from job ring en/dequeue crypto: caam - only query h/w in job ring dequeue path ...	2012-07-26 13:00:59 -07:00
Tony Luck	736edce5f3	x86/mce: Move MCACOD defines from mce-severity.c to <asm/mce.h> We will need some of these values in mce.c. Move them to the appropriate header file so they are available. Acked-by: Borislav Petkov <bp@amd64.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Chen Gong <gong.chen@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Link: http://lkml.kernel.org/r/0ccfb1af5fe35e537b7cd8e4d448bf7d851dbfb9.1343078495.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-07-26 15:05:47 +02:00
Jovi Zhang	35d56ca9d4	perf/x86: Fix missing struct before structure name When CONFIG_PERF_EVENTS disabled, there will have a compiliation error, because missing struct before structure name. Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jiri Kosina <trivial@kernel.org> Link: http://lkml.kernel.org/r/CACV3sbKF%3DCX%2B2jWEWesfCA6rBoQ3wDM4-5ac9MuBtVbCtMRHdQ@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-07-26 15:04:34 +02:00
Avi Kivity	e9bda6f6f9	Merge branch 'queue' into next Merge patches queued during the run-up to the merge window. * queue: (25 commits) KVM: Choose better candidate for directed yield KVM: Note down when cpu relax intercepted or pause loop exited KVM: Add config to support ple or cpu relax optimzation KVM: switch to symbolic name for irq_states size KVM: x86: Fix typos in pmu.c KVM: x86: Fix typos in lapic.c KVM: x86: Fix typos in cpuid.c KVM: x86: Fix typos in emulate.c KVM: x86: Fix typos in x86.c KVM: SVM: Fix typos KVM: VMX: Fix typos KVM: remove the unused parameter of gfn_to_pfn_memslot KVM: remove is_error_hpa KVM: make bad_pfn static to kvm_main.c KVM: using get_fault_pfn to get the fault pfn KVM: MMU: track the refcount when unmap the page KVM: x86: remove unnecessary mark_page_dirty KVM: MMU: Avoid handling same rmap_pde in kvm_handle_hva_range() KVM: MMU: Push trace_kvm_age_page() into kvm_age_rmapp() KVM: MMU: Add memslot parameter to hva handlers ... Signed-off-by: Avi Kivity <avi@redhat.com>	2012-07-26 11:54:21 +03:00
Linus Torvalds	97027da6ad	IOMMU Updates for Linux v3.6-rc1 The most important part of these updates is the IOMMU groups code enhancement written by Alex Williamson. It abstracts the problem that a given hardware IOMMU can't isolate any given device from any other device (e.g. 32 bit PCI devices can't usually be isolated). Devices that can't be isolated are grouped together. This code is required for the upcoming VFIO framework. Another IOMMU-API change written by be is the introduction of domain attributes. This makes it easier to handle GART-like IOMMUs with the IOMMU-API because now the start-address and the size of the domain address space can be queried. Besides that there are a few cleanups and fixes for the NVidia Tegra IOMMU drivers and the reworked init-code for the AMD IOMMU. The later is from my patch-set to support interrupt remapping. The rest of this patch-set requires x86 changes which are not mergabe yet. So full support for interrupt remapping with AMD IOMMUs will come in a future merge window. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJQDV/MAAoJECvwRC2XARrjSDcP+gJbtSHDMyZ71zyfQfAZcxJt rTqLbdZRtIjrjgtKSEDp8u5Bo5TK9dAYoZVuJMOZewFzwI/fSfbRsWp1PU0I88Fr ZzM+/o1N9MLvf1e3kRVOzNzUfku+jTQgUBD4txsbtQzc/IeGHe9qP1Bqzs/xg4Pk SjWu7pLNYxaER10z76nRodNn6zGjsc7GFdOW8cJu2HOAHhisIAR291jSQgd6Rz9r zWqSTsXIEzYt2CtU3G2/tFJ554Mp8v5F80gHo+0Ldw8aNxlD6nGtbqGNt+KI8qTv MUL8KJ0TNms9CZdti1CSlSNp51VgJi2GaWKCaDAkYuuER2IbC/8Yp/p2DIIA0GNp HpziIs+dauZPWfZHc6oU7lJAClGAG4MUx7CysVIOzl7ML/Bf4mjYv0faGf5YQfyE weOR+OPPIWDUwgjzHKMAboA4ijkE/v+EKjOaN/S9rEqFEMKC99fwGkf9wUcpZTne 8lzdI2JrgYNDWMVNYlomeLD4lBAbxb/QsnRUa33igjr0MclvMDkp5HaO631Z1+Zx be2z8Rl1CtMwS4qeaOXoeaoNWHU26+oJRZNtCGi/Fw4aKqYXP1dnE/m0GtqEP9Yi +CU2rKbZn3j0+ZcQjCQop8FREPrZ2/Uaji70b6G7WZ2ApcqBxzBffpbMKOmd6T1D HIzGh0fpdYNDuwn6Txit =MbAC -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: "The most important part of these updates is the IOMMU groups code enhancement written by Alex Williamson. It abstracts the problem that a given hardware IOMMU can't isolate any given device from any other device (e.g. 32 bit PCI devices can't usually be isolated). Devices that can't be isolated are grouped together. This code is required for the upcoming VFIO framework. Another IOMMU-API change written by me is the introduction of domain attributes. This makes it easier to handle GART-like IOMMUs with the IOMMU-API because now the start-address and the size of the domain address space can be queried. Besides that there are a few cleanups and fixes for the NVidia Tegra IOMMU drivers and the reworked init-code for the AMD IOMMU. The latter is from my patch-set to support interrupt remapping. The rest of this patch-set requires x86 changes which are not mergabe yet. So full support for interrupt remapping with AMD IOMMUs will come in a future merge window." * tag 'iommu-updates-v3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (33 commits) iommu/amd: Fix hotplug with iommu=pt iommu/amd: Add missing spin_lock initialization iommu/amd: Convert iommu initialization to state machine iommu/amd: Introduce amd_iommu_init_dma routine iommu/amd: Move unmap_flush message to amd_iommu_init_dma_ops() iommu/amd: Split enable_iommus() routine iommu/amd: Introduce early_amd_iommu_init routine iommu/amd: Move informational prinks out of iommu_enable iommu/amd: Split out PCI related parts of IOMMU initialization iommu/amd: Use acpi_get_table instead of acpi_table_parse iommu/amd: Fix sparse warnings iommu/tegra: Don't call alloc_pdir with as->lock iommu/tegra: smmu: Fix unsleepable memory allocation at alloc_pdir() iommu/tegra: smmu: Remove unnecessary sanity check at alloc_pdir() iommu/exynos: Implement DOMAIN_ATTR_GEOMETRY attribute iommu/tegra: Implement DOMAIN_ATTR_GEOMETRY attribute iommu/msm: Implement DOMAIN_ATTR_GEOMETRY attribute iommu/omap: Implement DOMAIN_ATTR_GEOMETRY attribute iommu/vt-d: Implement DOMAIN_ATTR_GEOMETRY attribute iommu/amd: Implement DOMAIN_ATTR_GEOMETRY attribute ...	2012-07-24 16:24:11 -07:00
Linus Torvalds	6dd53aa456	PCI changes for the 3.6 merge window: Host bridge hotplug - Add MMCONFIG support for hot-added host bridges (Jiang Liu) Device hotplug - Move fixups from __init to __devinit (Sebastian Andrzej Siewior) - Call FINAL fixups for hot-added devices, too (Myron Stowe) - Factor out generic code for P2P bridge hot-add (Yinghai Lu) - Remove all functions in a slot, not just those with _EJx (Amos Kong) Dynamic resource management - Track bus number allocation (struct resource tree per domain) (Yinghai Lu) - Make P2P bridge 1K I/O windows work with resource reassignment (Bjorn Helgaas, Yinghai Lu) - Disable decoding while updating 64-bit BARs (Bjorn Helgaas) Power management - Add PCIe runtime D3cold support (Huang Ying) Virtualization - Add VFIO infrastructure (ACS, DMA source ID quirks) (Alex Williamson) - Add quirks for devices with broken INTx masking (Jan Kiszka) Miscellaneous - Fix some PCI Express capability version issues (Myron Stowe) - Factor out some arch code with a weak, generic, pcibios_setup() (Myron Stowe) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAABAgAGBQJQBy+9AAoJEPGMOI97Hn6zOpQP+wVFvA7pcteFj6HPs5nTq2Hc 55oeRqCO0wBHoFMCKB0AjeTATjqxi9OhcjaiVrZejxNyWKC9MnrXuunpQ0l/hCbR M/TK+BCelfX2FU4eXNf+TBCCcOhOVWqQft9Gm6nYKwX8Y0msRVCceI4WwhZgSwtI vdtmnqlwolscdnq+8ThsnvUMtwkN0gExmn2FJRl6EoEgG0DTqhMkZ83uA+NPBhvv I+g0XbA6haaZph2nnSYR0hIW4Q7JkT/LgA6uVAQxamctwxLol7xxsjCRnfqrulkf kaRr2fAgBXfmaOIltro4UkXrCM52ZSyggCDfExHp6mWGPKMjE5ZcyK1YbGfmmumk DS3t1S0eBdDJXrnf9l/Yb8e95dQxRCYKelKzr1rTD9QAXsInE8rC40hvhfFaTa4s nZYRTz0SKv6coQihqaOR7shx1DNomLFk7jndaWEElfl9/cT/nQnZ8XLfVMzkJNNB Y4SM6zkiIaCL0aiSEE16MqVjmODYRjbURLYzQIrqr2KJQg8X6XjIRojQLjL6xEgA 22ry2ZRPhqO68g7aLqvixiSDaTp0Z0Vw+JmgjtBqvkokwZcGQtm4umkpAdOi+Es8 3bJaMY7ZUpDX53FE8iyP6AnmR/1k19rC1gNnNq/syWyjtYOYJ9i3QCTafFgvE1VC 5coQ1L5tByHvpzK5PHwf =oo/A -----END PGP SIGNATURE----- Merge tag 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI changes from Bjorn Helgaas: "Host bridge hotplug: - Add MMCONFIG support for hot-added host bridges (Jiang Liu) Device hotplug: - Move fixups from __init to __devinit (Sebastian Andrzej Siewior) - Call FINAL fixups for hot-added devices, too (Myron Stowe) - Factor out generic code for P2P bridge hot-add (Yinghai Lu) - Remove all functions in a slot, not just those with _EJx (Amos Kong) Dynamic resource management: - Track bus number allocation (struct resource tree per domain) (Yinghai Lu) - Make P2P bridge 1K I/O windows work with resource reassignment (Bjorn Helgaas, Yinghai Lu) - Disable decoding while updating 64-bit BARs (Bjorn Helgaas) Power management: - Add PCIe runtime D3cold support (Huang Ying) Virtualization: - Add VFIO infrastructure (ACS, DMA source ID quirks) (Alex Williamson) - Add quirks for devices with broken INTx masking (Jan Kiszka) Miscellaneous: - Fix some PCI Express capability version issues (Myron Stowe) - Factor out some arch code with a weak, generic, pcibios_setup() (Myron Stowe)" * tag 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (122 commits) PCI: hotplug: ensure a consistent return value in error case PCI: fix undefined reference to 'pci_fixup_final_inited' PCI: build resource code for M68K architecture PCI: pciehp: remove unused pciehp_get_max_lnk_width(), pciehp_get_cur_lnk_width() PCI: reorder __pci_assign_resource() (no change) PCI: fix truncation of resource size to 32 bits PCI: acpiphp: merge acpiphp_debug and debug PCI: acpiphp: remove unused res_lock sparc/PCI: replace pci_cfg_fake_ranges() with pci_read_bridge_bases() PCI: call final fixups hot-added devices PCI: move final fixups from __init to __devinit x86/PCI: move final fixups from __init to __devinit MIPS/PCI: move final fixups from __init to __devinit PCI: support sizing P2P bridge I/O windows with 1K granularity PCI: reimplement P2P bridge 1K I/O windows (Intel P64H2) PCI: disable MEM decoding while updating 64-bit MEM BARs PCI: leave MEM and IO decoding disabled during 64-bit BAR sizing, too PCI: never discard enable/suspend/resume_early/resume fixups PCI: release temporary reference in __nv_msi_ht_cap_quirk() PCI: restructure 'pci_do_fixups()' ...	2012-07-24 16:17:07 -07:00
Linus Torvalds	62c4d9afa4	Features: * Performance improvement to lower the amount of traps the hypervisor has to do 32-bit guests. Mainly for setting PTE entries and updating TLS descriptors. * MCE polling driver to collect hypervisor MCE buffer and present them to /dev/mcelog. * Physical CPU online/offline support. When an privileged guest is booted it is present with virtual CPUs, which might have an 1:1 to physical CPUs but usually don't. This provides mechanism to offline/online physical CPUs. Bug-fixes for: * Coverity found fixes in the console and ACPI processor driver. * PVonHVM kexec fixes along with some cleanups. * Pages that fall within E820 gaps and non-RAM regions (and had been released to hypervisor) would be populated back, but potentially in non-RAM regions. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJQDWcvAAoJEFjIrFwIi8fJ6GAH/iFIkOC5wseD8qZ9nV4VI46t 0GYvBFC4F91NvC7CNfoAySr84v+ZORIZzMcdyDF8H/tLO9MaOY/Mwn0S5ZSqmYMi rhskvK3InBaVkYtceOHugNGM7mB0c3STIm7OsjW6gbVzohmTN25rbQR+X5iWAtVA cTUtDyH3AU15mwuVT3U+VC4IulHpnNJz4pHoq3Sn61/UK1LYmhLXYd5fveA0D0B8 lRZTAvNMsYDJDDmkWNrs8RczKkQ86DTSjfGawm0YG+Gf94GgD5yMHWbiHh2Gy93e u7sHK0RrKbP5BY/MV6vVJxkoV5NoWgCc0tcjBcYwdyvwzxDS75UhV6uoVHC3Ao8= =drt2 -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.6-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen update from Konrad Rzeszutek Wilk: "Features: * Performance improvement to lower the amount of traps the hypervisor has to do 32-bit guests. Mainly for setting PTE entries and updating TLS descriptors. * MCE polling driver to collect hypervisor MCE buffer and present them to /dev/mcelog. * Physical CPU online/offline support. When an privileged guest is booted it is present with virtual CPUs, which might have an 1:1 to physical CPUs but usually don't. This provides mechanism to offline/online physical CPUs. Bug-fixes for: * Coverity found fixes in the console and ACPI processor driver. * PVonHVM kexec fixes along with some cleanups. * Pages that fall within E820 gaps and non-RAM regions (and had been released to hypervisor) would be populated back, but potentially in non-RAM regions." * tag 'stable/for-linus-3.6-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: populate correct number of pages when across mem boundary (v2) xen PVonHVM: move shared_info to MMIO before kexec xen: simplify init_hvm_pv_info xen: remove cast from HYPERVISOR_shared_info assignment xen: enable platform-pci only in a Xen guest xen/pv-on-hvm kexec: shutdown watches from old kernel xen/x86: avoid updating TLS descriptors if they haven't changed xen/x86: add desc_equal() to compare GDT descriptors xen/mm: zero PTEs for non-present MFNs in the initial page table xen/mm: do direct hypercall in xen_set_pte() if batching is unavailable xen/hvc: Fix up checks when the info is allocated. xen/acpi: Fix potential memory leak. xen/mce: add .poll method for mcelog device driver xen/mce: schedule a workqueue to avoid sleep in atomic context xen/pcpu: Xen physical cpus online/offline sys interface xen/mce: Register native mce handler as vMCE bounce back point x86, MCE, AMD: Adjust initcall sequence for xen xen/mce: Add mcelog support for Xen platform	2012-07-24 13:14:03 -07:00
Linus Torvalds	5fecc9d8f5	KVM updates for the 3.6 merge window -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQDRDNAAoJEI7yEDeUysxlkl8P/3C2AHx2webOU8sVzhfU6ONZ ZoGevwBjyZIeJEmiWVpFTTEew1l0PXtpyOocXGNUXIddVnhXTQOKr/Scj4uFbmx8 ROqgK8NSX9+xOGrBPCoN7SlJkmp+m6uYtwYkl2SGnsEVLWMKkc7J7oqmszCcTQvN UXMf7G47/Ul2NUSBdv4Yvizhl4kpvWxluiweDw3E/hIQKN0uyP7CY58qcAztw8nG csZBAnnuPFwIAWxHXW3eBBv4UP138HbNDqJ/dujjocM6GnOxmXJmcZ6b57gh+Y64 3+w9IR4qrRWnsErb/I8inKLJ1Jdcf7yV2FmxYqR4pIXay2Yzo1BsvFd6EB+JavUv pJpixrFiDDFoQyXlh4tGpsjpqdXNMLqyG4YpqzSZ46C8naVv9gKE7SXqlXnjyDlb Llx3hb9Fop8O5ykYEGHi+gIISAK5eETiQl4yw9RUBDpxydH4qJtqGIbLiDy8y9wi Xyi8PBlNl+biJFsK805lxURqTp/SJTC3+Zb7A7CzYEQm5xZw3W/CKZx1ZYBfpaa/ pWaP6tB7JwgLIVXi4HQayLWqMVwH0soZIn9yazpOEFv6qO8d5QH5RAxAW2VXE3n5 JDlrajar/lGIdiBVWfwTJLb86gv3QDZtIWoR9mZuLKeKWE/6PRLe7HQpG1pJovsm 2AsN5bS0BWq+aqPpZHa5 =pECD -----END PGP SIGNATURE----- Merge tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Avi Kivity: "Highlights include - full big real mode emulation on pre-Westmere Intel hosts (can be disabled with emulate_invalid_guest_state=0) - relatively small ppc and s390 updates - PCID/INVPCID support in guests - EOI avoidance; 3.6 guests should perform better on 3.6 hosts on interrupt intensive workloads) - Lockless write faults during live migration - EPT accessed/dirty bits support for new Intel processors" Fix up conflicts in: - Documentation/virtual/kvm/api.txt: Stupid subchapter numbering, added next to each other. - arch/powerpc/kvm/booke_interrupts.S: PPC asm changes clashing with the KVM fixes - arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c: Duplicated commits through the kvm tree and the s390 tree, with subsequent edits in the KVM tree. * tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits) KVM: fix race with level interrupts x86, hyper: fix build with !CONFIG_KVM_GUEST Revert "apic: fix kvm build on UP without IOAPIC" KVM guest: switch to apic_set_eoi_write, apic_write apic: add apic_set_eoi_write for PV use KVM: VMX: Implement PCID/INVPCID for guests with EPT KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check KVM: PPC: Critical interrupt emulation support KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests KVM: PPC64: booke: Set interrupt computation mode for 64-bit host KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt KVM: PPC: bookehv64: Add support for std/ld emulation. booke: Added crit/mc exception handler for e500v2 booke/bookehv: Add host crit-watchdog exception support KVM: MMU: document mmu-lock and fast page fault KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint KVM: MMU: trace fast page fault KVM: MMU: fast path of handling guest page fault KVM: MMU: introduce SPTE_MMU_WRITEABLE bit KVM: MMU: fold tlb flush judgement into mmu_spte_update ...	2012-07-24 12:01:20 -07:00
Joerg Roedel	395e51f18d	Merge branches 'iommu/fixes', 'x86/amd', 'groups', 'arm/tegra' and 'api/domain-attr' into next Conflicts: drivers/iommu/iommu.c include/linux/iommu.h	2012-07-23 12:17:00 +02:00
Linus Torvalds	5b160bd426	Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/mce changes from Ingo Molnar: "This tree improves the AMD thresholding bank code and includes a memory fault signal handling fixlet." * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Fix siginfo_t->si_addr value for non-recoverable memory faults x86, MCE, AMD: Update copyrights and boilerplate x86, MCE, AMD: Give proper names to the thresholding banks x86, MCE, AMD: Make error_count read only x86, MCE, AMD: Cleanup reading of error_count x86, MCE, AMD: Print decimal thresholding values x86, MCE, AMD: Move shared bank to node descriptor x86, MCE, AMD: Remove local_allocate_... wrapper x86, MCE, AMD: Remove shared banks sysfs linking x86, amd_nb: Export model 0x10 and later PCI id	2012-07-22 16:07:45 -07:00
Linus Torvalds	2bd3488fcf	Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/uv changes from Ingo Molnar: "UV2 BAU productization fixes. The BAU (Broadcast Assist Unit) is SGI's fancy out of line way on UV hardware to do TLB flushes, instead of the normal APIC IPI methods. The commits here fix / work around hangs in their latest hardware iteration (UV2). My understanding is that the main purpose of the out of line signalling channel is to improve scalability: the UV APIC hardware glue does not handle broadcasting to many CPUs very well, and this matters most for TLB shootdowns. [ I don't agree with all aspects of the current approach: in hindsight it would have been better to link the BAU at the IPI/APIC driver level instead of the TLB shootdown level, where TLB flushes are really just one of the uses of broadcast SMP messages. Doing that would improve scalability in some other ways and it would also remove a few uglies from the TLB path. It would also be nice to push more is_uv_system() tests into proper x86_init or x86_platform callbacks. Cliff? ]" * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/uv: Work around UV2 BAU hangs x86/uv: Implement UV BAU runtime enable and disable control via /proc/sgi_uv/ x86/uv: Fix the UV BAU destination timeout period	2012-07-22 12:37:15 -07:00
Linus Torvalds	d5d96ed2d8	Merge branch 'x86-reboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/reboot changes from Ingo Molnar: "Now that the revampted x86 real-mode trampoline code is upstream and seems to be working well, we can extend the 64-bit reboot code to be as capable as the 32-bit one." * 'x86-reboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64, reboot: Be more paranoid in 64-bit reboot=bios x86, reboot: Drop redundant write of reboot_mode x86-64, reboot: Allow reboot=bios and reboot-cpu override on x86-64	2012-07-22 12:25:47 -07:00
Linus Torvalds	bd3e57f913	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform changes from Ingo Molnar: "This tree mostly involves various APIC driver cleanups/robustization, and vSMP motivated platform callback improvements/cleanups" Fix up trivial conflict due to printk cleanup right next to return value change. * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits) Revert "x86/early_printk: Replace obsolete simple_strtoul() usage with kstrtoint()" x86/apic/x2apic: Use multiple cluster members for the irq destination only with the explicit affinity x86/apic/x2apic: Limit the vector reservation to the user specified mask x86/apic: Optimize cpu traversal in __assign_irq_vector() using domain membership x86/vsmp: Fix vector_allocation_domain's return value irq/apic: Use config_enabled(CONFIG_SMP) checks to clean up irq_set_affinity() for UP x86/vsmp: Fix linker error when CONFIG_PROC_FS is not set x86/apic/es7000: Make apicid of a cluster (not CPU) from a cpumask x86/apic/es7000+summit: Always make valid apicid from a cpumask x86/apic/es7000+summit: Fix compile warning in cpu_mask_to_apicid() x86/apic: Fix ugly casting and branching in cpu_mask_to_apicid_and() x86/apic: Eliminate cpu_mask_to_apicid() operation x86/x2apic/cluster: Vector_allocation_domain() should return a value x86/apic/irq_remap: Silence a bogus pr_err() x86/vsmp: Ignore IOAPIC IRQ affinity if possible x86/apic: Make cpu_mask_to_apicid() operations check cpu_online_mask x86/apic: Make cpu_mask_to_apicid() operations return error code x86/apic: Avoid useless scanning thru a cpumask in assign_irq_vector() x86/apic: Try to spread IRQ vectors to different priority levels x86/apic: Factor out default vector_allocation_domain() operation ...	2012-07-22 12:19:36 -07:00
Linus Torvalds	3fad0953a1	Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull debug-for-linus git tree from Ingo Molnar. Fix up trivial conflict in arch/x86/kernel/cpu/perf_event_intel.c due to a printk() having changed to a pr_info() differently in the two branches. * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Move call to print_modules() out of show_regs() x86/mm: Mark free_initrd_mem() as __init x86/microcode: Mark microcode_id[] as __initconst x86/nmi: Clean up register_nmi_handler() usage x86: Save cr2 in NMI in case NMIs take a page fault (for i386) x86: Remove cmpxchg from i386 NMI nesting code x86: Save cr2 in NMI in case NMIs take a page fault x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level>	2012-07-22 12:04:44 -07:00
Linus Torvalds	a065de0d25	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/asm changes from Ingo Molnar: "Assorted single-commit improvements, as usual" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/mtrr: Slightly simplify print_mtrr_state() x86/mm/mtrr: Fix alignment determination in range_to_mtrr() x86/copy_user_generic: Optimize copy_user_generic with CPU erms feature x86/alternatives: Use atomic_xchg() instead atomic_dec_and_test() for stop_machine_text_poke()	2012-07-22 11:42:28 -07:00
Linus Torvalds	55acdddbac	Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp/hotplug changes from Ingo Molnar: "Various cleanups to the SMP hotplug code - a continuing effort of Thomas et al" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smpboot: Remove leftover declaration smp: Remove num_booting_cpus() smp: Remove ipi_call_lock[_irq]()/ipi_call_unlock[_irq]() POWERPC: Smp: remove call to ipi_call_lock()/ipi_call_unlock() SPARC: SMP: Remove call to ipi_call_lock_irq()/ipi_call_unlock_irq() ia64: SMP: Remove call to ipi_call_lock_irq()/ipi_call_unlock_irq() x86-smp-remove-call-to-ipi_call_lock-ipi_call_unlock tile: SMP: Remove call to ipi_call_lock()/ipi_call_unlock() S390: Smp: remove call to ipi_call_lock()/ipi_call_unlock() parisc: Smp: remove call to ipi_call_lock()/ipi_call_unlock() mn10300: SMP: Remove call to ipi_call_lock()/ipi_call_unlock() hexagon: SMP: Remove call to ipi_call_lock()/ipi_call_unlock()	2012-07-22 11:22:15 -07:00
Matt Fleming	9ca8f72a92	x86, efi: Handover Protocol As things currently stand, traditional EFI boot loaders and the EFI boot stub are carrying essentially the same initialisation code required to setup an EFI machine for booting a kernel. There's really no need to have this code in two places and the hope is that, with this new protocol, initialisation and booting of the kernel can be left solely to the kernel's EFI boot stub. The responsibilities of the boot loader then become, o Loading the kernel image from boot media File system code still needs to be carried by boot loaders for the scenario where the kernel and initrd files reside on a file system that the EFI firmware doesn't natively understand, such as ext4, etc. o Providing a user interface Boot loaders still need to display any menus/interfaces, for example to allow the user to select from a list of kernels. Bump the boot protocol number because we added the 'handover_offset' field to indicate the location of the handover protocol entry point. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Jones <pjones@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Acked-and-Tested-by: Matthew Garrett <mjg@redhat.com> Link: http://lkml.kernel.org/r/1342689828-16815-1-git-send-email-matt@console-pimps.org Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-07-20 16:18:58 -07:00
Alex Shi	7efa1c8796	x86/tlb: Fix build warning and crash when building for !SMP The incompatible parameter of flush_tlb_mm_range cause build warning. Fix it by correct parameter. Ingo Molnar found that this could also cause a user space crash. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1342747103-19765-1-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-07-20 15:01:48 -07:00
H. Peter Anvin	30d5c4546a	x86, cpufeature: Add the RDSEED and ADX features Add the RDSEED and ADX features documented in section 9.1 of the Intel Architecture Instruction Set Extensions Programming Reference, document 319433, version 013b, available from http://software.intel.com/en-us/avx/ The PREFETCHW bit is already supported in Linux under the name 3DNOWPREFETCH. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-lgr6482ufk1bvxzvc2hr8qbp@git.kernel.org	2012-07-20 13:36:41 -07:00
Michael S. Tsirkin	1a577b7247	KVM: fix race with level interrupts When more than 1 source id is in use for the same GSI, we have the following race related to handling irq_states race: CPU 0 clears bit 0. CPU 0 read irq_state as 0. CPU 1 sets level to 1. CPU 1 calls kvm_ioapic_set_irq(1). CPU 0 calls kvm_ioapic_set_irq(0). Now ioapic thinks the level is 0 but irq_state is not 0. Fix by performing all irq_states bitmap handling under pic/ioapic lock. This also removes the need for atomics with irq_states handling. Reported-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-07-20 16:12:00 -03:00
Liu, Jinsong	cef12ee52b	xen/mce: Add mcelog support for Xen platform When MCA error occurs, it would be handled by Xen hypervisor first, and then the error information would be sent to initial domain for logging. This patch gets error information from Xen hypervisor and convert Xen format error into Linux format mcelog. This logic is basically self-contained, not touching other kernel components. By using tools like mcelog tool users could read specific error information, like what they did under native Linux. To test follow directions outlined in Documentation/acpi/apei/einj.txt Acked-and-tested-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Ke, Liping <liping.ke@intel.com> Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-07-19 15:51:36 -04:00
Steven Rostedt	4de72395ff	ftrace/x86: Add save_regs for i386 function calls Add saving full regs for function tracing on i386. The saving of regs was influenced by patches sent out by Masami Hiramatsu. Link: Link: http://lkml.kernel.org/r/20120711195745.379060003@goodmis.org Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-07-19 13:20:37 -04:00
Steven Rostedt	08f6fba503	ftrace/x86: Add separate function to save regs Add a way to have different functions calling different trampolines. If a ftrace_ops wants regs saved on the return, then have only the functions with ops registered to save regs. Functions registered by other ops would not be affected, unless the functions overlap. If one ftrace_ops registered functions A, B and C and another ops registered fucntions to save regs on A, and D, then only functions A and D would be saving regs. Function B and C would work as normal. Although A is registered by both ops: normal and saves regs; this is fine as saving the regs is needed to satisfy one of the ops that calls it but the regs are ignored by the other ops function. x86_64 implements the full regs saving, and i386 just passes a NULL for regs to satisfy the ftrace_ops passing. Where an arch must supply both regs and ftrace_ops parameters, even if regs is just NULL. It is OK for an arch to pass NULL regs. All function trace users that require regs passing must add the flag FTRACE_OPS_FL_SAVE_REGS when registering the ftrace_ops. If the arch does not support saving regs then the ftrace_ops will fail to register. The flag FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED may be set that will prevent the ftrace_ops from failing to register. In this case, the handler may either check if regs is not NULL or check if ARCH_SUPPORTS_FTRACE_SAVE_REGS. If the arch supports passing regs it will set this macro and pass regs for ops that request them. All other archs will just pass NULL. Link: Link: http://lkml.kernel.org/r/20120711195745.107705970@goodmis.org Cc: Alexander van Heukelum <heukelum@fastmail.fm> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-07-19 13:20:03 -04:00
Steven Rostedt	28fb5dfa78	ftrace/x86_32: Push ftrace_ops in as 3rd parameter to function tracer Add support of passing the current ftrace_ops into the 3rd parameter of the callback to the function tracer. Link: http://lkml.kernel.org/r/20120612225424.942411318@goodmis.org Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-07-19 13:19:27 -04:00
Steven Rostedt	2f5f6ad939	ftrace: Pass ftrace_ops as third parameter to function trace callback Currently the function trace callback receives only the ip and parent_ip of the function that it traced. It would be more powerful to also return the ops that registered the function as well. This allows the same function to act differently depending on what ftrace_ops registered it. Link: http://lkml.kernel.org/r/20120612225424.267254552@goodmis.org Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-07-19 13:17:35 -04:00
Takuya Yoshikawa	77d11309b3	KVM: Separate rmap_pde from kvm_lpage_info->write_count This makes it possible to loop over rmap_pde arrays in the same way as we do over rmap so that we can optimize kvm_handle_hva_range() easily in the following patch. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-07-18 16:55:04 -03:00
Takuya Yoshikawa	b3ae209697	KVM: Introduce kvm_unmap_hva_range() for kvm_mmu_notifier_invalidate_range_start() When we tested KVM under memory pressure, with THP enabled on the host, we noticed that MMU notifier took a long time to invalidate huge pages. Since the invalidation was done with mmu_lock held, it not only wasted the CPU but also made the host harder to respond. This patch mitigates this by using kvm_handle_hva_range(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Cc: Alexander Graf <agraf@suse.de> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-07-18 16:55:04 -03:00
Michael S. Tsirkin	ebf7d2e993	Revert "apic: fix kvm build on UP without IOAPIC" This reverts commit `f9808b7fd4`. After commit 'kvm: switch to apic_set_eoi_write, apic_write' the stubs are no longer needed as kvm does not look at apicdrivers anymore. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-07-16 12:51:56 +03:00
Michael S. Tsirkin	1551df646d	apic: add apic_set_eoi_write for PV use KVM PV EOI optimization overrides eoi_write apic op with its own version. Add an API for this to avoid meddling with core x86 apic driver data structures directly. For KVM use, we don't need any guarantees about when the switch to the new op will take place, so it could in theory use this API after SMP init, but it currently doesn't, and restricting callers to early init makes it clear that it's safe as it won't race with actual APIC driver use. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-07-16 12:51:23 +03:00
Mao, Junjie	ad756a1603	KVM: VMX: Implement PCID/INVPCID for guests with EPT This patch handles PCID/INVPCID for guests. Process-context identifiers (PCIDs) are a facility by which a logical processor may cache information for multiple linear-address spaces so that the processor may retain cached information when software switches to a different linear address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual Volume 3A for details. For guests with EPT, the PCID feature is enabled and INVPCID behaves as running natively. For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD. Signed-off-by: Junjie Mao <junjie.mao@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-07-12 13:07:34 +03:00
Prarit Bhargava	fc73373b33	KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check While debugging I noticed that unlike all the other hypervisor code in the kernel, kvm does not have an entry for x86_hyper which is used in detect_hypervisor_platform() which results in a nice printk in the syslog. This is only really a stub function but it does make kvm more consistent with the other hypervisors. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Marcelo Tostatti <mtosatti@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>	2012-07-11 19:33:32 +03:00
Ingo Molnar	92254d3144	Linux 3.5-rc6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEcBAABAgAGBQJP+NMmAAoJEHm+PkMAQRiGPxEH/18YQN8FAzEIjcC10ytA3RC3 KzPv31jXgJGZDy1UqmpKtJ7GDwb92AhqZxVnJimMa+6d1uA8NsZQq5EMOPPiX8Qi 8P4AEaw5kSMmR/6zxsxguCGdbDLU3xZ1nJZkHyMgjo2UJbMU0jBPneb/79heWPhe 0HOkLzN5VA6Yx3Nt70sWQ1zsuj0Ji5jCGO0iNTCBmTiv4J9ZlOx3xJQn4aK6JscO /3QRTM43GG0j6zToEOCTHrn8ajOq6rHQQkG0bPVR723nFrSGLoaCT6QVBXYug+AZ 9Xay7zVNvrq2oH5x5jADG2t2vyaG+nEJpSrVjXznzxgDnK7tWjYqiuG5zqKhAq8= =IMfr -----END PGP SIGNATURE----- Merge tag 'v3.5-rc6' into x86/mce Merge Linux 3.5-rc6 before merging more code. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-07-11 09:41:37 +02:00
Avi Kivity	cbd27ee783	KVM: x86 emulator: initialize memop memop is not initialized; this can lead to a two-byte operation following a 4-byte operation to see garbage values. Usually truncation fixes things fot us later on, but at least in one case (call abs) it doesn't. Fix by moving memop to the auto-initialized field area. Signed-off-by: Avi Kivity <avi@redhat.com>	2012-07-09 14:19:02 +03:00
Avi Kivity	0017f93a27	KVM: x86 emulator: change ->get_cpuid() accessor to use the x86 semantics Instead of getting an exact leaf, follow the spec and fall back to the last main leaf instead. This lets us easily emulate the cpuid instruction in the emulator. Signed-off-by: Avi Kivity <avi@redhat.com>	2012-07-09 14:19:00 +03:00
Suresh Siddha	1ac322d0b1	x86/apic/x2apic: Limit the vector reservation to the user specified mask For the x2apic cluster mode, vector for an interrupt is currently reserved on all the cpu's that are part of the x2apic cluster. But the interrupts will be routed only to the cluster (derived from the first cpu in the mask) members specified in the mask. So there is no need to reserve the vector in the unused cluster members. Modify __assign_irq_vector() to reserve the vectors based on the user specified irq destination mask. If the new mask is a proper subset of the currently used mask, cleanup the vector allocation on the unused cpu members. Also, allow the apic driver to tune the vector domain based on the affinity mask (which in most cases is the user-specified mask). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Link: http://lkml.kernel.org/r/1340656709-11423-3-git-send-email-suresh.b.siddha@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-07-06 11:00:22 +02:00
Suresh Siddha	b39f25a849	x86/apic: Optimize cpu traversal in __assign_irq_vector() using domain membership Currently __assign_irq_vector() goes through each cpu in the specified mask until it finds a free vector in all the cpu's that are part of the same interrupt domain. We visit all the interrupt domain sibling cpus to reserve the free vector. So, when we fail to find a free vector in an interrupt domain, it is safe to continue our search with a cpu belonging to a new interrupt domain. No need to go through each cpu, if the domain containing that cpu is already visited. Use the irq_cfg's old_domain to track the visited domains and optimize the cpu traversal while finding a free vector in the given cpumask. NOTE: We can also optimize the search by using for_each_cpu() and skip the current cpu, if it is not the first cpu in the mask returned by the vector_allocation_domain(). But re-using the cfg->old_domain to track the visited domains will be slightly faster. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Link: http://lkml.kernel.org/r/1340656709-11423-2-git-send-email-suresh.b.siddha@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-07-06 11:00:21 +02:00
Peter Zijlstra	c93dc84cbe	perf/x86: Add a microcode revision check for SNB-PEBS Recent Intel microcode resolved the SNB-PEBS issues, so conditionally enable PEBS on SNB hardware depending on the microcode revision. Thanks to Stephane for figuring out the various microcode revisions. Suggested-by: Stephane Eranian <eranian@google.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-v3672ziwh9damwqwh1uz3krm@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-07-05 21:55:57 +02:00
Robert Richter	b1dc3c4820	perf/x86/amd: Unify AMD's generic and family 15h pmus There is no need for keeping separate pmu structs. We can enable amd_{get,put}_event_constraints() functions also for family 15h event. The advantage is that there is only a single pmu struct for all AMD cpus. This patch introduces functions to setup the pmu to enabe core performance counters or counter constraints. Also, cpuid checks are used instead of family checks where possible. Thus, it enables the code independently of cpu families if the feature flag is set. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340217996-2254-4-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-07-05 21:19:41 +02:00
Robert Richter	15c7ad51ad	perf/x86: Rename Intel specific macros There are macros that are Intel specific and not x86 generic. Rename them into INTEL_*. This patch removes X86_PMC_IDX_GENERIC and does: $ sed -i -e 's/X86_PMC_MAX_/INTEL_PMC_MAX_/g' \ arch/x86/include/asm/kvm_host.h \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_p4.c \ arch/x86/kvm/pmu.c $ sed -i -e 's/X86_PMC_IDX_FIXED/INTEL_PMC_IDX_FIXED/g' \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_intel.c \ arch/x86/kernel/cpu/perf_event_intel_ds.c \ arch/x86/kvm/pmu.c $ sed -i -e 's/X86_PMC_MSK_/INTEL_PMC_MSK_/g' \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event.c Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340217996-2254-2-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-07-05 21:19:39 +02:00
Ingo Molnar	b0338e99b2	Merge branch 'x86/cpu' into perf/core Merge this branch because we changed the wrmsr*_safe() API and there's a conflict. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-07-05 21:12:11 +02:00
Ingo Molnar	90574ebb7e	Merge branch 'perf/urgent' into perf/core Merge this branch to pick up a fixlet and to update to a more recent base. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-07-05 21:10:23 +02:00
Michael S. Tsirkin	f9808b7fd4	apic: fix kvm build on UP without IOAPIC On UP i386, when APIC is disabled # CONFIG_X86_UP_APIC is not set # CONFIG_PCI_IOAPIC is not set code looking at apicdrivers never has any effect but it still gets compiled in. In particular, this causes build failures with kvm, but it generally bloats the kernel unnecessarily. Fix by defining both __apicdrivers and __apicdrivers_end to be NULL when CONFIG_X86_LOCAL_APIC is unset: I verified that as the result any loop scanning __apicdrivers gets optimized out by the compiler. Warning: a .config with apic disabled doesn't seem to boot for me (even without this patch). Still verifying why, meanwhile this patch is compile-tested only. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reported-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-07-03 14:55:29 -03:00
Fenghua Yu	954e482bde	x86/copy_user_generic: Optimize copy_user_generic with CPU erms feature According to Intel 64 and IA-32 SDM and Optimization Reference Manual, beginning with Ivybridge, REG string operation using MOVSB and STOSB can provide both flexible and high-performance REG string operations in cases like memory copy. Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/ STOSB). If CPU erms feature is detected, patch copy_user_generic with enhanced fast string version of copy_user_generic. A few new macros are defined to reduce duplicate code in ALTERNATIVE and ALTERNATIVE_2. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1337908785-14015-1-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-06-29 15:33:34 -07:00
Linus Torvalds	15b77435ed	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar. * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpufeature: Remove stray %s, add -w to mkcapflags.pl x86, cpufeature: Catch duplicate CPU feature strings x86, cpufeature: Rename X86_FEATURE_DTS to X86_FEATURE_DTHERM x86: Fix kernel-doc warnings x86, compat: Use test_thread_flag(TIF_IA32) in compat signal delivery	2012-06-29 10:29:54 -07:00
Alex Shi	effee4b9b3	x86/tlb: do flush_tlb_kernel_range by 'invlpg' This patch do flush_tlb_kernel_range by 'invlpg'. The performance pay and gain was analyzed in previous patch (x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range). In the testing: http://lkml.org/lkml/2012/6/21/10 The pay is mostly covered by long kernel path, but the gain is still quite clear, memory access in user APP can increase 30+% when kernel execute this funtion. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-10-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-06-27 19:29:14 -07:00
Alex Shi	52aec3308d	x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR There are 32 INVALIDATE_TLB_VECTOR now in kernel. That is quite big amount of vector in IDT. But it is still not enough, since modern x86 sever has more cpu number. That still causes heavy lock contention in TLB flushing. The patch using generic smp call function to replace it. That saved 32 vector number in IDT, and resolved the lock contention in TLB flushing on large system. In the NHM EX machine 4P * 8cores * HT = 64 CPUs, hackbench pthread has 3% performance increase. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-9-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-06-27 19:29:13 -07:00
Alex Shi	611ae8e3f5	x86/tlb: enable tlb flush range support for x86 Not every tlb_flush execution moment is really need to evacuate all TLB entries, like in munmap, just few 'invlpg' is better for whole process performance, since it leaves most of TLB entries for later accessing. This patch also rewrite flush_tlb_range for 2 purposes: 1, split it out to get flush_blt_mm_range function. 2, clean up to reduce line breaking, thanks for Borislav's input. My micro benchmark 'mummap' http://lkml.org/lkml/2012/5/17/59 show that the random memory access on other CPU has 0~50% speed up on a 2P * 4cores * HT NHM EP while do 'munmap'. Thanks Yongjie's testing on this patch: ------------- I used Linux 3.4-RC6 w/ and w/o his patches as Xen dom0 and guest kernel. After running two benchmarks in Xen HVM guest, I found his patches brought about 1%~3% performance gain in 'kernel build' and 'netperf' testing, though the performance gain was not very stable in 'kernel build' testing. Some detailed testing results are below. Testing Environment: Hardware: Romley-EP platform Xen version: latest upstream Linux kernel: 3.4-RC6 Guest vCPU number: 8 NIC: Intel 82599 (10GB bandwidth) In 'kernel build' testing in guest: Command line \| performance gain make -j 4 \| 3.81% make -j 8 \| 0.37% make -j 16 \| -0.52% In 'netperf' testing, we tested TCP_STREAM with default socket size 16384 byte as large packet and 64 byte as small packet. I used several clients to add networking pressure, then 'netperf' server automatically generated several threads to response them. I also used large-size packet and small-size packet in the testing. Packet size \| Thread number \| performance gain 16384 bytes \| 4 \| 0.02% 16384 bytes \| 8 \| 2.21% 16384 bytes \| 16 \| 2.04% 64 bytes \| 4 \| 1.07% 64 bytes \| 8 \| 3.31% 64 bytes \| 16 \| 0.71% Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-8-git-send-email-alex.shi@intel.com Tested-by: Ren, Yongjie <yongjie.ren@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-06-27 19:29:11 -07:00
Alex Shi	c4211f42d3	x86/tlb: add tlb_flushall_shift for specific CPU Testing show different CPU type(micro architectures and NUMA mode) has different balance points between the TLB flush all and multiple invlpg. And there also has cases the tlb flush change has no any help. This patch give a interface to let x86 vendor developers have a chance to set different shift for different CPU type. like some machine in my hands, balance points is 16 entries on Romely-EP; while it is at 8 entries on Bloomfield NHM-EP; and is 256 on IVB mobile CPU. but on model 15 core2 Xeon using invlpg has nothing help. For untested machine, do a conservative optimization, same as NHM CPU. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-5-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-06-27 19:29:10 -07:00
Alex Shi	e7b52ffd45	x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range x86 has no flush_tlb_range support in instruction level. Currently the flush_tlb_range just implemented by flushing all page table. That is not the best solution for all scenarios. In fact, if we just use 'invlpg' to flush few lines from TLB, we can get the performance gain from later remain TLB lines accessing. But the 'invlpg' instruction costs much of time. Its execution time can compete with cr3 rewriting, and even a bit more on SNB CPU. So, on a 512 4KB TLB entries CPU, the balance points is at: (512 - X) * 100ns(assumed TLB refill cost) = X(TLB flush entries) * 100ns(assumed invlpg cost) Here, X is 256, that is 1/2 of 512 entries. But with the mysterious CPU pre-fetcher and page miss handler Unit, the assumed TLB refill cost is far lower then 100ns in sequential access. And 2 HT siblings in one core makes the memory access more faster if they are accessing the same memory. So, in the patch, I just do the change when the target entries is less than 1/16 of whole active tlb entries. Actually, I have no data support for the percentage '1/16', so any suggestions are welcomed. As to hugetlb, guess due to smaller page table, and smaller active TLB entries, I didn't see benefit via my benchmark, so no optimizing now. My micro benchmark show in ideal scenarios, the performance improves 70 percent in reading. And in worst scenario, the reading/writing performance is similar with unpatched 3.4-rc4 kernel. Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP 'always': multi thread testing, '-t' paramter is thread number: with patch unpatched 3.4-rc4 ./mprotect -t 1 14ns 24ns ./mprotect -t 2 13ns 22ns ./mprotect -t 4 12ns 19ns ./mprotect -t 8 14ns 16ns ./mprotect -t 16 28ns 26ns ./mprotect -t 32 54ns 51ns ./mprotect -t 128 200ns 199ns Single process with sequencial flushing and memory accessing: with patch unpatched 3.4-rc4 ./mprotect 7ns 11ns ./mprotect -p 4096 -l 8 -n 10240 21ns 21ns [ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com has additional performance numbers. ] Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-06-27 19:29:07 -07:00
Alex Shi	e0ba94f14f	x86/tlb_info: get last level TLB entry number of CPU For 4KB pages, x86 CPU has 2 or 1 level TLB, first level is data TLB and instruction TLB, second level is shared TLB for both data and instructions. For hupe page TLB, usually there is just one level and seperated by 2MB/4MB and 1GB. Although each levels TLB size is important for performance tuning, but for genernal and rude optimizing, last level TLB entry number is suitable. And in fact, last level TLB always has the biggest entry number. This patch will get the biggest TLB entry number and use it in furture TLB optimizing. Accroding Borislav's suggestion, except tlb_ll[i/d]_* array, other function and data will be released after system boot up. For all kinds of x86 vendor friendly, vendor specific code was moved to its specific files. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-2-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-06-27 19:28:24 -07:00
Jussi Kivilinna	70ef2601fe	crypto: move arch/x86/include/asm/aes.h to arch/x86/include/asm/crypto/ Move AES header to the new asm/crypto directory. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-06-27 14:42:03 +08:00
Jussi Kivilinna	d4af0e9d6e	crypto: move arch/x86/include/asm/serpent-{sse2\|avx}.h to arch/x86/include/asm/crypto/ Move serpent crypto headers to the new asm/crypto/ directory. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-06-27 14:42:02 +08:00

... 49 50 51 52 53 ...

8502 Commits