linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-16 09:02:00 +00:00

Author	SHA1	Message	Date
Linus Torvalds	80f33a5fdf	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/rtc: Remove duplicate const specifier x86, early_serial_console: Remove unnecessary check x86, early_serial_console: Remove unused macro XMTRDY x86, setup: Rename BOOT_ISDIGIT_H to BOOT_CTYPE_H x86, CPU: Fix trivial printk formatting issues with dmesg	2015-02-09 17:50:09 -08:00
Linus Torvalds	7453311d68	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm changes from Ingo Molnar: "The main changes in this cycle were the x86/entry and sysret enhancements from Andy Lutomirski, see merge commits `772a9aca12` and `b57c0b5175` for details" [ Exectutive summary: IST exceptions that interrupt user space will run on the regular kernel stack instead of the IST stack. Which simplifies things particularly on return to user space. The sysret cleanup ends up simplifying the logic on when we can use sysret vs when we have to use iret. - Linus ] * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86_64, entry: Remove the syscall exit audit and schedule optimizations x86_64, entry: Use sysret to return to userspace when possible x86, traps: Fix ist_enter from userspace x86, vdso: teach 'make clean' remove vdso64 binaries x86_64 entry: Fix RCX for ptraced syscalls x86: entry_64.S: fold SAVE_ARGS_IRQ macro into its sole user x86: ia32entry.S: fix wrong symbolic constant usage: R11->ARGOFFSET x86: entry_64.S: delete unused code x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks x86, traps: Add ist_begin_non_atomic and ist_end_non_atomic x86: Clean up current_stack_pointer x86, traps: Track entry into and exit from IST context x86, entry: Switch stacks on a paranoid entry from userspace	2015-02-09 17:16:44 -08:00
Linus Torvalds	9d43bade34	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 APIC updates from Ingo Molnar: "Continued fallout of the conversion of the x86 IRQ code to the hierarchical irqdomain framework: more cleanups, simplifications, memory allocation behavior enhancements, mainly in the interrupt remapping and APIC code" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) x86, init: Fix UP boot regression on x86_64 iommu/amd: Fix irq remapping detection logic x86/acpi: Make acpi_[un]register_gsi_ioapic() depend on CONFIG_X86_LOCAL_APIC x86: Consolidate boot cpu timer setup x86/apic: Reuse apic_bsp_setup() for UP APIC setup x86/smpboot: Sanitize uniprocessor init x86/smpboot: Move apic init code to apic.c init: Get rid of x86isms x86/apic: Move apic_init_uniprocessor code x86/smpboot: Cleanup ioapic handling x86/apic: Sanitize ioapic handling x86/ioapic: Add proper checks to setp/enable_IO_APIC() x86/ioapic: Provide stub functions for IOAPIC%3Dn x86/smpboot: Move smpboot inlines to code x86/x2apic: Use state information for disable x86/x2apic: Split enable and setup function x86/x2apic: Disable x2apic from nox2apic setup x86/x2apic: Add proper state tracking x86/x2apic: Clarify remapping mode for x2apic enablement x86/x2apic: Move code in conditional region ...	2015-02-09 16:57:56 -08:00
Linus Torvalds	a4cbbf549a	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side changes: - AMD range breakpoints support: Extend breakpoint tools and core to support address range through perf event with initial backend support for AMD extended breakpoints. The syntax is: perf record -e mem:addr/len:type For example set write breakpoint from 0x1000 to 0x1200 (0x1000 + 512) perf record -e mem:0x1000/512:w - event throttling/rotating fixes - various event group handling fixes, cleanups and general paranoia code to be more robust against bugs in the future. - kernel stack overhead fixes User-visible tooling side changes: - Show precise number of samples in at the end of a 'record' session, if processing build ids, since we will then traverse the whole perf.data file and see all the PERF_RECORD_SAMPLE records, otherwise stop showing the previous off-base heuristicly counted number of "samples" (Namhyung Kim). - Support to read compressed module from build-id cache (Namhyung Kim) - Enable sampling loads and stores simultaneously in 'perf mem' (Stephane Eranian) - 'perf diff' output improvements (Namhyung Kim) - Fix error reporting for evsel pgfault constructor (Arnaldo Carvalho de Melo) Tooling side infrastructure changes: - Cache eh/debug frame offset for dwarf unwind (Namhyung Kim) - Support parsing parameterized events (Cody P Schafer) - Add support for IP address formats in libtraceevent (David Ahern) Plus other misc fixes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits) perf: Decouple unthrottling and rotating perf: Drop module reference on event init failure perf: Use POLLIN instead of POLL_IN for perf poll data in flag perf: Fix put_event() ctx lock perf: Fix move_group() order perf: Fix event->ctx locking perf: Add a bit of paranoia perf symbols: Convert lseek + read to pread perf tools: Use perf_data_file__fd() consistently perf symbols: Support to read compressed module from build-id cache perf evsel: Set attr.task bit for a tracking event perf header: Set header version correctly perf record: Show precise number of samples perf tools: Do not use __perf_session__process_events() directly perf callchain: Cache eh/debug frame offset for dwarf unwind perf tools: Provide stub for missing pthread_attr_setaffinity_np perf evsel: Don't rely on malloc working for sz 0 tools lib traceevent: Add support for IP address formats perf ui/tui: Show fatal error message only if exists perf tests: Fix typo in sample-parsing.c ...	2015-02-09 15:43:55 -08:00
Rafael J. Wysocki	c488ea4613	Merge branch 'sfi' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux into pm-cpufreq Pull SFI-based cpufreq driver for v3.20 from Len Brown. * 'sfi' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: cpufreq: Add SFI based cpufreq driver support SFI: fix compiler warnings	2015-02-09 23:43:53 +01:00
Linus Torvalds	23e8fe2e16	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main RCU changes in this cycle are: - Documentation updates. - Miscellaneous fixes. - Preemptible-RCU fixes, including fixing an old bug in the interaction of RCU priority boosting and CPU hotplug. - SRCU updates. - RCU CPU stall-warning updates. - RCU torture-test updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) rcu: Initialize tiny RCU stall-warning timeouts at boot rcu: Fix RCU CPU stall detection in tiny implementation rcu: Add GP-kthread-starvation checks to CPU stall warnings rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors rcu: Optionally run grace-period kthreads at real-time priority ksoftirqd: Use new cond_resched_rcu_qs() function ksoftirqd: Enable IRQs and call cond_resched() before poking RCU rcutorture: Add more diagnostics in rcu_barrier() test failure case torture: Flag console.log file to prevent holdovers from earlier runs torture: Add "-enable-kvm -soundhw pcspk" to qemu command line rcutorture: Handle different mpstat versions rcutorture: Check from beginning to end of grace period rcu: Remove redundant rcu_batches_completed() declaration rcutorture: Drop rcu_torture_completed() and friends rcu: Provide rcu_batches_completed_sched() for TINY_RCU rcutorture: Use unsigned for Reader Batch computations rcutorture: Make build-output parsing correctly flag RCU's warnings rcu: Make _batches_completed() functions return unsigned long rcutorture: Issue warnings on close calls due to Reader Batch blows documentation: Fix smp typo in memory-barriers.txt ...	2015-02-09 14:28:42 -08:00
Len Brown	3a9a941d0b	tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS The Processor generation code-named Haswell added MSR_{CORE \| GFX \| RING}_PERF_LIMIT_REASONS to explain when and how the processor limits frequency. turbostat -v will now decode these bits. Each MSR has an "Active" set of bits which describe current conditions, and a "Logged" set of bits, which describe what has happened since last cleared. Turbostat currently doesn't clear the log bits. Signed-off-by: Len Brown <len.brown@intel.com>	2015-02-09 16:44:24 -05:00
Linus Torvalds	b0c1936c44	spi: Updates for v3.20 The major highlight this release is a refactoring of the core to allow us to run synchronous transfers in the context of the caller when there is no contention for the bus. This improves performance in the very common case by eliminating context switches and reducing the number of hardware setup and teardown operations we need to perform. Other changes: - New drivers for DLN-2 USB-SPI adapter and ST SPI controllers. - A big round of cleanups, performance and feature improvements for the xilinx driver from Ricardo Ribalda Delgado. - A wide range of smaller cleanups, fixes and feature improvements throughout the subsystem. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJU2GNgAAoJECTWi3JdVIfQLiYH/0uLN43CunPp0gSWllQ2PY1O R1QiqXg1fr1uZKRuGy59QF0TkU/JlWPY+tpGiOH1jrnDsoecnWsxDx3YEeuYdV6U c//UrlK2uvESivbc48zVUTwCsgxsE8apG0JgqLjsfUpqZTEFxFpeSskepSJ2kIUz bsXHU8Xi0WkLalsk/8Ik8aUvOwVi5EtRE9OMvnU6QPqQMCszgv1TH4UbwbhqwwzZ U23WbNHQ262XDRwY2LKl/QROULeU5pd9F19wrveKMa42fkbu/e+kk6E3n7/Hd4mV CUjv1wTCpPZvzh3bTk50uXwA9XQOzv6ddw6jqsgLcV6jS8Ju3Z3Beya3fmdhOl0= =3ZQr -----END PGP SIGNATURE----- Merge tag 'spi-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi updates from Mark Brown: "The major highlight this release is a refactoring of the core to allow us to run synchronous transfers in the context of the caller when there is no contention for the bus. This improves performance in the very common case by eliminating context switches and reducing the number of hardware setup and teardown operations we need to perform. Other changes: - New drivers for DLN-2 USB-SPI adapter and ST SPI controllers. - A big round of cleanups, performance and feature improvements for the xilinx driver from Ricardo Ribalda Delgado. - A wide range of smaller cleanups, fixes and feature improvements throughout the subsystem" * tag 'spi-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (68 commits) spi: mxs: cleanup wait_for_completion return handling spi: ti-qspi: cleanup wait_for_completion return handling spi: spi-imx: cleanup wait_for_completion handling spi: sh-msiof: cleanup wait_for_completion return handling spi: match var type to return type of wait_for_completion spi: spi-pxa2xx: only include mach/dma.h for legacy DMA spi: atmel: cleanup wait_for_completion return handling spi: fsl-dspi: Remove possible memory leak of 'chip' spi: sh-msiof: Update calculation of frequency dividing spi: spidev: Convert buf pointers for 32-bit compat SPI_IOC_MESSAGE(n) spi/xilinx: Fix access invalid memory on xilinx_spi_tx spi: Revert "spi/xilinx: Remove iowrite/ioread wrappers" spi/xilinx: Check number of slaves range spi/xilinx: Use polling mode on small transfers spi/xilinx: Remove remaining_words driver data variable spi/xilinx: Remove iowrite/ioread wrappers spi/xilinx: Convert bits_per_word in bytes_per_word spi/xilinx: Convert remainding_bytes in remaining words spi/xilinx: Make spi_tx and spi_rx simmetric spi/xilinx: Remove rx_fn and tx_fn pointer ...	2015-02-09 13:36:20 -08:00
Tony Luck	a2413d8b29	x86/mce: Fix regression. All error records should report via /dev/mcelog I'm getting complaints from validation teams that have updated their Linux kernels from ancient versions to current. They don't see the error logs they expect. I tell the to unload any EDAC drivers[1], and things start working again. The problem is that we short-circuit the logging process if any function on the decoder chain claims to have dealt with the problem: ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m); if (ret == NOTIFY_STOP) return; The logic we used when we added this code was that we did not want to confuse users with double reports of the same error. But it turns out users are not confused - they are upset that they don't see a log where their tools used to find a log. I could also get into a long description of how the consumer of this log does more than just decode model specific details of the error. It keeps counts, tracks thresholds, takes actions and runs scripts that can alert administrators to problems. [1] We've recently compounded the problem because the acpi_extlog driver also registers for this notifier and also returns NOTIFY_STOP. Signed-off-by: Tony Luck <tony.luck@intel.com>	2015-02-09 09:36:53 -08:00
Paolo Bonzini	d44e121223	KVM: x86: emulate: correct page fault error code for NoWrite instructions NoWrite instructions (e.g. cmp or test) never set the "write access" bit in the error code, even if one of the operands is treated as a destination. Fixes: `c205fb7d7d` Cc: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-09 13:36:01 +01:00
Mark Brown	fab4b42a9a	Merge remote-tracking branches 'spi/topic/atmel', 'spi/topic/config', 'spi/topic/dln2' and 'spi/topic/dw' into spi-next	2015-02-08 11:16:43 +08:00
Linus Torvalds	26cdd1f76a	Merge branches 'timers-urgent-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer and x86 fix from Ingo Molnar: "A CLOCK_TAI early expiry fix and an x86 microcode driver oops fix" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Fix incorrect tai offset calculation for non high-res timer systems * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode: Return error from driver init code when loader is disabled	2015-02-06 13:56:02 -08:00
Ken Xue	92082a8886	ACPI: add AMD ACPI2Platform device support for x86 system This new feature is to interpret AMD specific ACPI device to platform device such as I2C, UART, GPIO found on AMD CZ and later chipsets. It based on example intel LPSS. Now, it can support AMD I2C, UART and GPIO. Signed-off-by: Ken Xue <Ken.Xue@amd.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-02-06 15:42:16 +01:00
Paolo Bonzini	f781951299	kvm: add halt_poll_ns module parameter This patch introduces a new module parameter for the KVM module; when it is present, KVM attempts a bit of polling on every HLT before scheduling itself out via kvm_vcpu_block. This parameter helps a lot for latency-bound workloads---in particular I tested it with O_DSYNC writes with a battery-backed disk in the host. In this case, writes are fast (because the data doesn't have to go all the way to the platters) but they cannot be merged by either the host or the guest. KVM's performance here is usually around 30% of bare metal, or 50% if you use cache=directsync or cache=writethrough (these parameters avoid that the guest sends pointless flush requests, and at the same time they are not slow because of the battery-backed cache). The bad performance happens because on every halt the host CPU decides to halt itself too. When the interrupt comes, the vCPU thread is then migrated to a new physical CPU, and in general the latency is horrible because the vCPU thread has to be scheduled back in. With this patch performance reaches 60-65% of bare metal and, more important, 99% of what you get if you use idle=poll in the guest. This means that the tunable gets rid of this particular bottleneck, and more work can be done to improve performance in the kernel or QEMU. Of course there is some price to pay; every time an otherwise idle vCPUs is interrupted by an interrupt, it will poll unnecessarily and thus impose a little load on the host. The above results were obtained with a mostly random value of the parameter (500000), and the load was around 1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU. The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll, that can be used to tune the parameter. It counts how many HLT instructions received an interrupt during the polling period; each successful poll avoids that Linux schedules the VCPU thread out and back in, and may also avoid a likely trip to C1 and back for the physical CPU. While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second. Of these halts, almost all are failed polls. During the benchmark, instead, basically all halts end within the polling period, except a more or less constant stream of 50 per second coming from vCPUs that are not running the benchmark. The wasted time is thus very low. Things may be slightly different for Windows VMs, which have a ~10 ms timer tick. The effect is also visible on Marcelo's recently-introduced latency test for the TSC deadline timer. Though of course a non-RT kernel has awful latency bounds, the latency of the timer is around 8000-10000 clock cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC deadline timer, thus, the effect is both a smaller average latency and a smaller variance. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-06 13:08:37 +01:00
Hanjun Guo	2fad93083e	ACPI / table: remove duplicate NULL check for the handler of acpi_table_parse() In acpi_table_parse(), pointer of the table to pass to handler() is checked before handler() called, so remove all the duplicate NULL check in the handler function. CC: Tony Luck <tony.luck@intel.com> CC: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-02-06 01:34:47 +01:00
Linus Torvalds	f3c2352df1	PCI updates for v3.19: Enumeration - Scan all device numbers on NEC as well as Stratus (Charlotte Richardson) Resource management - Handle read-only BARs on AMD CS553x devices (Myron Stowe) Synopsys DesignWare - Reject MSI-X IRQs (Lucas Stach) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU0m7MAAoJEFmIoMA60/r8zQoP/iZpoDfkIrvJWhLjRO1agVCd 7Y5Hrzr+AmESq4n3D7IQt7c+wzaR+05gOG/orDvaFfpbUMamTHlFxUjsk7ib3lLv Qg+WUX/I444pt/jDxRlqZbqWC8AkvYrTWPZoR+fagdWMuIGnKYNvjIAdmzRgMwB0 C3AFGa1S61tqD0+b2Dc177vPZowOUgPBq+ta0Ld8RJ7zBfNnL/GdgCqWxo0OyPNv uSeQuCmKEL5J1Sn1D+j0EKL5FdSDN+LJeU4k5hdF3I+GaervFkOYVqstAqv1kqML 2I9LBjMndRueseArqq7oCAeMn+lCEqleiYr+Y0DtlLOo68eZzh2YWNRV/PcRGMnj xOJo59rDSw5amg4baL7DVdPFIW3NGZiVThNiM3ye5CuhfAl3U/PY1xiDxQ85KY2F htc5rhnm1b9dE5LrJN0iChX4fT60sYyc4IMHuu3wuS5fRNDgZWscs0TT6GSIXSCH ua1xiaq0hLc8XoWdV0A/7R9kX37KJ0kyjCmSFJV/uVWKgKpS3eGi6Ex67qCU1w0C Vw2JT/H8V7vbNCpF/+8fYhXYzsTx6Oa0qLfcBEgT7xxjPTMTsH0LUW0wAivL60rV J9PPcIBrDui6fBVLDkOlgMBjokkShkSay0MUuaUaZib3BdcDY14RVfm8K8MhYS/I ViaWaM1bQ4DGzjnz5dQg =hsLb -----END PGP SIGNATURE----- Merge tag 'pci-v3.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "Enumeration - Scan all device numbers on NEC as well as Stratus (Charlotte Richardson) Resource management - Handle read-only BARs on AMD CS553x devices (Myron Stowe) Synopsys DesignWare - Reject MSI-X IRQs (Lucas Stach)" * tag 'pci-v3.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Handle read-only BARs on AMD CS553x devices PCI: Add NEC variants to Stratus ftServer PCIe DMI check PCI: designware: Reject MSI-X IRQs	2015-02-05 10:23:12 -08:00
Jiang Liu	b4b55cda58	x86/PCI: Refine the way to release PCI IRQ resources Some PCI device drivers assume that pci_dev->irq won't change after calling pci_disable_device() and pci_enable_device() during suspend and resume. Commit `c03b3b0738` ("x86, irq, mpparse: Release IOAPIC pin when PCI device is disabled") frees PCI IRQ resources when pci_disable_device() is called and reallocate IRQ resources when pci_enable_device() is called again. This breaks above assumption. So commit `3eec595235` ("x86, irq, PCI: Keep IRQ assignment for PCI devices during suspend/hibernation") and `9eabc99a63` ("x86, irq, PCI: Keep IRQ assignment for runtime power management") fix the issue by avoiding freeing/reallocating IRQ resources during PCI device suspend/resume. They achieve this by checking dev.power.is_prepared and dev.power.runtime_status. PM maintainer, Rafael, then pointed out that it's really an ugly fix which leaking PM internal state information to IRQ subsystem. Recently David Vrabel <david.vrabel@citrix.com> also reports an regression in pciback driver caused by commit `cffe0a2b5a` ("x86, irq: Keep balance of IOAPIC pin reference count"). Please refer to: http://lkml.org/lkml/2015/1/14/546 So this patch refine the way to release PCI IRQ resources. Instead of releasing PCI IRQ resources in pci_disable_device()/ pcibios_disable_device(), we now release it at driver unbinding notification BUS_NOTIFY_UNBOUND_DRIVER. In other word, we only release PCI IRQ resources when there's no driver bound to the PCI device, and it keeps the assumption that pci_dev->irq won't through multiple invocation of pci_enable_device()/pci_disable_device(). Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-02-05 15:09:26 +01:00
Jiang Liu	593669c2ac	x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation Use common ACPI resource discovery interfaces to simplify PCI host bridge resource enumeration. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-02-05 15:09:26 +01:00
Jiang Liu	812dbd9994	x86/PCI: Fix the range check for IO resources The range check in setup_res() checks the IO range against iomem_resource. That's just wrong. Reworked based on Thomas original patch. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-02-05 15:09:25 +01:00
Jiang Liu	14d76b68f2	PCI: Use common resource list management code instead of private implementation Use common resource list management data structure and interfaces instead of private implementation. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-02-05 15:09:25 +01:00
Tiejun Chen	1c2b364b22	kvm: remove KVM_MMIO_SIZE After `f78146b0f9`, "KVM: Fix page-crossing MMIO", and `87da7e66a4`, "KVM: x86: fix vcpu->mmio_fragments overflow", actually KVM_MMIO_SIZE is gone. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-05 12:26:14 +01:00
Andy Lutomirski	a66734297f	perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks While perfmon2 is a sufficiently evil library (it pokes MSRs directly) that breaking it is fair game, it's still useful, so we might as well try to support it. This allows users to write 2 to /sys/devices/cpu/rdpmc to disable all rdpmc protection so that hack like perfmon2 can continue to work. At some point, if perf_event becomes fast enough to replace perfmon2, then this can go. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/caac3c1c707dcca48ecbc35f4def21495856f479.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 12:10:49 +01:00
Andy Lutomirski	7911d3f7af	perf/x86: Only allow rdpmc if a perf_event is mapped We currently allow any process to use rdpmc. This significantly weakens the protection offered by PR_TSC_DISABLED, and it could be helpful to users attempting to exploit timing attacks. Since we can't enable access to individual counters, use a very coarse heuristic to limit access to rdpmc: allow access only when a perf_event is mmapped. This protects seccomp sandboxes. There is plenty of room to further tighen these restrictions. For example, this allows rdpmc for any x86_pmu event, but it's only useful for self-monitoring tasks. As a side effect, cap_user_rdpmc will now be false for AMD uncore events. This isn't a real regression, since .event_idx is disabled for these events anyway for the time being. Whenever that gets re-added, the cap_user_rdpmc code can be adjusted or refactored accordingly. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/a2bdb3cf3a1d70c26980d7c6dddfbaa69f3182bf.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 12:10:47 +01:00
Andy Lutomirski	c1317ec2b9	perf: Pass the event to arch_perf_update_userpage() Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/0fea9a7fac3c1eea86cb0a5954184e74f4213666.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 12:10:46 +01:00
Andy Lutomirski	22c4bd9fa9	x86: Add a comment clarifying LDT context switching The code is correct, but only for a rather subtle reason. This confused me for quite a while when I read switch_mm, so clarify the code to avoid confusing other people, too. TBH, I wouldn't be surprised if this code was only correct by accident. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/0db86397f968996fb772c443c251415b0b430ddd.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 12:10:43 +01:00
Andy Lutomirski	1e02ce4ccc	x86: Store a per-cpu shadow copy of CR4 Context switches and TLB flushes can change individual bits of CR4. CR4 reads take several cycles, so store a shadow copy of CR4 in a per-cpu variable. To avoid wasting a cache line, I added the CR4 shadow to cpu_tlbstate, which is already touched in switch_mm. The heaviest users of the cr4 shadow will be switch_mm and __switch_to_xtra, and __switch_to_xtra is called shortly after switch_mm during context switch, so the cacheline is likely to be hot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/3a54dd3353fffbf84804398e00dfdc5b7c1afd7d.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 12:10:42 +01:00
Andy Lutomirski	375074cc73	x86: Clean up cr4 manipulation CR4 manipulation was split, seemingly at random, between direct (write_cr4) and using a helper (set/clear_in_cr4). Unfortunately, the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code, which only a small subset of users actually wanted. This patch replaces all cr4 access in functions that don't leave cr4 exactly the way they found it with new helpers cr4_set_bits, cr4_clear_bits, and cr4_set_bits_and_update_boot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 12:10:41 +01:00
Josh Poimboeuf	12cf89b550	livepatch: rename config to CONFIG_LIVEPATCH Rename CONFIG_LIVE_PATCHING to CONFIG_LIVEPATCH to make the naming of the config and the code more consistent. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2015-02-04 11:25:51 +01:00
Ingo Molnar	0967160ad6	Merge branch 'x86/asm' into perf/x86, to avoid conflicts with upcoming patches Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 09:01:12 +01:00
Ingo Molnar	8f4bf4bcc4	Linux 3.19-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUzvgKAAoJEHm+PkMAQRiG8XQH/1qVbHI4pP0KcnzfZUHq/mXq RuS4aJMwLm/Y6cXFraXBDaPde1A3CPtwtpob2C6giKcfu2zXGunY65haOEeJWNpX lCbBsLkNC3oDNkygBpVr5Zd6yibaw63WBjjLnpAi7pn2G2Zm2zB8DfILWWWMb7yz MH8ZXV+/xIYCTkjNWGWA1iMjmdYqu0PQHPeOgLsYQ+u7rxfM1zb/wHEkjqUZS6iu IaaZv7PV2PnFYnqib/iIPYjAEDvSQ4vN/7b82zlFd2Culm9j/568KCCWUPhJTb2l X0u4QYs49GnMTWVRa3bgYxS/nTUaE/6DeWs2y2WzqTt0/XDntVUnok0blUeDxGk= =o2kS -----END PGP SIGNATURE----- Merge tag 'v3.19-rc7' into perf/core, to merge fixes before applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 07:58:29 +01:00
Jan Beulich	75aaf4c3e6	x86/raid6: correctly check for assembler capabilities Just like for AVX2 (which simply needs an #if -> #ifdef conversion), SSSE3 assembler support should be checked for before using it. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-04 08:35:51 +11:00
Rafael J. Wysocki	b2cd5dd71a	Merge branch 'acpica' into acpi-resources	2015-02-03 22:27:01 +01:00
Wincy Van	705699a139	KVM: nVMX: Enable nested posted interrupt processing If vcpu has a interrupt in vmx non-root mode, injecting that interrupt requires a vmexit. With posted interrupt processing, the vmexit is not needed, and interrupts are fully taken care of by hardware. In nested vmx, this feature avoids much more vmexits than non-nested vmx. When L1 asks L0 to deliver L1's posted interrupt vector, and the target VCPU is in non-root mode, we use a physical ipi to deliver POSTED_INTR_NV to the target vCPU. Using POSTED_INTR_NV avoids unexpected interrupts if a concurrent vmexit happens and L1's vector is different with L0's. The IPI triggers posted interrupt processing in the target physical CPU. In case the target vCPU was not in guest mode, complete the posted interrupt delivery on the next entry to L2. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-03 17:15:08 +01:00
Wincy Van	608406e290	KVM: nVMX: Enable nested virtual interrupt delivery With virtual interrupt delivery, the hardware lets KVM use a more efficient mechanism for interrupt injection. This is an important feature for nested VMX, because it reduces vmexits substantially and they are much more expensive with nested virtualization. This is especially important for throughput-bound scenarios. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-03 17:07:38 +01:00
Wincy Van	82f0dd4b27	KVM: nVMX: Enable nested apic register virtualization We can reduce apic register virtualization cost with this feature, it is also a requirement for virtual interrupt delivery and posted interrupt processing. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-03 17:07:03 +01:00
Wincy Van	b9c237bb1d	KVM: nVMX: Make nested control MSRs per-cpu To enable nested apicv support, we need per-cpu vmx control MSRs: 1. If in-kernel irqchip is enabled, we can enable nested posted interrupt, we should set posted intr bit in the nested_vmx_pinbased_ctls_high. 2. If in-kernel irqchip is disabled, we can not enable nested posted interrupt, the posted intr bit in the nested_vmx_pinbased_ctls_high will be cleared. Since there would be different settings about in-kernel irqchip between VMs, different nested control MSRs are needed. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-03 17:06:51 +01:00
Wincy Van	f2b93280ed	KVM: nVMX: Enable nested virtualize x2apic mode When L2 is using x2apic, we can use virtualize x2apic mode to gain higher performance, especially in apicv case. This patch also introduces nested_vmx_check_apicv_controls for the nested apicv patches. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-03 17:06:17 +01:00
Wincy Van	3af18d9c5f	KVM: nVMX: Prepare for using hardware MSR bitmap Currently, if L1 enables MSR_BITMAP, we will emulate this feature, all of L2's msr access is intercepted by L0. Features like "virtualize x2apic mode" require that the MSR bitmap is enabled, or the hardware will exit and for example not virtualize the x2apic MSRs. In order to let L1 use these features, we need to build a merged bitmap that only not cause a VMEXIT if 1) L1 requires that 2) the bit is not required by the processor for APIC virtualization. For now the guests are still run with MSR bitmap disabled, but this patch already introduces nested_vmx_merge_msr_bitmap for future use. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-03 17:02:32 +01:00
Ingo Molnar	b57c0b5175	x86: Entry cleanups and a bugfix for 3.20 This fixes a bug in the RCU code I added in ist_enter. It also includes the sysret stuff discussed here: http://lkml.kernel.org/g/cover.1421453410.git.luto%40amacapital.net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUzhZ0AAoJEK9N98ZeDfrksUEH/j7wkUlMGan5h1AQIZQW6gKk OjlE1a4rfcgKocgkc0ix6UMc8Ks/NAUWKpeHR08eqR+Xi6Yk29cqLkboTEmAdYJ3 jQvKjGu51kiprNjAGqF5wdqxvCT3oBSdm7CWdtY4zHkEr+2W93Ht9PM7xZhj4r+P ekUC8mIKQrhyhlC7g7VpXLAi3Bk4mO+f499T7XBVsVoywWpgVpOMYMhtUobV1reW V7/zul/dMerzNLB0t3amvdgCLphHBQTQ0fHBAN62RY78UvSDt36EZFyS65isirsR LhO4FpWzF5YNMRk8Dep/fB8jYlhsCi40ZIlOtGSE6kNJyLhPt+oLnkpgOwWAMQc= =uiRw -----END PGP SIGNATURE----- Merge tag 'pr-20150201-x86-entry' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/asm Pull "x86: Entry cleanups and a bugfix for 3.20" from Andy Lutomirski: " This fixes a bug in the RCU code I added in ist_enter. It also includes the sysret stuff discussed here: http://lkml.kernel.org/g/cover.1421453410.git.luto%40amacapital.net " Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-03 12:24:08 +01:00
Ingo Molnar	ad6e46869a	x86, vdso: One trivial last-minute VDSO build improvement Andrey noticed that the VDSO build wasn't cleaning itself up. This one-liner fixes it. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUzhiRAAoJEK9N98ZeDfrkpoIH/iS3dJ8IkWMhAcgYro87Hy+0 8YVJjHj+5DgSb4+SSB7B6DXCrmzvVFEXGxz9RraPxshDKLHWNh2qiuy0XqsGwdf8 Z7qkrbfcF1Dkp5RQG9vdkkHwwsh29Ky6R+1ewJVnPz/41Iy7G3uSI6QXLseytI2J myWSI/4yVEbL0CmKU+XXtV/oy1D6816vJSS9HG/Zm6WsO3FzR3NjPct1Wm6Br1/4 hDEGWfKEUg5l0fztCfbCQJO450mNfp/bXEdsy5wPuJTahj3p4MRz/ZoDpLcvFhyL Zldofc2XmHlBKwzbPs/5fO0l3pxko6y0kFWcCWa7wMy7TLhwoXsuRqiSowCy4AQ= =cf7M -----END PGP SIGNATURE----- Merge tag 'pr-20150201-x86-vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/asm Pull VDSO fix fro Andy Lutomirski: "x86, vdso: One trivial last-minute VDSO build improvement Andrey noticed that the VDSO build wasn't cleaning itself up. This one-liner fixes it." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-03 12:22:47 +01:00
Ingo Molnar	8dbcb8737c	Linux 3.19-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUzvgKAAoJEHm+PkMAQRiG8XQH/1qVbHI4pP0KcnzfZUHq/mXq RuS4aJMwLm/Y6cXFraXBDaPde1A3CPtwtpob2C6giKcfu2zXGunY65haOEeJWNpX lCbBsLkNC3oDNkygBpVr5Zd6yibaw63WBjjLnpAi7pn2G2Zm2zB8DfILWWWMb7yz MH8ZXV+/xIYCTkjNWGWA1iMjmdYqu0PQHPeOgLsYQ+u7rxfM1zb/wHEkjqUZS6iu IaaZv7PV2PnFYnqib/iIPYjAEDvSQ4vN/7b82zlFd2Culm9j/568KCCWUPhJTb2l X0u4QYs49GnMTWVRa3bgYxS/nTUaE/6DeWs2y2WzqTt0/XDntVUnok0blUeDxGk= =o2kS -----END PGP SIGNATURE----- Merge tag 'v3.19-rc7' into x86/asm, to refresh the branch before pulling in new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-03 12:22:18 +01:00
Stuart R. Anderson	ea9e9d8029	Specify PCI based UART for earlyprintk Add support for specifying PCI based UARTs for earlyprintk using a syntax like "earlyprintk=pciserial,00:18.1,115200", where 00:18.1 is the BDF of a UART device. [Slightly tidied from Stuart's original patch] Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-02-02 10:11:27 -08:00
Andy Shevchenko	874e52086f	x86, mrst: remove Moorestown specific serial drivers Intel Moorestown platform support was removed few years ago. This is a follow up which removes Moorestown specific code for the serial devices. It includes mrst_max3110 and earlyprintk bits. This was used on SFI (Medfield, Clovertrail) based platforms as well, though new ones use normal serial interface for the console service. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: David Cohen <david.a.cohen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-02-02 10:11:24 -08:00
Marcelo Tosatti	2e6d015799	KVM: x86: revert "add method to test PIR bitmap vector" Revert `7c6a98dfa1`, given that testing PIR is not necessary anymore. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-02 18:36:34 +01:00
Marcelo Tosatti	f933986038	KVM: x86: fix lapic_timer_int_injected with APIC-v With APICv, LAPIC timer interrupt is always delivered via IRR: apic_find_highest_irr syncs PIR to IRR. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-02 18:36:25 +01:00
Charlotte Richardson	51ac3d2f0c	PCI: Add NEC variants to Stratus ftServer PCIe DMI check NEC OEMs the same platforms as Stratus does, which have multiple devices on some PCIe buses under downstream ports. Link: https://bugzilla.kernel.org/show_bug.cgi?id=51331 Fixes: `1278998f8f` ("PCI: Work around Stratus ftServer broken PCIe hierarchy (fix DMI check)") Signed-off-by: Charlotte Richardson <charlotte.richardson@stratus.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org # v3.5+ CC: Myron Stowe <myron.stowe@redhat.com>	2015-02-02 09:36:23 -06:00
Andy Lutomirski	96b6352c12	x86_64, entry: Remove the syscall exit audit and schedule optimizations We used to optimize rescheduling and audit on syscall exit. Now that the full slow path is reasonably fast, remove these optimizations. Syscall exit auditing is now handled exclusively by syscall_trace_leave. This adds something like 10ns to the previously optimized paths on my computer, presumably due mostly to SAVE_REST / RESTORE_REST. I think that we should eventually replace both the syscall and non-paranoid interrupt exit slow paths with a pair of C functions along the lines of the syscall entry hooks. Link: http://lkml.kernel.org/r/22f2aa4a0361707a5cfb1de9d45260b39965dead.1421453410.git.luto@amacapital.net Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2015-02-01 04:03:02 -08:00
Andy Lutomirski	2a23c6b8a9	x86_64, entry: Use sysret to return to userspace when possible The x86_64 entry code currently jumps through complex and inconsistent hoops to try to minimize the impact of syscall exit work. For a true fast-path syscall, almost nothing needs to be done, so returning is just a check for exit work and sysret. For a full slow-path return from a syscall, the C exit hook is invoked if needed and we join the iret path. Using iret to return to userspace is very slow, so the entry code has accumulated various special cases to try to do certain forms of exit work without invoking iret. This is error-prone, since it duplicates assembly code paths, and it's dangerous, since sysret can malfunction in interesting ways if used carelessly. It's also inefficient, since a lot of useful cases aren't optimized and therefore force an iret out of a combination of paranoia and the fact that no one has bothered to write even more asm code to avoid it. I would argue that this approach is backwards. Rather than trying to avoid the iret path, we should instead try to make the iret path fast. Under a specific set of conditions, iret is unnecessary. In particular, if RIP==RCX, RFLAGS==R11, RIP is canonical, RF is not set, and both SS and CS are as expected, then movq 32(%rsp),%rsp;sysret does the same thing as iret. This set of conditions is nearly always satisfied on return from syscalls, and it can even occasionally be satisfied on return from an irq. Even with the careful checks for sysret applicability, this cuts nearly 80ns off of the overhead from syscalls with unoptimized exit work. This includes tracing and context tracking, and any return that invokes KVM's user return notifier. For example, the cost of getpid with CONFIG_CONTEXT_TRACKING_FORCE=y drops from ~360ns to ~280ns on my computer. This may allow the removal and even eventual conversion to C of a respectable amount of exit asm. This may require further tweaking to give the full benefit on Xen. It may be worthwhile to adjust signal delivery and exec to try hit the sysret path. This does not optimize returns to 32-bit userspace. Making the same optimization for CS == __USER32_CS is conceptually straightforward, but it will require some tedious code to handle the differences between sysretl and sysexitl. Link: http://lkml.kernel.org/r/71428f63e681e1b4aa1a781e3ef7c27f027d1103.1421453410.git.luto@amacapital.net Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2015-02-01 04:03:01 -08:00
Andy Lutomirski	b926e6f61a	x86, traps: Fix ist_enter from userspace context_tracking_user_exit() has no effect if in_interrupt() returns true, so ist_enter() didn't work. Fix it by calling exception_enter(), and thus context_tracking_user_exit(), before incrementing the preempt count. This also adds an assertion that will catch the problem reliably if CONFIG_PROVE_RCU=y to help prevent the bug from being reintroduced. Link: http://lkml.kernel.org/r/261ebee6aee55a4724746d0d7024697013c40a08.1422709102.git.luto@amacapital.net Fixes: `9592747538` x86, traps: Track entry into and exit from IST context Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2015-02-01 04:02:53 -08:00
Linus Torvalds	6155bc1431	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, but also an event groups fix, two PMU driver fixes and a CPU model variant addition" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Tighten (and fix) the grouping condition perf/x86/intel: Add model number for Airmont perf/rapl: Fix crash in rapl_scale() perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization perf probe: Fix probing kretprobes perf symbols: Introduce 'for' method to iterate over the symbols with a given name perf probe: Do not rely on map__load() filter to find symbols perf symbols: Introduce method to iterate symbols ordered by name perf symbols: Return the first entry with a given name in find_by_name method perf annotate: Fix memory leaks in LOCK handling perf annotate: Handle ins parsing failures perf scripting perl: Force to use stdbool perf evlist: Remove extraneous 'was' on error message	2015-01-30 14:34:55 -08:00
Linus Torvalds	1f59fe7667	The ARM changes are largish, but not too scary. And a simple fix for x86 (bug introduced in 3.19). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJUy2ulAAoJEL/70l94x66D18kIAJhuh2k5Mt3TfP/zfhi2Y6ER IAZqyFODs8txZ3v432PB8yWWvr2XfJ3gwfjvurLygQJ3jCGZqDrmucbUUXzEaPUk mPnLpxV0ZEmNweS2HLGPX9HJ6zfsZ1dHRk55Tko9ynAO731q7yPjj6HC0th8wzvE BRv5y/18rY2zyar+5Azpj5wpOSllq0ynMgjWXGSlaTLbQoyvgZtzbqNY6nsAGrKw e8hSUPogfGUmZkBHHHVDYKpgHvWS1hARyuGFo8LeKXKPo7qhYxZHCDpch8TXnq2y 21IvQfYddGpcMsaTroA5qyXFigxCX+1j3po6MS3ZH9GGXS5fC3sI8t0EDxKiO6Q= =O4X0 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "The ARM changes are largish, but not too scary. And a simple fix for x86 (bug introduced in 3.19)" (Paolo sayus these are the "Final" fixes. We'll see). * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: check LAPIC presence when building apic_map arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault arm/arm64: KVM: Invalidate data cache on unmap arm/arm64: KVM: Use set/way op trapping to track the state of the caches	2015-01-30 10:45:24 -08:00
Paolo Bonzini	ad15a29647	kvm: vmx: fix oops with explicit flexpriority=0 option A function pointer was not NULLed, causing kvm_vcpu_reload_apic_access_page to go down the wrong path and OOPS when doing put_page(NULL). This did not happen on old processors, only when setting the module option explicitly. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 16:18:49 +01:00
Radim Krčmář	df04d1d191	KVM: x86: check LAPIC presence when building apic_map We forgot to re-check LAPIC after splitting the loop in commit `173beedc16` (KVM: x86: Software disabled APIC should still deliver NMIs, 2014-11-02). Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Fixes: `173beedc16` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 12:28:31 +01:00
Radim Krčmář	8a395363e2	KVM: x86: fix x2apic logical address matching We cannot hit the bug now, but future patches will expose this path. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 12:26:46 +01:00
Radim Krčmář	3697f302ab	KVM: x86: replace 0 with APIC_DEST_PHYSICAL To make the code self-documenting. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 12:26:46 +01:00
Radim Krčmář	9368b56762	KVM: x86: cleanup kvm_apic_match_*() The majority of this patch turns result = 0; if (CODE) result = 1; return result; into return CODE; because we return bool now. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 12:26:45 +01:00
Radim Krčmář	52c233a440	KVM: x86: return bool from kvm_apic_match*() And don't export the internal ones while at it. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 12:26:45 +01:00
Kai Huang	843e433057	KVM: VMX: Add PML support in VMX This patch adds PML support in VMX. A new module parameter 'enable_pml' is added to allow user to enable/disable it manually. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 09:39:54 +01:00
Linus Torvalds	33692f2759	vm: add VM_FAULT_SIGSEGV handling support The core VM already knows about VM_FAULT_SIGBUS, but cannot return a "you should SIGSEGV" error, because the SIGSEGV case was generally handled by the caller - usually the architecture fault handler. That results in lots of duplication - all the architecture fault handlers end up doing very similar "look up vma, check permissions, do retries etc" - but it generally works. However, there are cases where the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV. In particular, when accessing the stack guard page, libsigsegv expects a SIGSEGV. And it usually got one, because the stack growth is handled by that duplicated architecture fault handler. However, when the generic VM layer started propagating the error return from the stack expansion in commit `fee7e49d45` ("mm: propagate error from stack expansion even for guard page"), that now exposed the existing VM_FAULT_SIGBUS result to user space. And user space really expected SIGSEGV, not SIGBUS. To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those duplicate architecture fault handlers about it. They all already have the code to handle SIGSEGV, so it's about just tying that new return value to the existing code, but it's all a bit annoying. This is the mindless minimal patch to do this. A more extensive patch would be to try to gather up the mostly shared fault handling logic into one generic helper routine, and long-term we really should do that cleanup. Just from this patch, you can generally see that most architectures just copied (directly or indirectly) the old x86 way of doing things, but in the meantime that original x86 model has been improved to hold the VM semaphore for shorter times etc and to handle VM_FAULT_RETRY and other "newer" things, so it would be a good idea to bring all those improvements to the generic case and teach other architectures about them too. Reported-and-tested-by: Takashi Iwai <tiwai@suse.de> Tested-by: Jan Engelhardt <jengelh@inai.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots" Cc: linux-arch@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-01-29 10:51:32 -08:00
Kai Huang	88178fd4f7	KVM: x86: Add new dirty logging kvm_x86_ops for PML This patch adds new kvm_x86_ops dirty logging hooks to enable/disable dirty logging for particular memory slot, and to flush potentially logged dirty GPAs before reporting slot->dirty_bitmap to userspace. kvm x86 common code calls these hooks when they are available so PML logic can be hidden to VMX specific. SVM won't be impacted as these hooks remain NULL there. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-29 15:31:41 +01:00
Kai Huang	1c91cad423	KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access This patch changes the second parameter of kvm_mmu_slot_remove_write_access from 'slot id' to 'struct kvm_memory_slot ' to align with kvm_x86_ops dirty logging hooks, which will be introduced in further patch. Better way is to change second parameter of kvm_arch_commit_memory_region from 'struct kvm_userspace_memory_region ' to 'struct kvm_memory_slot * new', but it requires changes on other non-x86 ARCH too, so avoid it now. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-29 15:31:37 +01:00
Kai Huang	9b51a63024	KVM: MMU: Explicitly set D-bit for writable spte. This patch avoids unnecessary dirty GPA logging to PML buffer in EPT violation path by setting D-bit manually prior to the occurrence of the write from guest. We only set D-bit manually in set_spte, and leave fast_page_fault path unchanged, as fast_page_fault is very unlikely to happen in case of PML. For the hva <-> pa change case, the spte is updated to either read-only (host pte is read-only) or be dropped (host pte is writeable), and both cases will be handled by above changes, therefore no change is necessary. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-29 15:31:33 +01:00
Kai Huang	f4b4b18086	KVM: MMU: Add mmu help functions to support PML This patch adds new mmu layer functions to clear/set D-bit for memory slot, and to write protect superpages for memory slot. In case of PML, CPU logs the dirty GPA automatically to PML buffer when CPU updates D-bit from 0 to 1, therefore we don't have to write protect 4K pages, instead, we only need to clear D-bit in order to log that GPA. For superpages, we still write protect it and let page fault code to handle dirty page logging, as we still need to split superpage to 4K pages in PML. As PML is always enabled during guest's lifetime, to eliminate unnecessary PML GPA logging, we set D-bit manually for the slot with dirty logging disabled. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-29 15:31:29 +01:00
Kai Huang	3b0f1d01e5	KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic for log dirty We don't have to write protect guest memory for dirty logging if architecture supports hardware dirty logging, such as PML on VMX, so rename it to be more generic. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-29 15:30:38 +01:00
Ingo Molnar	6d84d1d130	One final fix for 3.19 to address a wrongful deregistering of the microcode loader module. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUyMQuAAoJEBLB8Bhh3lVKoUcP/2NW+guS0v8/X7vBtO0R1TR6 zhjXKOcTIHlM/IBES/vszxKnq/jiiq1X+ZFwucx644RxmPo5fS0+zGIoNNchQgb2 g4Y9RChANXUG66CkMW5zhjOdu6XTuLuJnoY9B3cw8Ro7lTIj6XfYq60Is/U1avJX l7d99uTEt2mUGjlvYXnpSyFwOWiMOFDruQ9wtXFJ4BfJXKsObe6NHcQYzaY2Tu93 xYMbakh3i+EmVhA4gmhtYlpm6LAXZ21vdSEOselfulgoyQm/SaU2/BGJ384RNNmC AdfLTM9qRfxUwUbA/jXak6YUDca3RznPPcBSyYhssLJkUvx8q4D6/CXIlh4ygPWr j2fc3gJt2KXzZzUvMx5MYMSyCtGm7Whx4XMLXkZBrRQK0TKwpTFqHReL/bY7nQHC iq22AloRA49rPo7GFYYm6xPOTCZUVo9VlVRIcAVqcIgjtkutwmwFyoVmuSrFlnpg tDQcG8pexxtmbbRHdlIYpN+BeKNikA0y+aiyoP8SSn0D3dduAnQ4lKZazE+i+fnT /hMz9eJVjk0ccCaCHC/gyLOgBWJlLUyfYz7nfCvQE4dKMTmyDJZZE1hH9Jr1OPQW zmTge8KqRtXbFNqnfNEE3UK/oBSuD45kx/oSa7BLlzZCjyVsfa1xjhv3rJFw0gHc TeMp8vkcTVgdX4EONupN =vHs7 -----END PGP SIGNATURE----- Merge tag 'microcode_fix_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent Pull microcode fix from Borislav Petkov: "One final fix for 3.19 to address a wrongful deregistering of the microcode loader module." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-29 07:51:20 +01:00
Andrey Skvortsov	050835e9d3	x86, vdso: teach 'make clean' remove vdso64 binaries After 'make clean' vdso64.so and vdso64.dbg.so were left in arch/x86/vdso/. Link: http://lkml.kernel.org/r/1422453867-17326-1-git-send-email-andrej.skvortzov@gmail.com Signed-off-by: Andrey Skvortsov <andrej.skvortzov@gmail.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2015-01-28 18:44:18 -08:00
Yijing Wang	6a878e5085	PCI: Fail MSI-X mappings if there's no space assigned to MSI-X BAR Unlike MSI, which is configured via registers in the MSI capability in Configuration Space, MSI-X is configured via tables in Memory Space. These MSI-X tables are mapped by a device BAR, and if no Memory Space has been assigned to the BAR, MSI-X cannot be used. Fail MSI-X setup if no space has been assigned for the BAR. Previously, we ioremapped the MSI-X table even if the resource hadn't been assigned. In this case, the resource address is undefined (and is often zero), which may lead to warnings or oopses in this path: pci_enable_msix msix_capability_init msix_map_region ioremap_nocache The PCI core sets resource flags to zero when it can't assign space for the resource (see reset_resource()). There are also some cases where it sets the IORESOURCE_UNSET flag, e.g., pci_reassigndev_resource_alignment(), pci_assign_resource(), etc. So we must check for both cases. [bhelgaas: changelog] Reported-by: Zhang Jukuo <zhangjukuo@huawei.com> Tested-by: Zhang Jukuo <zhangjukuo@huawei.com> Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2015-01-28 09:25:57 -06:00
Ingo Molnar	b3890e4704	Merge branch 'perf/hw_breakpoints' into perf/core The new hw_breakpoint bits are now ready for v3.20, merge them into the main branch, to avoid conflicts. Conflicts: tools/perf/Documentation/perf-record.txt Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-28 15:48:59 +01:00
Ingo Molnar	772a9aca12	This is my accumulated x86 entry work, part 1, for 3.20. The meat of this is an IST rework. When an IST exception interrupts user space, we will handle it on the per-thread kernel stack instead of on the IST stack. This sounds messy, but it actually simplifies the IST entry/exit code, because it eliminates some ugly games we used to play in order to handle rescheduling, signal delivery, etc on the way out of an IST exception. The IST rework introduces proper context tracking to IST exception handlers. I haven't seen any bug reports, but the old code could have incorrectly treated an IST exception handler as an RCU extended quiescent state. The memory failure change (included in this pull request with Borislav and Tony's permission) eliminates a bunch of code that is no longer needed now that user memory failure handlers are called in process context. Finally, this includes a few on Denys' uncontroversial and Obviously Correct (tm) cleanups. The IST and memory failure changes have been in -next for a while. LKML references: IST rework: http://lkml.kernel.org/r/cover.1416604491.git.luto@amacapital.net Memory failure change: http://lkml.kernel.org/r/54ab2ffa301102cd6e@agluck-desk.sc.intel.com Denys' cleanups: http://lkml.kernel.org/r/1420927210-19738-1-git-send-email-dvlasenk@redhat.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUtvkFAAoJEK9N98ZeDfrkcfsIAJxZ0UBUCEDvulbqgk/iPGOa fIpKLMowS7CpKtw6Wdc/YvAIkeHXWm1vU44Hj0TrjSrXCgVF8yCngs/xlXtOjoa1 dosXQqgqVJJ+hyui7chAEWyalLW7bEO8raq/6snhiMrhiuEkVKpEr7Fer4FVVCZL 4VALmNQQsbV+Qq4pXIhuagZC0Nt/XKi/+/cKvhS4p//q1F/TbHTz0FpDUrh0jPMh 18WFy0jWgxdkMRnSp/wJhekvdXX6PwUy5BdES9fjw8LQJZxxFpqN3Fe1kgfyzV0k yuvEHw1hPt2aBGj3q69wQvDVyyn4OqMpRDBhk4S+GJYmVh7mFyFMN4BDMEy/EY8= =LXVl -----END PGP SIGNATURE----- Merge tag 'pr-20150114-x86-entry' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/asm Pull x86/entry enhancements from Andy Lutomirski: " This is my accumulated x86 entry work, part 1, for 3.20. The meat of this is an IST rework. When an IST exception interrupts user space, we will handle it on the per-thread kernel stack instead of on the IST stack. This sounds messy, but it actually simplifies the IST entry/exit code, because it eliminates some ugly games we used to play in order to handle rescheduling, signal delivery, etc on the way out of an IST exception. The IST rework introduces proper context tracking to IST exception handlers. I haven't seen any bug reports, but the old code could have incorrectly treated an IST exception handler as an RCU extended quiescent state. The memory failure change (included in this pull request with Borislav and Tony's permission) eliminates a bunch of code that is no longer needed now that user memory failure handlers are called in process context. Finally, this includes a few on Denys' uncontroversial and Obviously Correct (tm) cleanups. The IST and memory failure changes have been in -next for a while. LKML references: IST rework: http://lkml.kernel.org/r/cover.1416604491.git.luto@amacapital.net Memory failure change: http://lkml.kernel.org/r/54ab2ffa301102cd6e@agluck-desk.sc.intel.com Denys' cleanups: http://lkml.kernel.org/r/1420927210-19738-1-git-send-email-dvlasenk@redhat.com " This tree semantically depends on and is based on the following RCU commit: `734d168013` ("rcu: Make rcu_nmi_enter() handle nesting") ... and for that reason won't be pushed upstream before the RCU bits hit Linus's tree. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-28 15:33:26 +01:00
Ingo Molnar	41ca5d4e9b	Merge commit `3669ef9fa7` ("x86, tls: Interpret an all-zero struct user_desc as 'no segment'") into x86/asm Pick up the latestest asm fixes before advancing it any further. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-28 15:30:32 +01:00
Jennifer Herbert	8da7633f16	xen: mark grant mapped pages as foreign Use the "foreign" page flag to mark pages that have a grant map. Use page->private to store information of the grant (the granting domain and the grant reference). Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-28 14:03:12 +00:00
Jennifer Herbert	0ae65f49af	x86/xen: require ballooned pages for grant maps Ballooned pages are always used for grant maps which means the original frame does not need to be saved in page->index nor restored after the grant unmap. This allows the workaround in netback for the conflicting use of the (unionized) page->index and page->pfmemalloc to be removed. Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-28 14:03:11 +00:00
David Vrabel	0bb599fd30	xen: remove scratch frames for ballooned pages and m2p override The scratch frame mappings for ballooned pages and the m2p override are broken. Remove them in preparation for replacing them with simpler mechanisms that works. The scratch pages did not ensure that the page was not in use. In particular, the foreign page could still be in use by hardware. If the guest reused the frame the hardware could read or write that frame. The m2p override did not handle the same frame being granted by two different grant references. Trying an M2P override lookup in this case is impossible. With the m2p override removed, the grant map/unmap for the kernel mappings (for x86 PV) can be easily batched in set_foreign_p2m_mapping() and clear_foreign_p2m_mapping(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2015-01-28 14:03:10 +00:00
David Vrabel	853d028934	xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs() When unmapping grants, instead of converting the kernel map ops to unmap ops on the fly, pre-populate the set of unmap ops. This allows the grant unmap for the kernel mappings to be trivially batched in the future. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2015-01-28 14:03:10 +00:00
Kan Liang	ef454caeb7	perf/x86/intel: Add model number for Airmont Intel Airmont supports the same architectural and non-architectural performance monitoring events as Silvermont. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1421913053-99803-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-28 13:17:32 +01:00
Stephane Eranian	98b008dff8	perf/rapl: Fix crash in rapl_scale() This patch fixes a systematic crash in rapl_scale() due to an invalid pointer. The bug was introduced by commit: `89cbc76768` ("x86: Replace __get_cpu_var uses") The fix is simple. Just put the parenthesis where it needs to be, i.e., around rapl_pmu. To my surprise, the compiler was not complaining about passing an integer instead of a pointer. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Fixes: `89cbc76768` ("x86: Replace __get_cpu_var uses") Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: cl@linux.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150122203834.GA10228@thinkpad Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-28 13:04:35 +01:00
Kan Liang	c05199e5a5	perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization There were some issues about the uncore driver tried to access non-existing boxes, which caused boot crashes. These issues have been all fixed. But we should avoid boot failures if that ever happens again. This patch intends to prevent this kind of potential issues. It moves uncore_box_init out of driver initialization. The box will be initialized when it's first enabled. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1421729665-5912-1-git-send-email-kan.liang@intel.com Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-28 13:04:34 +01:00
Juergen Gross	270b79338e	x86/xen: cleanup arch/x86/xen/mmu.c Remove a nested ifdef. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-28 10:01:11 +00:00
Juergen Gross	bf9d834a9b	x86/xen: add some __init annotations in arch/x86/xen/mmu.c The file arch/x86/xen/mmu.c has some functions that can be annotated with "__init". Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-28 10:00:51 +00:00
Juergen Gross	a3f5239650	x86/xen: add some __init and static annotations in arch/x86/xen/setup.c Some more functions in arch/x86/xen/setup.c can be made "__init". xen_ignore_unusable() can be made "static". Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-28 10:00:36 +00:00
Juergen Gross	3ba5c867ca	x86/xen: use correct types for addresses in arch/x86/xen/setup.c In many places in arch/x86/xen/setup.c wrong types are used for physical addresses (u64 or unsigned long long). Use phys_addr_t instead. Use macros already defined instead of open coding them. Correct some other type mismatches. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-28 10:00:10 +00:00
Juergen Gross	f0feed10aa	x86/xen: cleanup arch/x86/xen/setup.c Remove extern declarations in arch/x86/xen/setup.c which are either not used or redundant. Move needed other extern declarations to xen-ops.h Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-28 09:59:46 +00:00
Boris Ostrovsky	da63865a01	x86, microcode: Return error from driver init code when loader is disabled Commits `65cef1311d` ("x86, microcode: Add a disable chicken bit") and `a18a0f6850` ("x86, microcode: Don't initialize microcode code on paravirt") allow microcode driver skip initialization when microcode loading is not permitted. However, they don't prevent the driver from being loaded since the init code returns 0. If at some point later the driver gets unloaded this will result in an oops while trying to deregister the (never registered) device. To avoid this, make init code return an error on paravirt or when microcode loading is disabled. The driver will then never be loaded. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1422411669-25147-1-git-send-email-boris.ostrovsky@oracle.com Reported-by: James Digwall <james@dingwall.me.uk> Cc: stable@vger.kernel.org # 3.18 Signed-off-by: Borislav Petkov <bp@suse.de>	2015-01-28 09:23:40 +01:00
Joerg Roedel	128ca093cc	kvm: iommu: Add cond_resched to legacy device assignment code When assigning devices to large memory guests (>=128GB guest memory in the failure case) the functions to create the IOMMU page-tables for the whole guest might run for a very long time. On non-preemptible kernels this might cause Soft-Lockup warnings. Fix these by adding a cond_resched() to the mapping and unmapping loops. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-27 21:31:12 +01:00
Kees Cook	d69911a68c	x86, build: replace Perl script with Shell script Commit `e6023367d7` ("x86, kaslr: Prevent .bss from overlaping initrd") added Perl to the required build environment. This reimplements in shell the Perl script used to find the size of the kernel with bss and brk added. Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Rob Landley <rob@landley.net> Acked-by: Rob Landley <rob@landley.net> Cc: Anca Emanuel <anca.emanuel@gmail.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Junjie Mao <eternal.n08@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-01-26 13:37:18 -08:00
Lv Zheng	a45de93eb1	ACPICA: Resources: Provide common part for struct acpi_resource_address structures. struct acpi_resource_address and struct acpi_resource_extended_address64 share substracts just at different offsets. To unify the parsing functions, OSPMs like Linux need a new ACPI_ADDRESS64_ATTRIBUTE as their substructs, so they can extract the shared data. This patch also synchronizes the structure changes to the Linux kernel. The usages are searched by matching the following keywords: 1. acpi_resource_address 2. acpi_resource_extended_address 3. ACPI_RESOURCE_TYPE_ADDRESS 4. ACPI_RESOURCE_TYPE_EXTENDED_ADDRESS And we found and fixed the usages in the following files: arch/ia64/kernel/acpi-ext.c arch/ia64/pci/pci.c arch/x86/pci/acpi.c arch/x86/pci/mmconfig-shared.c drivers/xen/xen-acpi-memhotplug.c drivers/acpi/acpi_memhotplug.c drivers/acpi/pci_root.c drivers/acpi/resource.c drivers/char/hpet.c drivers/pnp/pnpacpi/rsparser.c drivers/hv/vmbus_drv.c Build tests are passed with defconfig/allnoconfig/allyesconfig and defconfig+CONFIG_ACPI=n. Original-by: Thomas Gleixner <tglx@linutronix.de> Original-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-01-26 16:09:56 +01:00
Nadav Amit	82268083fa	KVM: x86: Emulation of call may use incorrect stack size On long-mode, when far call that changes cs.l takes place, the stack size is determined by the new mode. For instance, if we go from 32-bit mode to 64-bit mode, the stack-size if 64. KVM uses the old stack size. Fix it. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-26 12:17:34 +01:00
Nadav Amit	bac155310b	KVM: x86: 32-bit wraparound read/write not emulated correctly If we got a wraparound of 32-bit operand, and the limit is 0xffffffff, read and writes should be successful. It just needs to be done in two segments. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-26 12:15:18 +01:00
Nadav Amit	2b42fce695	KVM: x86: Fix defines in emulator.c Unnecassary define was left after commit `7d882ffa81` ("KVM: x86: Revert NoBigReal patch in the emulator"). Commit `39f062ff51` ("KVM: x86: Generate #UD when memory operand is required") was missing undef. Fix it. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-26 12:15:03 +01:00
Nadav Amit	2276b5116e	KVM: x86: ARPL emulation can cause spurious exceptions ARPL and MOVSXD are encoded the same and their execution depends on the execution mode. The operand sizes of each instruction are different. Currently, ARPL is detected too late, after the decoding was already done, and therefore may result in spurious exception (instead of failed emulation). Introduce a group to the emulator to handle instructions according to execution mode (32/64 bits). Note: in order not to make changes that may affect performance, the new ModeDual can only be applied to instructions with ModRM. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-26 12:14:49 +01:00
Nadav Amit	801806d956	KVM: x86: IRET emulation does not clear NMI masking The IRET instruction should clear NMI masking, but the current implementation does not do so. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-26 12:14:42 +01:00
Nadav Amit	16794aaaab	KVM: x86: Wrong operand size for far ret Indeed, Intel SDM specifically states that for the RET instruction "In 64-bit mode, the default operation size of this instruction is the stack-address size, i.e. 64 bits." However, experiments show this is not the case. Here is for example objdump of small 64-bit asm: 4004f1: ca 14 00 lret $0x14 4004f4: 48 cb lretq 4004f6: 48 ca 14 00 lretq $0x14 Therefore, remove the Stack flag from far-ret instructions. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-26 12:14:25 +01:00
Nadav Amit	2fcf5c8ae2	KVM: x86: Dirty the dest op page on cmpxchg emulation Intel SDM says for CMPXCHG: "To simplify the interface to the processorâ€™s bus, the destination operand receives a write cycle without regard to the result of the comparison.". This means the destination page should be dirtied. Fix it to by writing back the original value if cmpxchg failed. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-26 12:14:18 +01:00
Davidlohr Bueso	57b6b99bac	x86,xen: use current->state helpers Call __set_current_state() instead of assigning the new state directly. These interfaces also aid CONFIG_DEBUG_ATOMIC_SLEEP environments, keeping track of who changed the state. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-26 10:21:26 +00:00
Linus Torvalds	14746306af	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Hopefully the last round of fixes for 3.19 - regression fix for the LDT changes - regression fix for XEN interrupt handling caused by the APIC changes - regression fixes for the PAT changes - last minute fixes for new the MPX support - regression fix for 32bit UP - fix for a long standing relocation issue on 64bit tagged for stable - functional fix for the Hyper-V clocksource tagged for stable - downgrade of a pr_err which tends to confuse users Looks a bit on the large side, but almost half of it are valuable comments" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tsc: Change Fast TSC calibration failed from error to info x86/apic: Re-enable PCI_MSI support for non-SMP X86_32 x86, mm: Change cachemode exports to non-gpl x86, tls: Interpret an all-zero struct user_desc as "no segment" x86, tls, ldt: Stop checking lm in LDT_empty x86, mpx: Strictly enforce empty prctl() args x86, mpx: Fix potential performance issue on unmaps x86, mpx: Explicitly disable 32-bit MPX support on 64-bit kernels x86, hyperv: Mark the Hyper-V clocksource as being continuous x86: Don't rely on VMWare emulating PAT MSR correctly x86, irq: Properly tag virtualization entry in /proc/interrupts x86, boot: Skip relocs when load address unchanged x86/xen: Override ACPI IRQ management callback __acpi_unregister_gsi ACPI: pci: Do not clear pci_dev->irq in acpi_pci_irq_disable() x86/xen: Treat SCI interrupt as normal GSI interrupt	2015-01-25 18:11:17 -08:00
K. Y. Srinivasan	4061ed9e2a	Drivers: hv: vmbus: Implement a clockevent device Implement a clockevent device based on the timer support available on Hyper-V. In this version of the patch I have addressed Jason's review comments. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-01-25 09:17:57 -08:00
Thomas Gleixner	ba360f887a	x86, init: Fix UP boot regression on x86_64 Commit `30b8b0066c` "init: Get rid of x86isms" broke the UP boot on x86_64. That happens because CONFIG_UP_LATE_INIT depends on CONFIG_X86_UP_APIC. X86_UP_APIC is a 32bit only config switch and therefor not set on 64bit UP builds. As a consequence the UP init of the local APIC and the IOAPIC is not called, which results in a boot failure. Make it depend on !SMP && X86_LOCAL_APIC instead. Fixes: `30b8b0066c` init: Get rid of x86isms Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-24 10:34:46 +01:00
Linus Torvalds	550695925d	PCI updates for v3.19: Resource management - Clip bridge windows to fit in upstream windows (Yinghai Lu) Virtualization - Mark Atheros AR93xx to avoid using bus reset (Alex Williamson) Miscellaneous - Update Richard Zhu's email address (Lucas Stach) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUwqpDAAoJEFmIoMA60/r8ykIQAINgkP/iPaFMTPkTSfzTJCMY oQVNGha4FDt6Ic1UWGyS/sYUpywSnALxlYWxVZTm5r+sGQ2yJBo6veuxvCI09YFw lWqf6lfvkFSthWCo7pHLoNaIjKJUNCy4a2han31aAIScMCNX4YF60YorMSjQBST8 smLMG75U3U9VWaXYsV1e5gTvLa5IQh4lgaTgMAOXqd+6WcAR4WwOgD2sR06o2X43 63JF2U+ieuA789Xu2IS92TmMMESD5haEZATqdGPtxpnqyHxmBNu0Y4JkkBWD2S92 HvveOoLBT2TBfICkftvCJscBLHh7PZMIx9nLx58SnijVzX+hzVr4Zfc96MZU50MK DuNbbZn3sO902ukOEpfih7Mg0tDxCxNytleEdAnXmZuqf+odbd/Y4AA0Hg6w7GEY OsVGbQAT/knlTfsSZsivtmUl7l1SXzrozv+q4f4szY95v34S9pm0sWzz0IBn7oKj h7N9Vslr3lyEudOUo1OrFq+0arDw53kwOOkIavMUH0nvTqKs4cmXBcGMfo1EfMa+ 3YhjwbgpvtZ3AXi2NSBk4gIGZEmQslvgRStLhgXVDl+9DieK+sw1Vx4cKe8gu9mD c7zPStEsJBJgd3v+8s8avwo8R0oPZb6MsCKFjjaYojTvpfFmfX0YyWE/TzYoUm6Z +BTyA8t0+3jTArTs/Zid =HRy7 -----END PGP SIGNATURE----- Merge tag 'pci-v3.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "These are fixes for: - a resource management problem that causes a Radeon "Fatal error during GPU init" on machines where the BIOS programmed an invalid Root Port window. This was a regression in v3.16. - an Atheros AR93xx device that doesn't handle PCI bus resets correctly. This was a regression in v3.14. - an out-of-date email address" * tag 'pci-v3.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: MAINTAINERS: Update Richard Zhu's email address sparc/PCI: Clip bridge windows to fit in upstream windows powerpc/PCI: Clip bridge windows to fit in upstream windows parisc/PCI: Clip bridge windows to fit in upstream windows mn10300/PCI: Clip bridge windows to fit in upstream windows microblaze/PCI: Clip bridge windows to fit in upstream windows ia64/PCI: Clip bridge windows to fit in upstream windows frv/PCI: Clip bridge windows to fit in upstream windows alpha/PCI: Clip bridge windows to fit in upstream windows x86/PCI: Clip bridge windows to fit in upstream windows PCI: Add pci_claim_bridge_resource() to clip window if necessary PCI: Add pci_bus_clip_resource() to clip to fit upstream window PCI: Pass bridge device, not bus, when updating bridge windows PCI: Mark Atheros AR93xx to avoid bus reset PCI: Add flag for devices where we can't use bus reset	2015-01-24 10:58:47 +12:00
Linus Torvalds	2e3810da41	Three small fixes. Two for x86 and one avoids that sparse bails out. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJUwkVXAAoJEL/70l94x66DwPEH/RPBmxJ+lD0nRyXVECSWxjN6 DYJvp4HsLV8BhBx/ATjAkjiVPKTUk9vQBjfgl72YatjASP9aNIkBqnN0AOVdVQ2i 04ZvYaSw3jY0A5PSecdFQZ4u8MAvaRS4AYNOYM3Kpf0EOrIwanXFpEfVRGT8ichT uBK/mbN7vDO1SsgAnB00fCew4wFrHIa7fJ8eLNnebDOuC72oUZA+2nKx8ApWq4ca ZaziqkI2CFaV2rqJokKDun2arxI2Q6/L87g7qyo+HMd1b+aepLTWYNOs1vH0YoSc 73aHg+3crIqx75XmnaxKP5SPOr6vpmnloux9yre8u1tvejBIbCMz1g9Mdl0YOmA= =YRTn -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "Three small fixes. Two for x86 and one avoids that sparse bails out" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: SYSENTER emulation is broken KVM: x86: Fix of previously incomplete fix for CVE-2014-8480 KVM: fix sparse warning in include/trace/events/kvm.h	2015-01-24 09:58:17 +12:00
WANG Chao	d574ffa106	x86, e820: Clean up sanitize_e820_map() users The argument 3 of sanitize_e820_map() will only be updated upon a successful sanitization. Some of the callers have extra conditionals for the same purpose. Clean them up. default_machine_specific_memory_setup() must keep the extra conditional because boot_params.e820_entries is an u8 and not an u32, so the direct update would overwrite other fields in boot_params. [ tglx: Massaged changelog ] Signed-off-by: WANG Chao <chaowang@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Lee Chun-Yi <joeyli.kernel@gmail.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Link: http://lkml.kernel.org/r/1420601859-18439-1-git-send-email-chaowang@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-23 16:14:27 +01:00
WANG Chao	7389882c81	x86, setup: Let early_memremap() handle page alignment early_memremap() takes care of page alignment and map size, so we can just remap the required data size and get rid of the adjustments in the setup code. [tglx: Massaged changelog ] Signed-off-by: WANG Chao <chaowang@redhat.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie> Link: http://lkml.kernel.org/r/1420628150-16872-1-git-send-email-chaowang@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-23 16:14:26 +01:00
Paolo Bonzini	8fff5e374a	KVM: s390: fixes and features for kvm/next (3.20) 1. Generic - sparse warning (make function static) - optimize locking - bugfixes for interrupt injection - fix MVPG addressing modes 2. hrtimer/wakeup fun A recent change can cause KVM hangs if adjtime is used in the host. The hrtimer might wake up too early or too late. Too early is fatal as vcpu_block will see that the wakeup condition is not met and sleep again. This CPU might never wake up again. This series addresses this problem. adjclock slowing down the host clock will result in too late wakeups. This will require more work. In addition to that we also change the hrtimer from REALTIME to MONOTONIC to avoid similar problems with timedatectl set-time. 3. sigp rework We will move all "slow" sigps to QEMU (protected with a capability that can be enabled) to avoid several races between concurrent SIGP orders. 4. Optimize the shadow page table Provide an interface to announce the maximum guest size. The kernel will use that to make the pagetable 2,3,4 (or theoretically) 5 levels. 5. Provide an interface to set the guest TOD We now use two vm attributes instead of two oneregs, as oneregs are vcpu ioctl and we don't want to call them from other threads. 6. Protected key functions The real HMC allows to enable/disable protected key CPACF functions. Lets provide an implementation + an interface for QEMU to activate this the protected key instructions. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJUwj60AAoJEBF7vIC1phx8iV0QAKq1LZRTmgTLS2fd0oyWKZeN ShWUIUiB+7IUiuogYXZMfqOm61oogxwc95Ti+3tpSWYwkzUWagpS/RJQze7E1HOc 3pHpXwrR01ueUT6uVV4xc/vmVIlQAIl/ScRDDPahlAT2crCleWcKVC9l0zBs/Kut IrfzN9pJcrkmXD178CDP8/VwXsn02ptLQEpidGibGHCd03YVFjp3X0wfwNdQxMbU qOwNYCz3SLfDm5gsybO2DG+aVY3AbM2ZOJt/qLv2j4Phz4XB4t4W9iJnAefSz7JA W4677wbMQpfZlUQYhI78H/Cl9SfWAuLug1xk83O/+lbEiR5u+8zLxB69dkFTiBaH 442OY957T6TQZ/V9d0jDo2XxFrcaU9OONbVLsfBQ56Vwv5cAg9/7zqG8eqH7Nq9R gU3fQesgD4N0Kpa77T9k45TT/hBRnUEtsGixAPT6QYKyE6cK4AJATHKSjMSLbdfj ELbt0p2mVtKhuCcANfEx54U2CxOrg5ElBmPz8hRw0OkXdwpqh1sGKmt0govcHP1I BGSzE9G4mswwI1bQ7cqcyTk/lwL8g3+KQmRJoOcgCveQlnY12X5zGD5DhuPMPiIT VENqbcTzjlxdu+4t7Enml+rXl7ySsewT9L231SSrbLsTQVgCudD1B9m72WLu5ZUT 9/Z6znv6tkeKV5rM9DYE =zLjR -----END PGP SIGNATURE----- Merge tag 'kvm-s390-next-20150122' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next KVM: s390: fixes and features for kvm/next (3.20) 1. Generic - sparse warning (make function static) - optimize locking - bugfixes for interrupt injection - fix MVPG addressing modes 2. hrtimer/wakeup fun A recent change can cause KVM hangs if adjtime is used in the host. The hrtimer might wake up too early or too late. Too early is fatal as vcpu_block will see that the wakeup condition is not met and sleep again. This CPU might never wake up again. This series addresses this problem. adjclock slowing down the host clock will result in too late wakeups. This will require more work. In addition to that we also change the hrtimer from REALTIME to MONOTONIC to avoid similar problems with timedatectl set-time. 3. sigp rework We will move all "slow" sigps to QEMU (protected with a capability that can be enabled) to avoid several races between concurrent SIGP orders. 4. Optimize the shadow page table Provide an interface to announce the maximum guest size. The kernel will use that to make the pagetable 2,3,4 (or theoretically) 5 levels. 5. Provide an interface to set the guest TOD We now use two vm attributes instead of two oneregs, as oneregs are vcpu ioctl and we don't want to call them from other threads. 6. Protected key functions The real HMC allows to enable/disable protected key CPACF functions. Lets provide an implementation + an interface for QEMU to activate this the protected key instructions.	2015-01-23 14:33:36 +01:00
Nadav Amit	f3747379ac	KVM: x86: SYSENTER emulation is broken SYSENTER emulation is broken in several ways: 1. It misses the case of 16-bit code segments completely (CVE-2015-0239). 2. MSR_IA32_SYSENTER_CS is checked in 64-bit mode incorrectly (bits 0 and 1 can still be set without causing #GP). 3. MSR_IA32_SYSENTER_EIP and MSR_IA32_SYSENTER_ESP are not masked in legacy-mode. 4. There is some unneeded code. Fix it. Cc: stable@vger.linux.org Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-23 13:57:15 +01:00
Nadav Amit	63ea0a49ae	KVM: x86: Fix of previously incomplete fix for CVE-2014-8480 STR and SLDT with rip-relative operand can cause a host kernel oops. Mark them as DstMem as well. Cc: stable@vger.linux.org Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-23 13:56:56 +01:00
Paolo Bonzini	1c6007d59a	KVM/ARM changes for v3.20 including GICv3 emulation, dirty page logging, added trace symbols, and adding an explicit VGIC init device control IOCTL. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUwhsKAAoJEEtpOizt6ddyuSEH/ia2uf07N0i+C1dPKYiqhKEd nFqBvgrhAMVztWLmy1Wq4SnO9YNd+CrPYATrfCiYsYQ9aKc09+qDq+uo06bVpZXz KsHjVGUsdyJ4qRqjDixkPvZviGIXa6C//+hcwg1XH2nit1uHmXVupzB9dDz3ZM2l GCwApdRdaaUVDt5Ud2ljqIWZa18Qf/5/HD8MdPXpmotDOKucL6pBr/1R1XWueCU/ ejRs/qy3EFyMWdEdfGFAMCa0ZvHbPmsJmvB/EgkyUnuJj77ptA0jNo1jtzSfEyis 53x4ffWnIsPl9yqhk0oKerIALVUvV4A7/me2ya6tsQ5fiBX7lJ3+qwggvCkWQzw= =fMS2 -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next KVM/ARM changes for v3.20 including GICv3 emulation, dirty page logging, added trace symbols, and adding an explicit VGIC init device control IOCTL. Conflicts: arch/arm64/include/asm/kvm_arm.h arch/arm64/kvm/handle_exit.c	2015-01-23 13:39:51 +01:00
Dominik Dingel	31928aa586	KVM: remove unneeded return value of vcpu_postcreate The return value of kvm_arch_vcpu_postcreate is not checked in its caller. This is okay, because only x86 provides vcpu_postcreate right now and it could only fail if vcpu_load failed. But that is not possible during KVM_CREATE_VCPU (kvm_arch_vcpu_load is void, too), so just get rid of the unchecked return value. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2015-01-23 13:24:52 +01:00
Alexandre Demers	520452172e	x86/tsc: Change Fast TSC calibration failed from error to info Many users see this message when booting without knowning that it is of no importance and that TSC calibration may have succeeded by another way. As explained by Paul Bolle in http://lkml.kernel.org/r/1348488259.1436.22.camel@x61.thuisdomein "Fast TSC calibration failed" should not be considered as an error since other calibration methods are being tried afterward. At most, those send a warning if they fail (not an error). So let's change the message from error to warning. [ tglx: Make if pr_info. It's really not important at all ] Fixes: `c767a54ba0` x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level> Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1418106470-6906-1-git-send-email-alexandre.f.demers@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-23 10:53:52 +01:00
Colin King	d505ad1d66	x86/rtc: Remove duplicate const specifier Building with clang: CC arch/x86/kernel/rtc.o arch/x86/kernel/rtc.c:173:29: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier] static const char * const const ids[] __initconst = Remove the duplicate const, it is not needed and causes a warning. Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: http://lkml.kernel.org/r/1421244475-313-1-git-send-email-colin.king@canonical.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-23 10:35:51 +01:00
Bryan O'Donoghue	38a1dfda8e	x86/apic: Re-enable PCI_MSI support for non-SMP X86_32 Commit `0dbc6078c0` ('x86, build, pci: Fix PCI_MSI build on !SMP') introduced the dependency that X86_UP_APIC is only available when PCI_MSI is false. This effectively prevents PCI_MSI support on 32bit UP systems because it disables both APIC and IO-APIC. But APIC support is architecturally required for PCI_MSI. The intention of the patch was to enforce APIC support when PCI_MSI is enabled, but failed to do so. Remove the !PCI_MSI dependency from X86_UP_APIC and enforce X86_UP_APIC when PCI_MSI support is enabled on 32bit UP systems. [ tglx: Massaged changelog ] Fixes `0dbc6078c0` 'x86, build, pci: Fix PCI_MSI build on !SMP' Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1421967529-9037-1-git-send-email-pure.logic@nexus-software.ie Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-23 10:20:30 +01:00
Juergen Gross	31bb772370	x86, mm: Change cachemode exports to non-gpl Commit `281d4078be` ("x86: Make page cache mode a real type") introduced the symbols __cachemode2pte_tbl and __pte2cachemode_tbl and exported them via EXPORT_SYMBOL_GPL. The exports are part of a replacement of code which has been EXPORT_SYMBOL before these changes resulting in build breakage of out-of-tree non-gpl modules. Change EXPORT_SYMBOL_GPL to EXPORT-SYMBOL for these two symbols. Fixes: `281d4078be` "x86: Make page cache mode a real type" Reported-and-tested-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Link: http://lkml.kernel.org/r/1421926997-28615-1-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 21:50:14 +01:00
Andy Lutomirski	3669ef9fa7	x86, tls: Interpret an all-zero struct user_desc as "no segment" The Witcher 2 did something like this to allocate a TLS segment index: struct user_desc u_info; bzero(&u_info, sizeof(u_info)); u_info.entry_number = (uint32_t)-1; syscall(SYS_set_thread_area, &u_info); Strictly speaking, this code was never correct. It should have set read_exec_only and seg_not_present to 1 to indicate that it wanted to find a free slot without putting anything there, or it should have put something sensible in the TLS slot if it wanted to allocate a TLS entry for real. The actual effect of this code was to allocate a bogus segment that could be used to exploit espfix. The set_thread_area hardening patches changed the behavior, causing set_thread_area to return -EINVAL and crashing the game. This changes set_thread_area to interpret this as a request to find a free slot and to leave it empty, which isn't quite what the game expects but should be close enough to keep it working. In particular, using the code above to allocate two segments will allocate the same segment both times. According to FrostbittenKing on Github, this fixes The Witcher 2. If this somehow still causes problems, we could instead allocate a limit==0 32-bit data segment, but that seems rather ugly to me. Fixes: `41bdc78544` x86/tls: Validate TLS entries to protect espfix Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: stable@vger.kernel.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 21:45:07 +01:00
Andy Lutomirski	e30ab185c4	x86, tls, ldt: Stop checking lm in LDT_empty 32-bit programs don't have an lm bit in their ABI, so they can't reliably cause LDT_empty to return true without resorting to memset. They shouldn't need to do this. This should fix a longstanding, if minor, issue in all 64-bit kernels as well as a potential regression in the TLS hardening code. Fixes: `41bdc78544` x86/tls: Validate TLS entries to protect espfix Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/72a059de55e86ad5e2935c80aa91880ddf19d07c.1421954363.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 21:11:06 +01:00
Dave Hansen	c922228efe	x86, mpx: Fix potential performance issue on unmaps The 3.19 merge window saw some TLB modifications merged which caused a performance regression. They were fixed in commit 045bbb9fa. Once that fix was applied, I also noticed that there was a small but intermittent regression still present. It was not present consistently enough to bisect reliably, but I'm fairly confident that it came from (my own) MPX patches. The source was reading a relatively unused field in the mm_struct via arch_unmap. I also noted that this code was in the main instruction flow of do_munmap() and probably had more icache impact than we want. This patch does two things: 1. Adds a static (via Kconfig) and dynamic (via cpuid) check for MPX with cpu_feature_enabled(). This keeps us from reading that cacheline in the mm and trades it for a check of the global CPUID variables at least on CPUs without MPX. 2. Adds an unlikely() to ensure that the MPX call ends up out of the main instruction flow in do_munmap(). I've added a detailed comment about why this was done and why we want it even on systems where MPX is present. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: luto@amacapital.net Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20150108223021.AEEAB987@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 21:11:06 +01:00
Dave Hansen	814564a0a1	x86, mpx: Explicitly disable 32-bit MPX support on 64-bit kernels We had originally planned on submitting MPX support in one patch set. We eventually broke it up in to two pieces for easier review. One of the features that didn't make the first round was supporting 32-bit binaries on 64-bit kernels. Once we split the set up, we never added code to restrict 32-bit binaries from _using_ MPX on 64-bit kernels. The 32-bit bounds tables are a different format than the 64-bit ones. Without this patch, the kernel will try to read a 32-bit binary's tables as if they were the 64-bit version. They will likely be noticed as being invalid rather quickly and the app will get killed, but that's kinda mean. This patch adds an explicit check, and will make a 64-bit kernel essentially behave as if it has no MPX support when called from a 32-bit binary. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20150108223020.9E9AA511@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 21:11:06 +01:00
Linus Torvalds	193934123c	Surprising number of fixes this merge window :( First two are minor fallout from the param rework which went in this merge window. Next three are a series which fixes a longstanding (but never previously reported and unlikely , so no CC stable) race between kallsyms and freeing the init section. Finally, a minor cleanup as our module refcount will now be -1 during unload. Thanks, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUwEmwAAoJENkgDmzRrbjx77kP/1cNQR2eG2sBwokg3q0tvHnQ IKqEXErW7NvxRa+RAMEmy2uQoGt6+uNklAbtyJEYM9oR1NieFbPi2yrt9Xn5SAXS Brp1S8WYBMilA3W3o6I0trFDRWHdpdtkKIQwLWgJNSEWjbTXh8bSwp/2X1rlOPyI ZmphCMOQMU2/uFEyJhTz1WMEV8eVXiRLN8OxSkPxToxdZoGln2U8IBCCCJC9OG+f Cf3eMgEcNdEXNcPKqr11NIcHkAx6M6qI/eMDOqk151PslHa8lbis6di9Z87aE0ps i8PyrkJGTmgM9cCjXwE8deNseeCmuKYlbPIF+NoxcqtvZstfaMrISwTIEuzV4JHi p13YhDxy4XiC3H6pKHub/jo7UCl+wWtFh9SqpqGgduFX/p6FtUHQJm0S0X/DFFZt C+2MFVSe6HRHE8B7bFz86+619Qd/rU7+806CLCE+NbYlYAKIBYKzWt/bml6VH3RJ OjwXhQqmznWhJjsfD3BUUUpZpHijmylI9gAe2F1oErb8YjRU6gIm7P8hlkOzD7AS TfGHPFq2raQcfAiGdVmvkbvvhvYZXnB3WVsAexrYoqrT9I8eEfRI+7SkL75MLR2E ikzhJS3SHkAUAd7fUVMt7xMwh0jmhsPjWCCqc13m6UUFoXhTaDgKgPGftltN0bI2 g85+enZ3/eca6xh/KxvW =Kf9b -----END PGP SIGNATURE----- Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module and param fixes from Rusty Russell: "Surprising number of fixes this merge window :( The first two are minor fallout from the param rework which went in this merge window. The next three are a series which fixes a longstanding (but never previously reported and unlikely , so no CC stable) race between kallsyms and freeing the init section. Finally, a minor cleanup as our module refcount will now be -1 during unload" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: module: make module_refcount() a signed integer. module: fix race in kallsyms resolution during module load success. module: remove mod arg from module_free, rename module_memfree(). module_arch_freeing_init(): new hook for archs before module->module_init freed. param: fix uninitialized read with CONFIG_DEBUG_LOCK_ALLOC param: initialize store function to NULL if not available.	2015-01-23 06:40:36 +12:00
Thomas Gleixner	2f82c9dc60	x86/acpi: Make acpi_[un]register_gsi_ioapic() depend on CONFIG_X86_LOCAL_APIC Get rid of the defined but not used warnings Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com>	2015-01-22 15:17:41 +01:00
Thomas Gleixner	9c4d9c73dd	x86: Consolidate boot cpu timer setup Now that the APIC bringup is consolidated we can move the setup call for the percpu clock event device to apic_bsp_setup(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211704.162567839@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:56 +01:00
Thomas Gleixner	374aab339f	x86/apic: Reuse apic_bsp_setup() for UP APIC setup Extend apic_bsp_setup() so the same code flow can be used for APIC_init_uniprocessor(). Folded Jiangs fix to provide proper ordering of the UP setup. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211704.084765674@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:56 +01:00
Thomas Gleixner	613c25efbd	x86/smpboot: Sanitize uniprocessor init The UP related setups for local apic are mangled into smp_sanity_check(). That results in duplicate calls to disable_smp() and makes the code hard to follow. Let smp_sanity_check() return dedicated values for the various exit reasons and handle them at the call site. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211703.987833932@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:56 +01:00
Thomas Gleixner	05f7e46d2a	x86/smpboot: Move apic init code to apic.c We better provide proper functions which implement the required code flow in the apic code rather than letting the smpboot code open code it. That allows to make more functions static and confines the APIC functionality to apic.c where it belongs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211703.907616730@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:56 +01:00
Thomas Gleixner	30b8b0066c	init: Get rid of x86isms The UP local API support can be set up from an early initcall. No need for horrible hackery in the init code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211703.827943883@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:56 +01:00
Thomas Gleixner	e714a91f92	x86/apic: Move apic_init_uniprocessor code Move the code to a different place so we can make other functions inline. Preparatory patch for further cleanups. No change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211703.731329006@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:56 +01:00
Thomas Gleixner	ef4c59a4b6	x86/smpboot: Cleanup ioapic handling smpboot is very creative with the ways to disable ioapic. smpboot_clear_io_apic() smpboot_clear_io_apic_irqs() and disable_ioapic_support() serve a similar purpose. smpboot_clear_io_apic_irqs() is the most useless of all functions as it clears a variable which has not been setup yet. Aside of that it has the same ifdef mess and conditionals around the ioapic related code, which can now be removed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211703.650280684@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:56 +01:00
Thomas Gleixner	35e4c6d30e	x86/apic: Sanitize ioapic handling We have proper stubs for the IOAPIC=n case and the setup/enable function have the required checks inside now. Remove the ifdeffery and the copy&pasted conditionals. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>C Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211703.569830549@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	a46f5c8927	x86/ioapic: Add proper checks to setp/enable_IO_APIC() No point to have the same checks at every call site. Add them to the functions, so they can be called unconditionally. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211703.490719938@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	8686608336	x86/ioapic: Provide stub functions for IOAPIC%3Dn To avoid lots of ifdeffery provide proper stubs for setup_IO_APIC(), enable_IO_APIC() and setup_ioapic_dest(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211703.397170414@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	f77aa308e5	x86/smpboot: Move smpboot inlines to code No point for a separate header file. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211703.304126687@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	6d2d49d2cd	x86/x2apic: Use state information for disable Use the state information to simplify the disable logic further. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211703.209387598@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	659006bf3a	x86/x2apic: Split enable and setup function enable_x2apic() is a convoluted unreadable mess because it is used for both enablement in early boot and for setup in cpu_init(). Split the code into x2apic_enable() for enablement and x2apic_setup() for setup of (secondary cpus). Make use of the new state tracking to simplify the logic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211703.129287153@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	44e25ff9e6	x86/x2apic: Disable x2apic from nox2apic setup There is no point in postponing the hardware disablement of x2apic. It can be disabled right away in the nox2apic setup function. Disable it right away and set the state to DISABLED . This allows to remove all the nox2apic conditionals all over the place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211703.051214090@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	12e189d3cf	x86/x2apic: Add proper state tracking Having 3 different variables to track the state is just silly and error prone. Add a proper state tracking variable which covers the three possible states: ON/OFF/DISABLED. We cannot use x2apic_mode for this as this would require to change all users of x2apic_mode with explicit comparisons for a state value instead of treating it as boolean. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211702.955392443@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	62e61633da	x86/x2apic: Clarify remapping mode for x2apic enablement Rename the argument of try_to_enable_x2apic() so the purpose becomes more clear. Make the pr_warning more consistent and avoid the double print of "disabling". Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211702.876012628@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	55eae7de72	x86/x2apic: Move code in conditional region No point in having try_to_enable_x2apic() outside of the CONFIG_X86_X2APIC section and having inline functions and more ifdefs to deal with it. Move the code into the existing ifdef section and remove the inline cruft. Fixup the printk about not enabling interrupt remapping as suggested by Boris. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211702.795388613@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Thomas Gleixner	d524165cb8	x86/apic: Check x2apic early No point in delaying the x2apic detection for the CONFIG_X86_X2APIC=n case to enable_IR_x2apic(). We rather detect that before we try to setup anything there. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211702.702479404@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Thomas Gleixner	9aa1636527	x86/apic: Make disable x2apic work really If x2apic_preenabled is not enabled, then disable_x2apic() is not called from various places which results in x2apic_disabled not being set. So other code pathes can happily reenable the x2apic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211702.621431109@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Thomas Gleixner	2ca5b40479	x86/ioapic: Check x2apic really The x2apic_preenabled flag is just a horrible hack and if X2APIC support is disabled it does not reflect the actual hardware state. Check the hardware instead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211702.541280622@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Thomas Gleixner	bfb0507029	x86/apic: Move x2apic code to one place Having several disjunct pieces of code for x2apic support makes reading the code unnecessarily hard. Move it to one ifdeffed section. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211702.445212133@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Thomas Gleixner	81a46dd824	x86/apic: Make x2apic_mode depend on CONFIG_X86_X2APIC No point in having a static variable around which is always 0. Let the compiler optimize code out if disabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211702.363274310@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Thomas Gleixner	8d80696060	x86/apic: Avoid open coded x2apic detection enable_IR_x2apic() grew a open coded x2apic detection. Implement a proper helper function which shares the code with the already existing x2apic_enabled(). Made it use rdmsrl_safe as suggested by Boris. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211702.285038186@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Borislav Petkov	cfaa790a3f	kvm: Fix CR3_PCID_INVD type on 32-bit arch/x86/kvm/emulate.c: In function ‘check_cr_write’: arch/x86/kvm/emulate.c:3552:4: warning: left shift count >= width of type rsvd = CR3_L_MODE_RESERVED_BITS & ~CR3_PCID_INVD; happens because sizeof(UL) on 32-bit is 4 bytes but we shift it 63 bits to the left. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-21 15:59:09 +01:00
Ingo Molnar	f49028292c	Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Documentation updates. - Miscellaneous fixes. - Preemptible-RCU fixes, including fixing an old bug in the interaction of RCU priority boosting and CPU hotplug. - SRCU updates. - RCU CPU stall-warning updates. - RCU torture-test updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-21 06:12:21 +01:00
Marcelo Tosatti	54750f2cf0	KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue SuSE's 2.6.16 kernel fails to boot if the delta between tsc_timestamp and rdtsc is larger than a given threshold: * If we get more than the below threshold into the future, we rerequest * the real time from the host again which has only little offset then * that we need to adjust using the TSC. * * For now that threshold is 1/5th of a jiffie. That should be good * enough accuracy for completely broken systems, but also give us swing * to not call out to the host all the time. */ #define PVCLOCK_DELTA_MAX ((1000000000ULL / HZ) / 5) Disable masterclock support (which increases said delta) in case the boot vcpu does not use MSR_KVM_SYSTEM_TIME_NEW. Upstreams kernels which support pvclock vsyscalls (and therefore make use of PVCLOCK_STABLE_BIT) use MSR_KVM_SYSTEM_TIME_NEW. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-20 20:38:39 +01:00
Fengguang Wu	69b0049a89	KVM: fix "Should it be static?" warnings from sparse arch/x86/kvm/x86.c:495:5: sparse: symbol 'kvm_read_nested_guest_page' was not declared. Should it be static? arch/x86/kvm/x86.c:646:5: sparse: symbol '__kvm_set_xcr' was not declared. Should it be static? arch/x86/kvm/x86.c:1183:15: sparse: symbol 'max_tsc_khz' was not declared. Should it be static? arch/x86/kvm/x86.c:1237:6: sparse: symbol 'kvm_track_tsc_matching' was not declared. Should it be static? Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-20 20:38:35 +01:00
Palik, Imre	94dd85f6a0	x86/xen: prefer TSC over xen clocksource for dom0 In Dom0's the use of the TSC clocksource (whenever it is stable enough to be used) instead of the Xen clocksource should not cause any issues, as Dom0 VMs never live-migrated. The TSC clocksource is somewhat more efficient than the Xen paravirtualised clocksource, thus it should have higher rating. This patch decreases the rating of the Xen clocksource in Dom0s to 275. Which is half-way between the rating of the TSC clocksource (300) and the hpet clocksource (250). Cc: Anthony Liguori <aliguori@amazon.com> Signed-off-by: Imre Palik <imrep@amazon.de> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-20 18:44:24 +00:00
Miroslav Benes	32b7eb8771	livepatch: change ARCH_HAVE_LIVE_PATCHING to HAVE_LIVE_PATCHING Change ARCH_HAVE_LIVE_PATCHING to HAVE_LIVE_PATCHING in Kconfigs. HAVE_ bools are prevalent there and we should go with the flow. Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2015-01-20 15:02:25 +01:00
K. Y. Srinivasan	32c6590d12	x86, hyperv: Mark the Hyper-V clocksource as being continuous The Hyper-V clocksource is continuous; mark it accordingly. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Acked-by: jasowang@redhat.com Cc: gregkh@linuxfoundation.org Cc: devel@linuxdriverproject.org Cc: olaf@aepfle.de Cc: apw@canonical.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1421108762-3331-1-git-send-email-kys@microsoft.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 14:36:25 +01:00
Juergen Gross	9d34cfdf47	x86: Don't rely on VMWare emulating PAT MSR correctly VMWare seems not to emulate the PAT MSR correctly: reaeding MSR_IA32_CR_PAT returns 0 even after writing another value to it. Commit `bd809af16e` triggers this VMWare bug when the kernel is booted as a VMWare guest. Detect this bug and don't use the read value if it is 0. Fixes: `bd809af16e` "x86: Enable PAT to use cache mode translation tables" Reported-and-tested-by: Jongman Heo <jongman.heo@samsung.com> Acked-by: Alok N Kataria <akataria@vmware.com> Signed-off-by: Juergen Gross <jgross@suse.com> Link: http://lkml.kernel.org/r/1421039745-14335-1-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 14:33:45 +01:00
Oleg Nesterov	7575637ab2	x86, fpu: Fix math_state_restore() race with kernel_fpu_begin() math_state_restore() can race with kernel_fpu_begin() if irq comes right after __thread_fpu_begin(), __save_init_fpu() will overwrite fpu->state we are going to restore. Add 2 simple helpers, kernel_fpu_disable() and kernel_fpu_enable() which simply set/clear in_kernel_fpu, and change math_state_restore() to exclude kernel_fpu_begin() in between. Alternatively we could use local_irq_save/restore, but probably these new helpers can have more users. Perhaps they should disable/enable preemption themselves, in this case we can remove preempt_disable() in __restore_xstate_sig(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: matt.fleming@intel.com Cc: bp@suse.de Cc: pbonzini@redhat.com Cc: luto@amacapital.net Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Suresh Siddha <sbsiddha@gmail.com> Link: http://lkml.kernel.org/r/20150115192028.GD27332@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 13:53:07 +01:00
Oleg Nesterov	33a3ebdc07	x86, fpu: Don't abuse has_fpu in __kernel_fpu_begin/end() Now that we have in_kernel_fpu we can remove __thread_clear_has_fpu() in __kernel_fpu_begin(). And this allows to replace the asymmetrical and nontrivial use_eager_fpu + tsk_used_math check in kernel_fpu_end() with the same __thread_has_fpu() check. The logic becomes really simple; if _begin() does save() then _end() needs restore(), this is controlled by __thread_has_fpu(). Otherwise they do clts/stts unless use_eager_fpu(). Not only this makes begin/end symmetrical and imo more understandable, potentially this allows to change irq_fpu_usable() to avoid all other checks except "in_kernel_fpu". Also, with this patch __kernel_fpu_end() does restore_fpu_checking() and WARNs if it fails instead of math_state_restore(). I think this looks better because we no longer need __thread_fpu_begin(), and it would be better to report the failure in this case. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: matt.fleming@intel.com Cc: bp@suse.de Cc: pbonzini@redhat.com Cc: luto@amacapital.net Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Suresh Siddha <sbsiddha@gmail.com> Link: http://lkml.kernel.org/r/20150115192005.GC27332@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 13:53:07 +01:00
Oleg Nesterov	14e153ef75	x86, fpu: Introduce per-cpu in_kernel_fpu state interrupted_kernel_fpu_idle() tries to detect if kernel_fpu_begin() is safe or not. In particular it should obviously deny the nested kernel_fpu_begin() and this logic looks very confusing. If use_eager_fpu() == T we rely on a) __thread_has_fpu() check in interrupted_kernel_fpu_idle(), and b) on the fact that _begin() does __thread_clear_has_fpu(). Otherwise we demand that the interrupted task has no FPU if it is in kernel mode, this works because __kernel_fpu_begin() does clts() and interrupted_kernel_fpu_idle() checks X86_CR0_TS. Add the per-cpu "bool in_kernel_fpu" variable, and change this code to check/set/clear it. This allows to do more cleanups and fixes, see the next changes. The patch also moves WARN_ON_ONCE() under preempt_disable() just to make this_cpu_read() look better, this is not really needed. And in fact I think we should move it into __kernel_fpu_begin(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: matt.fleming@intel.com Cc: bp@suse.de Cc: pbonzini@redhat.com Cc: luto@amacapital.net Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Suresh Siddha <sbsiddha@gmail.com> Link: http://lkml.kernel.org/r/20150115191943.GB27332@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 13:53:07 +01:00

1 2 3 4 5 ...

20895 Commits