linux

Author	SHA1	Message	Date
Linus Torvalds	b9b16a7922	Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpufeature update from Ingo Molnar: "Two refinements to clflushopt support" * 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpufeature: If we disable CLFLUSH, we should disable CLFLUSHOPT x86, cpufeature: Rename X86_FEATURE_CLFLSH to X86_FEATURE_CLFLUSH	2014-04-01 10:11:21 -07:00
Ingo Molnar	b8c89c6a0d	Fix the code to tell when a CMCI storm ends by actually looking at the machine check banks when we poll while interrupts are disabled. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABAgAGBQJTOZzGAAoJEKurIx+X31iBD6IP/2zd94hyif6gl8qGEqRUO0Ts 3Bo+P8DhFXH7VY5qvN9dP4raOfQjb8B2pFCShq6c4nr1lv6AKb4Gehf9EcQnXF20 z10vZuKGyJfyck0dKr3xE35V34DAARNoCJCWoSwutbqEaCQT/7J/cymkgdqUgHo9 f7XIeMy51dAVw7IeYleNLlVdDet1s5BuCPVdJOIqirGA9qfdDUUVuYmBB/mrbneI uftl27zd+KM4tcuNkVxrsRbZ8q/IOu1mhtb09aDpeKLLRfl282XsLr1j+52lbM8t RPieFO56jGlZn5015aWNTXARqhdKQ8wnUzttEJCGkl+p7bqNp3pQmpbNUD2ZpJF8 euR89OaxaOmU8Y1anfw/pA1G7WumkTG9WjOHRQFzSERPq1GF+YLlkaSwVTkgT9Ny 6ubvGr9WYPTxzNeDqUeu4HjCKOS1soWak7zIoXbu2Yu59scCcITE/cKw3y76BZ2O 2gKkBmBxlaZt8DBmPR2W4A+Uhb7lhJWvrYdCwpnrMA0nDCKOEDs5g5MVr5xMAEJ4 Zy1X6hwOwvmOmJp/DdHrsmfsRhCRV/itjb0vpfEF+MzWYbECD8Zyslb6YP5E5ZoV eWoKglC/W25907YJZiX2yzsCi+GNjLO/0V5ZJHwCxrWPSUnSJMeXjgbslIEQyOlC 0Gkj2DUIrgsGJDhwTmAJ =ZqzJ -----END PGP SIGNATURE----- Merge tag 'please-pull-cmci-storm' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/urgent Pull RAS/CMCI storm code fix from Tony Luck: "Fix the code to tell when a CMCI storm ends by actually looking at the machine check banks when we poll while interrupts are disabled." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-01 15:13:16 +02:00
Linus Torvalds	c0fc3cbac0	Merge branch 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 hyperv change from Ingo Molnar: "Skip the timer_irq_works() check on hyperv systems" * 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, hyperv: Bypass the timer_irq_works() check	2014-03-31 12:28:38 -07:00
Linus Torvalds	918d80a136	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu handling changes from Ingo Molnar: "Bigger changes: - Intel CPU hardware-enablement: new vector instructions support (AVX-512), by Fenghua Yu. - Support the clflushopt instruction and use it in appropriate places. clflushopt is similar to clflush but with more relaxed ordering, by Ross Zwisler. - MSR accessor cleanups, by Borislav Petkov. - 'forcepae' boot flag for those who have way too much time to spend on way too old Pentium-M systems and want to live way too dangerously, by Chris Bainbridge" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC x86, intel: Make MSR_IA32_MISC_ENABLE bit constants systematic x86, Intel: Convert to the new bit access MSR accessors x86, AMD: Convert to the new bit access MSR accessors x86: Add another set of MSR accessor functions x86: Use clflushopt in drm_clflush_virt_range x86: Use clflushopt in drm_clflush_page x86: Use clflushopt in clflush_cache_range x86: Add support for the clflushopt instruction x86, AVX-512: Enable AVX-512 States Context Switch x86, AVX-512: AVX-512 Feature Detection	2014-03-31 12:00:45 -07:00
Linus Torvalds	8c292f1174	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Main changes: Kernel side changes: - Add SNB/IVB/HSW client uncore memory controller support (Stephane Eranian) - Fix various x86/P4 PMU driver bugs (Don Zickus) Tooling, user visible changes: - Add several futex 'perf bench' microbenchmarks (Davidlohr Bueso) - Speed up thread map generation (Don Zickus) - Introduce 'perf kvm --list-cmds' command line option for use by scripts (Ramkumar Ramachandra) - Print the evsel name in the annotate stdio output, prep to fix support outputting annotation for multiple events, not just for the first one (Arnaldo Carvalho de Melo) - Allow setting preferred callchain method in .perfconfig (Jiri Olsa) - Show in what binaries/modules 'perf probe's are set (Masami Hiramatsu) - Support distro-style debuginfo for uprobe in 'perf probe' (Masami Hiramatsu) Tooling, internal changes and fixes: - Use tid in mmap/mmap2 events to find maps (Don Zickus) - Record the reason for filtering an address_location (Namhyung Kim) - Apply all filters to an addr_location (Namhyung Kim) - Merge al->filtered with hist_entry->filtered in report/hists (Namhyung Kim) - Fix memory leak when synthesizing thread records (Namhyung Kim) - Use ui__has_annotation() in 'report' (Namhyung Kim) - hists browser refactorings to reuse code accross UIs (Namhyung Kim) - Add support for the new DWARF unwinder library in elfutils (Jiri Olsa) - Fix build race in the generation of bison files (Jiri Olsa) - Further streamline the feature detection display, trimming it a bit to show just the libraries detected, using VF=1 gets a more verbose output, showing the less interesting feature checks as well (Jiri Olsa). - Check compatible symtab type before loading dso (Namhyung Kim) - Check return value of filename__read_debuglink() (Stephane Eranian) - Move some hashing and fs related code from tools/perf/util/ to tools/lib/ so that it can be used by more tools/ living utilities (Borislav Petkov) - Prepare DWARF unwinding code for using an elfutils alternative unwinding library (Jiri Olsa) - Fix DWARF unwind max_stack processing (Jiri Olsa) - Add dwarf unwind 'perf test' entry (Jiri Olsa) - 'perf probe' improvements including memory leak fixes, sharing the intlist class with other tools, uprobes/kprobes code sharing and use of ref_reloc_sym (Masami Hiramatsu) - Shorten sample symbol resolving by adding cpumode to struct addr_location (Arnaldo Carvalho de Melo) - Fix synthesizing mmaps for threads (Don Zickus) - Fix invalid output on event group stdio report (Namhyung Kim) - Fixup header alignment in 'perf sched latency' output (Ramkumar Ramachandra) - Fix off-by-one error in 'perf timechart record' argv handling (Ramkumar Ramachandra) Tooling, cleanups: - Remove unused thread__find_map function (Jiri Olsa) - Remove unused simple_strtoul() function (Ramkumar Ramachandra) Tooling, documentation updates: - Update function names in debug messages (Ramkumar Ramachandra) - Update some code references in design.txt (Ramkumar Ramachandra) - Clarify load-latency information in the 'perf mem' docs (Andi Kleen) - Clarify x86 register naming in 'perf probe' docs (Andi Kleen)" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (96 commits) perf tools: Remove unused simple_strtoul() function perf tools: Update some code references in design.txt perf evsel: Update function names in debug messages perf tools: Remove thread__find_map function perf annotate: Print the evsel name in the stdio output perf report: Use ui__has_annotation() perf tools: Fix memory leak when synthesizing thread records perf tools: Use tid in mmap/mmap2 events to find maps perf report: Merge al->filtered with hist_entry->filtered perf symbols: Apply all filters to an addr_location perf symbols: Record the reason for filtering an address_location perf sched: Fixup header alignment in 'latency' output perf timechart: Fix off-by-one error in 'record' argv handling perf machine: Factor machine__find_thread to take tid argument perf tools: Speed up thread map generation perf kvm: introduce --list-cmds for use by scripts perf ui hists: Pass evsel to hpp->header/width functions explicitly perf symbols: Introduce thread__find_cpumode_addr_location perf session: Change header.misc dump from decimal to hex perf ui/tui: Reuse generic __hpp__fmt() code ...	2014-03-31 11:13:25 -07:00
Chen, Gong	27f6c573e0	x86, CMCI: Add proper detection of end of CMCI storms When CMCI storm persists for a long time(at least beyond predefined threshold. It's 30 seconds for now), we can watch CMCI storm is detected immediately after it subsides. ... Dec 10 22:04:29 kernel: CMCI storm detected: switching to poll mode Dec 10 22:04:59 kernel: CMCI storm subsided: switching to interrupt mode Dec 10 22:04:59 kernel: CMCI storm detected: switching to poll mode Dec 10 22:05:29 kernel: CMCI storm subsided: switching to interrupt mode ... The problem is that our logic that determines that the storm has ended is incorrect. We announce the end, re-enable interrupts and realize that the storm is still going on, so we switch back to polling mode. Rinse, repeat. When a storm happens we disable signaling of errors via CMCI and begin polling machine check banks instead. If we find any logged errors, then we need to set a per-cpu flag so that our per-cpu tests that check whether the storm is ongoing will see that errors are still being logged independently of whether mce_notify_irq() says that the error has been fully processed. cmci_clear() is not the right tool to disable a bank. It disables the interrupt for the bank as desired, but it also clears the bit for this bank in "mce_banks_owned" so we will skip the bank when polling (so we fail to see that the storm continues because we stop looking). New cmci_storm_disable_banks() just disables the interrupt while allowing polling to continue. Reported-by: William Dauchy <wdauchy@gmail.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2014-03-28 13:40:16 -07:00
Jason Wang	ca3ba2a2f4	x86, hyperv: Bypass the timer_irq_works() check This patch bypass the timer_irq_works() check for hyperv guest since: - It was guaranteed to work. - timer_irq_works() may fail sometime due to the lpj calibration were inaccurate in a hyperv guest or a buggy host. In the future, we should get the tsc frequency from hypervisor and use preset lpj instead. [ hpa: I would prefer to not defer things to "the future" in the future... ] Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: <stable@vger.kernel.org> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: http://lkml.kernel.org/r/1393558229-14755-1-git-send-email-jasowang@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-27 11:02:45 -07:00
Chris Bainbridge	69f2366c94	x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M Many Pentium M systems disable PAE but may have a functionally usable PAE implementation. This adds the "forcepae" parameter which bypasses the boot check for PAE, and sets the CPU as being PAE capable. Using this parameter will taint the kernel with TAINT_CPU_OUT_OF_SPEC. Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com> Link: http://lkml.kernel.org/r/20140307114040.GA4997@localhost Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-03-20 16:31:54 -07:00
Dave Jones	8c90487cdc	Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC, so we can repurpose the flag to encompass a wider range of pushing the CPU beyond its warrany. Signed-off-by: Dave Jones <davej@fedoraproject.org> Link: http://lkml.kernel.org/r/20140226154949.GA770@redhat.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-03-20 16:28:09 -07:00
Srivatsa S. Bhat	a8c17c2951	x86, amd, uncore: Fix CPU hotplug callback registration Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Instead, the correct and race-free way of performing the callback registration is: cpu_notifier_register_begin(); for_each_online_cpu(cpu) init_cpu(cpu); /* Note the use of the double underscored version of the API */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_notifier_register_done(); Fix the amd-uncore code in x86 by using this latter form of callback registration. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-20 13:43:43 +01:00
Srivatsa S. Bhat	fd537e56f6	x86, intel, rapl: Fix CPU hotplug callback registration Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Instead, the correct and race-free way of performing the callback registration is: cpu_notifier_register_begin(); for_each_online_cpu(cpu) init_cpu(cpu); /* Note the use of the double underscored version of the API */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_notifier_register_done(); Fix the intel rapl code in x86 by using this latter form of callback registration. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-20 13:43:43 +01:00
Srivatsa S. Bhat	8c60ea1464	x86, intel, cacheinfo: Fix CPU hotplug callback registration Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Instead, the correct and race-free way of performing the callback registration is: cpu_notifier_register_begin(); for_each_online_cpu(cpu) init_cpu(cpu); /* Note the use of the double underscored version of the API */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_notifier_register_done(); Fix the intel cacheinfo code in x86 by using this latter form of callback registration. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-20 13:43:43 +01:00
Srivatsa S. Bhat	047868ce29	x86, amd, ibs: Fix CPU hotplug callback registration Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Instead, the correct and race-free way of performing the callback registration is: cpu_notifier_register_begin(); for_each_online_cpu(cpu) init_cpu(cpu); /* Note the use of the double underscored version of the API */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_notifier_register_done(); Fix the amd-ibs code in x86 by using this latter form of callback registration. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-20 13:43:43 +01:00
Srivatsa S. Bhat	7b7139d4ab	x86, therm_throt.c: Remove unused therm_cpu_lock After fixing the CPU hotplug callback registration code, the callbacks invoked for each online CPU, during the initialization phase in thermal_throttle_init_device(), can no longer race with the actual CPU hotplug notifier callbacks (in thermal_throttle_cpu_callback). Hence the therm_cpu_lock is unnecessary now. Remove it. Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-20 13:43:43 +01:00
Srivatsa S. Bhat	4e6192bbec	x86, therm_throt.c: Fix CPU hotplug callback registration Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Instead, the correct and race-free way of performing the callback registration is: cpu_notifier_register_begin(); for_each_online_cpu(cpu) init_cpu(cpu); /* Note the use of the double underscored version of the API */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_notifier_register_done(); Fix the thermal throttle code in x86 by using this latter form of callback registration. Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-20 13:43:42 +01:00
Srivatsa S. Bhat	82a8f131aa	x86, mce: Fix CPU hotplug callback registration Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Instead, the correct and race-free way of performing the callback registration is: cpu_notifier_register_begin(); for_each_online_cpu(cpu) init_cpu(cpu); /* Note the use of the double underscored version of the API */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_notifier_register_done(); Fix the mce code in x86 by using this latter form of callback registration. Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-20 13:43:42 +01:00
Srivatsa S. Bhat	2c666adacc	x86, intel, uncore: Fix CPU hotplug callback registration Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Instead, the correct and race-free way of performing the callback registration is: cpu_notifier_register_begin(); for_each_online_cpu(cpu) init_cpu(cpu); /* Note the use of the double underscored version of the API */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_notifier_register_done(); Fix the uncore code in intel-x86 by using this latter form of callback registration. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-20 13:43:42 +01:00
Linus Torvalds	b44eeb4d47	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc smaller fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix leak in uncore_type_init failure paths perf machine: Use map as success in ip__resolve_ams perf symbols: Fix crash in elf_section_by_name perf trace: Decode architecture-specific signal numbers	2014-03-16 10:41:21 -07:00
Stephane Eranian	81827ed8d8	perf/x86/uncore: Fix missing end markers for SNB/IVB/HSW IMC PMU This patch fixes a bug with the SNB/IVB/HSW uncore mmeory controller support. The PCI Ids tables for the memory controller were missing end markers. That could cause random crashes on boot during or after PCI device registration. Signed-off-by: Stephane Erainan <eranian@google.com> Cc: peterz@infradead.org Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: ak@linux.intel.com Link: http://lkml.kernel.org/r/20140313120436.GA14236@quad Signed-off-by: Ingo Molnar <mingo@kernel.org> --	2014-03-14 09:25:25 +01:00
H. Peter Anvin	0b131be8d4	x86, intel: Make MSR_IA32_MISC_ENABLE bit constants systematic Replace somewhat arbitrary constants for bits in MSR_IA32_MISC_ENABLE with verbose but systematic ones. Add _BIT defines for all the rest of them, too. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-13 15:55:46 -07:00
Borislav Petkov	c0a639ad0b	x86, Intel: Convert to the new bit access MSR accessors ... and save some lines of code. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1394384725-10796-4-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-13 15:35:09 -07:00
Borislav Petkov	8f86a7373a	x86, AMD: Convert to the new bit access MSR accessors ... and save us a bunch of code. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1394384725-10796-3-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-13 15:35:03 -07:00
Thomas Gleixner	ffb12cf002	Merge branch 'irq/for-gpio' into irq/core Merge the request/release callbacks which are in a separate branch for consumption by the gpio folks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-03-12 16:01:07 +01:00
Stephane Eranian	4191c29f05	perf/x86/uncore: Fix compilation warning in snb_uncore_imc_init_box() This patch fixes a compilation problem (unused variable) with the new SNB/IVB/HSW uncore IMC code. [ In -v2 we simplify the fix as suggested by Peter Zjilstra. ] Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140311235329.GA28624@quad Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-03-12 10:49:13 +01:00
Dave Jones	09df7c4c80	x86: Remove CONFIG_X86_OOSTORE This was an optimization that made memcpy type benchmarks a little faster on ancient (Circa 1998) IDT Winchip CPUs. In real-life workloads, it wasn't even noticable, and I doubt anyone is running benchmarks on 16 year old silicon any more. Given this code has likely seen very little use over the last decade, let's just remove it. Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-03-11 10:16:18 -07:00
Dave Jones	b7b4839d93	perf/x86: Fix leak in uncore_type_init failure paths The error path of uncore_type_init() frees up any allocations that were made along the way, but it relies upon type->pmus being set, which only happens if the function succeeds. As type->pmus remains null in this case, the call to uncore_type_exit will do nothing. Moving the assignment earlier will allow us to actually free those allocations should something go awry. Signed-off-by: Dave Jones <davej@fedoraproject.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140306172028.GA552@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-03-11 11:59:34 +01:00
Dongsheng Yang	ef11dadb83	perf/x86/uncore: Add __init for uncore_cpumask_init() Commit: `411cf180fa` perf/x86/uncore: fix initialization of cpumask introduced the function uncore_cpumask_init(), which is only called in __init intel_uncore_init(). But it is not marked with __init, which produces the following warning: WARNING: vmlinux.o(.text+0x2464a): Section mismatch in reference from the function uncore_cpumask_init() to the function .init.text:uncore_cpu_setup() The function uncore_cpumask_init() references the function __init uncore_cpu_setup(). This is often because uncore_cpumask_init lacks a __init annotation or the annotation of uncore_cpu_setup is wrong. This patch marks uncore_cpumask_init() with __init. Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Acked-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: http://lkml.kernel.org/r/1394013516-4964-1-git-send-email-yangds.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-03-11 11:57:56 +01:00
Steven Rostedt	198d208df4	x86: Keep thread_info on thread stack in x86_32 x86_64 uses a per_cpu variable kernel_stack to always point to the thread stack of current. This is where the thread_info is stored and is accessed from this location even when the irq or exception stack is in use. This removes the complexity of having to maintain the thread info on the stack when interrupts are running and having to copy the preempt_count and other fields to the interrupt stack. x86_32 uses the old method of copying the thread_info from the thread stack to the exception stack just before executing the exception. Having the two different requires #ifdefs and also the x86_32 way is a bit of a pain to maintain. By converting x86_32 to the same method of x86_64, we can remove #ifdefs, clean up the x86_32 code a little, and remove the overhead of the copy. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20110806012354.263834829@goodmis.org Link: http://lkml.kernel.org/r/20140206144321.852942014@goodmis.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-03-06 16:56:55 -08:00
Thomas Gleixner	76d388cd72	x86: hyperv: Fixup the (brain) damage caused by the irq cleanup Compiling last minute changes without setting the proper config options is not really clever. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-03-05 13:42:14 +01:00
Thomas Gleixner	13b5be56d1	x86: hyperv: Fix brown paperbag typos reported by Fenguangs build robot Reported-by: fengguang.wu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linuxdrivers <devel@linuxdriverproject.org> Cc: x86 <x86@kernel.org>	2014-03-04 23:53:33 +01:00
Thomas Gleixner	3c433679ab	x86: hyperv: Make it build with CONFIG_HYPERV=m again Commit `1aec16967` (x86: Hyperv: Cleanup the irq mess) removed the ability to build the hyperv stuff as a module. Bring it back. Reported-by: fengguang.wu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linuxdrivers <devel@linuxdriverproject.org> Cc: x86 <x86@kernel.org>	2014-03-04 23:41:44 +01:00
Thomas Gleixner	1aec169673	x86: Hyperv: Cleanup the irq mess The vmbus/hyperv interrupt handling is another complete trainwreck and probably the worst of all currently in tree. If CONFIG_HYPERV=y then the interrupt delivery to the vmbus happens via the direct HYPERVISOR_CALLBACK_VECTOR. So far so good, but: The driver requests first a normal device interrupt. The only reason to do so is to increment the interrupt stats of that device interrupt. For no reason it also installs a private flow handler. We have proper accounting mechanisms for direct vectors, but of course it's too much effort to add that 5 lines of code. Aside of that the alloc_intr_gate() is not protected against reallocation which makes module reload impossible. Solution to the problem is simple to rip out the whole mess and implement it correctly. First of all move all that code to arch/x86/kernel/cpu/mshyperv.c and merily install the HYPERVISOR_CALLBACK_VECTOR with proper reallocation protection and use the proper direct vector accounting mechanism. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linuxdrivers <devel@linuxdriverproject.org> Cc: x86 <x86@kernel.org> Link: http://lkml.kernel.org/r/20140223212739.028307673@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-03-04 17:37:54 +01:00
Greg Kroah-Hartman	13df797743	Merge 3.14-rc5 into driver-core-next We want the fixes in here.	2014-03-02 20:09:08 -08:00
H. Peter Anvin	da4aaa7d86	x86, cpufeature: If we disable CLFLUSH, we should disable CLFLUSHOPT If we explicitly disable the use of CLFLUSH, we should disable the use of CLFLUSHOPT as well. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-jtdv7btppr4jgzxm3sxx1e74@git.kernel.org	2014-02-27 08:36:31 -08:00
H. Peter Anvin	840d2830e6	x86, cpufeature: Rename X86_FEATURE_CLFLSH to X86_FEATURE_CLFLUSH We call this "clflush" in /proc/cpuinfo, and have cpu_has_clflush()... let's be consistent and just call it that. Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alan Cox <alan@linux.intel.com> Link: http://lkml.kernel.org/n/tip-mlytfzjkvuf739okyn40p8a5@git.kernel.org	2014-02-27 08:31:30 -08:00
H. Peter Anvin	b5660ba76b	x86, platforms: Remove NUMAQ The NUMAQ support seems to be unmaintained, remove it. Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: David Rientjes <rientjes@google.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/n/530CFD6C.7040705@zytor.com	2014-02-27 08:07:39 -08:00
Peter Zijlstra	c347a2f179	perf/x86: Add a few more comments Add a few comments on the ->add(), ->del() and ->*_txn() implementation. Requested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-he3819318c245j7t5e1e22tr@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-02-27 12:43:25 +01:00
Ingo Molnar	ff5a7088f0	Merge branch 'perf/urgent' into perf/core Merge the latest fixes before queueing up new changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-02-27 12:41:17 +01:00
Peter Zijlstra	26e61e8939	perf/x86: Fix event scheduling Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures, with perf WARN_ON()s triggering. He also provided traces of the failures. This is I think the relevant bit: > pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_disable: x86_pmu_disable > pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_state: Events: { > pec_1076_warn-2804 [000] d... 147.926156: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null)) > pec_1076_warn-2804 [000] d... 147.926158: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926159: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1 > pec_1076_warn-2804 [000] d... 147.926161: x86_pmu_state: Assignment: { > pec_1076_warn-2804 [000] d... 147.926162: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926163: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926166: collect_events: Adding event: 1 (ffff880119ec8800) So we add the insn:p event (fd[23]). At this point we should have: n_events = 2, n_added = 1, n_txn = 1 > pec_1076_warn-2804 [000] d... 147.926170: collect_events: Adding event: 0 (ffff8800c9e01800) > pec_1076_warn-2804 [000] d... 147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00) We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]). These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so that's not visible. group_sched_in() pmu->start_txn() /* nop - BP pmu */ event_sched_in() event->pmu->add() So here we should end up with: 0: n_events = 3, n_added = 2, n_txn = 2 4: n_events = 4, n_added = 3, n_txn = 3 But seeing the below state on x86_pmu_enable(), the must have failed, because the 0 and 4 events aren't there anymore. Looking at group_sched_in(), since the BP is the leader, its event_sched_in() must have succeeded, for otherwise we would not have seen the sibling adds. But since neither 0 or 4 are in the below state; their event_sched_in() must have failed; but I don't see why, the complete state: 0,0,1:p,4 fits perfectly fine on a core2. However, since we try and schedule 4 it means the 0 event must have succeeded! Therefore the 4 event must have failed, its failure will have put group_sched_in() into the fail path, which will call: event_sched_out() event->pmu->del() on 0 and the BP event. Now x86_pmu_del() will reduce n_events; but it will not reduce n_added; giving what we see below: n_event = 2, n_added = 2, n_txn = 2 > pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_enable: x86_pmu_enable > pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_state: Events: { > pec_1076_warn-2804 [000] d... 147.926179: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null)) > pec_1076_warn-2804 [000] d... 147.926181: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926182: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2 > pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: Assignment: { > pec_1076_warn-2804 [000] d... 147.926186: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: 1->0 tag: 1 config: 1 (ffff880119ec8800) > pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0 So the problem is that x86_pmu_del(), when called from a group_sched_in() that fails (for whatever reason), and without x86_pmu TXN support (because the leader is !x86_pmu), will corrupt the n_added state. Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Stephane Eranian <eranian@google.com> Cc: Dave Jones <davej@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-02-27 12:38:02 +01:00
Linus Torvalds	9b3e7c9b9a	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixlets from all around the place" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/uncore: Fix IVT/SNB-EP uncore CBOX NID filter table perf/x86: Correctly use FEATURE_PDCM perf, nmi: Fix unknown NMI warning perf trace: Fix ioctl 'request' beautifier build problems on !(i386 \|\| x86_64) arches perf trace: Add fallback definition of EFD_SEMAPHORE perf list: Fix checking for supported events on older kernels perf tools: Handle PERF_RECORD_HEADER_EVENT_TYPE properly perf probe: Do not add offset twice to uprobe address perf/x86: Fix Userspace RDPMC switch perf/x86/intel/p6: Add userspace RDPMC quirk for PPro	2014-02-22 12:11:54 -08:00
Stephane Eranian	337397f3af	perf/x86/uncore: Fix IVT/SNB-EP uncore CBOX NID filter table This patch updates the CBOX PMU filters mapping tables for SNB-EP and IVT (model 45 and 62 respectively). The NID umask always comes in addition to another umask. When set, the NID filter is applied. The current mapping tables were missing some code/umask combinations to account for the NID umask. This patch fixes that. Cc: mingo@elte.hu Cc: ak@linux.intel.com Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140219131018.GA24475@quad Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-02-21 22:09:01 +01:00
Peter Zijlstra	c9b08884c9	perf/x86: Correctly use FEATURE_PDCM The current code simply assumes Intel Arch PerfMon v2+ to have the IA32_PERF_CAPABILITIES MSR; the SDM specifies that we should check CPUID[1].ECX[15] (aka, FEATURE_PDCM) instead. This was found by KVM which implements v2+ but didn't provide the capabilities MSR. Change the code to DTRT; KVM will also implement the MSR and return 0. Cc: pbonzini@redhat.com Reported-by: "Michael S. Tsirkin" <mst@redhat.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140203132903.GI8874@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-02-21 22:09:01 +01:00
Markus Metzger	a3ef2229c9	perf, nmi: Fix unknown NMI warning When using BTS on Core i7-4*, I get the below kernel warning. $ perf record -c 1 -e branches:u ls Message from syslogd@labpc1501 at Nov 11 15:49:25 ... kernel:[ 438.317893] Uhhuh. NMI received for unknown reason 31 on CPU 2. Message from syslogd@labpc1501 at Nov 11 15:49:25 ... kernel:[ 438.317920] Do you have a strange power saving mode enabled? Message from syslogd@labpc1501 at Nov 11 15:49:25 ... kernel:[ 438.317945] Dazed and confused, but trying to continue Make intel_pmu_handle_irq() take the full exit path when returning early. Cc: eranian@google.com Cc: peterz@infradead.org Cc: mingo@kernel.org Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1392425048-5309-1-git-send-email-andi@firstfloor.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-02-21 22:09:01 +01:00
Stephane Eranian	e9d9768824	perf/x86/uncore: use MiB unit for events for SNB/IVB/HSW IMC This patch makes perf use Mebibytes to display the counts of uncore_imc/data_reads/ and uncore_imc/data_writes. 1MiB = 1024*1024 bytes. Cc: mingo@elte.hu Cc: acme@redhat.com Cc: ak@linux.intel.com Cc: zheng.z.yan@intel.com Cc: peterz@infradead.org Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1392132015-14521-9-git-send-email-eranian@google.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-02-21 21:49:08 +01:00
Stephane Eranian	ced2efb099	perf/x86/uncore: add hrtimer to SNB uncore IMC PMU This patch is needed because that PMU uses 32-bit free running counters with no interrupt capabilities. On SNB/IVB/HSW, we used 20GB/s theoretical peak to calculate the hrtimer timeout necessary to avoid missing an overflow. That delay is set to 5s to be on the cautious side. The SNB IMC uses free running counters, which are handled via pseudo fixed counters. The SNB IMC PMU implementation supports an arbitrary number of events, because the counters are read-only. Therefore it is not possible to track active counters. Instead we put active events on a linked list which is then used by the hrtimer handler to update the SW counts. Cc: mingo@elte.hu Cc: acme@redhat.com Cc: ak@linux.intel.com Cc: zheng.z.yan@intel.com Cc: peterz@infradead.org Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1392132015-14521-8-git-send-email-eranian@google.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-02-21 21:49:08 +01:00
Stephane Eranian	b9e1ab6d4c	perf/x86/uncore: add SNB/IVB/HSW client uncore memory controller support This patch adds a new uncore PMU for Intel SNB/IVB/HSW client CPUs. It adds the Integrated Memory Controller (IMC) PMU. This new PMU provides a set of events to measure memory bandwidth utilization. The IMC on those processor is PCI-space based. This patch exposes a new uncore PMU on those processor: uncore_imc Two new events are defined: - name: data_reads - code: 0x1 - unit: 64 bytes - number of full cacheline read requests to the IMC - name: data_writes - code: 0x2 - unit: 64 bytes - number of full cacheline write requests to the IMC Documentation available at: http://software.intel.com/en-us/articles/monitoring-integrated-memory-controller-requests-in-the-2nd-3rd-and-4th-generation-intel Cc: mingo@elte.hu Cc: acme@redhat.com Cc: ak@linux.intel.com Cc: zheng.z.yan@intel.com Cc: peterz@infradead.org Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1392132015-14521-7-git-send-email-eranian@google.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-02-21 21:49:08 +01:00
Stephane Eranian	001e413f7e	perf/x86/uncore: move uncore_event_to_box() and uncore_pmu_to_box() Move a couple of functions around to avoid forward declarations when we add code later on. Cc: mingo@elte.hu Cc: acme@redhat.com Cc: ak@linux.intel.com Cc: zheng.z.yan@intel.com Cc: peterz@infradead.org Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1392132015-14521-6-git-send-email-eranian@google.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-02-21 21:49:07 +01:00
Stephane Eranian	79859cce5a	perf/x86/uncore: make hrtimer timeout configurable per box This patch makes the hrtimer timeout configurable per PMU box. Not all counters have necessarily the same width and rate, thus the default timeout of 60s may need to be adjusted. This patch adds box->hrtimer_duration. It is set to default when the box is allocated. It can be overriden when the box is initialized. Cc: mingo@elte.hu Cc: acme@redhat.com Cc: ak@linux.intel.com Cc: zheng.z.yan@intel.com Cc: peterz@infradead.org Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1392132015-14521-5-git-send-email-eranian@google.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-02-21 21:49:07 +01:00
Stephane Eranian	d64b25b6a0	perf/x86/uncore: add ability to customize pmu callbacks This patch enables custom struct pmu callbacks per uncore PMU types. This feature may be used to simplify counter setup for certain uncore PMUs which have free running counters for instance. It becomes possible to bypass the event scheduling phase of the configuration. Cc: mingo@elte.hu Cc: acme@redhat.com Cc: ak@linux.intel.com Cc: zheng.z.yan@intel.com Cc: peterz@infradead.org Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1392132015-14521-3-git-send-email-eranian@google.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-02-21 21:49:07 +01:00
Stephane Eranian	411cf180fa	perf/x86/uncore: fix initialization of cpumask On certain processors, the uncore PMU boxes may only be msr-bsed or PCI-based. But in both cases, the cpumask, suggesting on which CPUs to monitor to get full coverage of the particular PMU, must be created. However with the current code base, the cpumask was only created on processor which had at least one MSR-based uncore PMU. This patch removes that restriction and ensures the cpumask is created even when there is no msr-based PMU. For instance, on SNB client where only a PCI-based memory controller PMU is supported. Cc: mingo@elte.hu Cc: acme@redhat.com Cc: ak@linux.intel.com Cc: zheng.z.yan@intel.com Cc: peterz@infradead.org Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1392132015-14521-2-git-send-email-eranian@google.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-02-21 21:49:07 +01:00
Ard Biesheuvel	2b9c1f0327	x86: align x86 arch with generic CPU modalias handling The x86 CPU feature modalias handling existed before it was reimplemented generically. This patch aligns the x86 handling so that it (a) reuses some more code that is now generic; (b) uses the generic format for the modalias module metadata entry, i.e., it now uses 'cpu:type:x86,venVVVVfamFFFFmodMMMM:feature:,XXXX,YYYY' instead of the 'x86cpu:vendor:VVVV👪FFFF:model:MMMM:feature:,XXXX,YYYY' that was used before. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-02-18 12:45:38 -08:00
H. Peter Anvin	03bbd596ac	x86, smap: Don't enable SMAP if CONFIG_X86_SMAP is disabled If SMAP support is not compiled into the kernel, don't enable SMAP in CR4 -- in fact, we should clear it, because the kernel doesn't contain the proper STAC/CLAC instructions for SMAP support. Found by Fengguang Wu's test system. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Link: http://lkml.kernel.org/r/20140213124550.GA30497@localhost Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v3.7+	2014-02-13 07:50:25 -08:00
Don Zickus	90ed5b0fa5	perf/x86/p4: Block PMIs on init to prevent a stream of unkown NMIs A bunch of unknown NMIs have popped up on a Pentium4 recently when booting into a kdump kernel. This was exposed because the watchdog timer went from 60 seconds down to 10 seconds (increasing the ability to reproduce this problem). What is happening is on boot up of the second kernel (the kdump one), the previous nmi_watchdogs were enabled on thread 0 and thread 1. The second kernel only initializes one cpu but the perf counter on thread 1 still counts. Normally in a kdump scenario, the other cpus are blocking in an NMI loop, but more importantly their local apics have the performance counters disabled (iow LVTPC is masked). So any counters that fire are masked and never get through to the second kernel. However, on a P4 the local apic is shared by both threads and thread1's PMI (despite being configured to only interrupt thread1) will generate an NMI on thread0. Because thread0 knows nothing about this NMI, it is seen as an unknown NMI. This would be fine because it is a kdump kernel, strange things happen what is the big deal about a single unknown NMI. Unfortunately, the P4 comes with another quirk: clearing the overflow bit to prevent a stream of NMIs. This is the problem. The kdump kernel can not execute because of the endless NMIs that happen. To solve this, I instrumented the p4 perf init code, to walk all the counters and zero them out (just like a normal reset would). Now when the counters go off, they do not generate anything and no unknown NMIs are seen. I tested this on a P4 we have in our lab. After two or three crashes, I could normally reproduce the problem. Now after 10 crashes, everything continues to boot correctly. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140120154115.GZ25953@redhat.com [ Fixed a stylistic detail. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-02-09 13:20:35 +01:00
Don Zickus	13beacee81	perf/x86/p4: Fix counter corruption when using lots of perf groups On a P4 box stressing perf with: ./perf record -o perf.data ./perf stat -v ./perf bench all it was noticed that a slew of unknown NMIs would pop out rather quickly. Painfully debugging this ancient platform, led me to notice cross cpu counter corruption. The P4 machine is special in that it has 18 counters, half are used for cpu0 and the other half is for cpu1 (or all 18 if hyperthreading is disabled). But the splitting of the counters has to be actively managed by the software. In this particular bug, one of the cpu0 specific counters was being used by cpu1 and caused all sorts of random unknown nmis. I am not entirely sure on the corruption path, but what happens is: o perf schedules a group with p4_pmu_schedule_events() o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused but for a different cpu, so it 'swaps' the config bits and returns the updated 'assign' array with a _new_ index. o perf schedules another group with p4_pmu_schedule_events() o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused (the same one as above) but for the _same_ cpu [BUG!!], so it updates the 'assign' array to use the _old_ (wrong cpu) index because the _new_ index is in an earlier part of the 'assign' array (and hasn't been committed yet). o perf commits the transaction using the wrong index and corrupts the other cpu The [BUG!!] is because the 'hwc->config' is updated but not the 'hwc->idx'. So the check for 'p4_should_swap_ts()' is correct the first time around but incorrect the second time around (because hwc->config was updated in between). I think the spirit of perf was to not modify anything until all the transactions had a chance to 'test' if they would succeed, and if so, commit atomically. However, P4 breaks this spirit by touching the hwc->config element. So my fix is to continue the un-perf like breakage, by assigning hwc->idx to -1 on swap to tell follow up group scheduling to find a new index. Of course if the transaction fails rolling this back will be difficult, but that is not different than how the current code works. :-) And I wasn't sure how much effort to cleanup the code I should do for a platform that is almost 10 years old by now. Hence the lazy fix. Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1391024270-19469-1-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-02-09 13:17:23 +01:00
Ingo Molnar	3c3d7cb1db	Merge branch 'linus' into perf/core Refresh the branch to a v3.14-rc base before queueing up new devel patches. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-02-09 13:13:45 +01:00
Peter Zijlstra	0e9f2204cf	perf/x86: Fix Userspace RDPMC switch The current code forgets to change the CR4 state on the current CPU. Use on_each_cpu() instead of smp_call_function(). Reported-by: Mark Davies <junk@eslaf.co.uk> Suggested-by: Mark Davies <junk@eslaf.co.uk> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: fweisbec@gmail.com Link: http://lkml.kernel.org/n/tip-69efsat90ibhnd577zy3z9gh@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-02-09 13:08:25 +01:00
Peter Zijlstra	e97df76377	perf/x86/intel/p6: Add userspace RDPMC quirk for PPro PPro machines can die hard when PCE gets enabled due to a CPU erratum. The safe way it so disable it by default and keep it disabled. See erratum 26 in: http://download.intel.com/design/archives/processors/pro/docs/24268935.pdf Reported-and-Tested-by: Mark Davies <junk@eslaf.co.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vince@deater.net> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140206170815.GW2936@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-02-09 13:08:24 +01:00
H. Peter Anvin	a3b072cd18	* Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJS9P2pAAoJEC84WcCNIz1VABsP/j0fwiWdaoEn/9p9EUQcBcMn CILuiKI8GoBR+0nH7vm/jSgUKfUxzM6T0+qjoZPVkO/u/qup+JCT5JxoywUyEbfb +wPNIhBgKSwJdWXPd4ZvObA9jknrP6r5sJTsixpESVpVWmcC8egPgkyBFILjokKS BCJqqruHZcXfWaDQTocYErl3T217J7RfnCG1o7yP96g4jteX8EdvIkPjEf2mM83I GXacHLy8IhGhdyPKSXcRqZ0ivhZ2WX1iFQYIY19FhmzBs364WulyXve+oV+l/4h4 Kx7ks4Ob5AlUIizchGNwVz7058F9o/v7m0CezTgi4Q/RrZi34samxbjk/95/Xc60 JSvWkekm3/jOODubad5zj+ZnJmG83/ZlCUuTaqsE47ftbaedSUHBN9QSpm8iHEex n3d4J3AdP/3amcP33kZ5MRALDYIFKb4ZxtDkADqDcXhS56COivGAdZe5hnyCpb/9 RPUDXTOlxfJQK/y2Atcdb400JjJ/Yr9Kew81LRIt0UMZU3dKSh05UZ+a7Ym0yCkt 3k0NNkgsFCZbYTO/Z3aPDcprwU5Lq9UrwjB17U2ev/qK+qRYDzCzSR0XGPrLMRv7 C5Bnov6uCn/0ZG/NlAx8UXK9wdWDsLhp1QkBz+daX3sGwRAS+OiKBv6+l8dqsdOc 1L8PMkTX2rgtELiv4PJ/ =hLCC -----END PGP SIGNATURE----- Merge tag 'efi-urgent' into x86/urgent * Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit). Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-02-07 11:27:30 -08:00
Borislav Petkov	75a1ba5b2c	x86, microcode, AMD: Unify valid container checks For additional coverage, BorisO and friends unknowlingly did swap AMD microcode with Intel microcode blobs in order to see what happens. What did happen on 32-bit was [ 5.722656] BUG: unable to handle kernel paging request at be3a6008 [ 5.722693] IP: [<c106d6b4>] load_microcode_amd+0x24/0x3f0 [ 5.722716] pdpt = 0000000000000000 pde = 0000000000000000 because there was a valid initrd there but without valid microcode in it and the container check happened after the relocated ramdisk handling on 32-bit, which was clearly wrong. While at it, take care of the ramdisk relocation on both 32- and 64-bit as it is done on both. Also, comment what we're doing because this code is a bit tricky. Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1391460104-7261-1-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-02-06 11:11:19 -08:00
Ingo Molnar	2b45e0f9f3	Merge branch 'linus' into x86/urgent Merge in the x86 changes to apply a fix. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-25 09:16:14 +01:00
Mel Gorman	b9a3b4c976	mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridge There was a large ebizzy performance regression that was bisected to commit `611ae8e3` (x86/tlb: enable tlb flush range support for x86). The problem was related to the tlb_flushall_shift tuning for IvyBridge which was altered. The problem is that it is not clear if the tuning values for each CPU family is correct as the methodology used to tune the values is unclear. This patch uses a conservative tlb_flushall_shift value for all CPU families except IvyBridge so the decision can be revisited if any regression is found as a result of this change. IvyBridge is an exception as testing with one methodology determined that the value of 2 is acceptable. Details are in the changelog for the patch "x86: mm: Change tlb_flushall_shift for IvyBridge". One important aspect of this to watch out for is Xen. The original commit log mentioned large performance gains on Xen. It's possible Xen is more sensitive to this value if it flushes small ranges of pages more frequently than workloads on bare metal typically do. Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alex Shi <alex.shi@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-dyzMww3fqugnhbhgo6Gxmtkw@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-25 09:10:44 +01:00
Mel Gorman	f98b7a772a	x86: mm: change tlb_flushall_shift for IvyBridge There was a large performance regression that was bisected to commit `611ae8e3` ("x86/tlb: enable tlb flush range support for x86"). This patch simply changes the default balance point between a local and global flush for IvyBridge. In the interest of allowing the tests to be reproduced, this patch was tested using mmtests 0.15 with the following configurations configs/config-global-dhp__tlbflush-performance configs/config-global-dhp__scheduler-performance configs/config-global-dhp__network-performance Results are from two machines Ivybridge 4 threads: Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz Ivybridge 8 threads: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz Page fault microbenchmark showed nothing interesting. Ebizzy was configured to run multiple iterations and threads. Thread counts ranged from 1 to NR_CPUS2. For each thread count, it ran 100 iterations and each iteration lasted 10 seconds. Ivybridge 4 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 6395.44 ( 0.00%) 6789.09 ( 6.16%) Mean 2 7012.85 ( 0.00%) 8052.16 ( 14.82%) Mean 3 6403.04 ( 0.00%) 6973.74 ( 8.91%) Mean 4 6135.32 ( 0.00%) 6582.33 ( 7.29%) Mean 5 6095.69 ( 0.00%) 6526.68 ( 7.07%) Mean 6 6114.33 ( 0.00%) 6416.64 ( 4.94%) Mean 7 6085.10 ( 0.00%) 6448.51 ( 5.97%) Mean 8 6120.62 ( 0.00%) 6462.97 ( 5.59%) Ivybridge 8 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 7336.65 ( 0.00%) 7787.02 ( 6.14%) Mean 2 8218.41 ( 0.00%) 9484.13 ( 15.40%) Mean 3 7973.62 ( 0.00%) 8922.01 ( 11.89%) Mean 4 7798.33 ( 0.00%) 8567.03 ( 9.86%) Mean 5 7158.72 ( 0.00%) 8214.23 ( 14.74%) Mean 6 6852.27 ( 0.00%) 7952.45 ( 16.06%) Mean 7 6774.65 ( 0.00%) 7536.35 ( 11.24%) Mean 8 6510.50 ( 0.00%) 6894.05 ( 5.89%) Mean 12 6182.90 ( 0.00%) 6661.29 ( 7.74%) Mean 16 6100.09 ( 0.00%) 6608.69 ( 8.34%) Ebizzy hits the worst case scenario for TLB range flushing every time and it shows for these Ivybridge CPUs at least that the default choice is a poor on. The patch addresses the problem. Next was a tlbflush microbenchmark written by Alex Shi at http://marc.info/?l=linux-kernel&m=133727348217113 . It measures access costs while the TLB is being flushed. The expectation is that if there are always full TLB flushes that the benchmark would suffer and it benefits from range flushing There are 320 iterations of the test per thread count. The number of entries is randomly selected with a min of 1 and max of 512. To ensure a reasonably even spread of entries, the full range is broken up into 8 sections and a random number selected within that section. iteration 1, random number between 0-64 iteration 2, random number between 64-128 etc This is still a very weak methodology. When you do not know what are typical ranges, random is a reasonable choice but it can be easily argued that the opimisation was for smaller ranges and an even spread is not representative of any workload that matters. To improve this, we'd need to know the probability distribution of TLB flush range sizes for a set of workloads that are considered "common", build a synthetic trace and feed that into this benchmark. Even that is not perfect because it would not account for the time between flushes but there are limits of what can be reasonably done and still be doing something useful. If a representative synthetic trace is provided then this benchmark could be revisited and the shift values retuned. Ivybridge 4 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 10.50 ( 0.00%) 10.50 ( 0.03%) Mean 2 17.59 ( 0.00%) 17.18 ( 2.34%) Mean 3 22.98 ( 0.00%) 21.74 ( 5.41%) Mean 5 47.13 ( 0.00%) 46.23 ( 1.92%) Mean 8 43.30 ( 0.00%) 42.56 ( 1.72%) Ivybridge 8 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 9.45 ( 0.00%) 9.36 ( 0.93%) Mean 2 9.37 ( 0.00%) 9.70 ( -3.54%) Mean 3 9.36 ( 0.00%) 9.29 ( 0.70%) Mean 5 14.49 ( 0.00%) 15.04 ( -3.75%) Mean 8 41.08 ( 0.00%) 38.73 ( 5.71%) Mean 13 32.04 ( 0.00%) 31.24 ( 2.49%) Mean 16 40.05 ( 0.00%) 39.04 ( 2.51%) For both CPUs, average access time is reduced which is good as this is the benchmark that was used to tune the shift values in the first place albeit it is now known how* the benchmark was used. The scheduler benchmarks were somewhat inconclusive. They showed gains and losses and makes me reconsider how stable those benchmarks really are or if something else might be interfering with the test results recently. Network benchmarks were inconclusive. Almost all results were flat except for netperf-udp tests on the 4 thread machine. These results were unstable and showed large variations between reboots. It is unknown if this is a recent problems but I've noticed before that netperf-udp results tend to vary. Based on these results, changing the default for Ivybridge seems like a logical choice. Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Alex Shi <alex.shi@linaro.org> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-cqnadffh1tiqrshthRj3Esge@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-25 09:10:43 +01:00
Mel Gorman	ec65993443	mm, x86: Account for TLB flushes only when debugging Bisection between 3.11 and 3.12 fingered commit `9824cf97` ("mm: vmstats: tlb flush counters") to cause overhead problems. The counters are undeniably useful but how often do we really need to debug TLB flush related issues? It does not justify taking the penalty everywhere so make it a debugging option. Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Alex Shi <alex.shi@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-XzxjntugxuwpxXhcrxqqh53b@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-25 09:10:41 +01:00
Linus Torvalds	d3bad75a6d	Driver core / sysfs patches for 3.14-rc1 Here's the big driver core and sysfs patch set for 3.14-rc1. There's a lot of work here moving sysfs logic out into a "kernfs" to allow other subsystems to also have a virtual filesystem with the same attributes of sysfs (handle device disconnect, dynamic creation / removal as needed / unneeded, etc. This is primarily being done for the cgroups filesystem, but the goal is to also move debugfs to it when it is ready, solving all of the known issues in that filesystem as well. The code isn't completed yet, but all should be stable now (there is a big section that was reverted due to problems found when testing.) There's also some other smaller fixes, and a driver core addition that allows for a "collection" of objects, that the DRM people will be using soon (it's in this tree to make merges after -rc1 easier.) All of this has been in linux-next with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEABECAAYFAlLdh0cACgkQMUfUDdst+ylv4QCfeDKDgLo4LsaBIIrFSxLoH/c7 UUsAoMPRwA0h8wy+BQcJAg4H4J4maKj3 =0pc0 -----END PGP SIGNATURE----- Merge tag 'driver-core-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core / sysfs patches from Greg KH: "Here's the big driver core and sysfs patch set for 3.14-rc1. There's a lot of work here moving sysfs logic out into a "kernfs" to allow other subsystems to also have a virtual filesystem with the same attributes of sysfs (handle device disconnect, dynamic creation / removal as needed / unneeded, etc) This is primarily being done for the cgroups filesystem, but the goal is to also move debugfs to it when it is ready, solving all of the known issues in that filesystem as well. The code isn't completed yet, but all should be stable now (there is a big section that was reverted due to problems found when testing) There's also some other smaller fixes, and a driver core addition that allows for a "collection" of objects, that the DRM people will be using soon (it's in this tree to make merges after -rc1 easier) All of this has been in linux-next with no reported issues" * tag 'driver-core-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (113 commits) kernfs: associate a new kernfs_node with its parent on creation kernfs: add struct dentry declaration in kernfs.h kernfs: fix get_active failure handling in kernfs_seq_() Revert "kernfs: fix get_active failure handling in kernfs_seq_()" Revert "kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitq" Revert "kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep()" Revert "kernfs: remove KERNFS_REMOVED" Revert "kernfs: restructure removal path to fix possible premature return" Revert "kernfs: invoke kernfs_unmap_bin_file() directly from __kernfs_remove()" Revert "kernfs: remove kernfs_addrm_cxt" Revert "kernfs: make kernfs_get_active() block if the node is deactivated but not removed" Revert "kernfs: implement kernfs_{de\|re}activate[_self]()" Revert "kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers" Revert "pci: use device_remove_file_self() instead of device_schedule_callback()" Revert "scsi: use device_remove_file_self() instead of device_schedule_callback()" Revert "s390: use device_remove_file_self() instead of device_schedule_callback()" Revert "sysfs, driver-core: remove unused {sysfs\|device}_schedule_callback_owner()" Revert "kernfs: remove unnecessary NULL check in __kernfs_remove()" kernfs: remove unnecessary NULL check in __kernfs_remove() drivers/base: provide an infrastructure for componentised subsystems ...	2014-01-20 15:49:44 -08:00
Linus Torvalds	f4bcd8ccdd	Merge branch 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 kernel address space randomization support from Peter Anvin: "This enables kernel address space randomization for x86" * 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, kaslr: Clarify RANDOMIZE_BASE_MAX_OFFSET x86, kaslr: Remove unused including <linux/version.h> x86, kaslr: Use char array to gain sizeof sanity x86, kaslr: Add a circular multiply for better bit diffusion x86, kaslr: Mix entropy sources together as needed x86/relocs: Add percpu fixup for GNU ld 2.23 x86, boot: Rename get_flags() and check_flags() to *_cpuflags() x86, kaslr: Raise the maximum virtual address to -1 GiB on x86_64 x86, kaslr: Report kernel offset on panic x86, kaslr: Select random position from e820 maps x86, kaslr: Provide randomness functions x86, kaslr: Return location from decompress_kernel x86, boot: Move CPU flags out of cpucheck x86, relocs: Add more per-cpu gold special cases	2014-01-20 14:45:50 -08:00
Linus Torvalds	7fe67a1180	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull leftover x86 fixes from Ingo Molnar: "Two leftover fixes that did not make it into v3.13" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Add check for number of available vectors before CPU down x86, cpu, amd: Add workaround for family 16h, erratum 793	2014-01-20 12:11:41 -08:00
Linus Torvalds	fab5669d55	Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: - SCI reporting for other error types not only correctable ones - GHES cleanups - Add the functionality to override error reporting agents as some machines are sporting a new extended error logging capability which, if done properly in the BIOS, makes a corresponding EDAC module redundant - PCIe AER tracepoint severity levels fix - Error path correction for the mce device init - MCE timer fix - Add more flexibility to the error injection (EINJ) debugfs interface * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, mce: Fix mce_start_timer semantics ACPI, APEI, GHES: Cleanup ghes memory error handling ACPI, APEI: Cleanup alignment-aware accesses ACPI, APEI, GHES: Do not report only correctable errors with SCI ACPI, APEI, EINJ: Changes to the ACPI/APEI/EINJ debugfs interface ACPI, eMCA: Combine eMCA/EDAC event reporting priority EDAC, sb_edac: Modify H/W event reporting policy EDAC: Add an edac_report parameter to EDAC PCI, AER: Fix severity usage in aer trace event x86, mce: Call put_device on device_register failure	2014-01-20 12:10:27 -08:00
Linus Torvalds	2bb2c5e235	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loader updates from Ingo Molnar: "There are two main changes in this tree: - AMD microcode early loading fixes - some microcode loader source files reorganization" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode: Move to a proper location x86, microcode, AMD: Fix early ucode loading x86, microcode: Share native MSR accessing variants x86, ramdisk: Export relocated ramdisk VA	2014-01-20 12:07:54 -08:00
Linus Torvalds	5d4863e4cc	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 TLB detection update from Ingo Molnar: "A single change that extends our TLB cache size detection+reporting code" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpu: Detect more TLB configuration	2014-01-20 12:04:45 -08:00
Linus Torvalds	2a0fede97f	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpu, amd: Fix a shadowed variable situation um, x86: Fix vDSO build x86: Delete non-required instances of include <linux/init.h> x86, realmode: Pointer walk cleanups, pull out invariant use of __pa() x86/traps: Clean up error exception handler definitions	2014-01-20 12:03:57 -08:00
Linus Torvalds	a0fa1dd3cd	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: - Add the initial implementation of SCHED_DEADLINE support: a real-time scheduling policy where tasks that meet their deadlines and periodically execute their instances in less than their runtime quota see real-time scheduling and won't miss any of their deadlines. Tasks that go over their quota get delayed (Available to privileged users for now) - Clean up and fix preempt_enable_no_resched() abuse all around the tree - Do sched_clock() performance optimizations on x86 and elsewhere - Fix and improve auto-NUMA balancing - Fix and clean up the idle loop - Apply various cleanups and fixes * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) sched: Fix __sched_setscheduler() nice test sched: Move SCHED_RESET_ON_FORK into attr::sched_flags sched: Fix up attr::sched_priority warning sched: Fix up scheduler syscall LTP fails sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls sched/core: Fix htmldocs warnings sched/deadline: No need to check p if dl_se is valid sched/deadline: Remove unused variables sched/deadline: Fix sparse static warnings m68k: Fix build warning in mac_via.h sched, thermal: Clean up preempt_enable_no_resched() abuse sched, net: Fixup busy_loop_us_clock() sched, net: Clean up preempt_enable_no_resched() abuse sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding sched/preempt, locking: Rework local_bh_{dis,en}able() sched/clock, x86: Avoid a runtime condition in native_sched_clock() sched/clock: Fix up clear_sched_clock_stable() sched/clock, x86: Use a static_key for sched_clock_stable sched/clock: Remove local_irq_disable() from the clocks sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs ...	2014-01-20 10:42:08 -08:00
Ingo Molnar	860fc2f264	Merge branch 'perf/urgent' into perf/core Pick up the latest fixes, refresh the development tree. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-16 09:33:30 +01:00
Robert Richter	bee09ed91c	perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10h On AMD family 10h we see following error messages while waking up from S3 for all non-boot CPUs leading to a failed IBS initialization: Enabling non-boot CPUs ... smpboot: Booting Node 0 Processor 1 APIC 0x1 [Firmware Bug]: cpu 1, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu perf: IBS APIC setup failed on cpu #1 process: Switch to broadcast mode on CPU1 CPU1 is up ... ACPI: Waking up from system sleep state S3 Reason for this is that during suspend the LVT offset for the IBS vector gets lost and needs to be reinialized while resuming. The offset is read from the IBSCTL msr. On family 10h the offset needs to be 1 as offset 0 is used for the MCE threshold interrupt, but firmware assings it for IBS to 0 too. The kernel needs to reprogram the vector. The msr is a readonly node msr, but a new value can be written via pci config space access. The reinitialization is implemented for family 10h in setup_ibs_ctl() which is forced during IBS setup. This patch fixes IBS setup after waking up from S3 by adding resume/supend hooks for the boot cpu which does the offset reinitialization. Marking it as stable to let distros pick up this fix. Signed-off-by: Robert Richter <rric@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> v3.2.. Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1389797849-5565-1-git-send-email-rric.net@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-16 09:19:50 +01:00
Borislav Petkov	d139336700	x86, cpu, amd: Fix a shadowed variable situation Having u32 and struct cpuinfo_x86 * by the same name is not very smart, although it was ok in this case due to the limited scope of u32 c and it being used only once in there. Fix this. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1389786735-16751-1-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-01-15 04:21:45 -08:00
Borislav Petkov	3b56496865	x86, cpu, amd: Add workaround for family 16h, erratum 793 This adds the workaround for erratum 793 as a precaution in case not every BIOS implements it. This addresses CVE-2013-6885. Erratum text: [Revision Guide for AMD Family 16h Models 00h-0Fh Processors, document 51810 Rev. 3.04 November 2013] 793 Specific Combination of Writes to Write Combined Memory Types and Locked Instructions May Cause Core Hang Description Under a highly specific and detailed set of internal timing conditions, a locked instruction may trigger a timing sequence whereby the write to a write combined memory type is not flushed, causing the locked instruction to stall indefinitely. Potential Effect on System Processor core hang. Suggested Workaround BIOS should set MSR C001_1020[15] = 1b. Fix Planned No fix planned [ hpa: updated description, fixed typo in MSR name ] Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnic Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-01-14 16:39:07 -08:00
Borislav Petkov	bad5fa631f	x86, microcode: Move to a proper location We've grown a bunch of microcode loader files all prefixed with "microcode_". They should be under cpu/ because this is strictly CPU-related functionality so do that and drop the prefix since they're in their own directory now which gives that prefix. :) While at it, drop MICROCODE_INTEL_LIB config item and stash the functionality under CONFIG_MICROCODE_INTEL as it was its only user. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>	2014-01-13 20:00:12 +01:00
Peter Zijlstra	35af99e646	sched/clock, x86: Use a static_key for sched_clock_stable In order to avoid the runtime condition and variable load turn sched_clock_stable into a static_key. Also provide a shorter implementation of local_clock() and cpu_clock(int) when sched_clock_stable==1. MAINLINE PRE POST sched_clock_stable: 1 1 1 (cold) sched_clock: 329841 221876 215295 (cold) local_clock: 301773 234692 220773 (warm) sched_clock: 38375 25602 25659 (warm) local_clock: 100371 33265 27242 (warm) rdtsc: 27340 24214 24208 sched_clock_stable: 0 0 0 (cold) sched_clock: 382634 235941 237019 (cold) local_clock: 396890 297017 294819 (warm) sched_clock: 38194 25233 25609 (warm) local_clock: 143452 71234 71232 (warm) rdtsc: 27345 24245 24243 Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 15:13:13 +01:00
Peter Zijlstra	20d1c86a57	sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs Use a ring-buffer like multi-version object structure which allows always having a coherent object; we use this to avoid having to disable IRQs while reading sched_clock() and avoids a problem when getting an NMI while changing the cyc2ns data. MAINLINE PRE POST sched_clock_stable: 1 1 1 (cold) sched_clock: 329841 331312 257223 (cold) local_clock: 301773 310296 309889 (warm) sched_clock: 38375 38247 25280 (warm) local_clock: 100371 102713 85268 (warm) rdtsc: 27340 27289 24247 sched_clock_stable: 0 0 0 (cold) sched_clock: 382634 372706 301224 (cold) local_clock: 396890 399275 399870 (warm) sched_clock: 38194 38124 25630 (warm) local_clock: 143452 148698 129629 (warm) rdtsc: 27345 27365 24307 Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-s567in1e5ekq2nlyhn8f987r@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 15:13:06 +01:00
Ingo Molnar	b769e014f3	SCI reporting for other error types not only correctable ones + APEI GHES cleanups + mce timer fix -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iQIcBAABAgAGBQJS0qbXAAoJEBLB8Bhh3lVK0G4P/1fjGcBzCiC4qp0vUzhdvu1d CXVTtF20fdGP+hgwVU/1DLuIeUhyBLFM57rPgz0FxfvPo/bTYq92Lw9sElQpiVad trIRpw3bemjDY/8E91vR94SqLKTddoJifldK5ZTUpRJN9up06MwLky4IurdL2ixE QLaI20ZQIDjxe+pmh4wHtyV5qPdtkiXY2ICcdTXwAW7RdiG7pXXd5gLljL4oFUMF bp0QzM074szkDvhBgZg6KTAy1zfNzZVM9hUBOAesi1tfXqkT7pCEbUkYKZ6QFNVV haQNt+RtkVvYcFcZK+9TkRM32KHSy1MVuicv2W6eNvRDtauGFtSEwBBuvyq3Imjb PTY6Vd5g/hR/+o968ieYUYdFua4xaza2wqC3mwL6WuNoWGzzb/f4dk9+wU/x0Khu th1NMSwy7VfNqi8pbTTy13LG1yjqqorrMLAEAfsNAMjZPIPsBREFYu8J5kko4ird 1aEsaENUdkzgoDEUxwO7vQEfAL3RBOC8kr4SrBGd5jQtbN4SYRyP0bYVIsQL3Psz fo0H+9sVLwkSxcx7MM0bboBcv0NYazvTS5aIiivSdwMq0uxDsRWRmUCr70AlJDp9 h7EfyovynCpfOcAOIz212A9bRzBcfW46KnQm6xkcgVdKKhEbc5EWhx6Nz3kCMNQZ Or1B6ZtUe7Acdg2kRTRL =+eFf -----END PGP SIGNATURE----- Merge tag 'ras_for_3.14_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull RAS updates from Borislav Petkov: " SCI reporting for other error types not only correctable ones + APEI GHES cleanups + mce timer fix " Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 17:56:54 +01:00
Ingo Molnar	da4540757d	Linux 3.13-rc8 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJS0miqAAoJEHm+PkMAQRiGbfgIAJSWEfo8ludknhPcHJabBtxu 75SQAKJlL3sBVnxEc58Rtt8gsKYQIrm4IY5Slunklsn04RxuDUIQMgFoAYR5gQwz +Myqkw/HOqDe5VStGxtLYpWnfglxVwGDCd7ISfL9AOVy5adMWBxh4Tv+qqQc7aIZ eF7dy+DD+C6Q3Z5OoV8s0FZDxse29vOf17Nki7+7t8WMqyegYwjoOqNeqocGKsPi eHLrJgTl4T6jB4l9LKKC154DSKjKOTSwZMWgwK8mToyNLT/ufCiKgXloIjEvZZcY VVKUtncdHiTf+iqVojgpGBzOEeB5DM83iiapFeDiJg8C9yBzvT8lBtA9aPb5Wgw= =lEeV -----END PGP SIGNATURE----- Merge tag 'v3.13-rc8' into x86/ras, to pick up fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 17:56:29 +01:00
Borislav Petkov	4f75d84127	x86, mce: Fix mce_start_timer semantics So mce_start_timer() has a 'cpu' argument which is supposed to mean to start a timer on that cpu. However, the code currently starts a timer on the current cpu the function runs on and causes the sanity-check in mce_timer_fn to fire: WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:1286 mce_timer_fn because it is running on the wrong cpu. This was triggered by Prarit Bhargava <prarit@redhat.com> by offlining all the cpus in succession. Then, we were fiddling with the CMCI storm settings when starting the timer whereas there's no need for that - if there's storm happening on this newly restarted cpu, we're going to be in normal CMCI mode initially and then when the CMCI interrupt starts firing, we're going to go to the polling mode with the timer real soon. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Prarit Bhargava <prarit@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Reviewed-by: Chen, Gong <gong.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1387722156-5511-1-git-send-email-prarit@redhat.com	2014-01-12 15:22:25 +01:00
Stephane Eranian	f228c5b882	perf/x86/intel: Add Intel RAPL PP1 energy counter support This patch adds support for the Intel RAPL energy counter PP1 (Power Plane 1). On client processors, it usually corresponds to the energy consumption of the builtin graphic card. That is why the sysfs event is called energy-gpu. New event: - name: power/energy-gpu/ - code: event=0x4 - unit: 2^-32 Joules On processors without graphics, this should count 0. The patch only enables this event on client processors. Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: Stephane Eranian <eranian@google.com> Cc: ak@linux.intel.com Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: vincent.weaver@maine.edu Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389176153-3128-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 10:16:08 +01:00
Paul Gortmaker	663b55b9b3	x86: Delete non-required instances of include <linux/init.h> None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. [ hpa: undid incorrect removal from arch/x86/kernel/head_32.S ] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/1389054026-12947-1-git-send-email-paul.gortmaker@windriver.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-01-06 21:25:18 -08:00
Kirill A. Shutemov	dd360393f4	x86, cpu: Detect more TLB configuration The Intel Software Developer’s Manual covers few more TLB configurations exposed as CPUID 2 descriptors: 61H Instruction TLB: 4 KByte pages, fully associative, 48 entries 63H Data TLB: 1 GByte pages, 4-way set associative, 4 entries 76H Instruction TLB: 2M/4M pages, fully associative, 8 entries B5H Instruction TLB: 4KByte pages, 8-way set associative, 64 entries B6H Instruction TLB: 4KByte pages, 8-way set associative, 128 entries C1H Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries C2H DTLB DTLB: 2 MByte/$MByte pages, 4-way associative, 16 entries Let's detect them as well. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/1387801018-14499-1-git-send-email-kirill.shutemov@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-01-03 14:35:42 -08:00
Linus Torvalds	8cf126d927	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "There is a small EFI fix and a big power regression fix in this batch. My queue also had a fix for downing a CPU when there are insufficient number of IRQ vectors available, but I'm holding that one for now due to recent bug reports" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/efi: Don't select EFI from certain special ACPI drivers x86 idle: Repair large-server 50-watt idle-power regression	2013-12-29 13:35:04 -08:00
Chen, Gong	addccbb264	ACPI, APEI, GHES: Do not report only correctable errors with SCI Currently SCI is employed to handle corrected errors - memory corrected errors, more specifically but in fact SCI still can be used to handle any errors, e.g. uncorrected or even fatal ones if enabled by the BIOS. Enable logging for those kinds of errors too. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1385363701-12387-1-git-send-email-gong.chen@linux.intel.com [ Boris: massage commit message, rename function arg. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2013-12-21 13:31:06 +01:00
Len Brown	40e2d7f9b5	x86 idle: Repair large-server 50-watt idle-power regression Linux 3.10 changed the timing of how thread_info->flags is touched: x86: Use generic idle loop (`7d1a941731`) This caused Intel NHM-EX and WSM-EX servers to experience a large number of immediate MONITOR/MWAIT break wakeups, which caused cpuidle to demote from deep C-states to shallow C-states, which caused these platforms to experience a significant increase in idle power. Note that this issue was already present before the commit above, however, it wasn't seen often enough to be noticed in power measurements. Here we extend an errata workaround from the Core2 EX "Dunnington" to extend to NHM-EX and WSM-EX, to prevent these immediate returns from MWAIT, reducing idle power on these platforms. While only acpi_idle ran on Dunnington, intel_idle may also run on these two newer systems. As of today, there are no other models that are known to need this tweak. Link: http://lkml.kernel.org/r/CAJvTdK=%2BaNN66mYpCGgbHGCHhYQAKx-vB0kJSWjVpsNb_hOAtQ@mail.gmail.com Signed-off-by: Len Brown <len.brown@intel.com> Link: http://lkml.kernel.org/r/baff264285f6e585df757d58b17788feabc68918.1387403066.git.len.brown@intel.com Cc: <stable@vger.kernel.org> # 3.12.x, 3.11.x, 3.10.x Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-12-19 11:47:39 -08:00
Stephane Eranian	7fd565e275	perf/x86: enable Haswell Celeron RAPL support Enable RAPL support for Haswell Celeron (model 69). Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: ak@linux.intel.com Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: vincent.weaver@maine.edu Cc: maria.n.dimakopoulou@gmail.com Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1387225224-27799-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-17 15:21:32 +01:00
Maria Dimakopoulou	cf30d52e2d	perf/x86: Fix constraint table end marker bug The EVENT_CONSTRAINT_END() macro defines the end marker as a constraint with a weight of zero. This was all fine until we blacklisted the corrupting memory events on Intel IvyBridge. These events are blacklisted by using a counter bitmask of zero. Thus, they also get a constraint weight of zero. The iteration macro: for_each_constraint tests the weight==0. Therefore, it was stopping at the first blacklisted event, i.e., 0xd0. The corrupting events were therefore considered as unconstrained and were scheduled on any of the generic counters. This patch fixes the end marker to have a weight of -1. With this, the blacklisted events get an empty constraint and cannot be scheduled which is what we want for now. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: ak@linux.intel.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Link: http://lkml.kernel.org/r/20131204232437.GA10689@starlight Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-05 10:02:30 +01:00
Levente Kurusa	853d9b18f1	x86, mce: Call put_device on device_register failure This patch adds a call to put_device() when the device_register() call has failed. This is required so that the last reference to the device is given up. Signed-off-by: Levente Kurusa <levex@linux.com> Link: http://lkml.kernel.org/r/5298F900.9000208@linux.com Signed-off-by: Borislav Petkov <bp@suse.de>	2013-11-30 12:14:51 +01:00
Stephane Eranian	65661f96d3	perf/x86: Add RAPL hrtimer support The RAPL PMU counters do not interrupt on overflow. Therefore, the kernel needs to poll the counters to avoid missing an overflow. This patch adds the hrtimer code to do this. The timer interval is calculated at boot time based on the power unit used by the HW. There is one hrtimer per-cpu to handle the case of multiple simultaneous use across cores on the same package + hotplug CPU. Thanks to Maria Dimakopoulou for her contributions to this patch especially on the math aspects. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> [ Applied 32-bit build fix. ] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1384275531-10892-5-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-11-27 15:31:23 +01:00
Stephane Eranian	4788e5b4b2	perf/x86: Add Intel RAPL PMU support This patch adds a new uncore PMU to expose the Intel RAPL energy consumption counters. Up to 3 counters, each counting a particular RAPL event are exposed. The RAPL counters are available on Intel SandyBridge, IvyBridge, Haswell. The server skus add a 3rd counter. The following events are available and exposed in sysfs: - power/energy-cores: power consumption of all cores on socket - power/energy-pkg: power consumption of all cores + LLc cache - power/energy-dram: power consumption of DRAM (servers only) For each event both the unit (Joules) and scale (2^-32 J) is exposed in sysfs for use by perf stat and other tools. The files are: /sys/devices/power/events/energy-.unit /sys/devices/power/events/energy-.scale The RAPL PMU is uncore by nature and is implemented such that it only works in system-wide mode. Measuring only one CPU per socket is sufficient. The /sys/devices/power/cpumask file can be used by tools to figure out which CPUs to monitor by default. For instance, on a 2-socket system, 2 CPUs (one on each socket) will be shown. All the counters measure in the same unit (exposed via sysfs). The perf_events API exposes all RAPL counters as 64-bit integers counting in unit of 1/2^32 Joules (about 0.23 nJ). User level tools must convert the counts by multiplying them by 2^-32 to obtain Joules. The reason for this is that the kernel avoids doing floating point math whenever possible because it is expensive (user floating-point state must be saved). The method used avoids kernel floating-point usage. There is no loss of precision. Thanks to PeterZ for suggesting this approach. To convert the raw count in Watt: W = C * 2.3 / (1e10 * time) or ldexp(C, -32). RAPL PMU is a new standalone PMU which registers with the perf_event core subsystem. The PMU type (attr->type) is dynamically allocated and is available from /sys/device/power/type. Sampling is not supported by the RAPL PMU. There is no privilege level filtering either. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Link: http://lkml.kernel.org/r/1384275531-10892-4-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-11-27 11:16:40 +01:00
Linus Torvalds	9073e1a804	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual earth-shaking, news-breaking, rocket science pile from trivial.git" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits) doc: usb: Fix typo in Documentation/usb/gadget_configs.txt doc: add missing files to timers/00-INDEX timekeeping: Fix some trivial typos in comments mm: Fix some trivial typos in comments irq: Fix some trivial typos in comments NUMA: fix typos in Kconfig help text mm: update 00-INDEX doc: Documentation/DMA-attributes.txt fix typo DRM: comment: `halve' -> `half' Docs: Kconfig: `devlopers' -> `developers' doc: typo on word accounting in kprobes.c in mutliple architectures treewide: fix "usefull" typo treewide: fix "distingush" typo mm/Kconfig: Grammar s/an/a/ kexec: Typo s/the/then/ Documentation/kvm: Update cpuid documentation for steal time and pv eoi treewide: Fix common typo in "identify" __page_to_pfn: Fix typo in comment Correct some typos for word frequency clk: fixed-factor: Fix a trivial typo ...	2013-11-15 16:47:22 -08:00
Linus Torvalds	340286cd4e	Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: "The biggest change adds support for Intel 'CPER' (UEFI Common Platform Error Record) error logging, which builds upon an enhanced error logging mechanism available on Xeon processors. Full description is here: http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html This change provides a module (and support code) to check for an extended error log and prints extra details about the error on the console" * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ACPI, x86: Fix extended error log driver to depend on CONFIG_X86_LOCAL_APIC dmi: Avoid unaligned memory access in save_mem_devices() Move cper.c from drivers/acpi/apei to drivers/firmware/efi EDAC, GHES: Update ghes error record info ACPI, APEI, CPER: Cleanup CPER memory error output format ACPI, APEI, CPER: Enhance memory reporting capability ACPI, APEI, CPER: Add UEFI 2.4 support for memory error DMI: Parse memory device (type 17) in SMBIOS ACPI, x86: Extended error log driver for x86 platform bitops: Introduce a more generic BITMASK macro ACPI, CPER: Update cper info ACPI, APEI, CPER: Fix status check during error printing	2013-11-12 11:16:44 +09:00
Linus Torvalds	2dc1733fd4	Merge branch 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/hyperv changes from Ingo Molnar: "These changes enable Linux guests to boot as 'Modern VM' guest kernels on MS-Hyperv hosts" * 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, hyperv: Move a variable to avoid an unused variable warning x86, hyperv: Fix build error due to missing <asm/apic.h> include x86, hyperv: Correctly guard the local APIC calibration code x86, hyperv: Get the local APIC timer frequency from the hypervisor	2013-11-12 11:07:49 +09:00
Linus Torvalds	6df1e7f2e9	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu changes from Ingo Molnar: "The biggest change that stands out is the increase of the CONFIG_NR_CPUS range from 4096 to 8192 - as real hardware out there already went beyond 4k CPUs ... We only allow more than 512 CPUs if offstack cpumasks are enabled. CONFIG_MAXSMP=y remains to be the 'you are nuts!' extreme testcase, which now means a max of 8192 CPUs" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Increase max CPU count to 8192 x86/cpu: Allow higher NR_CPUS values x86/cpu: Always print SMP information in /proc/cpuinfo x86/cpu: Track legacy CPU model data only on 32-bit kernels	2013-11-12 10:46:43 +09:00
Linus Torvalds	39cf275a1a	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The main changes in this cycle are: - (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik van Riel, Peter Zijlstra et al. Yay! - optimize preemption counter handling: merge the NEED_RESCHED flag into the preempt_count variable, by Peter Zijlstra. - wait.h fixes and code reorganization from Peter Zijlstra - cfs_bandwidth fixes from Ben Segall - SMP load-balancer cleanups from Peter Zijstra - idle balancer improvements from Jason Low - other fixes and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits) ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED stop_machine: Fix race between stop_two_cpus() and stop_cpus() sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus sched: Fix asymmetric scheduling for POWER7 sched: Move completion code from core.c to completion.c sched: Move wait code from core.c to wait.c sched: Move wait.c into kernel/sched/ sched/wait: Fix __wait_event_interruptible_lock_irq_timeout() sched: Avoid throttle_cfs_rq() racing with period_timer stopping sched: Guarantee new group-entities always have weight sched: Fix hrtimer_cancel()/rq->lock deadlock sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining sched: Fix race on toggling cfs_bandwidth_used sched: Remove extra put_online_cpus() inside sched_setaffinity() sched/rt: Fix task_tick_rt() comment sched/wait: Fix build breakage sched/wait: Introduce prepare_to_wait_event() sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too sched: Remove get_online_cpus() usage sched: Fix race in migrate_swap_stop() ...	2013-11-12 10:20:12 +09:00
H. Peter Anvin	4c08edd305	x86, hyperv: Move a variable to avoid an unused variable warning The variable hv_lapic_frequency causes an unused variable warning if CONFIG_X86_LOCAL_APIC is disabled. Since the variable is only used inside a small if statement, move the declaration of that variable into the if statement itself. Cc: K. Y. Srinivasan <kys@microsoft.com> Link: http://lkml.kernel.org/r/1381444224-3303-1-git-send-email-kys@microsoft.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-11-06 10:02:05 -08:00
Yan, Zheng	f891d8cfb8	perf/x86/intel: Add Ivy Bridge-EP uncore IRP box support Unlike other uncore boxes, IRP boxes live in PCI buses with no UBOX device. For PCI bus without UBOX device, we find the next bus that has UBOX device and use its 'bus to socket' mapping. Besides the counter/control registers in IRP boxes are not properly aligned. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: eranian@google.com Cc: "Yan Zheng" <zheng.z.yan@intel.com> Link: http://lkml.kernel.org/r/1383197815-17706-2-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-11-06 12:34:31 +01:00
Yan, Zheng	d1e8f4a836	perf/x86/intel/uncore: Add filter support for IvyBridge-EP QPI boxes The encoding for filter registers of IvyBridge-EP uncore QPI boxes is completely the same as SandyBridge-EP. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: eranian@google.com Cc: "Yan Zheng" <zheng.z.yan@intel.com> Link: http://lkml.kernel.org/r/1383197815-17706-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-11-06 12:34:29 +01:00

1 2 3 4 5 ...

2689 Commits