linux/arch/x86/events/intel
Andi Kleen af3bdb991a perf/x86/intel: Add a separate Arch Perfmon v4 PMI handler
Implements counter freezing for Arch Perfmon v4 (Skylake and
newer). This allows to speed up the PMI handler by avoiding
unnecessary MSR writes and make it more accurate.

The Arch Perfmon v4 PMI handler is substantially different than
the older PMI handler.

Differences to the old handler:

- It relies on counter freezing, which eliminates several MSR
  writes from the PMI handler and lowers the overhead significantly.

  It makes the PMI handler more accurate, as all counters get
  frozen atomically as soon as any counter overflows. So there is
  much less counting of the PMI handler itself.

  With the freezing we don't need to disable or enable counters or
  PEBS. Only BTS which does not support auto-freezing still needs to
  be explicitly managed.

- The PMU acking is done at the end, not the beginning.
  This makes it possible to avoid manual enabling/disabling
  of the PMU, instead we just rely on the freezing/acking.

- The APIC is acked before reenabling the PMU, which avoids
  problems with LBRs occasionally not getting unfreezed on Skylake.

- Looping is only needed to workaround a corner case which several PMIs
  are very close to each other. For common cases, the counters are freezed
  during PMI handler. It doesn't need to do re-check.

This patch:

- Adds code to enable v4 counter freezing
- Fork <=v3 and >=v4 PMI handlers into separate functions.
- Add kernel parameter to disable counter freezing. It took some time to
  debug counter freezing, so in case there are new problems we added an
  option to turn it off. Would not expect this to be used until there
  are new bugs.
- Only for big core. The patch for small core will be posted later
  separately.

Performance:

When profiling a kernel build on Kabylake with different perf options,
measuring the length of all NMI handlers using the nmi handler
trace point:

V3 is without counter freezing.
V4 is with counter freezing.
The value is the average cost of the PMI handler.
(lower is better)

perf options    `           V3(ns) V4(ns)  delta
-c 100000                   1088   894     -18%
-g -c 100000                1862   1646    -12%
--call-graph lbr -c 100000  3649   3367    -8%
--c.g. dwarf -c 100000      2248   1982    -12%

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Link: http://lkml.kernel.org/r/1533712328-2834-2-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02 10:14:31 +02:00
..
bts.c x86,perf: Disable intel_bts when PTI 2018-01-14 11:42:10 +01:00
core.c perf/x86/intel: Add a separate Arch Perfmon v4 PMI handler 2018-10-02 10:14:31 +02:00
cstate.c perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msr 2018-05-05 08:37:31 +02:00
ds.c perf/x86/intel: Support Extended PEBS for Goldmont Plus 2018-07-25 11:50:50 +02:00
knc.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
lbr.c perf/x86/intel: Add support/quirk for the MISPREDICT bit on Knights Landing CPUs 2018-09-10 10:03:01 +02:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
p4.c x86: Mark various structures and functions as 'static' 2017-08-11 14:49:43 +02:00
p6.c x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping 2018-02-15 01:15:52 +01:00
pt.c perf/x86/intel/pt: Annotate 'pt_cap_group' with __ro_after_init 2018-09-12 21:16:16 +02:00
pt.h perf/x86/intel/pt: Allow the disabling of branch tracing 2017-03-30 09:53:49 +02:00
rapl.c perf/x86/intel: Add Cannon Lake support for RAPL profiling 2018-03-31 11:28:36 +02:00
uncore_nhmex.c perf/x86/intel/uncore: Correct fixed counter index check for NHM 2018-05-31 12:36:28 +02:00
uncore_snb.c perf/x86/intel/uncore: Clean up client IMC uncore 2018-05-31 12:36:29 +02:00
uncore_snbep.c perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX 2018-10-02 09:38:02 +02:00
uncore.c treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
uncore.h perf/x86/intel/uncore: Fix hardcoded index of Broadwell extra PCI devices 2018-07-31 07:43:37 +02:00