2009-04-20 13:37:32 +00:00
|
|
|
/*
|
2009-06-02 21:37:05 +00:00
|
|
|
* builtin-stat.c
|
|
|
|
*
|
|
|
|
* Builtin stat command: Give a precise performance counters summary
|
|
|
|
* overview about any workload, CPU or specific PID.
|
|
|
|
*
|
|
|
|
* Sample output:
|
2009-04-20 13:37:32 +00:00
|
|
|
|
perf stat: Add -d -d and -d -d -d options to show more CPU events
Print even more detailed statistics if requested via perf stat -d:
-d: detailed events, L1 and LLC data cache
-d -d: more detailed events, dTLB and iTLB events
-d -d -d: very detailed events, adding prefetch events
Full output looks like this now:
Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% )
49,068 context-switches # 0.029 M/sec ( +- 16.66% )
8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% )
17,397 page-faults # 0.010 M/sec ( +- 0.46% )
2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%]
1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%]
743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%]
1,314,416,379 instructions # 0.56 insns per cycle
# 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%]
272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%]
3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%]
449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%]
22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%]
6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%]
1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%]
411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%]
27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%]
464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%]
10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%]
1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%]
117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%]
4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%]
1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%]
0.195622057 seconds time elapsed ( +- 6.84% )
Also clean up the attribute construction code to be appending, and factor
it out into add_default_attributes().
Tweak the coverage percentage printout a bit, so that it's easier to view it
alongside the +- sttddev colum.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 11:30:56 +00:00
|
|
|
$ perf stat ./hackbench 10
|
2009-04-20 13:37:32 +00:00
|
|
|
|
perf stat: Add -d -d and -d -d -d options to show more CPU events
Print even more detailed statistics if requested via perf stat -d:
-d: detailed events, L1 and LLC data cache
-d -d: more detailed events, dTLB and iTLB events
-d -d -d: very detailed events, adding prefetch events
Full output looks like this now:
Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% )
49,068 context-switches # 0.029 M/sec ( +- 16.66% )
8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% )
17,397 page-faults # 0.010 M/sec ( +- 0.46% )
2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%]
1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%]
743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%]
1,314,416,379 instructions # 0.56 insns per cycle
# 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%]
272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%]
3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%]
449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%]
22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%]
6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%]
1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%]
411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%]
27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%]
464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%]
10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%]
1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%]
117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%]
4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%]
1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%]
0.195622057 seconds time elapsed ( +- 6.84% )
Also clean up the attribute construction code to be appending, and factor
it out into add_default_attributes().
Tweak the coverage percentage printout a bit, so that it's easier to view it
alongside the +- sttddev colum.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 11:30:56 +00:00
|
|
|
Time: 0.118
|
2009-04-20 13:37:32 +00:00
|
|
|
|
perf stat: Add -d -d and -d -d -d options to show more CPU events
Print even more detailed statistics if requested via perf stat -d:
-d: detailed events, L1 and LLC data cache
-d -d: more detailed events, dTLB and iTLB events
-d -d -d: very detailed events, adding prefetch events
Full output looks like this now:
Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% )
49,068 context-switches # 0.029 M/sec ( +- 16.66% )
8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% )
17,397 page-faults # 0.010 M/sec ( +- 0.46% )
2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%]
1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%]
743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%]
1,314,416,379 instructions # 0.56 insns per cycle
# 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%]
272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%]
3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%]
449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%]
22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%]
6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%]
1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%]
411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%]
27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%]
464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%]
10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%]
1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%]
117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%]
4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%]
1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%]
0.195622057 seconds time elapsed ( +- 6.84% )
Also clean up the attribute construction code to be appending, and factor
it out into add_default_attributes().
Tweak the coverage percentage printout a bit, so that it's easier to view it
alongside the +- sttddev colum.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 11:30:56 +00:00
|
|
|
Performance counter stats for './hackbench 10':
|
2009-04-20 13:37:32 +00:00
|
|
|
|
perf stat: Add -d -d and -d -d -d options to show more CPU events
Print even more detailed statistics if requested via perf stat -d:
-d: detailed events, L1 and LLC data cache
-d -d: more detailed events, dTLB and iTLB events
-d -d -d: very detailed events, adding prefetch events
Full output looks like this now:
Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% )
49,068 context-switches # 0.029 M/sec ( +- 16.66% )
8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% )
17,397 page-faults # 0.010 M/sec ( +- 0.46% )
2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%]
1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%]
743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%]
1,314,416,379 instructions # 0.56 insns per cycle
# 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%]
272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%]
3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%]
449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%]
22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%]
6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%]
1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%]
411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%]
27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%]
464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%]
10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%]
1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%]
117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%]
4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%]
1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%]
0.195622057 seconds time elapsed ( +- 6.84% )
Also clean up the attribute construction code to be appending, and factor
it out into add_default_attributes().
Tweak the coverage percentage printout a bit, so that it's easier to view it
alongside the +- sttddev colum.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 11:30:56 +00:00
|
|
|
1708.761321 task-clock # 11.037 CPUs utilized
|
|
|
|
41,190 context-switches # 0.024 M/sec
|
|
|
|
6,735 CPU-migrations # 0.004 M/sec
|
|
|
|
17,318 page-faults # 0.010 M/sec
|
|
|
|
5,205,202,243 cycles # 3.046 GHz
|
|
|
|
3,856,436,920 stalled-cycles-frontend # 74.09% frontend cycles idle
|
|
|
|
1,600,790,871 stalled-cycles-backend # 30.75% backend cycles idle
|
|
|
|
2,603,501,247 instructions # 0.50 insns per cycle
|
|
|
|
# 1.48 stalled cycles per insn
|
|
|
|
484,357,498 branches # 283.455 M/sec
|
|
|
|
6,388,934 branch-misses # 1.32% of all branches
|
|
|
|
|
|
|
|
0.154822978 seconds time elapsed
|
2009-04-20 13:37:32 +00:00
|
|
|
|
2009-05-26 07:17:18 +00:00
|
|
|
*
|
perf stat: Add -d -d and -d -d -d options to show more CPU events
Print even more detailed statistics if requested via perf stat -d:
-d: detailed events, L1 and LLC data cache
-d -d: more detailed events, dTLB and iTLB events
-d -d -d: very detailed events, adding prefetch events
Full output looks like this now:
Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% )
49,068 context-switches # 0.029 M/sec ( +- 16.66% )
8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% )
17,397 page-faults # 0.010 M/sec ( +- 0.46% )
2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%]
1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%]
743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%]
1,314,416,379 instructions # 0.56 insns per cycle
# 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%]
272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%]
3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%]
449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%]
22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%]
6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%]
1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%]
411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%]
27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%]
464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%]
10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%]
1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%]
117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%]
4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%]
1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%]
0.195622057 seconds time elapsed ( +- 6.84% )
Also clean up the attribute construction code to be appending, and factor
it out into add_default_attributes().
Tweak the coverage percentage printout a bit, so that it's easier to view it
alongside the +- sttddev colum.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 11:30:56 +00:00
|
|
|
* Copyright (C) 2008-2011, Red Hat Inc, Ingo Molnar <mingo@redhat.com>
|
2009-05-26 07:17:18 +00:00
|
|
|
*
|
|
|
|
* Improvements and fixes by:
|
|
|
|
*
|
|
|
|
* Arjan van de Ven <arjan@linux.intel.com>
|
|
|
|
* Yanmin Zhang <yanmin.zhang@intel.com>
|
|
|
|
* Wu Fengguang <fengguang.wu@intel.com>
|
|
|
|
* Mike Galbraith <efault@gmx.de>
|
|
|
|
* Paul Mackerras <paulus@samba.org>
|
2009-06-26 21:32:07 +00:00
|
|
|
* Jaswinder Singh Rajput <jaswinder@kernel.org>
|
2009-05-26 07:17:18 +00:00
|
|
|
*
|
|
|
|
* Released under the GPL v2. (and only v2, not any later version)
|
2009-04-20 13:37:32 +00:00
|
|
|
*/
|
|
|
|
|
2009-05-23 16:28:58 +00:00
|
|
|
#include "perf.h"
|
2009-05-27 07:10:38 +00:00
|
|
|
#include "builtin.h"
|
2014-10-17 15:17:40 +00:00
|
|
|
#include "util/cgroup.h"
|
2009-04-27 06:02:14 +00:00
|
|
|
#include "util/util.h"
|
2015-12-15 15:39:39 +00:00
|
|
|
#include <subcmd/parse-options.h>
|
2009-05-26 07:17:18 +00:00
|
|
|
#include "util/parse-events.h"
|
2013-08-21 23:47:26 +00:00
|
|
|
#include "util/pmu.h"
|
2009-08-16 20:05:48 +00:00
|
|
|
#include "util/event.h"
|
2011-01-11 22:56:53 +00:00
|
|
|
#include "util/evlist.h"
|
2011-01-03 18:39:04 +00:00
|
|
|
#include "util/evsel.h"
|
2009-08-16 20:05:48 +00:00
|
|
|
#include "util/debug.h"
|
2011-04-27 03:39:24 +00:00
|
|
|
#include "util/color.h"
|
2012-09-17 08:31:14 +00:00
|
|
|
#include "util/stat.h"
|
2009-12-31 08:05:50 +00:00
|
|
|
#include "util/header.h"
|
perf tools: Fix sparse CPU numbering related bugs
At present, the perf subcommands that do system-wide monitoring
(perf stat, perf record and perf top) don't work properly unless
the online cpus are numbered 0, 1, ..., N-1. These tools ask
for the number of online cpus with sysconf(_SC_NPROCESSORS_ONLN)
and then try to create events for cpus 0, 1, ..., N-1.
This creates problems for systems where the online cpus are
numbered sparsely. For example, a POWER6 system in
single-threaded mode (i.e. only running 1 hardware thread per
core) will have only even-numbered cpus online.
This fixes the problem by reading the /sys/devices/system/cpu/online
file to find out which cpus are online. The code that does that is in
tools/perf/util/cpumap.[ch], and consists of a read_cpu_map()
function that sets up a cpumap[] array and returns the number of
online cpus. If /sys/devices/system/cpu/online can't be read or
can't be parsed successfully, it falls back to using sysconf to
ask how many cpus are online and sets up an identity map in cpumap[].
The perf record, perf stat and perf top code then calls
read_cpu_map() in the system-wide monitoring case (instead of
sysconf) and uses cpumap[] to get the cpu numbers to pass to
perf_event_open.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
LKML-Reference: <20100310093609.GA3959@brick.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-10 09:36:09 +00:00
|
|
|
#include "util/cpumap.h"
|
2010-03-18 14:36:05 +00:00
|
|
|
#include "util/thread.h"
|
2011-01-18 17:15:24 +00:00
|
|
|
#include "util/thread_map.h"
|
2015-08-07 10:51:03 +00:00
|
|
|
#include "util/counts.h"
|
2009-04-20 13:37:32 +00:00
|
|
|
|
2012-10-23 11:40:14 +00:00
|
|
|
#include <stdlib.h>
|
2009-04-20 13:37:32 +00:00
|
|
|
#include <sys/prctl.h>
|
perf stat: add perf stat -B to pretty print large numbers
It is hard to read very large numbers so provide an option to perf stat
to separate thousands using a separator. The patch leverages the locale
support of stdio. You need to set your LC_NUMERIC appropriately, for
instance LC_NUMERIC=en_US.UTF8. You need to pass -B to activate this
feature. This way existing scripts parsing the output do not need to be
changed. Here is an example.
$ perf stat noploop 2
noploop for 2 seconds
Performance counter stats for 'noploop 2':
1998.347031 task-clock-msecs # 0.998 CPUs
61 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
118 page-faults # 0.000 M/sec
4,138,410,900 cycles # 2070.917 M/sec (scaled from 70.01%)
2,062,650,268 instructions # 0.498 IPC (scaled from 70.01%)
2,057,653,466 branches # 1029.678 M/sec (scaled from 70.01%)
40,267 branch-misses # 0.002 % (scaled from 30.04%)
2,055,961,348 cache-references # 1028.831 M/sec (scaled from 30.03%)
53,725 cache-misses # 0.027 M/sec (scaled from 30.02%)
2.001393933 seconds time elapsed
$ perf stat -B noploop 2
noploop for 2 seconds
Performance counter stats for 'noploop 2':
1998.297883 task-clock-msecs # 0.998 CPUs
59 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
119 page-faults # 0.000 M/sec
4,131,380,160 cycles # 2067.450 M/sec (scaled from 70.01%)
2,059,096,507 instructions # 0.498 IPC (scaled from 70.01%)
2,054,681,303 branches # 1028.216 M/sec (scaled from 70.01%)
25,650 branch-misses # 0.001 % (scaled from 30.05%)
2,056,283,014 cache-references # 1029.017 M/sec (scaled from 30.03%)
47,097 cache-misses # 0.024 M/sec (scaled from 30.02%)
2.001391016 seconds time elapsed
Cc: David S. Miller <davem@davemloft.net>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <4bf28fe8.914ed80a.01ca.fffff5f5@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-05-18 13:00:01 +00:00
|
|
|
#include <locale.h>
|
2009-05-05 15:50:27 +00:00
|
|
|
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
#define DEFAULT_SEPARATOR " "
|
2011-05-30 14:55:59 +00:00
|
|
|
#define CNTR_NOT_SUPPORTED "<not supported>"
|
|
|
|
#define CNTR_NOT_COUNTED "<not counted>"
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
|
2015-06-26 09:29:26 +00:00
|
|
|
static void print_counters(struct timespec *ts, int argc, const char **argv);
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
|
2013-08-21 23:47:26 +00:00
|
|
|
/* Default events used for perf stat -T */
|
2015-06-03 14:25:53 +00:00
|
|
|
static const char *transaction_attrs = {
|
|
|
|
"task-clock,"
|
2013-08-21 23:47:26 +00:00
|
|
|
"{"
|
|
|
|
"instructions,"
|
|
|
|
"cycles,"
|
|
|
|
"cpu/cycles-t/,"
|
|
|
|
"cpu/tx-start/,"
|
|
|
|
"cpu/el-start/,"
|
|
|
|
"cpu/cycles-ct/"
|
|
|
|
"}"
|
|
|
|
};
|
|
|
|
|
|
|
|
/* More limited version when the CPU does not have all events. */
|
2015-06-03 14:25:53 +00:00
|
|
|
static const char * transaction_limited_attrs = {
|
|
|
|
"task-clock,"
|
2013-08-21 23:47:26 +00:00
|
|
|
"{"
|
|
|
|
"instructions,"
|
|
|
|
"cycles,"
|
|
|
|
"cpu/cycles-t/,"
|
|
|
|
"cpu/tx-start/"
|
|
|
|
"}"
|
|
|
|
};
|
|
|
|
|
2012-04-05 16:26:27 +00:00
|
|
|
static struct perf_evlist *evsel_list;
|
2011-01-11 22:56:53 +00:00
|
|
|
|
2013-11-12 19:46:16 +00:00
|
|
|
static struct target target = {
|
2012-05-07 05:09:04 +00:00
|
|
|
.uid = UINT_MAX,
|
|
|
|
};
|
2009-04-20 13:37:32 +00:00
|
|
|
|
2015-10-25 14:51:18 +00:00
|
|
|
typedef int (*aggr_get_id_t)(struct cpu_map *m, int cpu);
|
|
|
|
|
2009-06-24 12:49:34 +00:00
|
|
|
static int run_count = 1;
|
2010-05-12 08:40:01 +00:00
|
|
|
static bool no_inherit = false;
|
2013-06-04 15:44:26 +00:00
|
|
|
static volatile pid_t child_pid = -1;
|
2010-04-13 08:37:33 +00:00
|
|
|
static bool null_run = false;
|
perf stat: Add -d -d and -d -d -d options to show more CPU events
Print even more detailed statistics if requested via perf stat -d:
-d: detailed events, L1 and LLC data cache
-d -d: more detailed events, dTLB and iTLB events
-d -d -d: very detailed events, adding prefetch events
Full output looks like this now:
Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% )
49,068 context-switches # 0.029 M/sec ( +- 16.66% )
8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% )
17,397 page-faults # 0.010 M/sec ( +- 0.46% )
2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%]
1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%]
743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%]
1,314,416,379 instructions # 0.56 insns per cycle
# 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%]
272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%]
3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%]
449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%]
22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%]
6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%]
1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%]
411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%]
27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%]
464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%]
10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%]
1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%]
117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%]
4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%]
1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%]
0.195622057 seconds time elapsed ( +- 6.84% )
Also clean up the attribute construction code to be appending, and factor
it out into add_default_attributes().
Tweak the coverage percentage printout a bit, so that it's easier to view it
alongside the +- sttddev colum.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 11:30:56 +00:00
|
|
|
static int detailed_run = 0;
|
2013-08-21 23:47:26 +00:00
|
|
|
static bool transaction_run;
|
2010-12-01 19:53:27 +00:00
|
|
|
static bool big_num = true;
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
static int big_num_opt = -1;
|
|
|
|
static const char *csv_sep = NULL;
|
|
|
|
static bool csv_output = false;
|
2011-08-17 10:42:07 +00:00
|
|
|
static bool group = false;
|
2012-10-23 11:40:14 +00:00
|
|
|
static const char *pre_cmd = NULL;
|
|
|
|
static const char *post_cmd = NULL;
|
|
|
|
static bool sync_run = false;
|
2013-08-03 00:41:11 +00:00
|
|
|
static unsigned int initial_delay = 0;
|
2013-11-12 16:58:49 +00:00
|
|
|
static unsigned int unit_width = 4; /* strlen("unit") */
|
2013-03-01 18:02:27 +00:00
|
|
|
static bool forever = false;
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
static struct timespec ref_time;
|
2013-02-14 12:57:27 +00:00
|
|
|
static struct cpu_map *aggr_map;
|
2015-10-25 14:51:18 +00:00
|
|
|
static aggr_get_id_t aggr_get_id;
|
2015-11-05 14:40:45 +00:00
|
|
|
static bool append_file;
|
|
|
|
static const char *output_name;
|
|
|
|
static int output_fd;
|
perf stat: add perf stat -B to pretty print large numbers
It is hard to read very large numbers so provide an option to perf stat
to separate thousands using a separator. The patch leverages the locale
support of stdio. You need to set your LC_NUMERIC appropriately, for
instance LC_NUMERIC=en_US.UTF8. You need to pass -B to activate this
feature. This way existing scripts parsing the output do not need to be
changed. Here is an example.
$ perf stat noploop 2
noploop for 2 seconds
Performance counter stats for 'noploop 2':
1998.347031 task-clock-msecs # 0.998 CPUs
61 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
118 page-faults # 0.000 M/sec
4,138,410,900 cycles # 2070.917 M/sec (scaled from 70.01%)
2,062,650,268 instructions # 0.498 IPC (scaled from 70.01%)
2,057,653,466 branches # 1029.678 M/sec (scaled from 70.01%)
40,267 branch-misses # 0.002 % (scaled from 30.04%)
2,055,961,348 cache-references # 1028.831 M/sec (scaled from 30.03%)
53,725 cache-misses # 0.027 M/sec (scaled from 30.02%)
2.001393933 seconds time elapsed
$ perf stat -B noploop 2
noploop for 2 seconds
Performance counter stats for 'noploop 2':
1998.297883 task-clock-msecs # 0.998 CPUs
59 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
119 page-faults # 0.000 M/sec
4,131,380,160 cycles # 2067.450 M/sec (scaled from 70.01%)
2,059,096,507 instructions # 0.498 IPC (scaled from 70.01%)
2,054,681,303 branches # 1028.216 M/sec (scaled from 70.01%)
25,650 branch-misses # 0.001 % (scaled from 30.05%)
2,056,283,014 cache-references # 1029.017 M/sec (scaled from 30.03%)
47,097 cache-misses # 0.024 M/sec (scaled from 30.02%)
2.001391016 seconds time elapsed
Cc: David S. Miller <davem@davemloft.net>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <4bf28fe8.914ed80a.01ca.fffff5f5@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-05-18 13:00:01 +00:00
|
|
|
|
2009-12-31 08:05:50 +00:00
|
|
|
static volatile int done = 0;
|
|
|
|
|
2015-07-21 12:31:22 +00:00
|
|
|
static struct perf_stat_config stat_config = {
|
|
|
|
.aggr_mode = AGGR_GLOBAL,
|
2015-07-21 12:31:23 +00:00
|
|
|
.scale = true,
|
2015-07-21 12:31:22 +00:00
|
|
|
};
|
|
|
|
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
static inline void diff_timespec(struct timespec *r, struct timespec *a,
|
|
|
|
struct timespec *b)
|
|
|
|
{
|
|
|
|
r->tv_sec = a->tv_sec - b->tv_sec;
|
|
|
|
if (a->tv_nsec < b->tv_nsec) {
|
|
|
|
r->tv_nsec = a->tv_nsec + 1000000000L - b->tv_nsec;
|
|
|
|
r->tv_sec--;
|
|
|
|
} else {
|
|
|
|
r->tv_nsec = a->tv_nsec - b->tv_nsec ;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-06-26 09:29:13 +00:00
|
|
|
static void perf_stat__reset_stats(void)
|
|
|
|
{
|
|
|
|
perf_evlist__reset_stats(evsel_list);
|
2015-06-03 14:25:59 +00:00
|
|
|
perf_stat__reset_shadow_stats();
|
2015-06-03 14:25:55 +00:00
|
|
|
}
|
|
|
|
|
2012-11-12 17:34:00 +00:00
|
|
|
static int create_perf_stat_counter(struct perf_evsel *evsel)
|
2009-04-20 13:37:32 +00:00
|
|
|
{
|
2011-01-03 18:39:04 +00:00
|
|
|
struct perf_event_attr *attr = &evsel->attr;
|
2011-10-25 12:42:19 +00:00
|
|
|
|
2015-07-21 12:31:23 +00:00
|
|
|
if (stat_config.scale)
|
2009-06-06 07:58:57 +00:00
|
|
|
attr->read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
|
|
|
|
PERF_FORMAT_TOTAL_TIME_RUNNING;
|
2009-04-20 13:37:32 +00:00
|
|
|
|
2011-04-14 14:20:14 +00:00
|
|
|
attr->inherit = !no_inherit;
|
|
|
|
|
2015-11-25 15:36:54 +00:00
|
|
|
/*
|
|
|
|
* Some events get initialized with sample_(period/type) set,
|
|
|
|
* like tracepoints. Clear it up for counting.
|
|
|
|
*/
|
|
|
|
attr->sample_period = 0;
|
|
|
|
attr->sample_type = 0;
|
|
|
|
|
2015-12-03 09:06:44 +00:00
|
|
|
/*
|
|
|
|
* Disabling all counters initially, they will be enabled
|
|
|
|
* either manually by us or by kernel via enable_on_exec
|
|
|
|
* set later.
|
|
|
|
*/
|
2015-12-03 09:06:45 +00:00
|
|
|
if (perf_evsel__is_group_leader(evsel)) {
|
2015-12-03 09:06:44 +00:00
|
|
|
attr->disabled = 1;
|
|
|
|
|
2015-12-03 09:06:45 +00:00
|
|
|
/*
|
|
|
|
* In case of initial_delay we enable tracee
|
|
|
|
* events manually.
|
|
|
|
*/
|
|
|
|
if (target__none(&target) && !initial_delay)
|
2013-08-03 00:41:11 +00:00
|
|
|
attr->enable_on_exec = 1;
|
2009-04-20 13:37:32 +00:00
|
|
|
}
|
2010-03-22 16:10:28 +00:00
|
|
|
|
2015-12-03 09:06:45 +00:00
|
|
|
if (target__has_cpu(&target))
|
|
|
|
return perf_evsel__open_per_cpu(evsel, perf_evsel__cpus(evsel));
|
|
|
|
|
2012-12-13 16:13:07 +00:00
|
|
|
return perf_evsel__open_per_thread(evsel, evsel_list->threads);
|
2009-04-20 13:37:32 +00:00
|
|
|
}
|
|
|
|
|
2009-05-29 07:10:54 +00:00
|
|
|
/*
|
|
|
|
* Does the counter have nsecs as a unit?
|
|
|
|
*/
|
2011-01-03 18:49:44 +00:00
|
|
|
static inline int nsec_counter(struct perf_evsel *evsel)
|
2009-05-29 07:10:54 +00:00
|
|
|
{
|
2011-01-03 18:49:44 +00:00
|
|
|
if (perf_evsel__match(evsel, SOFTWARE, SW_CPU_CLOCK) ||
|
|
|
|
perf_evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK))
|
2009-05-29 07:10:54 +00:00
|
|
|
return 1;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-11-16 09:05:01 +00:00
|
|
|
/*
|
|
|
|
* Read out the results of a single counter:
|
|
|
|
* do not aggregate counts across CPUs in system-wide mode
|
|
|
|
*/
|
2011-01-03 19:45:52 +00:00
|
|
|
static int read_counter(struct perf_evsel *counter)
|
2010-11-16 09:05:01 +00:00
|
|
|
{
|
2014-11-21 09:31:09 +00:00
|
|
|
int nthreads = thread_map__nr(evsel_list->threads);
|
|
|
|
int ncpus = perf_evsel__nr_cpus(counter);
|
|
|
|
int cpu, thread;
|
2010-11-16 09:05:01 +00:00
|
|
|
|
2015-02-13 18:40:58 +00:00
|
|
|
if (!counter->supported)
|
|
|
|
return -ENOENT;
|
|
|
|
|
2014-11-21 09:31:09 +00:00
|
|
|
if (counter->system_wide)
|
|
|
|
nthreads = 1;
|
|
|
|
|
|
|
|
for (thread = 0; thread < nthreads; thread++) {
|
|
|
|
for (cpu = 0; cpu < ncpus; cpu++) {
|
2015-06-26 09:29:20 +00:00
|
|
|
struct perf_counts_values *count;
|
|
|
|
|
|
|
|
count = perf_counts(counter->counts, cpu, thread);
|
|
|
|
if (perf_evsel__read(counter, cpu, thread, count))
|
2014-11-21 09:31:09 +00:00
|
|
|
return -1;
|
|
|
|
}
|
2010-11-16 09:05:01 +00:00
|
|
|
}
|
2011-01-03 19:45:52 +00:00
|
|
|
|
|
|
|
return 0;
|
2009-05-29 07:10:54 +00:00
|
|
|
}
|
|
|
|
|
2015-07-08 11:17:31 +00:00
|
|
|
static void read_counters(bool close_counters)
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
{
|
|
|
|
struct perf_evsel *counter;
|
|
|
|
|
2015-06-26 09:29:19 +00:00
|
|
|
evlist__for_each(evsel_list, counter) {
|
2015-06-26 09:29:20 +00:00
|
|
|
if (read_counter(counter))
|
2015-09-01 22:52:46 +00:00
|
|
|
pr_debug("failed to read counter %s\n", counter->name);
|
2015-06-26 09:29:20 +00:00
|
|
|
|
2015-07-21 12:31:27 +00:00
|
|
|
if (perf_stat_process_counter(&stat_config, counter))
|
2015-06-26 09:29:20 +00:00
|
|
|
pr_warning("failed to process counter %s\n", counter->name);
|
2015-06-26 09:29:19 +00:00
|
|
|
|
2015-07-08 11:17:31 +00:00
|
|
|
if (close_counters) {
|
2015-06-26 09:29:19 +00:00
|
|
|
perf_evsel__close_fd(counter, perf_evsel__nr_cpus(counter),
|
|
|
|
thread_map__nr(evsel_list->threads));
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
}
|
|
|
|
}
|
2015-06-26 09:29:19 +00:00
|
|
|
}
|
|
|
|
|
2015-06-26 09:29:24 +00:00
|
|
|
static void process_interval(void)
|
2015-06-26 09:29:19 +00:00
|
|
|
{
|
|
|
|
struct timespec ts, rs;
|
|
|
|
|
|
|
|
read_counters(false);
|
2013-02-14 12:57:27 +00:00
|
|
|
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
clock_gettime(CLOCK_MONOTONIC, &ts);
|
|
|
|
diff_timespec(&rs, &ts, &ref_time);
|
|
|
|
|
2015-06-26 09:29:26 +00:00
|
|
|
print_counters(&rs, 0, NULL);
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
}
|
|
|
|
|
2015-12-03 09:06:44 +00:00
|
|
|
static void enable_counters(void)
|
2013-08-03 00:41:11 +00:00
|
|
|
{
|
2015-12-03 09:06:44 +00:00
|
|
|
if (initial_delay)
|
2013-08-03 00:41:11 +00:00
|
|
|
usleep(initial_delay * 1000);
|
2015-12-03 09:06:44 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We need to enable counters only if:
|
|
|
|
* - we don't have tracee (attaching to task or cpu)
|
|
|
|
* - we have initial delay configured
|
|
|
|
*/
|
|
|
|
if (!target__none(&target) || initial_delay)
|
2015-12-03 09:06:43 +00:00
|
|
|
perf_evlist__enable(evsel_list);
|
2013-08-03 00:41:11 +00:00
|
|
|
}
|
|
|
|
|
2014-01-02 18:11:25 +00:00
|
|
|
static volatile int workload_exec_errno;
|
2013-12-28 18:45:08 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* perf_evlist__prepare_workload will send a SIGUSR1
|
|
|
|
* if the fork fails, since we asked by setting its
|
|
|
|
* want_signal to true.
|
|
|
|
*/
|
2014-01-02 18:11:25 +00:00
|
|
|
static void workload_exec_failed_signal(int signo __maybe_unused, siginfo_t *info,
|
|
|
|
void *ucontext __maybe_unused)
|
2013-12-28 18:45:08 +00:00
|
|
|
{
|
2014-01-02 18:11:25 +00:00
|
|
|
workload_exec_errno = info->si_value.sival_int;
|
2013-12-28 18:45:08 +00:00
|
|
|
}
|
|
|
|
|
2013-03-11 07:43:18 +00:00
|
|
|
static int __run_perf_stat(int argc, const char **argv)
|
2009-06-13 12:57:28 +00:00
|
|
|
{
|
2015-07-21 12:31:25 +00:00
|
|
|
int interval = stat_config.interval;
|
2012-12-13 18:10:58 +00:00
|
|
|
char msg[512];
|
2009-06-13 12:57:28 +00:00
|
|
|
unsigned long long t0, t1;
|
2012-11-12 17:34:00 +00:00
|
|
|
struct perf_evsel *counter;
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
struct timespec ts;
|
2013-11-12 16:58:49 +00:00
|
|
|
size_t l;
|
2009-06-13 12:57:28 +00:00
|
|
|
int status = 0;
|
2010-03-18 14:36:03 +00:00
|
|
|
const bool forks = (argc > 0);
|
2009-06-13 12:57:28 +00:00
|
|
|
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
if (interval) {
|
|
|
|
ts.tv_sec = interval / 1000;
|
|
|
|
ts.tv_nsec = (interval % 1000) * 1000000;
|
|
|
|
} else {
|
|
|
|
ts.tv_sec = 1;
|
|
|
|
ts.tv_nsec = 0;
|
|
|
|
}
|
|
|
|
|
2009-12-31 08:05:50 +00:00
|
|
|
if (forks) {
|
2014-01-03 17:56:49 +00:00
|
|
|
if (perf_evlist__prepare_workload(evsel_list, &target, argv, false,
|
|
|
|
workload_exec_failed_signal) < 0) {
|
2013-03-11 07:43:18 +00:00
|
|
|
perror("failed to prepare workload");
|
|
|
|
return -1;
|
2009-12-31 08:05:50 +00:00
|
|
|
}
|
2013-09-30 09:01:11 +00:00
|
|
|
child_pid = evsel_list->workload.pid;
|
2009-06-29 11:13:21 +00:00
|
|
|
}
|
|
|
|
|
perf tools: Enable grouping logic for parsed events
This patch adds a functionality that allows to create event groups
based on the way they are specified on the command line. Adding
functionality to the '{}' group syntax introduced in earlier patch.
The current '--group/-g' option behaviour remains intact. If you
specify it for record/stat/top command, all the specified events
become members of a single group with the first event as a group
leader.
With the new '{}' group syntax you can create group like:
# perf record -e '{cycles,faults}' ls
resulting in single event group containing 'cycles' and 'faults'
events, with cycles event as group leader.
All groups are created with regards to threads and cpus. Thus
recording an event group within a 2 threads on server with
4 CPUs will create 8 separate groups.
Examples (first event in brackets is group leader):
# 1 group (cpu-clock,task-clock)
perf record --group -e cpu-clock,task-clock ls
perf record -e '{cpu-clock,task-clock}' ls
# 2 groups (cpu-clock,task-clock) (minor-faults,major-faults)
perf record -e '{cpu-clock,task-clock},{minor-faults,major-faults}' ls
# 1 group (cpu-clock,task-clock,minor-faults,major-faults)
perf record --group -e cpu-clock,task-clock -e minor-faults,major-faults ls
perf record -e '{cpu-clock,task-clock,minor-faults,major-faults}' ls
# 2 groups (cpu-clock,task-clock) (minor-faults,major-faults)
perf record -e '{cpu-clock,task-clock} -e '{minor-faults,major-faults}' \
-e instructions ls
# 1 group
# (cpu-clock,task-clock,minor-faults,major-faults,instructions)
perf record --group -e cpu-clock,task-clock \
-e minor-faults,major-faults -e instructions ls perf record -e
'{cpu-clock,task-clock,minor-faults,major-faults,instructions}' ls
It's possible to use standard event modifier for a group, which spans
over all events in the group and updates each event modifier settings,
for example:
# perf record -r '{faults:k,cache-references}:p'
resulting in ':kp' modifier being used for 'faults' and ':p' modifier
being used for 'cache-references' event.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/n/tip-ho42u0wcr8mn1otkalqi13qp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-08-08 10:22:36 +00:00
|
|
|
if (group)
|
2012-08-14 19:35:48 +00:00
|
|
|
perf_evlist__set_leader(evsel_list);
|
perf tools: Enable grouping logic for parsed events
This patch adds a functionality that allows to create event groups
based on the way they are specified on the command line. Adding
functionality to the '{}' group syntax introduced in earlier patch.
The current '--group/-g' option behaviour remains intact. If you
specify it for record/stat/top command, all the specified events
become members of a single group with the first event as a group
leader.
With the new '{}' group syntax you can create group like:
# perf record -e '{cycles,faults}' ls
resulting in single event group containing 'cycles' and 'faults'
events, with cycles event as group leader.
All groups are created with regards to threads and cpus. Thus
recording an event group within a 2 threads on server with
4 CPUs will create 8 separate groups.
Examples (first event in brackets is group leader):
# 1 group (cpu-clock,task-clock)
perf record --group -e cpu-clock,task-clock ls
perf record -e '{cpu-clock,task-clock}' ls
# 2 groups (cpu-clock,task-clock) (minor-faults,major-faults)
perf record -e '{cpu-clock,task-clock},{minor-faults,major-faults}' ls
# 1 group (cpu-clock,task-clock,minor-faults,major-faults)
perf record --group -e cpu-clock,task-clock -e minor-faults,major-faults ls
perf record -e '{cpu-clock,task-clock,minor-faults,major-faults}' ls
# 2 groups (cpu-clock,task-clock) (minor-faults,major-faults)
perf record -e '{cpu-clock,task-clock} -e '{minor-faults,major-faults}' \
-e instructions ls
# 1 group
# (cpu-clock,task-clock,minor-faults,major-faults,instructions)
perf record --group -e cpu-clock,task-clock \
-e minor-faults,major-faults -e instructions ls perf record -e
'{cpu-clock,task-clock,minor-faults,major-faults,instructions}' ls
It's possible to use standard event modifier for a group, which spans
over all events in the group and updates each event modifier settings,
for example:
# perf record -r '{faults:k,cache-references}:p'
resulting in ':kp' modifier being used for 'faults' and ':p' modifier
being used for 'cache-references' event.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/n/tip-ho42u0wcr8mn1otkalqi13qp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-08-08 10:22:36 +00:00
|
|
|
|
2014-01-10 13:37:27 +00:00
|
|
|
evlist__for_each(evsel_list, counter) {
|
2012-11-12 17:34:00 +00:00
|
|
|
if (create_perf_stat_counter(counter) < 0) {
|
2012-05-08 15:29:16 +00:00
|
|
|
/*
|
|
|
|
* PPC returns ENXIO for HW counters until 2.6.37
|
|
|
|
* (behavior changed with commit b0a873e).
|
|
|
|
*/
|
2011-12-01 22:38:33 +00:00
|
|
|
if (errno == EINVAL || errno == ENOSYS ||
|
2012-05-08 15:29:16 +00:00
|
|
|
errno == ENOENT || errno == EOPNOTSUPP ||
|
|
|
|
errno == ENXIO) {
|
2011-04-29 22:04:15 +00:00
|
|
|
if (verbose)
|
|
|
|
ui__warning("%s event is not supported by the kernel.\n",
|
2012-06-12 15:34:58 +00:00
|
|
|
perf_evsel__name(counter));
|
2011-05-30 14:55:59 +00:00
|
|
|
counter->supported = false;
|
2015-06-11 06:32:40 +00:00
|
|
|
|
|
|
|
if ((counter->leader != counter) ||
|
|
|
|
!(counter->leader->nr_members > 1))
|
|
|
|
continue;
|
2011-04-29 22:04:15 +00:00
|
|
|
}
|
2011-04-28 06:48:42 +00:00
|
|
|
|
2012-12-13 18:10:58 +00:00
|
|
|
perf_evsel__open_strerror(counter, &target,
|
|
|
|
errno, msg, sizeof(msg));
|
|
|
|
ui__error("%s\n", msg);
|
|
|
|
|
2011-01-03 19:48:12 +00:00
|
|
|
if (child_pid != -1)
|
|
|
|
kill(child_pid, SIGTERM);
|
2012-08-26 18:24:44 +00:00
|
|
|
|
2011-01-03 19:48:12 +00:00
|
|
|
return -1;
|
|
|
|
}
|
2011-05-30 14:55:59 +00:00
|
|
|
counter->supported = true;
|
2013-11-12 16:58:49 +00:00
|
|
|
|
|
|
|
l = strlen(counter->unit);
|
|
|
|
if (l > unit_width)
|
|
|
|
unit_width = l;
|
2010-03-22 16:10:28 +00:00
|
|
|
}
|
2009-06-13 12:57:28 +00:00
|
|
|
|
2015-03-24 22:23:47 +00:00
|
|
|
if (perf_evlist__apply_filters(evsel_list, &counter)) {
|
|
|
|
error("failed to set filter \"%s\" on event %s with %d (%s)\n",
|
|
|
|
counter->filter, perf_evsel__name(counter), errno,
|
2014-08-14 02:22:55 +00:00
|
|
|
strerror_r(errno, msg, sizeof(msg)));
|
2011-03-14 15:40:30 +00:00
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2009-06-13 12:57:28 +00:00
|
|
|
/*
|
|
|
|
* Enable counters and exec the command:
|
|
|
|
*/
|
|
|
|
t0 = rdclock();
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
clock_gettime(CLOCK_MONOTONIC, &ref_time);
|
2009-06-13 12:57:28 +00:00
|
|
|
|
2009-12-31 08:05:50 +00:00
|
|
|
if (forks) {
|
2013-03-11 07:43:18 +00:00
|
|
|
perf_evlist__start_workload(evsel_list);
|
2015-12-03 09:06:44 +00:00
|
|
|
enable_counters();
|
2013-03-11 07:43:18 +00:00
|
|
|
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
if (interval) {
|
|
|
|
while (!waitpid(child_pid, &status, WNOHANG)) {
|
|
|
|
nanosleep(&ts, NULL);
|
2015-06-26 09:29:24 +00:00
|
|
|
process_interval();
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
}
|
|
|
|
}
|
2009-12-31 08:05:50 +00:00
|
|
|
wait(&status);
|
2013-12-28 18:45:08 +00:00
|
|
|
|
2014-01-02 18:11:25 +00:00
|
|
|
if (workload_exec_errno) {
|
|
|
|
const char *emsg = strerror_r(workload_exec_errno, msg, sizeof(msg));
|
|
|
|
pr_err("Workload failed: %s\n", emsg);
|
2013-12-28 18:45:08 +00:00
|
|
|
return -1;
|
2014-01-02 18:11:25 +00:00
|
|
|
}
|
2013-12-28 18:45:08 +00:00
|
|
|
|
2011-09-15 21:31:40 +00:00
|
|
|
if (WIFSIGNALED(status))
|
|
|
|
psignal(WTERMSIG(status), argv[0]);
|
2009-12-31 08:05:50 +00:00
|
|
|
} else {
|
2015-12-03 09:06:44 +00:00
|
|
|
enable_counters();
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
while (!done) {
|
|
|
|
nanosleep(&ts, NULL);
|
|
|
|
if (interval)
|
2015-06-26 09:29:24 +00:00
|
|
|
process_interval();
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
}
|
2009-12-31 08:05:50 +00:00
|
|
|
}
|
2009-06-13 12:57:28 +00:00
|
|
|
|
|
|
|
t1 = rdclock();
|
|
|
|
|
2009-09-04 13:36:08 +00:00
|
|
|
update_stats(&walltime_nsecs_stats, t1 - t0);
|
2009-06-13 12:57:28 +00:00
|
|
|
|
2015-06-26 09:29:19 +00:00
|
|
|
read_counters(true);
|
2011-01-03 19:45:52 +00:00
|
|
|
|
2009-06-13 12:57:28 +00:00
|
|
|
return WEXITSTATUS(status);
|
|
|
|
}
|
|
|
|
|
2014-01-03 20:34:42 +00:00
|
|
|
static int run_perf_stat(int argc, const char **argv)
|
2012-10-23 11:40:14 +00:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (pre_cmd) {
|
|
|
|
ret = system(pre_cmd);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (sync_run)
|
|
|
|
sync();
|
|
|
|
|
|
|
|
ret = __run_perf_stat(argc, argv);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
if (post_cmd) {
|
|
|
|
ret = system(post_cmd);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2015-03-11 14:16:27 +00:00
|
|
|
static void print_running(u64 run, u64 ena)
|
|
|
|
{
|
|
|
|
if (csv_output) {
|
2015-07-21 12:31:24 +00:00
|
|
|
fprintf(stat_config.output, "%s%" PRIu64 "%s%.2f",
|
2015-03-11 14:16:27 +00:00
|
|
|
csv_sep,
|
|
|
|
run,
|
|
|
|
csv_sep,
|
|
|
|
ena ? 100.0 * run / ena : 100.0);
|
|
|
|
} else if (run != ena) {
|
2015-07-21 12:31:24 +00:00
|
|
|
fprintf(stat_config.output, " (%.2f%%)", 100.0 * run / ena);
|
2015-03-11 14:16:27 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2011-04-27 03:35:39 +00:00
|
|
|
static void print_noise_pct(double total, double avg)
|
|
|
|
{
|
2012-09-17 08:31:14 +00:00
|
|
|
double pct = rel_stddev_stats(total, avg);
|
2011-04-27 03:35:39 +00:00
|
|
|
|
perf stat: Add noise output for csv mode
Previously, when you want perf-stat to output the statistics in
csv mode, no information of the noise will be printed out.
For example right now we output this --repeat information:
./perf stat -r3 -x, sleep 1
1.164789,task-clock
8,context-switches
0,CPU-migrations
219,page-faults
3337800,cycles
With this patch, the output will be appended with an additional
entry for the noise value:
./perf stat -r3 -x, sleep 1
1.164789,task-clock,3.75%
8,context-switches,75.00%
0,CPU-migrations,100.00%
219,page-faults,0.00%
3337800,cycles,3.36%
Signed-off-by: Zhengyu He <zhengyuh@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Stephane Eranian <eranian@google.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1308861942-4945-1-git-send-email-zhengyuh@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-06-23 20:45:42 +00:00
|
|
|
if (csv_output)
|
2015-07-21 12:31:24 +00:00
|
|
|
fprintf(stat_config.output, "%s%.2f%%", csv_sep, pct);
|
2011-09-07 23:14:02 +00:00
|
|
|
else if (pct)
|
2015-07-21 12:31:24 +00:00
|
|
|
fprintf(stat_config.output, " ( +-%6.2f%% )", pct);
|
2011-04-27 03:35:39 +00:00
|
|
|
}
|
|
|
|
|
2011-01-03 18:39:04 +00:00
|
|
|
static void print_noise(struct perf_evsel *evsel, double avg)
|
2009-06-13 12:57:28 +00:00
|
|
|
{
|
2015-10-16 10:41:03 +00:00
|
|
|
struct perf_stat_evsel *ps;
|
2011-01-03 18:39:04 +00:00
|
|
|
|
2009-09-04 16:23:38 +00:00
|
|
|
if (run_count == 1)
|
|
|
|
return;
|
|
|
|
|
2011-01-03 18:39:04 +00:00
|
|
|
ps = evsel->priv;
|
2011-04-27 03:35:39 +00:00
|
|
|
print_noise_pct(stddev_stats(&ps->res_stats[0]), avg);
|
2009-06-13 12:57:28 +00:00
|
|
|
}
|
|
|
|
|
2013-02-14 12:57:29 +00:00
|
|
|
static void aggr_printout(struct perf_evsel *evsel, int id, int nr)
|
2009-06-13 11:35:00 +00:00
|
|
|
{
|
2015-07-21 12:31:22 +00:00
|
|
|
switch (stat_config.aggr_mode) {
|
2013-02-14 12:57:29 +00:00
|
|
|
case AGGR_CORE:
|
2015-07-21 12:31:24 +00:00
|
|
|
fprintf(stat_config.output, "S%d-C%*d%s%*d%s",
|
2013-02-14 12:57:29 +00:00
|
|
|
cpu_map__id_to_socket(id),
|
|
|
|
csv_output ? 0 : -8,
|
|
|
|
cpu_map__id_to_cpu(id),
|
|
|
|
csv_sep,
|
|
|
|
csv_output ? 0 : 4,
|
|
|
|
nr,
|
|
|
|
csv_sep);
|
|
|
|
break;
|
2013-02-14 12:57:27 +00:00
|
|
|
case AGGR_SOCKET:
|
2015-07-21 12:31:24 +00:00
|
|
|
fprintf(stat_config.output, "S%*d%s%*d%s",
|
2013-02-06 14:46:02 +00:00
|
|
|
csv_output ? 0 : -5,
|
2013-02-14 12:57:29 +00:00
|
|
|
id,
|
2013-02-06 14:46:02 +00:00
|
|
|
csv_sep,
|
|
|
|
csv_output ? 0 : 4,
|
|
|
|
nr,
|
|
|
|
csv_sep);
|
2013-02-14 12:57:27 +00:00
|
|
|
break;
|
|
|
|
case AGGR_NONE:
|
2015-07-21 12:31:24 +00:00
|
|
|
fprintf(stat_config.output, "CPU%*d%s",
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
csv_output ? 0 : -4,
|
2013-02-14 12:57:29 +00:00
|
|
|
perf_evsel__cpus(evsel)->map[id], csv_sep);
|
2013-02-14 12:57:27 +00:00
|
|
|
break;
|
perf stat: Introduce --per-thread option
Currently all the -p option PID arguments tasks values get aggregated
and printed as single values.
Adding --per-tasks option to print values per task.
$ perf stat -e cycles,instructions --per-thread -p 30190,30242
^C
Performance counter stats for process id '30190,30242':
cat-30190 0 cycles
yes-30242 3,842,525,421 cycles
cat-30190 0 instructions
yes-30242 10,370,817,010 instructions
1.143155657 seconds time elapsed
Also works under interval mode:
$ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000
# time comm-pid counts unit events
1.000073435 cat-30190 89,058 cycles
1.000073435 yes-30242 3,360,786,902 cycles (100.00%)
1.000073435 cat-30190 14,066 instructions
1.000073435 yes-30242 9,069,937,462 instructions
2.000204830 cat-30190 0 cycles
2.000204830 yes-30242 3,351,667,626 cycles
2.000204830 cat-30190 0 instructions
2.000204830 yes-30242 9,045,796,885 instructions
^C 2.771286639 cat-30190 0 cycles
2.771286639 yes-30242 2,593,884,166 cycles
2.771286639 cat-30190 0 instructions
2.771286639 yes-30242 7,001,171,191 instructions
It works only with -t and -p options, otherwise following error is
printed:
$ perf stat -e cycles --per-thread -I 1000 ls
The --per-thread option is only available when monitoring via -p -t options.
-p, --pid <pid> stat events on existing process id
-t, --tid <tid> stat events on existing thread id
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 09:29:27 +00:00
|
|
|
case AGGR_THREAD:
|
2015-07-21 12:31:24 +00:00
|
|
|
fprintf(stat_config.output, "%*s-%*d%s",
|
perf stat: Introduce --per-thread option
Currently all the -p option PID arguments tasks values get aggregated
and printed as single values.
Adding --per-tasks option to print values per task.
$ perf stat -e cycles,instructions --per-thread -p 30190,30242
^C
Performance counter stats for process id '30190,30242':
cat-30190 0 cycles
yes-30242 3,842,525,421 cycles
cat-30190 0 instructions
yes-30242 10,370,817,010 instructions
1.143155657 seconds time elapsed
Also works under interval mode:
$ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000
# time comm-pid counts unit events
1.000073435 cat-30190 89,058 cycles
1.000073435 yes-30242 3,360,786,902 cycles (100.00%)
1.000073435 cat-30190 14,066 instructions
1.000073435 yes-30242 9,069,937,462 instructions
2.000204830 cat-30190 0 cycles
2.000204830 yes-30242 3,351,667,626 cycles
2.000204830 cat-30190 0 instructions
2.000204830 yes-30242 9,045,796,885 instructions
^C 2.771286639 cat-30190 0 cycles
2.771286639 yes-30242 2,593,884,166 cycles
2.771286639 cat-30190 0 instructions
2.771286639 yes-30242 7,001,171,191 instructions
It works only with -t and -p options, otherwise following error is
printed:
$ perf stat -e cycles --per-thread -I 1000 ls
The --per-thread option is only available when monitoring via -p -t options.
-p, --pid <pid> stat events on existing process id
-t, --tid <tid> stat events on existing thread id
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 09:29:27 +00:00
|
|
|
csv_output ? 0 : 16,
|
|
|
|
thread_map__comm(evsel->threads, id),
|
|
|
|
csv_output ? 0 : -8,
|
|
|
|
thread_map__pid(evsel->threads, id),
|
|
|
|
csv_sep);
|
|
|
|
break;
|
2013-02-14 12:57:27 +00:00
|
|
|
case AGGR_GLOBAL:
|
2015-10-16 10:41:04 +00:00
|
|
|
case AGGR_UNSET:
|
2013-02-14 12:57:27 +00:00
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-09-24 20:50:46 +00:00
|
|
|
static void nsec_printout(int id, int nr, struct perf_evsel *evsel, double avg)
|
2013-02-14 12:57:27 +00:00
|
|
|
{
|
2015-07-21 12:31:24 +00:00
|
|
|
FILE *output = stat_config.output;
|
2013-02-14 12:57:27 +00:00
|
|
|
double msecs = avg / 1e6;
|
2013-11-12 16:58:49 +00:00
|
|
|
const char *fmt_v, *fmt_n;
|
2013-09-28 20:28:00 +00:00
|
|
|
char name[25];
|
2013-02-14 12:57:27 +00:00
|
|
|
|
2013-11-12 16:58:49 +00:00
|
|
|
fmt_v = csv_output ? "%.6f%s" : "%18.6f%s";
|
|
|
|
fmt_n = csv_output ? "%s" : "%-25s";
|
|
|
|
|
2014-09-24 20:50:46 +00:00
|
|
|
aggr_printout(evsel, id, nr);
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
|
2013-09-28 20:28:00 +00:00
|
|
|
scnprintf(name, sizeof(name), "%s%s",
|
|
|
|
perf_evsel__name(evsel), csv_output ? "" : " (msec)");
|
2013-11-12 16:58:49 +00:00
|
|
|
|
|
|
|
fprintf(output, fmt_v, msecs, csv_sep);
|
|
|
|
|
|
|
|
if (csv_output)
|
|
|
|
fprintf(output, "%s%s", evsel->unit, csv_sep);
|
|
|
|
else
|
|
|
|
fprintf(output, "%-*s%s", unit_width, evsel->unit, csv_sep);
|
|
|
|
|
|
|
|
fprintf(output, fmt_n, name);
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
|
perf tool: Add cgroup support
This patch adds the ability to filter monitoring based on container groups
(cgroups) for both perf stat and perf record. It is possible to monitor
multiple cgroup in parallel. There is one cgroup per event. The cgroups to
monitor are passed via a new -G option followed by a comma separated list of
cgroup names.
The cgroup filesystem has to be mounted. Given a cgroup name, the perf tool
finds the corresponding directory in the cgroup filesystem and opens it. It
then passes that file descriptor to the kernel.
Example:
$ perf stat -B -a -e cycles:u,cycles:u,cycles:u -G test1,,test2 -- sleep 1
Performance counter stats for 'sleep 1':
2,368,667,414 cycles test1
2,369,661,459 cycles
<not counted> cycles test2
1.001856890 seconds time elapsed
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4d590290.825bdf0a.7d0a.4890@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 09:20:01 +00:00
|
|
|
if (evsel->cgrp)
|
2011-08-15 20:22:33 +00:00
|
|
|
fprintf(output, "%s%s", csv_sep, evsel->cgrp->name);
|
2009-06-13 11:35:00 +00:00
|
|
|
}
|
|
|
|
|
2015-06-03 14:25:56 +00:00
|
|
|
static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
|
|
|
|
{
|
2015-07-21 12:31:24 +00:00
|
|
|
FILE *output = stat_config.output;
|
2015-06-03 14:25:56 +00:00
|
|
|
double sc = evsel->scale;
|
|
|
|
const char *fmt;
|
|
|
|
|
|
|
|
if (csv_output) {
|
|
|
|
fmt = sc != 1.0 ? "%.2f%s" : "%.0f%s";
|
|
|
|
} else {
|
|
|
|
if (big_num)
|
|
|
|
fmt = sc != 1.0 ? "%'18.2f%s" : "%'18.0f%s";
|
|
|
|
else
|
|
|
|
fmt = sc != 1.0 ? "%18.2f%s" : "%18.0f%s";
|
|
|
|
}
|
|
|
|
|
|
|
|
aggr_printout(evsel, id, nr);
|
|
|
|
|
|
|
|
fprintf(output, fmt, avg, csv_sep);
|
|
|
|
|
|
|
|
if (evsel->unit)
|
|
|
|
fprintf(output, "%-*s%s",
|
|
|
|
csv_output ? 0 : unit_width,
|
|
|
|
evsel->unit, csv_sep);
|
|
|
|
|
|
|
|
fprintf(output, "%-*s", csv_output ? 0 : 25, perf_evsel__name(evsel));
|
|
|
|
|
|
|
|
if (evsel->cgrp)
|
|
|
|
fprintf(output, "%s%s", csv_sep, evsel->cgrp->name);
|
2015-11-03 01:50:21 +00:00
|
|
|
}
|
2015-06-03 14:25:56 +00:00
|
|
|
|
2015-11-03 01:50:21 +00:00
|
|
|
static void printout(int id, int nr, struct perf_evsel *counter, double uval)
|
|
|
|
{
|
|
|
|
int cpu = cpu_map__id_to_cpu(id);
|
|
|
|
|
|
|
|
if (stat_config.aggr_mode == AGGR_GLOBAL)
|
|
|
|
cpu = 0;
|
|
|
|
|
|
|
|
if (nsec_counter(counter))
|
|
|
|
nsec_printout(id, nr, counter, uval);
|
|
|
|
else
|
|
|
|
abs_printout(id, nr, counter, uval);
|
2015-06-03 14:25:56 +00:00
|
|
|
|
2015-11-03 01:50:21 +00:00
|
|
|
if (!csv_output && !stat_config.interval)
|
|
|
|
perf_stat__print_shadow_stats(stat_config.output, counter,
|
|
|
|
uval, cpu,
|
|
|
|
stat_config.aggr_mode);
|
2015-06-03 14:25:56 +00:00
|
|
|
}
|
|
|
|
|
2013-02-14 12:57:27 +00:00
|
|
|
static void print_aggr(char *prefix)
|
2013-02-06 14:46:02 +00:00
|
|
|
{
|
2015-07-21 12:31:24 +00:00
|
|
|
FILE *output = stat_config.output;
|
2013-02-06 14:46:02 +00:00
|
|
|
struct perf_evsel *counter;
|
perf stat: Get correct cpu id for print_aggr
print_aggr() fails to print per-core/per-socket statistics after commit
582ec0829b3d ("perf stat: Fix per-socket output bug for uncore events")
if events have differnt cpus. Because in print_aggr(), aggr_get_id needs
index (not cpu id) to find core/pkg id. Also, evsel cpu maps should be
used to get aggregated id.
Here is an example:
Counting events cycles,uncore_imc_0/cas_count_read/. (Uncore event has
cpumask 0,18)
$ perf stat -e cycles,uncore_imc_0/cas_count_read/ -C0,18 --per-core sleep 2
Without this patch, it failes to get CPU 18 result.
Performance counter stats for 'CPU(s) 0,18':
S0-C0 1 7526851 cycles
S0-C0 1 1.05 MiB uncore_imc_0/cas_count_read/
S1-C0 0 <not counted> cycles
S1-C0 0 <not counted> MiB uncore_imc_0/cas_count_read/
With this patch, it can get both CPU0 and CPU18 result.
Performance counter stats for 'CPU(s) 0,18':
S0-C0 1 6327768 cycles
S0-C0 1 0.47 MiB uncore_imc_0/cas_count_read/
S1-C0 1 330228 cycles
S1-C0 1 0.29 MiB uncore_imc_0/cas_count_read/
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixes: 582ec0829b3d ("perf stat: Fix per-socket output bug for uncore events")
Link: http://lkml.kernel.org/r/1435820925-51091-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-02 07:08:43 +00:00
|
|
|
int cpu, s, s2, id, nr;
|
2013-11-12 16:58:49 +00:00
|
|
|
double uval;
|
2013-02-06 14:46:02 +00:00
|
|
|
u64 ena, run, val;
|
|
|
|
|
2013-02-14 12:57:27 +00:00
|
|
|
if (!(aggr_map || aggr_get_id))
|
2013-02-06 14:46:02 +00:00
|
|
|
return;
|
|
|
|
|
2013-02-14 12:57:27 +00:00
|
|
|
for (s = 0; s < aggr_map->nr; s++) {
|
|
|
|
id = aggr_map->map[s];
|
2014-01-10 13:37:27 +00:00
|
|
|
evlist__for_each(evsel_list, counter) {
|
2013-02-06 14:46:02 +00:00
|
|
|
val = ena = run = 0;
|
|
|
|
nr = 0;
|
|
|
|
for (cpu = 0; cpu < perf_evsel__nr_cpus(counter); cpu++) {
|
perf stat: Get correct cpu id for print_aggr
print_aggr() fails to print per-core/per-socket statistics after commit
582ec0829b3d ("perf stat: Fix per-socket output bug for uncore events")
if events have differnt cpus. Because in print_aggr(), aggr_get_id needs
index (not cpu id) to find core/pkg id. Also, evsel cpu maps should be
used to get aggregated id.
Here is an example:
Counting events cycles,uncore_imc_0/cas_count_read/. (Uncore event has
cpumask 0,18)
$ perf stat -e cycles,uncore_imc_0/cas_count_read/ -C0,18 --per-core sleep 2
Without this patch, it failes to get CPU 18 result.
Performance counter stats for 'CPU(s) 0,18':
S0-C0 1 7526851 cycles
S0-C0 1 1.05 MiB uncore_imc_0/cas_count_read/
S1-C0 0 <not counted> cycles
S1-C0 0 <not counted> MiB uncore_imc_0/cas_count_read/
With this patch, it can get both CPU0 and CPU18 result.
Performance counter stats for 'CPU(s) 0,18':
S0-C0 1 6327768 cycles
S0-C0 1 0.47 MiB uncore_imc_0/cas_count_read/
S1-C0 1 330228 cycles
S1-C0 1 0.29 MiB uncore_imc_0/cas_count_read/
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixes: 582ec0829b3d ("perf stat: Fix per-socket output bug for uncore events")
Link: http://lkml.kernel.org/r/1435820925-51091-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-02 07:08:43 +00:00
|
|
|
s2 = aggr_get_id(perf_evsel__cpus(counter), cpu);
|
2013-02-14 12:57:27 +00:00
|
|
|
if (s2 != id)
|
2013-02-06 14:46:02 +00:00
|
|
|
continue;
|
2015-06-26 09:29:11 +00:00
|
|
|
val += perf_counts(counter->counts, cpu, 0)->val;
|
|
|
|
ena += perf_counts(counter->counts, cpu, 0)->ena;
|
|
|
|
run += perf_counts(counter->counts, cpu, 0)->run;
|
2013-02-06 14:46:02 +00:00
|
|
|
nr++;
|
|
|
|
}
|
|
|
|
if (prefix)
|
|
|
|
fprintf(output, "%s", prefix);
|
|
|
|
|
|
|
|
if (run == 0 || ena == 0) {
|
2013-07-05 17:06:45 +00:00
|
|
|
aggr_printout(counter, id, nr);
|
2013-02-14 12:57:27 +00:00
|
|
|
|
2013-11-12 16:58:49 +00:00
|
|
|
fprintf(output, "%*s%s",
|
2013-02-06 14:46:02 +00:00
|
|
|
csv_output ? 0 : 18,
|
|
|
|
counter->supported ? CNTR_NOT_COUNTED : CNTR_NOT_SUPPORTED,
|
2013-11-12 16:58:49 +00:00
|
|
|
csv_sep);
|
|
|
|
|
|
|
|
fprintf(output, "%-*s%s",
|
|
|
|
csv_output ? 0 : unit_width,
|
|
|
|
counter->unit, csv_sep);
|
|
|
|
|
|
|
|
fprintf(output, "%*s",
|
|
|
|
csv_output ? 0 : -25,
|
2013-02-06 14:46:02 +00:00
|
|
|
perf_evsel__name(counter));
|
2013-02-14 12:57:27 +00:00
|
|
|
|
2013-02-06 14:46:02 +00:00
|
|
|
if (counter->cgrp)
|
|
|
|
fprintf(output, "%s%s",
|
|
|
|
csv_sep, counter->cgrp->name);
|
|
|
|
|
2015-03-11 14:16:27 +00:00
|
|
|
print_running(run, ena);
|
2013-02-06 14:46:02 +00:00
|
|
|
fputc('\n', output);
|
|
|
|
continue;
|
|
|
|
}
|
2013-11-12 16:58:49 +00:00
|
|
|
uval = val * counter->scale;
|
2015-11-03 01:50:21 +00:00
|
|
|
printout(id, nr, counter, uval);
|
2015-03-11 14:16:27 +00:00
|
|
|
if (!csv_output)
|
2013-02-06 14:46:02 +00:00
|
|
|
print_noise(counter, 1.0);
|
|
|
|
|
2015-03-11 14:16:27 +00:00
|
|
|
print_running(run, ena);
|
2013-02-06 14:46:02 +00:00
|
|
|
fputc('\n', output);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
perf stat: Introduce --per-thread option
Currently all the -p option PID arguments tasks values get aggregated
and printed as single values.
Adding --per-tasks option to print values per task.
$ perf stat -e cycles,instructions --per-thread -p 30190,30242
^C
Performance counter stats for process id '30190,30242':
cat-30190 0 cycles
yes-30242 3,842,525,421 cycles
cat-30190 0 instructions
yes-30242 10,370,817,010 instructions
1.143155657 seconds time elapsed
Also works under interval mode:
$ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000
# time comm-pid counts unit events
1.000073435 cat-30190 89,058 cycles
1.000073435 yes-30242 3,360,786,902 cycles (100.00%)
1.000073435 cat-30190 14,066 instructions
1.000073435 yes-30242 9,069,937,462 instructions
2.000204830 cat-30190 0 cycles
2.000204830 yes-30242 3,351,667,626 cycles
2.000204830 cat-30190 0 instructions
2.000204830 yes-30242 9,045,796,885 instructions
^C 2.771286639 cat-30190 0 cycles
2.771286639 yes-30242 2,593,884,166 cycles
2.771286639 cat-30190 0 instructions
2.771286639 yes-30242 7,001,171,191 instructions
It works only with -t and -p options, otherwise following error is
printed:
$ perf stat -e cycles --per-thread -I 1000 ls
The --per-thread option is only available when monitoring via -p -t options.
-p, --pid <pid> stat events on existing process id
-t, --tid <tid> stat events on existing thread id
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 09:29:27 +00:00
|
|
|
static void print_aggr_thread(struct perf_evsel *counter, char *prefix)
|
|
|
|
{
|
2015-07-21 12:31:24 +00:00
|
|
|
FILE *output = stat_config.output;
|
perf stat: Introduce --per-thread option
Currently all the -p option PID arguments tasks values get aggregated
and printed as single values.
Adding --per-tasks option to print values per task.
$ perf stat -e cycles,instructions --per-thread -p 30190,30242
^C
Performance counter stats for process id '30190,30242':
cat-30190 0 cycles
yes-30242 3,842,525,421 cycles
cat-30190 0 instructions
yes-30242 10,370,817,010 instructions
1.143155657 seconds time elapsed
Also works under interval mode:
$ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000
# time comm-pid counts unit events
1.000073435 cat-30190 89,058 cycles
1.000073435 yes-30242 3,360,786,902 cycles (100.00%)
1.000073435 cat-30190 14,066 instructions
1.000073435 yes-30242 9,069,937,462 instructions
2.000204830 cat-30190 0 cycles
2.000204830 yes-30242 3,351,667,626 cycles
2.000204830 cat-30190 0 instructions
2.000204830 yes-30242 9,045,796,885 instructions
^C 2.771286639 cat-30190 0 cycles
2.771286639 yes-30242 2,593,884,166 cycles
2.771286639 cat-30190 0 instructions
2.771286639 yes-30242 7,001,171,191 instructions
It works only with -t and -p options, otherwise following error is
printed:
$ perf stat -e cycles --per-thread -I 1000 ls
The --per-thread option is only available when monitoring via -p -t options.
-p, --pid <pid> stat events on existing process id
-t, --tid <tid> stat events on existing thread id
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 09:29:27 +00:00
|
|
|
int nthreads = thread_map__nr(counter->threads);
|
|
|
|
int ncpus = cpu_map__nr(counter->cpus);
|
|
|
|
int cpu, thread;
|
|
|
|
double uval;
|
|
|
|
|
|
|
|
for (thread = 0; thread < nthreads; thread++) {
|
|
|
|
u64 ena = 0, run = 0, val = 0;
|
|
|
|
|
|
|
|
for (cpu = 0; cpu < ncpus; cpu++) {
|
|
|
|
val += perf_counts(counter->counts, cpu, thread)->val;
|
|
|
|
ena += perf_counts(counter->counts, cpu, thread)->ena;
|
|
|
|
run += perf_counts(counter->counts, cpu, thread)->run;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (prefix)
|
|
|
|
fprintf(output, "%s", prefix);
|
|
|
|
|
|
|
|
uval = val * counter->scale;
|
2015-11-03 01:50:21 +00:00
|
|
|
printout(thread, 0, counter, uval);
|
perf stat: Introduce --per-thread option
Currently all the -p option PID arguments tasks values get aggregated
and printed as single values.
Adding --per-tasks option to print values per task.
$ perf stat -e cycles,instructions --per-thread -p 30190,30242
^C
Performance counter stats for process id '30190,30242':
cat-30190 0 cycles
yes-30242 3,842,525,421 cycles
cat-30190 0 instructions
yes-30242 10,370,817,010 instructions
1.143155657 seconds time elapsed
Also works under interval mode:
$ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000
# time comm-pid counts unit events
1.000073435 cat-30190 89,058 cycles
1.000073435 yes-30242 3,360,786,902 cycles (100.00%)
1.000073435 cat-30190 14,066 instructions
1.000073435 yes-30242 9,069,937,462 instructions
2.000204830 cat-30190 0 cycles
2.000204830 yes-30242 3,351,667,626 cycles
2.000204830 cat-30190 0 instructions
2.000204830 yes-30242 9,045,796,885 instructions
^C 2.771286639 cat-30190 0 cycles
2.771286639 yes-30242 2,593,884,166 cycles
2.771286639 cat-30190 0 instructions
2.771286639 yes-30242 7,001,171,191 instructions
It works only with -t and -p options, otherwise following error is
printed:
$ perf stat -e cycles --per-thread -I 1000 ls
The --per-thread option is only available when monitoring via -p -t options.
-p, --pid <pid> stat events on existing process id
-t, --tid <tid> stat events on existing thread id
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 09:29:27 +00:00
|
|
|
|
|
|
|
if (!csv_output)
|
|
|
|
print_noise(counter, 1.0);
|
|
|
|
|
|
|
|
print_running(run, ena);
|
|
|
|
fputc('\n', output);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2009-05-29 07:10:54 +00:00
|
|
|
/*
|
|
|
|
* Print out the results of a single counter:
|
2010-11-16 09:05:01 +00:00
|
|
|
* aggregated counts in system-wide mode
|
2009-05-29 07:10:54 +00:00
|
|
|
*/
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
static void print_counter_aggr(struct perf_evsel *counter, char *prefix)
|
2009-05-29 07:10:54 +00:00
|
|
|
{
|
2015-07-21 12:31:24 +00:00
|
|
|
FILE *output = stat_config.output;
|
2015-10-16 10:41:03 +00:00
|
|
|
struct perf_stat_evsel *ps = counter->priv;
|
2011-01-03 18:39:04 +00:00
|
|
|
double avg = avg_stats(&ps->res_stats[0]);
|
2011-01-03 19:45:52 +00:00
|
|
|
int scaled = counter->counts->scaled;
|
2013-11-12 16:58:49 +00:00
|
|
|
double uval;
|
2015-03-11 14:16:27 +00:00
|
|
|
double avg_enabled, avg_running;
|
|
|
|
|
|
|
|
avg_enabled = avg_stats(&ps->res_stats[1]);
|
|
|
|
avg_running = avg_stats(&ps->res_stats[2]);
|
2009-05-29 07:10:54 +00:00
|
|
|
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
if (prefix)
|
|
|
|
fprintf(output, "%s", prefix);
|
|
|
|
|
2015-02-13 18:40:58 +00:00
|
|
|
if (scaled == -1 || !counter->supported) {
|
2013-11-12 16:58:49 +00:00
|
|
|
fprintf(output, "%*s%s",
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
csv_output ? 0 : 18,
|
2011-05-30 14:55:59 +00:00
|
|
|
counter->supported ? CNTR_NOT_COUNTED : CNTR_NOT_SUPPORTED,
|
2013-11-12 16:58:49 +00:00
|
|
|
csv_sep);
|
|
|
|
fprintf(output, "%-*s%s",
|
|
|
|
csv_output ? 0 : unit_width,
|
|
|
|
counter->unit, csv_sep);
|
|
|
|
fprintf(output, "%*s",
|
|
|
|
csv_output ? 0 : -25,
|
2012-06-12 15:34:58 +00:00
|
|
|
perf_evsel__name(counter));
|
perf tool: Add cgroup support
This patch adds the ability to filter monitoring based on container groups
(cgroups) for both perf stat and perf record. It is possible to monitor
multiple cgroup in parallel. There is one cgroup per event. The cgroups to
monitor are passed via a new -G option followed by a comma separated list of
cgroup names.
The cgroup filesystem has to be mounted. Given a cgroup name, the perf tool
finds the corresponding directory in the cgroup filesystem and opens it. It
then passes that file descriptor to the kernel.
Example:
$ perf stat -B -a -e cycles:u,cycles:u,cycles:u -G test1,,test2 -- sleep 1
Performance counter stats for 'sleep 1':
2,368,667,414 cycles test1
2,369,661,459 cycles
<not counted> cycles test2
1.001856890 seconds time elapsed
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4d590290.825bdf0a.7d0a.4890@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 09:20:01 +00:00
|
|
|
|
|
|
|
if (counter->cgrp)
|
2011-08-15 20:22:33 +00:00
|
|
|
fprintf(output, "%s%s", csv_sep, counter->cgrp->name);
|
perf tool: Add cgroup support
This patch adds the ability to filter monitoring based on container groups
(cgroups) for both perf stat and perf record. It is possible to monitor
multiple cgroup in parallel. There is one cgroup per event. The cgroups to
monitor are passed via a new -G option followed by a comma separated list of
cgroup names.
The cgroup filesystem has to be mounted. Given a cgroup name, the perf tool
finds the corresponding directory in the cgroup filesystem and opens it. It
then passes that file descriptor to the kernel.
Example:
$ perf stat -B -a -e cycles:u,cycles:u,cycles:u -G test1,,test2 -- sleep 1
Performance counter stats for 'sleep 1':
2,368,667,414 cycles test1
2,369,661,459 cycles
<not counted> cycles test2
1.001856890 seconds time elapsed
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4d590290.825bdf0a.7d0a.4890@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 09:20:01 +00:00
|
|
|
|
2015-03-11 14:16:27 +00:00
|
|
|
print_running(avg_running, avg_enabled);
|
2011-08-15 20:22:33 +00:00
|
|
|
fputc('\n', output);
|
2009-05-29 07:10:54 +00:00
|
|
|
return;
|
|
|
|
}
|
2009-05-29 07:10:54 +00:00
|
|
|
|
2013-11-12 16:58:49 +00:00
|
|
|
uval = avg * counter->scale;
|
2015-11-03 01:50:21 +00:00
|
|
|
printout(-1, 0, counter, uval);
|
2009-09-04 16:23:38 +00:00
|
|
|
|
perf stat: Add noise output for csv mode
Previously, when you want perf-stat to output the statistics in
csv mode, no information of the noise will be printed out.
For example right now we output this --repeat information:
./perf stat -r3 -x, sleep 1
1.164789,task-clock
8,context-switches
0,CPU-migrations
219,page-faults
3337800,cycles
With this patch, the output will be appended with an additional
entry for the noise value:
./perf stat -r3 -x, sleep 1
1.164789,task-clock,3.75%
8,context-switches,75.00%
0,CPU-migrations,100.00%
219,page-faults,0.00%
3337800,cycles,3.36%
Signed-off-by: Zhengyu He <zhengyuh@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Stephane Eranian <eranian@google.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1308861942-4945-1-git-send-email-zhengyuh@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-06-23 20:45:42 +00:00
|
|
|
print_noise(counter, avg);
|
|
|
|
|
2015-03-11 14:16:27 +00:00
|
|
|
print_running(avg_running, avg_enabled);
|
2011-08-15 20:22:33 +00:00
|
|
|
fprintf(output, "\n");
|
2009-05-29 07:10:54 +00:00
|
|
|
}
|
|
|
|
|
2010-11-16 09:05:01 +00:00
|
|
|
/*
|
|
|
|
* Print out the results of a single counter:
|
|
|
|
* does not use aggregated count in system-wide
|
|
|
|
*/
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
static void print_counter(struct perf_evsel *counter, char *prefix)
|
2010-11-16 09:05:01 +00:00
|
|
|
{
|
2015-07-21 12:31:24 +00:00
|
|
|
FILE *output = stat_config.output;
|
2010-11-16 09:05:01 +00:00
|
|
|
u64 ena, run, val;
|
2013-11-12 16:58:49 +00:00
|
|
|
double uval;
|
2010-11-16 09:05:01 +00:00
|
|
|
int cpu;
|
|
|
|
|
2012-09-10 07:53:50 +00:00
|
|
|
for (cpu = 0; cpu < perf_evsel__nr_cpus(counter); cpu++) {
|
2015-06-26 09:29:11 +00:00
|
|
|
val = perf_counts(counter->counts, cpu, 0)->val;
|
|
|
|
ena = perf_counts(counter->counts, cpu, 0)->ena;
|
|
|
|
run = perf_counts(counter->counts, cpu, 0)->run;
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
|
|
|
|
if (prefix)
|
|
|
|
fprintf(output, "%s", prefix);
|
|
|
|
|
2010-11-16 09:05:01 +00:00
|
|
|
if (run == 0 || ena == 0) {
|
2013-11-12 16:58:49 +00:00
|
|
|
fprintf(output, "CPU%*d%s%*s%s",
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
csv_output ? 0 : -4,
|
2012-09-10 07:53:50 +00:00
|
|
|
perf_evsel__cpus(counter)->map[cpu], csv_sep,
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
csv_output ? 0 : 18,
|
2011-05-30 14:55:59 +00:00
|
|
|
counter->supported ? CNTR_NOT_COUNTED : CNTR_NOT_SUPPORTED,
|
2013-11-12 16:58:49 +00:00
|
|
|
csv_sep);
|
|
|
|
|
|
|
|
fprintf(output, "%-*s%s",
|
|
|
|
csv_output ? 0 : unit_width,
|
|
|
|
counter->unit, csv_sep);
|
|
|
|
|
|
|
|
fprintf(output, "%*s",
|
|
|
|
csv_output ? 0 : -25,
|
|
|
|
perf_evsel__name(counter));
|
2010-11-16 09:05:01 +00:00
|
|
|
|
perf tool: Add cgroup support
This patch adds the ability to filter monitoring based on container groups
(cgroups) for both perf stat and perf record. It is possible to monitor
multiple cgroup in parallel. There is one cgroup per event. The cgroups to
monitor are passed via a new -G option followed by a comma separated list of
cgroup names.
The cgroup filesystem has to be mounted. Given a cgroup name, the perf tool
finds the corresponding directory in the cgroup filesystem and opens it. It
then passes that file descriptor to the kernel.
Example:
$ perf stat -B -a -e cycles:u,cycles:u,cycles:u -G test1,,test2 -- sleep 1
Performance counter stats for 'sleep 1':
2,368,667,414 cycles test1
2,369,661,459 cycles
<not counted> cycles test2
1.001856890 seconds time elapsed
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4d590290.825bdf0a.7d0a.4890@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 09:20:01 +00:00
|
|
|
if (counter->cgrp)
|
2011-08-15 20:22:33 +00:00
|
|
|
fprintf(output, "%s%s",
|
|
|
|
csv_sep, counter->cgrp->name);
|
perf tool: Add cgroup support
This patch adds the ability to filter monitoring based on container groups
(cgroups) for both perf stat and perf record. It is possible to monitor
multiple cgroup in parallel. There is one cgroup per event. The cgroups to
monitor are passed via a new -G option followed by a comma separated list of
cgroup names.
The cgroup filesystem has to be mounted. Given a cgroup name, the perf tool
finds the corresponding directory in the cgroup filesystem and opens it. It
then passes that file descriptor to the kernel.
Example:
$ perf stat -B -a -e cycles:u,cycles:u,cycles:u -G test1,,test2 -- sleep 1
Performance counter stats for 'sleep 1':
2,368,667,414 cycles test1
2,369,661,459 cycles
<not counted> cycles test2
1.001856890 seconds time elapsed
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4d590290.825bdf0a.7d0a.4890@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 09:20:01 +00:00
|
|
|
|
2015-03-11 14:16:27 +00:00
|
|
|
print_running(run, ena);
|
2011-08-15 20:22:33 +00:00
|
|
|
fputc('\n', output);
|
2010-11-16 09:05:01 +00:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2013-11-12 16:58:49 +00:00
|
|
|
uval = val * counter->scale;
|
2015-11-03 01:50:21 +00:00
|
|
|
printout(cpu, 0, counter, uval);
|
2015-03-11 14:16:27 +00:00
|
|
|
if (!csv_output)
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
print_noise(counter, 1.0);
|
2015-03-11 14:16:27 +00:00
|
|
|
print_running(run, ena);
|
2010-11-16 09:05:01 +00:00
|
|
|
|
2011-08-15 20:22:33 +00:00
|
|
|
fputc('\n', output);
|
2010-11-16 09:05:01 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-06-26 09:29:26 +00:00
|
|
|
static void print_interval(char *prefix, struct timespec *ts)
|
|
|
|
{
|
2015-07-21 12:31:24 +00:00
|
|
|
FILE *output = stat_config.output;
|
2015-06-26 09:29:26 +00:00
|
|
|
static int num_print_interval;
|
|
|
|
|
|
|
|
sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep);
|
|
|
|
|
|
|
|
if (num_print_interval == 0 && !csv_output) {
|
2015-07-21 12:31:22 +00:00
|
|
|
switch (stat_config.aggr_mode) {
|
2015-06-26 09:29:26 +00:00
|
|
|
case AGGR_SOCKET:
|
|
|
|
fprintf(output, "# time socket cpus counts %*s events\n", unit_width, "unit");
|
|
|
|
break;
|
|
|
|
case AGGR_CORE:
|
|
|
|
fprintf(output, "# time core cpus counts %*s events\n", unit_width, "unit");
|
|
|
|
break;
|
|
|
|
case AGGR_NONE:
|
|
|
|
fprintf(output, "# time CPU counts %*s events\n", unit_width, "unit");
|
|
|
|
break;
|
perf stat: Introduce --per-thread option
Currently all the -p option PID arguments tasks values get aggregated
and printed as single values.
Adding --per-tasks option to print values per task.
$ perf stat -e cycles,instructions --per-thread -p 30190,30242
^C
Performance counter stats for process id '30190,30242':
cat-30190 0 cycles
yes-30242 3,842,525,421 cycles
cat-30190 0 instructions
yes-30242 10,370,817,010 instructions
1.143155657 seconds time elapsed
Also works under interval mode:
$ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000
# time comm-pid counts unit events
1.000073435 cat-30190 89,058 cycles
1.000073435 yes-30242 3,360,786,902 cycles (100.00%)
1.000073435 cat-30190 14,066 instructions
1.000073435 yes-30242 9,069,937,462 instructions
2.000204830 cat-30190 0 cycles
2.000204830 yes-30242 3,351,667,626 cycles
2.000204830 cat-30190 0 instructions
2.000204830 yes-30242 9,045,796,885 instructions
^C 2.771286639 cat-30190 0 cycles
2.771286639 yes-30242 2,593,884,166 cycles
2.771286639 cat-30190 0 instructions
2.771286639 yes-30242 7,001,171,191 instructions
It works only with -t and -p options, otherwise following error is
printed:
$ perf stat -e cycles --per-thread -I 1000 ls
The --per-thread option is only available when monitoring via -p -t options.
-p, --pid <pid> stat events on existing process id
-t, --tid <tid> stat events on existing thread id
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 09:29:27 +00:00
|
|
|
case AGGR_THREAD:
|
|
|
|
fprintf(output, "# time comm-pid counts %*s events\n", unit_width, "unit");
|
|
|
|
break;
|
2015-06-26 09:29:26 +00:00
|
|
|
case AGGR_GLOBAL:
|
|
|
|
default:
|
|
|
|
fprintf(output, "# time counts %*s events\n", unit_width, "unit");
|
2015-10-16 10:41:04 +00:00
|
|
|
case AGGR_UNSET:
|
|
|
|
break;
|
2015-06-26 09:29:26 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (++num_print_interval == 25)
|
|
|
|
num_print_interval = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void print_header(int argc, const char **argv)
|
2009-06-13 12:57:28 +00:00
|
|
|
{
|
2015-07-21 12:31:24 +00:00
|
|
|
FILE *output = stat_config.output;
|
2011-01-03 18:39:04 +00:00
|
|
|
int i;
|
2009-06-13 12:57:28 +00:00
|
|
|
|
2009-04-20 13:37:32 +00:00
|
|
|
fflush(stdout);
|
|
|
|
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
if (!csv_output) {
|
2011-08-15 20:22:33 +00:00
|
|
|
fprintf(output, "\n");
|
|
|
|
fprintf(output, " Performance counter stats for ");
|
2013-09-28 20:27:58 +00:00
|
|
|
if (target.system_wide)
|
|
|
|
fprintf(output, "\'system wide");
|
|
|
|
else if (target.cpu_list)
|
|
|
|
fprintf(output, "\'CPU(s) %s", target.cpu_list);
|
2013-11-12 19:46:16 +00:00
|
|
|
else if (!target__has_task(&target)) {
|
2011-08-15 20:22:33 +00:00
|
|
|
fprintf(output, "\'%s", argv[0]);
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
for (i = 1; i < argc; i++)
|
2011-08-15 20:22:33 +00:00
|
|
|
fprintf(output, " %s", argv[i]);
|
2012-04-26 05:15:16 +00:00
|
|
|
} else if (target.pid)
|
|
|
|
fprintf(output, "process id \'%s", target.pid);
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
else
|
2012-04-26 05:15:16 +00:00
|
|
|
fprintf(output, "thread id \'%s", target.tid);
|
2009-06-03 17:36:07 +00:00
|
|
|
|
2011-08-15 20:22:33 +00:00
|
|
|
fprintf(output, "\'");
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
if (run_count > 1)
|
2011-08-15 20:22:33 +00:00
|
|
|
fprintf(output, " (%d runs)", run_count);
|
|
|
|
fprintf(output, ":\n\n");
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
}
|
2015-06-26 09:29:26 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void print_footer(void)
|
|
|
|
{
|
2015-07-21 12:31:24 +00:00
|
|
|
FILE *output = stat_config.output;
|
|
|
|
|
2015-06-26 09:29:26 +00:00
|
|
|
if (!null_run)
|
|
|
|
fprintf(output, "\n");
|
|
|
|
fprintf(output, " %17.9f seconds time elapsed",
|
|
|
|
avg_stats(&walltime_nsecs_stats)/1e9);
|
|
|
|
if (run_count > 1) {
|
|
|
|
fprintf(output, " ");
|
|
|
|
print_noise_pct(stddev_stats(&walltime_nsecs_stats),
|
|
|
|
avg_stats(&walltime_nsecs_stats));
|
|
|
|
}
|
|
|
|
fprintf(output, "\n\n");
|
|
|
|
}
|
|
|
|
|
|
|
|
static void print_counters(struct timespec *ts, int argc, const char **argv)
|
|
|
|
{
|
2015-07-21 12:31:25 +00:00
|
|
|
int interval = stat_config.interval;
|
2015-06-26 09:29:26 +00:00
|
|
|
struct perf_evsel *counter;
|
|
|
|
char buf[64], *prefix = NULL;
|
|
|
|
|
|
|
|
if (interval)
|
|
|
|
print_interval(prefix = buf, ts);
|
|
|
|
else
|
|
|
|
print_header(argc, argv);
|
2009-05-29 07:10:54 +00:00
|
|
|
|
2015-07-21 12:31:22 +00:00
|
|
|
switch (stat_config.aggr_mode) {
|
2013-02-14 12:57:29 +00:00
|
|
|
case AGGR_CORE:
|
2013-02-14 12:57:27 +00:00
|
|
|
case AGGR_SOCKET:
|
2015-06-26 09:29:26 +00:00
|
|
|
print_aggr(prefix);
|
2013-02-14 12:57:27 +00:00
|
|
|
break;
|
perf stat: Introduce --per-thread option
Currently all the -p option PID arguments tasks values get aggregated
and printed as single values.
Adding --per-tasks option to print values per task.
$ perf stat -e cycles,instructions --per-thread -p 30190,30242
^C
Performance counter stats for process id '30190,30242':
cat-30190 0 cycles
yes-30242 3,842,525,421 cycles
cat-30190 0 instructions
yes-30242 10,370,817,010 instructions
1.143155657 seconds time elapsed
Also works under interval mode:
$ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000
# time comm-pid counts unit events
1.000073435 cat-30190 89,058 cycles
1.000073435 yes-30242 3,360,786,902 cycles (100.00%)
1.000073435 cat-30190 14,066 instructions
1.000073435 yes-30242 9,069,937,462 instructions
2.000204830 cat-30190 0 cycles
2.000204830 yes-30242 3,351,667,626 cycles
2.000204830 cat-30190 0 instructions
2.000204830 yes-30242 9,045,796,885 instructions
^C 2.771286639 cat-30190 0 cycles
2.771286639 yes-30242 2,593,884,166 cycles
2.771286639 cat-30190 0 instructions
2.771286639 yes-30242 7,001,171,191 instructions
It works only with -t and -p options, otherwise following error is
printed:
$ perf stat -e cycles --per-thread -I 1000 ls
The --per-thread option is only available when monitoring via -p -t options.
-p, --pid <pid> stat events on existing process id
-t, --tid <tid> stat events on existing thread id
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 09:29:27 +00:00
|
|
|
case AGGR_THREAD:
|
|
|
|
evlist__for_each(evsel_list, counter)
|
|
|
|
print_aggr_thread(counter, prefix);
|
|
|
|
break;
|
2013-02-14 12:57:27 +00:00
|
|
|
case AGGR_GLOBAL:
|
2014-01-10 13:37:27 +00:00
|
|
|
evlist__for_each(evsel_list, counter)
|
2015-06-26 09:29:26 +00:00
|
|
|
print_counter_aggr(counter, prefix);
|
2013-02-14 12:57:27 +00:00
|
|
|
break;
|
|
|
|
case AGGR_NONE:
|
2014-01-10 13:37:27 +00:00
|
|
|
evlist__for_each(evsel_list, counter)
|
2015-06-26 09:29:26 +00:00
|
|
|
print_counter(counter, prefix);
|
2013-02-14 12:57:27 +00:00
|
|
|
break;
|
2015-10-16 10:41:04 +00:00
|
|
|
case AGGR_UNSET:
|
2013-02-14 12:57:27 +00:00
|
|
|
default:
|
|
|
|
break;
|
2010-11-16 09:05:01 +00:00
|
|
|
}
|
2009-04-20 13:37:32 +00:00
|
|
|
|
2015-06-26 09:29:26 +00:00
|
|
|
if (!interval && !csv_output)
|
|
|
|
print_footer();
|
|
|
|
|
2015-07-21 12:31:24 +00:00
|
|
|
fflush(stat_config.output);
|
2009-04-20 13:37:32 +00:00
|
|
|
}
|
|
|
|
|
2009-06-10 13:55:59 +00:00
|
|
|
static volatile int signr = -1;
|
|
|
|
|
2009-05-26 07:17:18 +00:00
|
|
|
static void skip_signal(int signo)
|
2009-04-20 13:37:32 +00:00
|
|
|
{
|
2015-07-21 12:31:25 +00:00
|
|
|
if ((child_pid == -1) || stat_config.interval)
|
2009-12-31 08:05:50 +00:00
|
|
|
done = 1;
|
|
|
|
|
2009-06-10 13:55:59 +00:00
|
|
|
signr = signo;
|
2013-06-04 15:44:26 +00:00
|
|
|
/*
|
|
|
|
* render child_pid harmless
|
|
|
|
* won't send SIGTERM to a random
|
|
|
|
* process in case of race condition
|
|
|
|
* and fast PID recycling
|
|
|
|
*/
|
|
|
|
child_pid = -1;
|
2009-06-10 13:55:59 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void sig_atexit(void)
|
|
|
|
{
|
2013-06-04 15:44:26 +00:00
|
|
|
sigset_t set, oset;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* avoid race condition with SIGCHLD handler
|
|
|
|
* in skip_signal() which is modifying child_pid
|
|
|
|
* goal is to avoid send SIGTERM to a random
|
|
|
|
* process
|
|
|
|
*/
|
|
|
|
sigemptyset(&set);
|
|
|
|
sigaddset(&set, SIGCHLD);
|
|
|
|
sigprocmask(SIG_BLOCK, &set, &oset);
|
|
|
|
|
2009-10-04 00:35:01 +00:00
|
|
|
if (child_pid != -1)
|
|
|
|
kill(child_pid, SIGTERM);
|
|
|
|
|
2013-06-04 15:44:26 +00:00
|
|
|
sigprocmask(SIG_SETMASK, &oset, NULL);
|
|
|
|
|
2009-06-10 13:55:59 +00:00
|
|
|
if (signr == -1)
|
|
|
|
return;
|
|
|
|
|
|
|
|
signal(signr, SIG_DFL);
|
|
|
|
kill(getpid(), signr);
|
2009-05-26 07:17:18 +00:00
|
|
|
}
|
|
|
|
|
2012-09-10 22:15:03 +00:00
|
|
|
static int stat__set_big_num(const struct option *opt __maybe_unused,
|
|
|
|
const char *s __maybe_unused, int unset)
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
{
|
|
|
|
big_num_opt = unset ? 0 : 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-11-05 14:40:45 +00:00
|
|
|
static const struct option stat_options[] = {
|
|
|
|
OPT_BOOLEAN('T', "transaction", &transaction_run,
|
|
|
|
"hardware transaction statistics"),
|
|
|
|
OPT_CALLBACK('e', "event", &evsel_list, "event",
|
|
|
|
"event selector. use 'perf list' to list available events",
|
|
|
|
parse_events_option),
|
|
|
|
OPT_CALLBACK(0, "filter", &evsel_list, "filter",
|
|
|
|
"event filter", parse_filter),
|
|
|
|
OPT_BOOLEAN('i', "no-inherit", &no_inherit,
|
|
|
|
"child tasks do not inherit counters"),
|
|
|
|
OPT_STRING('p', "pid", &target.pid, "pid",
|
|
|
|
"stat events on existing process id"),
|
|
|
|
OPT_STRING('t', "tid", &target.tid, "tid",
|
|
|
|
"stat events on existing thread id"),
|
|
|
|
OPT_BOOLEAN('a', "all-cpus", &target.system_wide,
|
|
|
|
"system-wide collection from all CPUs"),
|
|
|
|
OPT_BOOLEAN('g', "group", &group,
|
|
|
|
"put the counters into a counter group"),
|
|
|
|
OPT_BOOLEAN('c', "scale", &stat_config.scale, "scale/normalize counters"),
|
|
|
|
OPT_INCR('v', "verbose", &verbose,
|
|
|
|
"be more verbose (show counter open errors, etc)"),
|
|
|
|
OPT_INTEGER('r', "repeat", &run_count,
|
|
|
|
"repeat command and print average + stddev (max: 100, forever: 0)"),
|
|
|
|
OPT_BOOLEAN('n', "null", &null_run,
|
|
|
|
"null run - dont start any counters"),
|
|
|
|
OPT_INCR('d', "detailed", &detailed_run,
|
|
|
|
"detailed run - start a lot of events"),
|
|
|
|
OPT_BOOLEAN('S', "sync", &sync_run,
|
|
|
|
"call sync() before starting a run"),
|
|
|
|
OPT_CALLBACK_NOOPT('B', "big-num", NULL, NULL,
|
|
|
|
"print large numbers with thousands\' separators",
|
|
|
|
stat__set_big_num),
|
|
|
|
OPT_STRING('C', "cpu", &target.cpu_list, "cpu",
|
|
|
|
"list of cpus to monitor in system-wide"),
|
|
|
|
OPT_SET_UINT('A', "no-aggr", &stat_config.aggr_mode,
|
|
|
|
"disable CPU count aggregation", AGGR_NONE),
|
|
|
|
OPT_STRING('x', "field-separator", &csv_sep, "separator",
|
|
|
|
"print counts with custom separator"),
|
|
|
|
OPT_CALLBACK('G', "cgroup", &evsel_list, "name",
|
|
|
|
"monitor event in cgroup name only", parse_cgroups),
|
|
|
|
OPT_STRING('o', "output", &output_name, "file", "output file name"),
|
|
|
|
OPT_BOOLEAN(0, "append", &append_file, "append to the output file"),
|
|
|
|
OPT_INTEGER(0, "log-fd", &output_fd,
|
|
|
|
"log output to fd, instead of stderr"),
|
|
|
|
OPT_STRING(0, "pre", &pre_cmd, "command",
|
|
|
|
"command to run prior to the measured command"),
|
|
|
|
OPT_STRING(0, "post", &post_cmd, "command",
|
|
|
|
"command to run after to the measured command"),
|
|
|
|
OPT_UINTEGER('I', "interval-print", &stat_config.interval,
|
|
|
|
"print counts at regular interval in ms (>= 10)"),
|
|
|
|
OPT_SET_UINT(0, "per-socket", &stat_config.aggr_mode,
|
|
|
|
"aggregate counts per processor socket", AGGR_SOCKET),
|
|
|
|
OPT_SET_UINT(0, "per-core", &stat_config.aggr_mode,
|
|
|
|
"aggregate counts per physical processor core", AGGR_CORE),
|
|
|
|
OPT_SET_UINT(0, "per-thread", &stat_config.aggr_mode,
|
|
|
|
"aggregate counts per thread", AGGR_THREAD),
|
|
|
|
OPT_UINTEGER('D', "delay", &initial_delay,
|
|
|
|
"ms to wait before starting measurement after program start"),
|
|
|
|
OPT_END()
|
|
|
|
};
|
|
|
|
|
2015-10-16 10:41:15 +00:00
|
|
|
static int perf_stat__get_socket(struct cpu_map *map, int cpu)
|
|
|
|
{
|
|
|
|
return cpu_map__get_socket(map, cpu, NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int perf_stat__get_core(struct cpu_map *map, int cpu)
|
|
|
|
{
|
|
|
|
return cpu_map__get_core(map, cpu, NULL);
|
|
|
|
}
|
|
|
|
|
2015-10-25 14:51:18 +00:00
|
|
|
static int cpu_map__get_max(struct cpu_map *map)
|
|
|
|
{
|
|
|
|
int i, max = -1;
|
|
|
|
|
|
|
|
for (i = 0; i < map->nr; i++) {
|
|
|
|
if (map->map[i] > max)
|
|
|
|
max = map->map[i];
|
|
|
|
}
|
|
|
|
|
|
|
|
return max;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct cpu_map *cpus_aggr_map;
|
|
|
|
|
|
|
|
static int perf_stat__get_aggr(aggr_get_id_t get_id, struct cpu_map *map, int idx)
|
|
|
|
{
|
|
|
|
int cpu;
|
|
|
|
|
|
|
|
if (idx >= map->nr)
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
cpu = map->map[idx];
|
|
|
|
|
|
|
|
if (cpus_aggr_map->map[cpu] == -1)
|
|
|
|
cpus_aggr_map->map[cpu] = get_id(map, idx);
|
|
|
|
|
|
|
|
return cpus_aggr_map->map[cpu];
|
|
|
|
}
|
|
|
|
|
|
|
|
static int perf_stat__get_socket_cached(struct cpu_map *map, int idx)
|
|
|
|
{
|
|
|
|
return perf_stat__get_aggr(perf_stat__get_socket, map, idx);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int perf_stat__get_core_cached(struct cpu_map *map, int idx)
|
|
|
|
{
|
|
|
|
return perf_stat__get_aggr(perf_stat__get_core, map, idx);
|
|
|
|
}
|
|
|
|
|
2013-02-14 12:57:27 +00:00
|
|
|
static int perf_stat_init_aggr_mode(void)
|
|
|
|
{
|
2015-10-25 14:51:18 +00:00
|
|
|
int nr;
|
|
|
|
|
2015-07-21 12:31:22 +00:00
|
|
|
switch (stat_config.aggr_mode) {
|
2013-02-14 12:57:27 +00:00
|
|
|
case AGGR_SOCKET:
|
|
|
|
if (cpu_map__build_socket_map(evsel_list->cpus, &aggr_map)) {
|
|
|
|
perror("cannot build socket map");
|
|
|
|
return -1;
|
|
|
|
}
|
2015-10-25 14:51:18 +00:00
|
|
|
aggr_get_id = perf_stat__get_socket_cached;
|
2013-02-14 12:57:27 +00:00
|
|
|
break;
|
2013-02-14 12:57:29 +00:00
|
|
|
case AGGR_CORE:
|
|
|
|
if (cpu_map__build_core_map(evsel_list->cpus, &aggr_map)) {
|
|
|
|
perror("cannot build core map");
|
|
|
|
return -1;
|
|
|
|
}
|
2015-10-25 14:51:18 +00:00
|
|
|
aggr_get_id = perf_stat__get_core_cached;
|
2013-02-14 12:57:29 +00:00
|
|
|
break;
|
2013-02-14 12:57:27 +00:00
|
|
|
case AGGR_NONE:
|
|
|
|
case AGGR_GLOBAL:
|
perf stat: Introduce --per-thread option
Currently all the -p option PID arguments tasks values get aggregated
and printed as single values.
Adding --per-tasks option to print values per task.
$ perf stat -e cycles,instructions --per-thread -p 30190,30242
^C
Performance counter stats for process id '30190,30242':
cat-30190 0 cycles
yes-30242 3,842,525,421 cycles
cat-30190 0 instructions
yes-30242 10,370,817,010 instructions
1.143155657 seconds time elapsed
Also works under interval mode:
$ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000
# time comm-pid counts unit events
1.000073435 cat-30190 89,058 cycles
1.000073435 yes-30242 3,360,786,902 cycles (100.00%)
1.000073435 cat-30190 14,066 instructions
1.000073435 yes-30242 9,069,937,462 instructions
2.000204830 cat-30190 0 cycles
2.000204830 yes-30242 3,351,667,626 cycles
2.000204830 cat-30190 0 instructions
2.000204830 yes-30242 9,045,796,885 instructions
^C 2.771286639 cat-30190 0 cycles
2.771286639 yes-30242 2,593,884,166 cycles
2.771286639 cat-30190 0 instructions
2.771286639 yes-30242 7,001,171,191 instructions
It works only with -t and -p options, otherwise following error is
printed:
$ perf stat -e cycles --per-thread -I 1000 ls
The --per-thread option is only available when monitoring via -p -t options.
-p, --pid <pid> stat events on existing process id
-t, --tid <tid> stat events on existing thread id
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 09:29:27 +00:00
|
|
|
case AGGR_THREAD:
|
2015-10-16 10:41:04 +00:00
|
|
|
case AGGR_UNSET:
|
2013-02-14 12:57:27 +00:00
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
2015-10-25 14:51:18 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The evsel_list->cpus is the base we operate on,
|
|
|
|
* taking the highest cpu number to be the size of
|
|
|
|
* the aggregation translate cpumap.
|
|
|
|
*/
|
|
|
|
nr = cpu_map__get_max(evsel_list->cpus);
|
|
|
|
cpus_aggr_map = cpu_map__empty_new(nr + 1);
|
|
|
|
return cpus_aggr_map ? 0 : -ENOMEM;
|
2013-02-14 12:57:27 +00:00
|
|
|
}
|
|
|
|
|
2015-12-09 02:11:27 +00:00
|
|
|
static void perf_stat__exit_aggr_mode(void)
|
|
|
|
{
|
|
|
|
cpu_map__put(aggr_map);
|
|
|
|
cpu_map__put(cpus_aggr_map);
|
|
|
|
aggr_map = NULL;
|
|
|
|
cpus_aggr_map = NULL;
|
|
|
|
}
|
|
|
|
|
perf stat: Add -d -d and -d -d -d options to show more CPU events
Print even more detailed statistics if requested via perf stat -d:
-d: detailed events, L1 and LLC data cache
-d -d: more detailed events, dTLB and iTLB events
-d -d -d: very detailed events, adding prefetch events
Full output looks like this now:
Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% )
49,068 context-switches # 0.029 M/sec ( +- 16.66% )
8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% )
17,397 page-faults # 0.010 M/sec ( +- 0.46% )
2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%]
1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%]
743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%]
1,314,416,379 instructions # 0.56 insns per cycle
# 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%]
272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%]
3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%]
449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%]
22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%]
6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%]
1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%]
411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%]
27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%]
464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%]
10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%]
1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%]
117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%]
4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%]
1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%]
0.195622057 seconds time elapsed ( +- 6.84% )
Also clean up the attribute construction code to be appending, and factor
it out into add_default_attributes().
Tweak the coverage percentage printout a bit, so that it's easier to view it
alongside the +- sttddev colum.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 11:30:56 +00:00
|
|
|
/*
|
|
|
|
* Add default attributes, if there were no attributes specified or
|
|
|
|
* if -d/--detailed, -d -d or -d -d -d is used:
|
|
|
|
*/
|
|
|
|
static int add_default_attributes(void)
|
|
|
|
{
|
2012-10-01 18:20:58 +00:00
|
|
|
struct perf_event_attr default_attrs[] = {
|
|
|
|
|
|
|
|
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
|
|
|
|
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES },
|
|
|
|
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS },
|
|
|
|
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS },
|
|
|
|
|
|
|
|
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES },
|
|
|
|
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_FRONTEND },
|
|
|
|
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_BACKEND },
|
|
|
|
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
|
|
|
|
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
|
|
|
|
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },
|
|
|
|
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Detailed stats (-d), covering the L1 and last level data caches:
|
|
|
|
*/
|
|
|
|
struct perf_event_attr detailed_attrs[] = {
|
|
|
|
|
|
|
|
{ .type = PERF_TYPE_HW_CACHE,
|
|
|
|
.config =
|
|
|
|
PERF_COUNT_HW_CACHE_L1D << 0 |
|
|
|
|
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
|
|
|
|
(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16) },
|
|
|
|
|
|
|
|
{ .type = PERF_TYPE_HW_CACHE,
|
|
|
|
.config =
|
|
|
|
PERF_COUNT_HW_CACHE_L1D << 0 |
|
|
|
|
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
|
|
|
|
(PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
|
|
|
|
|
|
|
|
{ .type = PERF_TYPE_HW_CACHE,
|
|
|
|
.config =
|
|
|
|
PERF_COUNT_HW_CACHE_LL << 0 |
|
|
|
|
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
|
|
|
|
(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16) },
|
|
|
|
|
|
|
|
{ .type = PERF_TYPE_HW_CACHE,
|
|
|
|
.config =
|
|
|
|
PERF_COUNT_HW_CACHE_LL << 0 |
|
|
|
|
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
|
|
|
|
(PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Very detailed stats (-d -d), covering the instruction cache and the TLB caches:
|
|
|
|
*/
|
|
|
|
struct perf_event_attr very_detailed_attrs[] = {
|
|
|
|
|
|
|
|
{ .type = PERF_TYPE_HW_CACHE,
|
|
|
|
.config =
|
|
|
|
PERF_COUNT_HW_CACHE_L1I << 0 |
|
|
|
|
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
|
|
|
|
(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16) },
|
|
|
|
|
|
|
|
{ .type = PERF_TYPE_HW_CACHE,
|
|
|
|
.config =
|
|
|
|
PERF_COUNT_HW_CACHE_L1I << 0 |
|
|
|
|
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
|
|
|
|
(PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
|
|
|
|
|
|
|
|
{ .type = PERF_TYPE_HW_CACHE,
|
|
|
|
.config =
|
|
|
|
PERF_COUNT_HW_CACHE_DTLB << 0 |
|
|
|
|
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
|
|
|
|
(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16) },
|
|
|
|
|
|
|
|
{ .type = PERF_TYPE_HW_CACHE,
|
|
|
|
.config =
|
|
|
|
PERF_COUNT_HW_CACHE_DTLB << 0 |
|
|
|
|
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
|
|
|
|
(PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
|
|
|
|
|
|
|
|
{ .type = PERF_TYPE_HW_CACHE,
|
|
|
|
.config =
|
|
|
|
PERF_COUNT_HW_CACHE_ITLB << 0 |
|
|
|
|
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
|
|
|
|
(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16) },
|
|
|
|
|
|
|
|
{ .type = PERF_TYPE_HW_CACHE,
|
|
|
|
.config =
|
|
|
|
PERF_COUNT_HW_CACHE_ITLB << 0 |
|
|
|
|
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
|
|
|
|
(PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
|
|
|
|
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Very, very detailed stats (-d -d -d), adding prefetch events:
|
|
|
|
*/
|
|
|
|
struct perf_event_attr very_very_detailed_attrs[] = {
|
|
|
|
|
|
|
|
{ .type = PERF_TYPE_HW_CACHE,
|
|
|
|
.config =
|
|
|
|
PERF_COUNT_HW_CACHE_L1D << 0 |
|
|
|
|
(PERF_COUNT_HW_CACHE_OP_PREFETCH << 8) |
|
|
|
|
(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16) },
|
|
|
|
|
|
|
|
{ .type = PERF_TYPE_HW_CACHE,
|
|
|
|
.config =
|
|
|
|
PERF_COUNT_HW_CACHE_L1D << 0 |
|
|
|
|
(PERF_COUNT_HW_CACHE_OP_PREFETCH << 8) |
|
|
|
|
(PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
|
|
|
|
};
|
|
|
|
|
perf stat: Add -d -d and -d -d -d options to show more CPU events
Print even more detailed statistics if requested via perf stat -d:
-d: detailed events, L1 and LLC data cache
-d -d: more detailed events, dTLB and iTLB events
-d -d -d: very detailed events, adding prefetch events
Full output looks like this now:
Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% )
49,068 context-switches # 0.029 M/sec ( +- 16.66% )
8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% )
17,397 page-faults # 0.010 M/sec ( +- 0.46% )
2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%]
1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%]
743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%]
1,314,416,379 instructions # 0.56 insns per cycle
# 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%]
272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%]
3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%]
449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%]
22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%]
6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%]
1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%]
411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%]
27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%]
464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%]
10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%]
1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%]
117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%]
4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%]
1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%]
0.195622057 seconds time elapsed ( +- 6.84% )
Also clean up the attribute construction code to be appending, and factor
it out into add_default_attributes().
Tweak the coverage percentage printout a bit, so that it's easier to view it
alongside the +- sttddev colum.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 11:30:56 +00:00
|
|
|
/* Set attrs if no event is selected and !null_run: */
|
|
|
|
if (null_run)
|
|
|
|
return 0;
|
|
|
|
|
2013-08-21 23:47:26 +00:00
|
|
|
if (transaction_run) {
|
|
|
|
int err;
|
|
|
|
if (pmu_have_event("cpu", "cycles-ct") &&
|
|
|
|
pmu_have_event("cpu", "el-start"))
|
2015-06-03 14:25:53 +00:00
|
|
|
err = parse_events(evsel_list, transaction_attrs, NULL);
|
2013-08-21 23:47:26 +00:00
|
|
|
else
|
2015-06-03 14:25:53 +00:00
|
|
|
err = parse_events(evsel_list, transaction_limited_attrs, NULL);
|
|
|
|
if (err) {
|
2013-08-21 23:47:26 +00:00
|
|
|
fprintf(stderr, "Cannot set up transaction events\n");
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
perf stat: Add -d -d and -d -d -d options to show more CPU events
Print even more detailed statistics if requested via perf stat -d:
-d: detailed events, L1 and LLC data cache
-d -d: more detailed events, dTLB and iTLB events
-d -d -d: very detailed events, adding prefetch events
Full output looks like this now:
Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% )
49,068 context-switches # 0.029 M/sec ( +- 16.66% )
8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% )
17,397 page-faults # 0.010 M/sec ( +- 0.46% )
2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%]
1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%]
743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%]
1,314,416,379 instructions # 0.56 insns per cycle
# 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%]
272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%]
3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%]
449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%]
22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%]
6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%]
1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%]
411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%]
27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%]
464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%]
10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%]
1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%]
117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%]
4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%]
1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%]
0.195622057 seconds time elapsed ( +- 6.84% )
Also clean up the attribute construction code to be appending, and factor
it out into add_default_attributes().
Tweak the coverage percentage printout a bit, so that it's easier to view it
alongside the +- sttddev colum.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 11:30:56 +00:00
|
|
|
if (!evsel_list->nr_entries) {
|
perf stat: Initialize default events wrt exclude_{guest,host}
When no event is specified the tools use perf_evlist__add_default(), that will
call event_attr_init to initialize the KVM exclusion bits.
When the change was made to the tools so that by default guest samples would be
excluded, the changes were made just to the parsing routines and to
perf_evlist__add_default(), not to perf_evlist__add_attrs, that is used so far
just by perf stat to add multiple events, according to the level of detail
specified.
Recently the tools were changed to reconstruct the event name from all the
details in perf_event_attr, not just from .type and .config, but taking into
account all the feature bits (.exclude_{guest,host,user,kernel,etc},
.precise_ip, etc).
That is when we noticed that the default for perf stat wasn't the one for the
rest of the tools, i.e. the .exclude_guest bit wasn't being set.
I.e. the default, that doesn't call event_attr_init was showing the :HG
modifier:
$ perf stat usleep 1
Performance counter stats for 'usleep 1':
0.942119 task-clock # 0.454 CPUs utilized
1 context-switches # 0.001 M/sec
0 CPU-migrations # 0.000 K/sec
126 page-faults # 0.134 M/sec
693,193 cycles:HG # 0.736 GHz [40.11%]
407,461 stalled-cycles-frontend:HG # 58.78% frontend cycles idle [72.29%]
365,403 stalled-cycles-backend:HG # 52.71% backend cycles idle
465,982 instructions:HG # 0.67 insns per cycle
# 0.87 stalled cycles per insn
89,760 branches:HG # 95.275 M/sec
6,178 branch-misses:HG # 6.88% of all branches
0.002077228 seconds time elapsed
While if one explicitely specifies the same events, which will make the parsing code
to be called and thus event_attr_init is called:
$ perf stat -e task-clock,context-switches,migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses usleep 1
Performance counter stats for 'usleep 1':
1.040349 task-clock # 0.500 CPUs utilized
2 context-switches # 0.002 M/sec
0 CPU-migrations # 0.000 K/sec
127 page-faults # 0.122 M/sec
587,966 cycles # 0.565 GHz [13.18%]
459,167 stalled-cycles-frontend # 78.09% frontend cycles idle
390,249 stalled-cycles-backend # 66.37% backend cycles idle
504,006 instructions # 0.86 insns per cycle
# 0.91 stalled cycles per insn
96,455 branches # 92.714 M/sec
6,522 branch-misses # 6.76% of all branches [96.12%]
0.002078681 seconds time elapsed
Fix it by introducing a perf_evlist__add_default_attrs method that will call
evlist_attr_init in all the perf_event_attr entries before adding the events.
Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-4eysr236r0pgiyum9epwxw7s@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-30 16:53:54 +00:00
|
|
|
if (perf_evlist__add_default_attrs(evsel_list, default_attrs) < 0)
|
2011-11-04 11:10:59 +00:00
|
|
|
return -1;
|
perf stat: Add -d -d and -d -d -d options to show more CPU events
Print even more detailed statistics if requested via perf stat -d:
-d: detailed events, L1 and LLC data cache
-d -d: more detailed events, dTLB and iTLB events
-d -d -d: very detailed events, adding prefetch events
Full output looks like this now:
Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% )
49,068 context-switches # 0.029 M/sec ( +- 16.66% )
8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% )
17,397 page-faults # 0.010 M/sec ( +- 0.46% )
2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%]
1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%]
743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%]
1,314,416,379 instructions # 0.56 insns per cycle
# 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%]
272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%]
3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%]
449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%]
22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%]
6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%]
1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%]
411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%]
27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%]
464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%]
10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%]
1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%]
117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%]
4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%]
1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%]
0.195622057 seconds time elapsed ( +- 6.84% )
Also clean up the attribute construction code to be appending, and factor
it out into add_default_attributes().
Tweak the coverage percentage printout a bit, so that it's easier to view it
alongside the +- sttddev colum.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 11:30:56 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Detailed events get appended to the event list: */
|
|
|
|
|
|
|
|
if (detailed_run < 1)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* Append detailed run extra attributes: */
|
perf stat: Initialize default events wrt exclude_{guest,host}
When no event is specified the tools use perf_evlist__add_default(), that will
call event_attr_init to initialize the KVM exclusion bits.
When the change was made to the tools so that by default guest samples would be
excluded, the changes were made just to the parsing routines and to
perf_evlist__add_default(), not to perf_evlist__add_attrs, that is used so far
just by perf stat to add multiple events, according to the level of detail
specified.
Recently the tools were changed to reconstruct the event name from all the
details in perf_event_attr, not just from .type and .config, but taking into
account all the feature bits (.exclude_{guest,host,user,kernel,etc},
.precise_ip, etc).
That is when we noticed that the default for perf stat wasn't the one for the
rest of the tools, i.e. the .exclude_guest bit wasn't being set.
I.e. the default, that doesn't call event_attr_init was showing the :HG
modifier:
$ perf stat usleep 1
Performance counter stats for 'usleep 1':
0.942119 task-clock # 0.454 CPUs utilized
1 context-switches # 0.001 M/sec
0 CPU-migrations # 0.000 K/sec
126 page-faults # 0.134 M/sec
693,193 cycles:HG # 0.736 GHz [40.11%]
407,461 stalled-cycles-frontend:HG # 58.78% frontend cycles idle [72.29%]
365,403 stalled-cycles-backend:HG # 52.71% backend cycles idle
465,982 instructions:HG # 0.67 insns per cycle
# 0.87 stalled cycles per insn
89,760 branches:HG # 95.275 M/sec
6,178 branch-misses:HG # 6.88% of all branches
0.002077228 seconds time elapsed
While if one explicitely specifies the same events, which will make the parsing code
to be called and thus event_attr_init is called:
$ perf stat -e task-clock,context-switches,migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses usleep 1
Performance counter stats for 'usleep 1':
1.040349 task-clock # 0.500 CPUs utilized
2 context-switches # 0.002 M/sec
0 CPU-migrations # 0.000 K/sec
127 page-faults # 0.122 M/sec
587,966 cycles # 0.565 GHz [13.18%]
459,167 stalled-cycles-frontend # 78.09% frontend cycles idle
390,249 stalled-cycles-backend # 66.37% backend cycles idle
504,006 instructions # 0.86 insns per cycle
# 0.91 stalled cycles per insn
96,455 branches # 92.714 M/sec
6,522 branch-misses # 6.76% of all branches [96.12%]
0.002078681 seconds time elapsed
Fix it by introducing a perf_evlist__add_default_attrs method that will call
evlist_attr_init in all the perf_event_attr entries before adding the events.
Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-4eysr236r0pgiyum9epwxw7s@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-30 16:53:54 +00:00
|
|
|
if (perf_evlist__add_default_attrs(evsel_list, detailed_attrs) < 0)
|
2011-11-04 11:10:59 +00:00
|
|
|
return -1;
|
perf stat: Add -d -d and -d -d -d options to show more CPU events
Print even more detailed statistics if requested via perf stat -d:
-d: detailed events, L1 and LLC data cache
-d -d: more detailed events, dTLB and iTLB events
-d -d -d: very detailed events, adding prefetch events
Full output looks like this now:
Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% )
49,068 context-switches # 0.029 M/sec ( +- 16.66% )
8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% )
17,397 page-faults # 0.010 M/sec ( +- 0.46% )
2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%]
1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%]
743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%]
1,314,416,379 instructions # 0.56 insns per cycle
# 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%]
272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%]
3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%]
449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%]
22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%]
6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%]
1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%]
411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%]
27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%]
464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%]
10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%]
1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%]
117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%]
4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%]
1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%]
0.195622057 seconds time elapsed ( +- 6.84% )
Also clean up the attribute construction code to be appending, and factor
it out into add_default_attributes().
Tweak the coverage percentage printout a bit, so that it's easier to view it
alongside the +- sttddev colum.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 11:30:56 +00:00
|
|
|
|
|
|
|
if (detailed_run < 2)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* Append very detailed run extra attributes: */
|
perf stat: Initialize default events wrt exclude_{guest,host}
When no event is specified the tools use perf_evlist__add_default(), that will
call event_attr_init to initialize the KVM exclusion bits.
When the change was made to the tools so that by default guest samples would be
excluded, the changes were made just to the parsing routines and to
perf_evlist__add_default(), not to perf_evlist__add_attrs, that is used so far
just by perf stat to add multiple events, according to the level of detail
specified.
Recently the tools were changed to reconstruct the event name from all the
details in perf_event_attr, not just from .type and .config, but taking into
account all the feature bits (.exclude_{guest,host,user,kernel,etc},
.precise_ip, etc).
That is when we noticed that the default for perf stat wasn't the one for the
rest of the tools, i.e. the .exclude_guest bit wasn't being set.
I.e. the default, that doesn't call event_attr_init was showing the :HG
modifier:
$ perf stat usleep 1
Performance counter stats for 'usleep 1':
0.942119 task-clock # 0.454 CPUs utilized
1 context-switches # 0.001 M/sec
0 CPU-migrations # 0.000 K/sec
126 page-faults # 0.134 M/sec
693,193 cycles:HG # 0.736 GHz [40.11%]
407,461 stalled-cycles-frontend:HG # 58.78% frontend cycles idle [72.29%]
365,403 stalled-cycles-backend:HG # 52.71% backend cycles idle
465,982 instructions:HG # 0.67 insns per cycle
# 0.87 stalled cycles per insn
89,760 branches:HG # 95.275 M/sec
6,178 branch-misses:HG # 6.88% of all branches
0.002077228 seconds time elapsed
While if one explicitely specifies the same events, which will make the parsing code
to be called and thus event_attr_init is called:
$ perf stat -e task-clock,context-switches,migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses usleep 1
Performance counter stats for 'usleep 1':
1.040349 task-clock # 0.500 CPUs utilized
2 context-switches # 0.002 M/sec
0 CPU-migrations # 0.000 K/sec
127 page-faults # 0.122 M/sec
587,966 cycles # 0.565 GHz [13.18%]
459,167 stalled-cycles-frontend # 78.09% frontend cycles idle
390,249 stalled-cycles-backend # 66.37% backend cycles idle
504,006 instructions # 0.86 insns per cycle
# 0.91 stalled cycles per insn
96,455 branches # 92.714 M/sec
6,522 branch-misses # 6.76% of all branches [96.12%]
0.002078681 seconds time elapsed
Fix it by introducing a perf_evlist__add_default_attrs method that will call
evlist_attr_init in all the perf_event_attr entries before adding the events.
Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-4eysr236r0pgiyum9epwxw7s@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-30 16:53:54 +00:00
|
|
|
if (perf_evlist__add_default_attrs(evsel_list, very_detailed_attrs) < 0)
|
2011-11-04 11:10:59 +00:00
|
|
|
return -1;
|
perf stat: Add -d -d and -d -d -d options to show more CPU events
Print even more detailed statistics if requested via perf stat -d:
-d: detailed events, L1 and LLC data cache
-d -d: more detailed events, dTLB and iTLB events
-d -d -d: very detailed events, adding prefetch events
Full output looks like this now:
Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% )
49,068 context-switches # 0.029 M/sec ( +- 16.66% )
8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% )
17,397 page-faults # 0.010 M/sec ( +- 0.46% )
2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%]
1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%]
743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%]
1,314,416,379 instructions # 0.56 insns per cycle
# 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%]
272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%]
3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%]
449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%]
22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%]
6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%]
1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%]
411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%]
27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%]
464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%]
10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%]
1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%]
117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%]
4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%]
1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%]
0.195622057 seconds time elapsed ( +- 6.84% )
Also clean up the attribute construction code to be appending, and factor
it out into add_default_attributes().
Tweak the coverage percentage printout a bit, so that it's easier to view it
alongside the +- sttddev colum.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 11:30:56 +00:00
|
|
|
|
|
|
|
if (detailed_run < 3)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* Append very, very detailed run extra attributes: */
|
perf stat: Initialize default events wrt exclude_{guest,host}
When no event is specified the tools use perf_evlist__add_default(), that will
call event_attr_init to initialize the KVM exclusion bits.
When the change was made to the tools so that by default guest samples would be
excluded, the changes were made just to the parsing routines and to
perf_evlist__add_default(), not to perf_evlist__add_attrs, that is used so far
just by perf stat to add multiple events, according to the level of detail
specified.
Recently the tools were changed to reconstruct the event name from all the
details in perf_event_attr, not just from .type and .config, but taking into
account all the feature bits (.exclude_{guest,host,user,kernel,etc},
.precise_ip, etc).
That is when we noticed that the default for perf stat wasn't the one for the
rest of the tools, i.e. the .exclude_guest bit wasn't being set.
I.e. the default, that doesn't call event_attr_init was showing the :HG
modifier:
$ perf stat usleep 1
Performance counter stats for 'usleep 1':
0.942119 task-clock # 0.454 CPUs utilized
1 context-switches # 0.001 M/sec
0 CPU-migrations # 0.000 K/sec
126 page-faults # 0.134 M/sec
693,193 cycles:HG # 0.736 GHz [40.11%]
407,461 stalled-cycles-frontend:HG # 58.78% frontend cycles idle [72.29%]
365,403 stalled-cycles-backend:HG # 52.71% backend cycles idle
465,982 instructions:HG # 0.67 insns per cycle
# 0.87 stalled cycles per insn
89,760 branches:HG # 95.275 M/sec
6,178 branch-misses:HG # 6.88% of all branches
0.002077228 seconds time elapsed
While if one explicitely specifies the same events, which will make the parsing code
to be called and thus event_attr_init is called:
$ perf stat -e task-clock,context-switches,migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses usleep 1
Performance counter stats for 'usleep 1':
1.040349 task-clock # 0.500 CPUs utilized
2 context-switches # 0.002 M/sec
0 CPU-migrations # 0.000 K/sec
127 page-faults # 0.122 M/sec
587,966 cycles # 0.565 GHz [13.18%]
459,167 stalled-cycles-frontend # 78.09% frontend cycles idle
390,249 stalled-cycles-backend # 66.37% backend cycles idle
504,006 instructions # 0.86 insns per cycle
# 0.91 stalled cycles per insn
96,455 branches # 92.714 M/sec
6,522 branch-misses # 6.76% of all branches [96.12%]
0.002078681 seconds time elapsed
Fix it by introducing a perf_evlist__add_default_attrs method that will call
evlist_attr_init in all the perf_event_attr entries before adding the events.
Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-4eysr236r0pgiyum9epwxw7s@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-30 16:53:54 +00:00
|
|
|
return perf_evlist__add_default_attrs(evsel_list, very_very_detailed_attrs);
|
perf stat: Add -d -d and -d -d -d options to show more CPU events
Print even more detailed statistics if requested via perf stat -d:
-d: detailed events, L1 and LLC data cache
-d -d: more detailed events, dTLB and iTLB events
-d -d -d: very detailed events, adding prefetch events
Full output looks like this now:
Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% )
49,068 context-switches # 0.029 M/sec ( +- 16.66% )
8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% )
17,397 page-faults # 0.010 M/sec ( +- 0.46% )
2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%]
1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%]
743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%]
1,314,416,379 instructions # 0.56 insns per cycle
# 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%]
272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%]
3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%]
449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%]
22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%]
6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%]
1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%]
411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%]
27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%]
464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%]
10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%]
1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%]
117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%]
4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%]
1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%]
0.195622057 seconds time elapsed ( +- 6.84% )
Also clean up the attribute construction code to be appending, and factor
it out into add_default_attributes().
Tweak the coverage percentage printout a bit, so that it's easier to view it
alongside the +- sttddev colum.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 11:30:56 +00:00
|
|
|
}
|
|
|
|
|
2012-09-10 22:15:03 +00:00
|
|
|
int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
|
2009-05-26 07:17:18 +00:00
|
|
|
{
|
2012-10-01 18:20:58 +00:00
|
|
|
const char * const stat_usage[] = {
|
|
|
|
"perf stat [<options>] [<command>]",
|
|
|
|
NULL
|
|
|
|
};
|
2013-11-01 07:33:15 +00:00
|
|
|
int status = -EINVAL, run_idx;
|
2011-08-15 20:22:33 +00:00
|
|
|
const char *mode;
|
2015-07-21 12:31:24 +00:00
|
|
|
FILE *output = stderr;
|
2015-07-21 12:31:25 +00:00
|
|
|
unsigned int interval;
|
2009-06-13 12:57:28 +00:00
|
|
|
|
perf stat: add perf stat -B to pretty print large numbers
It is hard to read very large numbers so provide an option to perf stat
to separate thousands using a separator. The patch leverages the locale
support of stdio. You need to set your LC_NUMERIC appropriately, for
instance LC_NUMERIC=en_US.UTF8. You need to pass -B to activate this
feature. This way existing scripts parsing the output do not need to be
changed. Here is an example.
$ perf stat noploop 2
noploop for 2 seconds
Performance counter stats for 'noploop 2':
1998.347031 task-clock-msecs # 0.998 CPUs
61 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
118 page-faults # 0.000 M/sec
4,138,410,900 cycles # 2070.917 M/sec (scaled from 70.01%)
2,062,650,268 instructions # 0.498 IPC (scaled from 70.01%)
2,057,653,466 branches # 1029.678 M/sec (scaled from 70.01%)
40,267 branch-misses # 0.002 % (scaled from 30.04%)
2,055,961,348 cache-references # 1028.831 M/sec (scaled from 30.03%)
53,725 cache-misses # 0.027 M/sec (scaled from 30.02%)
2.001393933 seconds time elapsed
$ perf stat -B noploop 2
noploop for 2 seconds
Performance counter stats for 'noploop 2':
1998.297883 task-clock-msecs # 0.998 CPUs
59 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
119 page-faults # 0.000 M/sec
4,131,380,160 cycles # 2067.450 M/sec (scaled from 70.01%)
2,059,096,507 instructions # 0.498 IPC (scaled from 70.01%)
2,054,681,303 branches # 1028.216 M/sec (scaled from 70.01%)
25,650 branch-misses # 0.001 % (scaled from 30.05%)
2,056,283,014 cache-references # 1029.017 M/sec (scaled from 30.03%)
47,097 cache-misses # 0.024 M/sec (scaled from 30.02%)
2.001391016 seconds time elapsed
Cc: David S. Miller <davem@davemloft.net>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <4bf28fe8.914ed80a.01ca.fffff5f5@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-05-18 13:00:01 +00:00
|
|
|
setlocale(LC_ALL, "");
|
|
|
|
|
2013-03-11 07:43:12 +00:00
|
|
|
evsel_list = perf_evlist__new();
|
2011-01-11 22:56:53 +00:00
|
|
|
if (evsel_list == NULL)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2015-11-05 14:40:45 +00:00
|
|
|
argc = parse_options(argc, argv, stat_options, stat_usage,
|
2009-07-22 13:04:12 +00:00
|
|
|
PARSE_OPT_STOP_AT_NON_OPTION);
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
|
2015-07-21 12:31:25 +00:00
|
|
|
interval = stat_config.interval;
|
|
|
|
|
2011-08-15 20:22:33 +00:00
|
|
|
if (output_name && strcmp(output_name, "-"))
|
|
|
|
output = NULL;
|
|
|
|
|
2011-09-07 23:14:00 +00:00
|
|
|
if (output_name && output_fd) {
|
|
|
|
fprintf(stderr, "cannot use both --output and --log-fd\n");
|
2015-11-05 14:40:45 +00:00
|
|
|
parse_options_usage(stat_usage, stat_options, "o", 1);
|
|
|
|
parse_options_usage(NULL, stat_options, "log-fd", 0);
|
2013-11-01 07:33:15 +00:00
|
|
|
goto out;
|
2011-09-07 23:14:00 +00:00
|
|
|
}
|
2012-05-15 11:11:11 +00:00
|
|
|
|
|
|
|
if (output_fd < 0) {
|
|
|
|
fprintf(stderr, "argument to --log-fd must be a > 0\n");
|
2015-11-05 14:40:45 +00:00
|
|
|
parse_options_usage(stat_usage, stat_options, "log-fd", 0);
|
2013-11-01 07:33:15 +00:00
|
|
|
goto out;
|
2012-05-15 11:11:11 +00:00
|
|
|
}
|
|
|
|
|
2011-08-15 20:22:33 +00:00
|
|
|
if (!output) {
|
|
|
|
struct timespec tm;
|
|
|
|
mode = append_file ? "a" : "w";
|
|
|
|
|
|
|
|
output = fopen(output_name, mode);
|
|
|
|
if (!output) {
|
|
|
|
perror("failed to create output file");
|
2012-08-26 18:24:44 +00:00
|
|
|
return -1;
|
2011-08-15 20:22:33 +00:00
|
|
|
}
|
|
|
|
clock_gettime(CLOCK_REALTIME, &tm);
|
|
|
|
fprintf(output, "# started on %s\n", ctime(&tm.tv_sec));
|
2012-05-15 11:11:11 +00:00
|
|
|
} else if (output_fd > 0) {
|
2011-09-07 23:14:00 +00:00
|
|
|
mode = append_file ? "a" : "w";
|
|
|
|
output = fdopen(output_fd, mode);
|
|
|
|
if (!output) {
|
|
|
|
perror("Failed opening logfd");
|
|
|
|
return -errno;
|
|
|
|
}
|
2011-08-15 20:22:33 +00:00
|
|
|
}
|
|
|
|
|
2015-07-21 12:31:24 +00:00
|
|
|
stat_config.output = output;
|
|
|
|
|
2011-09-07 23:14:03 +00:00
|
|
|
if (csv_sep) {
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
csv_output = true;
|
2011-09-07 23:14:03 +00:00
|
|
|
if (!strcmp(csv_sep, "\\t"))
|
|
|
|
csv_sep = "\t";
|
|
|
|
} else
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
csv_sep = DEFAULT_SEPARATOR;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* let the spreadsheet do the pretty-printing
|
|
|
|
*/
|
|
|
|
if (csv_output) {
|
2011-09-07 23:14:04 +00:00
|
|
|
/* User explicitly passed -B? */
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
if (big_num_opt == 1) {
|
|
|
|
fprintf(stderr, "-B option not supported with -x\n");
|
2015-11-05 14:40:45 +00:00
|
|
|
parse_options_usage(stat_usage, stat_options, "B", 1);
|
|
|
|
parse_options_usage(NULL, stat_options, "x", 1);
|
2013-11-01 07:33:15 +00:00
|
|
|
goto out;
|
perf stat: Add csv-style output
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 16:49:05 +00:00
|
|
|
} else /* Nope, so disable big number formatting */
|
|
|
|
big_num = false;
|
|
|
|
} else if (big_num_opt == 0) /* User passed --no-big-num */
|
|
|
|
big_num = false;
|
|
|
|
|
2013-11-12 19:46:16 +00:00
|
|
|
if (!argc && target__none(&target))
|
2015-11-05 14:40:45 +00:00
|
|
|
usage_with_options(stat_usage, stat_options);
|
2013-09-30 13:37:37 +00:00
|
|
|
|
2013-03-01 18:02:27 +00:00
|
|
|
if (run_count < 0) {
|
2013-11-01 07:33:15 +00:00
|
|
|
pr_err("Run count must be a positive number\n");
|
2015-11-05 14:40:45 +00:00
|
|
|
parse_options_usage(stat_usage, stat_options, "r", 1);
|
2013-11-01 07:33:15 +00:00
|
|
|
goto out;
|
2013-03-01 18:02:27 +00:00
|
|
|
} else if (run_count == 0) {
|
|
|
|
forever = true;
|
|
|
|
run_count = 1;
|
|
|
|
}
|
2009-04-20 13:37:32 +00:00
|
|
|
|
2015-07-21 12:31:22 +00:00
|
|
|
if ((stat_config.aggr_mode == AGGR_THREAD) && !target__has_task(&target)) {
|
perf stat: Introduce --per-thread option
Currently all the -p option PID arguments tasks values get aggregated
and printed as single values.
Adding --per-tasks option to print values per task.
$ perf stat -e cycles,instructions --per-thread -p 30190,30242
^C
Performance counter stats for process id '30190,30242':
cat-30190 0 cycles
yes-30242 3,842,525,421 cycles
cat-30190 0 instructions
yes-30242 10,370,817,010 instructions
1.143155657 seconds time elapsed
Also works under interval mode:
$ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000
# time comm-pid counts unit events
1.000073435 cat-30190 89,058 cycles
1.000073435 yes-30242 3,360,786,902 cycles (100.00%)
1.000073435 cat-30190 14,066 instructions
1.000073435 yes-30242 9,069,937,462 instructions
2.000204830 cat-30190 0 cycles
2.000204830 yes-30242 3,351,667,626 cycles
2.000204830 cat-30190 0 instructions
2.000204830 yes-30242 9,045,796,885 instructions
^C 2.771286639 cat-30190 0 cycles
2.771286639 yes-30242 2,593,884,166 cycles
2.771286639 cat-30190 0 instructions
2.771286639 yes-30242 7,001,171,191 instructions
It works only with -t and -p options, otherwise following error is
printed:
$ perf stat -e cycles --per-thread -I 1000 ls
The --per-thread option is only available when monitoring via -p -t options.
-p, --pid <pid> stat events on existing process id
-t, --tid <tid> stat events on existing thread id
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 09:29:27 +00:00
|
|
|
fprintf(stderr, "The --per-thread option is only available "
|
|
|
|
"when monitoring via -p -t options.\n");
|
2015-11-05 14:40:45 +00:00
|
|
|
parse_options_usage(NULL, stat_options, "p", 1);
|
|
|
|
parse_options_usage(NULL, stat_options, "t", 1);
|
perf stat: Introduce --per-thread option
Currently all the -p option PID arguments tasks values get aggregated
and printed as single values.
Adding --per-tasks option to print values per task.
$ perf stat -e cycles,instructions --per-thread -p 30190,30242
^C
Performance counter stats for process id '30190,30242':
cat-30190 0 cycles
yes-30242 3,842,525,421 cycles
cat-30190 0 instructions
yes-30242 10,370,817,010 instructions
1.143155657 seconds time elapsed
Also works under interval mode:
$ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000
# time comm-pid counts unit events
1.000073435 cat-30190 89,058 cycles
1.000073435 yes-30242 3,360,786,902 cycles (100.00%)
1.000073435 cat-30190 14,066 instructions
1.000073435 yes-30242 9,069,937,462 instructions
2.000204830 cat-30190 0 cycles
2.000204830 yes-30242 3,351,667,626 cycles
2.000204830 cat-30190 0 instructions
2.000204830 yes-30242 9,045,796,885 instructions
^C 2.771286639 cat-30190 0 cycles
2.771286639 yes-30242 2,593,884,166 cycles
2.771286639 cat-30190 0 instructions
2.771286639 yes-30242 7,001,171,191 instructions
It works only with -t and -p options, otherwise following error is
printed:
$ perf stat -e cycles --per-thread -I 1000 ls
The --per-thread option is only available when monitoring via -p -t options.
-p, --pid <pid> stat events on existing process id
-t, --tid <tid> stat events on existing thread id
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 09:29:27 +00:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* no_aggr, cgroup are for system-wide only
|
|
|
|
* --per-thread is aggregated per thread, we dont mix it with cpu mode
|
|
|
|
*/
|
2015-07-21 12:31:22 +00:00
|
|
|
if (((stat_config.aggr_mode != AGGR_GLOBAL &&
|
|
|
|
stat_config.aggr_mode != AGGR_THREAD) || nr_cgroups) &&
|
2013-11-12 19:46:16 +00:00
|
|
|
!target__has_cpu(&target)) {
|
perf tool: Add cgroup support
This patch adds the ability to filter monitoring based on container groups
(cgroups) for both perf stat and perf record. It is possible to monitor
multiple cgroup in parallel. There is one cgroup per event. The cgroups to
monitor are passed via a new -G option followed by a comma separated list of
cgroup names.
The cgroup filesystem has to be mounted. Given a cgroup name, the perf tool
finds the corresponding directory in the cgroup filesystem and opens it. It
then passes that file descriptor to the kernel.
Example:
$ perf stat -B -a -e cycles:u,cycles:u,cycles:u -G test1,,test2 -- sleep 1
Performance counter stats for 'sleep 1':
2,368,667,414 cycles test1
2,369,661,459 cycles
<not counted> cycles test2
1.001856890 seconds time elapsed
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4d590290.825bdf0a.7d0a.4890@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 09:20:01 +00:00
|
|
|
fprintf(stderr, "both cgroup and no-aggregation "
|
|
|
|
"modes only available in system-wide mode\n");
|
|
|
|
|
2015-11-05 14:40:45 +00:00
|
|
|
parse_options_usage(stat_usage, stat_options, "G", 1);
|
|
|
|
parse_options_usage(NULL, stat_options, "A", 1);
|
|
|
|
parse_options_usage(NULL, stat_options, "a", 1);
|
2013-11-01 07:33:15 +00:00
|
|
|
goto out;
|
2013-02-06 14:46:02 +00:00
|
|
|
}
|
|
|
|
|
perf stat: Add -d -d and -d -d -d options to show more CPU events
Print even more detailed statistics if requested via perf stat -d:
-d: detailed events, L1 and LLC data cache
-d -d: more detailed events, dTLB and iTLB events
-d -d -d: very detailed events, adding prefetch events
Full output looks like this now:
Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% )
49,068 context-switches # 0.029 M/sec ( +- 16.66% )
8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% )
17,397 page-faults # 0.010 M/sec ( +- 0.46% )
2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%]
1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%]
743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%]
1,314,416,379 instructions # 0.56 insns per cycle
# 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%]
272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%]
3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%]
449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%]
22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%]
6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%]
1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%]
411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%]
27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%]
464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%]
10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%]
1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%]
117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%]
4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%]
1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%]
0.195622057 seconds time elapsed ( +- 6.84% )
Also clean up the attribute construction code to be appending, and factor
it out into add_default_attributes().
Tweak the coverage percentage printout a bit, so that it's easier to view it
alongside the +- sttddev colum.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 11:30:56 +00:00
|
|
|
if (add_default_attributes())
|
|
|
|
goto out;
|
2009-04-20 13:37:32 +00:00
|
|
|
|
2013-11-12 19:46:16 +00:00
|
|
|
target__validate(&target);
|
2011-01-03 19:53:33 +00:00
|
|
|
|
2012-05-07 05:09:04 +00:00
|
|
|
if (perf_evlist__create_maps(evsel_list, &target) < 0) {
|
2013-11-12 19:46:16 +00:00
|
|
|
if (target__has_task(&target)) {
|
2012-05-07 05:09:04 +00:00
|
|
|
pr_err("Problems finding threads of monitor\n");
|
2015-11-05 14:40:45 +00:00
|
|
|
parse_options_usage(stat_usage, stat_options, "p", 1);
|
|
|
|
parse_options_usage(NULL, stat_options, "t", 1);
|
2013-11-12 19:46:16 +00:00
|
|
|
} else if (target__has_cpu(&target)) {
|
2012-05-07 05:09:04 +00:00
|
|
|
perror("failed to parse CPUs map");
|
2015-11-05 14:40:45 +00:00
|
|
|
parse_options_usage(stat_usage, stat_options, "C", 1);
|
|
|
|
parse_options_usage(NULL, stat_options, "a", 1);
|
2013-11-01 07:33:15 +00:00
|
|
|
}
|
|
|
|
goto out;
|
2011-01-03 19:49:48 +00:00
|
|
|
}
|
perf stat: Introduce --per-thread option
Currently all the -p option PID arguments tasks values get aggregated
and printed as single values.
Adding --per-tasks option to print values per task.
$ perf stat -e cycles,instructions --per-thread -p 30190,30242
^C
Performance counter stats for process id '30190,30242':
cat-30190 0 cycles
yes-30242 3,842,525,421 cycles
cat-30190 0 instructions
yes-30242 10,370,817,010 instructions
1.143155657 seconds time elapsed
Also works under interval mode:
$ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000
# time comm-pid counts unit events
1.000073435 cat-30190 89,058 cycles
1.000073435 yes-30242 3,360,786,902 cycles (100.00%)
1.000073435 cat-30190 14,066 instructions
1.000073435 yes-30242 9,069,937,462 instructions
2.000204830 cat-30190 0 cycles
2.000204830 yes-30242 3,351,667,626 cycles
2.000204830 cat-30190 0 instructions
2.000204830 yes-30242 9,045,796,885 instructions
^C 2.771286639 cat-30190 0 cycles
2.771286639 yes-30242 2,593,884,166 cycles
2.771286639 cat-30190 0 instructions
2.771286639 yes-30242 7,001,171,191 instructions
It works only with -t and -p options, otherwise following error is
printed:
$ perf stat -e cycles --per-thread -I 1000 ls
The --per-thread option is only available when monitoring via -p -t options.
-p, --pid <pid> stat events on existing process id
-t, --tid <tid> stat events on existing thread id
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 09:29:27 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Initialize thread_map with comm names,
|
|
|
|
* so we could print it out on output.
|
|
|
|
*/
|
2015-07-21 12:31:22 +00:00
|
|
|
if (stat_config.aggr_mode == AGGR_THREAD)
|
perf stat: Introduce --per-thread option
Currently all the -p option PID arguments tasks values get aggregated
and printed as single values.
Adding --per-tasks option to print values per task.
$ perf stat -e cycles,instructions --per-thread -p 30190,30242
^C
Performance counter stats for process id '30190,30242':
cat-30190 0 cycles
yes-30242 3,842,525,421 cycles
cat-30190 0 instructions
yes-30242 10,370,817,010 instructions
1.143155657 seconds time elapsed
Also works under interval mode:
$ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000
# time comm-pid counts unit events
1.000073435 cat-30190 89,058 cycles
1.000073435 yes-30242 3,360,786,902 cycles (100.00%)
1.000073435 cat-30190 14,066 instructions
1.000073435 yes-30242 9,069,937,462 instructions
2.000204830 cat-30190 0 cycles
2.000204830 yes-30242 3,351,667,626 cycles
2.000204830 cat-30190 0 instructions
2.000204830 yes-30242 9,045,796,885 instructions
^C 2.771286639 cat-30190 0 cycles
2.771286639 yes-30242 2,593,884,166 cycles
2.771286639 cat-30190 0 instructions
2.771286639 yes-30242 7,001,171,191 instructions
It works only with -t and -p options, otherwise following error is
printed:
$ perf stat -e cycles --per-thread -I 1000 ls
The --per-thread option is only available when monitoring via -p -t options.
-p, --pid <pid> stat events on existing process id
-t, --tid <tid> stat events on existing thread id
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 09:29:27 +00:00
|
|
|
thread_map__read_comms(evsel_list->threads);
|
|
|
|
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
if (interval && interval < 100) {
|
2015-10-02 09:04:34 +00:00
|
|
|
if (interval < 10) {
|
|
|
|
pr_err("print interval must be >= 10ms\n");
|
2015-11-05 14:40:45 +00:00
|
|
|
parse_options_usage(stat_usage, stat_options, "I", 1);
|
2015-10-02 09:04:34 +00:00
|
|
|
goto out;
|
|
|
|
} else
|
|
|
|
pr_warning("print interval < 100ms. "
|
|
|
|
"The overhead percentage could be high in some cases. "
|
|
|
|
"Please proceed with caution.\n");
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
}
|
2010-05-28 10:00:01 +00:00
|
|
|
|
2013-03-18 14:24:21 +00:00
|
|
|
if (perf_evlist__alloc_stats(evsel_list, interval))
|
2014-01-03 18:56:06 +00:00
|
|
|
goto out;
|
2010-03-18 14:36:05 +00:00
|
|
|
|
2013-02-14 12:57:27 +00:00
|
|
|
if (perf_stat_init_aggr_mode())
|
2014-01-03 18:56:06 +00:00
|
|
|
goto out;
|
2013-02-14 12:57:27 +00:00
|
|
|
|
2009-05-15 09:03:23 +00:00
|
|
|
/*
|
|
|
|
* We dont want to block the signals - that would cause
|
|
|
|
* child tasks to inherit that and Ctrl-C would not work.
|
|
|
|
* What we want is for Ctrl-C to work in the exec()-ed
|
|
|
|
* task, but being ignored by perf stat itself:
|
|
|
|
*/
|
2009-06-10 13:55:59 +00:00
|
|
|
atexit(sig_atexit);
|
2013-03-01 18:02:27 +00:00
|
|
|
if (!forever)
|
|
|
|
signal(SIGINT, skip_signal);
|
perf stat: Add interval printing
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-29 11:47:44 +00:00
|
|
|
signal(SIGCHLD, skip_signal);
|
2009-05-15 09:03:23 +00:00
|
|
|
signal(SIGALRM, skip_signal);
|
|
|
|
signal(SIGABRT, skip_signal);
|
|
|
|
|
2009-06-13 12:57:28 +00:00
|
|
|
status = 0;
|
2013-03-01 18:02:27 +00:00
|
|
|
for (run_idx = 0; forever || run_idx < run_count; run_idx++) {
|
2009-06-13 12:57:28 +00:00
|
|
|
if (run_count != 1 && verbose)
|
2011-08-15 20:22:33 +00:00
|
|
|
fprintf(output, "[ perf stat: executing run #%d ... ]\n",
|
|
|
|
run_idx + 1);
|
2011-04-28 16:17:11 +00:00
|
|
|
|
2009-06-13 12:57:28 +00:00
|
|
|
status = run_perf_stat(argc, argv);
|
2013-03-01 18:02:27 +00:00
|
|
|
if (forever && status != -1) {
|
2015-06-26 09:29:26 +00:00
|
|
|
print_counters(NULL, argc, argv);
|
2015-06-26 09:29:13 +00:00
|
|
|
perf_stat__reset_stats();
|
2013-03-01 18:02:27 +00:00
|
|
|
}
|
2009-06-13 12:57:28 +00:00
|
|
|
}
|
|
|
|
|
2013-03-01 18:02:27 +00:00
|
|
|
if (!forever && status != -1 && !interval)
|
2015-06-26 09:29:26 +00:00
|
|
|
print_counters(NULL, argc, argv);
|
2013-03-18 14:24:21 +00:00
|
|
|
|
2015-12-09 02:11:27 +00:00
|
|
|
perf_stat__exit_aggr_mode();
|
2013-03-18 14:24:21 +00:00
|
|
|
perf_evlist__free_stats(evsel_list);
|
2011-02-01 18:18:10 +00:00
|
|
|
out:
|
|
|
|
perf_evlist__delete(evsel_list);
|
2009-06-13 12:57:28 +00:00
|
|
|
return status;
|
2009-04-20 13:37:32 +00:00
|
|
|
}
|