linux/tools
Kan Liang 384b60557b perf tools: Construct LBR call chain
LBR call stack only has user-space callchains. It is output in the
PERF_SAMPLE_BRANCH_STACK data format. For kernel callchains, it's
still in the form of PERF_SAMPLE_CALLCHAIN.

The perf tool has to handle both data sources to construct a
complete callstack.

For the "perf report -D" option, both lbr and fp information will be
displayed.

A new call chain recording option "lbr" is introduced into the perf
tool for LBR call stack. The user can use --call-graph lbr to get
the call stack information from hardware.

Here are some examples.

When profiling bc(1) on Fedora 19:

  echo 'scale=2000; 4*a(1)' > cmd; perf record --call-graph lbr bc -l < cmd

If enabling LBR, perf report output looks like:

    50.36%       bc  bc                 [.] bc_divide
                 |
                 --- bc_divide
                     execute
                     run_code
                     yyparse
                     main
                     __libc_start_main
                     _start
    33.66%       bc  bc                 [.] _one_mult
                 |
                 --- _one_mult
                     bc_divide
                     execute
                     run_code
                     yyparse
                     main
                     __libc_start_main
                     _start
     7.62%       bc  bc                 [.] _bc_do_add
                 |
                 --- _bc_do_add
                    |
                    |--99.89%-- 0x2000186a8
                     --0.11%-- [...]
     6.83%       bc  bc                 [.] _bc_do_sub
                 |
                 --- _bc_do_sub
                    |
                    |--99.94%-- bc_add
                    |          execute
                    |          run_code
                    |          yyparse
                    |          main
                    |          __libc_start_main
                    |          _start
                     --0.06%-- [...]
     0.46%       bc  libc-2.17.so       [.] __memset_sse2
                 |
                 --- __memset_sse2
                    |
                    |--54.13%-- bc_new_num
                    |          |
                    |          |--51.00%-- bc_divide
                    |          |          execute
                    |          |          run_code
                    |          |          yyparse
                    |          |          main
                    |          |          __libc_start_main
                    |          |          _start
                    |          |
                    |          |--30.46%-- _bc_do_sub
                    |          |          bc_add
                    |          |          execute
                    |          |          run_code
                    |          |          yyparse
                    |          |          main
                    |          |          __libc_start_main
                    |          |          _start
                    |          |
                    |           --18.55%-- _bc_do_add
                    |                     bc_add
                    |                     execute
                    |                     run_code
                    |                     yyparse
                    |                     main
                    |                     __libc_start_main
                    |                     _start
                    |
                     --45.87%-- bc_divide
                               execute
                               run_code
                               yyparse
                               main
                               __libc_start_main
                               _start

If using FP, perf report output looks like:

  echo 'scale=2000; 4*a(1)' > cmd; perf record --call-graph fp bc -l < cmd

    50.49%       bc  bc                 [.] bc_divide
                 |
                 --- bc_divide
    33.57%       bc  bc                 [.] _one_mult
                 |
                 --- _one_mult
     7.61%       bc  bc                 [.] _bc_do_add
                 |
                 --- _bc_do_add
                     0x2000186a8
     6.88%       bc  bc                 [.] _bc_do_sub
                 |
                 --- _bc_do_sub
     0.42%       bc  libc-2.17.so       [.] __memcpy_ssse3_back
                 |
                 --- __memcpy_ssse3_back

If using LBR, perf report -D output looks like:

3458145275743 0x2fd750 [0xd8]: PERF_RECORD_SAMPLE(IP, 0x2): 9748/9748: 0x408ea8 period: 609644 addr: 0
... LBR call chain: nr:8
.....  0: fffffffffffffe00
.....  1: 0000000000408e50
.....  2: 000000000040a458
.....  3: 000000000040562e
.....  4: 0000000000408590
.....  5: 00000000004022c0
.....  6: 00000000004015dd
.....  7: 0000003d1cc21b43
... FP chain: nr:2
.....  0: fffffffffffffe00
.....  1: 0000000000408ea8
 ... thread: bc:9748
 ...... dso: /usr/bin/bc

The LBR call stack has the following known limitations:

 - Zero length calls are not filtered out by the hardware

 - Exception handing such as setjmp/longjmp will have calls/returns not
   match

 - Pushing different return address onto the stack will have
   calls/returns not match

 - If callstack is deeper than the LBR, only the last entries are
   captured

Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1420482185-29830-3-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:16:18 +01:00
..
cgroup cgroups: fix cgroup_event_listener error handling 2013-01-08 10:00:44 -08:00
firewire tools/firewire: nosy-dump: check for allocation failure 2012-12-02 20:10:18 +01:00
hv tools: hv: ignore ENOBUFS and ENOMEM in the KVP daemon 2014-11-26 19:01:12 -08:00
include tools: Remove bitops/hweight usage of bits in tools/perf 2015-01-16 17:49:29 -03:00
lguest tools/lguest: offer VIRTIO_F_ANY_LAYOUT for net device. 2013-07-15 11:18:32 +09:30
lib tools lib traceevent: Add support for IP address formats 2015-01-26 12:04:41 -03:00
net tools: bpf_jit_disasm: increase image buffer size 2014-05-16 16:44:08 -04:00
nfsd
perf perf tools: Construct LBR call chain 2015-02-18 17:16:18 +01:00
power tools / cpupower: Fix no idle state information return value 2014-12-19 23:01:12 +01:00
scripts tools lib traceevent: Add global QUIET_CC_FPIC build output 2013-12-19 16:18:10 -03:00
testing selftests/vm: fix link error for transhuge-stress test 2015-01-08 09:01:00 -07:00
thermal/tmon calloc/xcalloc: Fix argument order 2014-12-09 10:06:29 -03:00
time tools: add script to test udelay 2014-07-23 10:16:38 -07:00
usb USB patches for 3.19-rc1 2014-12-14 14:57:16 -08:00
virtio tools/virtio: add virtio 1.0 in vringh_test 2014-12-15 23:49:22 +02:00
vm mm/page_owner: keep track of page owners 2014-12-13 12:42:48 -08:00
Makefile tools/liblockdep: Build liblockdep from tools/Makefile 2014-05-08 13:34:45 -04:00