linux/tools/perf
Adrian Hunter 3c0cd952cf perf thread-stack: Hide x86 retpolines
x86 retpoline functions pollute the call graph by showing up everywhere
there is an indirect branch, but they do not really mean anything. Make
changes so that the default retpoline functions will no longer appear in
the call graph. Note this only affects the call graph, since all the
original branches are left unchanged.

This does not handle function return thunks, nor is there any
improvement for the handling of inline thunks or extern thunks.

Example:

  $ cat simple-retpoline.c
  __attribute__((noinline)) int bar(void)
  {
          return -1;
  }

  int foo(void)
  {
          return bar() + 1;
  }

  __attribute__((indirect_branch("thunk"))) int main()
  {
          int (*volatile fn)(void) = foo;

          fn();
          return fn();
  }
  $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
  $ objdump -d simple-retpoline
  <SNIP>
  0000000000001040 <main>:
      1040:       48 83 ec 18             sub    $0x18,%rsp
      1044:       48 8d 05 25 01 00 00    lea    0x125(%rip),%rax        # 1170 <foo>
      104b:       48 89 44 24 08          mov    %rax,0x8(%rsp)
      1050:       48 8b 44 24 08          mov    0x8(%rsp),%rax
      1055:       e8 1f 01 00 00          callq  1179 <__x86_indirect_thunk_rax>
      105a:       48 8b 44 24 08          mov    0x8(%rsp),%rax
      105f:       48 83 c4 18             add    $0x18,%rsp
      1063:       e9 11 01 00 00          jmpq   1179 <__x86_indirect_thunk_rax>
  <SNIP>
  0000000000001160 <bar>:
      1160:       b8 ff ff ff ff          mov    $0xffffffff,%eax
      1165:       c3                      retq
  <SNIP>
  0000000000001170 <foo>:
      1170:       e8 eb ff ff ff          callq  1160 <bar>
      1175:       83 c0 01                add    $0x1,%eax
      1178:       c3                      retq
  0000000000001179 <__x86_indirect_thunk_rax>:
      1179:       e8 07 00 00 00          callq  1185 <__x86_indirect_thunk_rax+0xc>
      117e:       f3 90                   pause
      1180:       0f ae e8                lfence
      1183:       eb f9                   jmp    117e <__x86_indirect_thunk_rax+0x5>
      1185:       48 89 04 24             mov    %rax,(%rsp)
      1189:       c3                      retq
  <SNIP>
  $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
  $ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
  2019-01-08 14:03:37.851655 Creating database...
  2019-01-08 14:03:37.863256 Writing records...
  2019-01-08 14:03:38.069750 Adding indexes
  2019-01-08 14:03:38.078799 Done
  $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db

Before:

    main
        -> __x86_indirect_thunk_rax
            -> __x86_indirect_thunk_rax
                -> foo
                    -> bar

After:

    main
        -> foo
            -> bar

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190109091835.5570-7-adrian.hunter@intel.com
[ Remove (sym->name != NULL) test, this is not a pointer and breaks the build with clang version 7.0.1 (Fedora 7.0.1-2.fc30) ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-22 16:49:49 -03:00
..
arch perf tools: Rename build libperf to perf 2019-02-14 15:18:08 -03:00
bench perf bench: Add epoll_ctl(2) benchmark 2018-11-21 22:39:55 -03:00
Documentation perf script: Allow +- operator for type specific fields option 2019-02-20 16:15:35 -03:00
examples/bpf perf augmented_syscalls: Convert to bpf_map() 2019-01-25 15:12:11 +01:00
include/bpf perf bpf: Convert pid_map() to bpf_map() 2019-01-25 15:12:10 +01:00
jvmti perf jvmti: Separate jvmti cmlr check 2018-11-21 22:39:58 -03:00
pmu-events perf vendor events power9: General metrics 2019-02-14 13:31:11 -03:00
python perf python: Make twatch.py work with both python2 and python3 2018-02-19 12:28:08 -03:00
scripts perf tools: Rename build libperf to perf 2019-02-14 15:18:08 -03:00
tests perf test: Fix failure of 'evsel-tp-sched' test on s390 2019-02-19 13:43:29 -03:00
trace perf tools: Rename build libperf to perf 2019-02-14 15:18:08 -03:00
ui perf tools: Rename build libperf to perf 2019-02-14 15:18:08 -03:00
util perf thread-stack: Hide x86 retpolines 2019-02-22 16:49:49 -03:00
.gitignore perf tools: Add trace/beauty/generated/ into .gitignore 2018-02-05 13:58:02 -03:00
Build perf tools: Rename build libperf to perf 2019-02-14 15:18:08 -03:00
builtin-annotate.c pref tools: Add missing map.h includes 2019-02-06 10:00:38 -03:00
builtin-bench.c perf bench: Add epoll_ctl(2) benchmark 2018-11-21 22:39:55 -03:00
builtin-buildid-cache.c perf buildid-cache: Warn --purge-all failures 2018-05-15 10:32:16 -03:00
builtin-buildid-list.c
builtin-c2c.c perf hists: Add argument to hists__resort_cb_t callback 2019-02-06 10:00:39 -03:00
builtin-config.c perf config: Show the configuration when no arguments are provided 2018-12-18 12:24:00 -03:00
builtin-data.c
builtin-diff.c perf hist: Use cached rbtrees 2019-01-25 15:12:10 +01:00
builtin-evlist.c
builtin-ftrace.c perf ftrace: Append an EOL when write tracing files 2018-02-19 09:49:12 -03:00
builtin-help.c perf help: Remove needless use of strncpy() 2018-12-17 14:59:18 -03:00
builtin-inject.c perf tools: Add missing include for symbols.h 2019-02-06 10:00:38 -03:00
builtin-kallsyms.c pref tools: Add missing map.h includes 2019-02-06 10:00:38 -03:00
builtin-kmem.c pref tools: Add missing map.h includes 2019-02-06 10:00:38 -03:00
builtin-kvm.c perf tools: Allow specifying proc-map-timeout in config file 2018-12-17 14:56:57 -03:00
builtin-list.c perf list: Display metric expressions for --details option 2019-02-14 15:18:09 -03:00
builtin-lock.c
builtin-mem.c pref tools: Add missing map.h includes 2019-02-06 10:00:38 -03:00
builtin-probe.c perf namespaces: Remove namespaces.h from .h headers 2019-01-25 15:12:09 +01:00
builtin-record.c perf record: Implement --affinity=node|cpu option 2019-02-11 12:32:21 -03:00
builtin-report.c perf report: Move symbol annotation to the resort phase 2019-02-06 10:00:40 -03:00
builtin-sched.c perf sched: Use cached rbtrees 2019-01-25 15:12:10 +01:00
builtin-script.c perf script: Allow +- operator for type specific fields option 2019-02-20 16:15:35 -03:00
builtin-stat.c perf pmu: Remove set_drv_config API 2019-02-06 10:00:39 -03:00
builtin-timechart.c perf tools: Add missing open_memstream() prototype for systems lacking it 2018-12-18 12:23:57 -03:00
builtin-top.c perf pmu: Remove set_drv_config API 2019-02-06 10:00:39 -03:00
builtin-trace.c perf trace: Allow dumping a BPF map after setting up BPF events 2019-02-19 16:35:45 -03:00
builtin-version.c perf version: Print status for syscall_table 2018-04-12 10:33:34 -03:00
builtin.h
check-headers.sh tools headers powerpc: Remove unistd.h 2019-01-10 10:42:08 -03:00
command-list.txt perf help: Add missing subcommand version 2018-09-19 14:53:36 -03:00
CREDITS
design.txt perf/doc: Update design.txt for exclude_{host|guest} flags 2019-01-21 11:01:18 +01:00
Makefile perf tools: Disable parallelism for 'make clean' 2018-08-20 08:54:58 -03:00
Makefile.config perf build: Add missing FEATURE_CHECK_LDFLAGS-libcrypto 2019-02-14 15:18:05 -03:00
Makefile.perf perf tools: Rename LIB_FILE to LIBPERF_A 2019-02-14 15:18:08 -03:00
MANIFEST
perf-archive.sh
perf-completion.sh perf tools: Auto-complete for events with ':' 2017-12-27 12:16:00 -03:00
perf-read-vdso.c perf tools: Make find_vdso_map() more modular 2019-01-08 13:28:13 -03:00
perf-sys.h Drop a bunch of metag references 2018-02-23 14:29:59 +00:00
perf-with-kcore.sh
perf.c perf tools: Remove dead quote.[ch] code 2018-06-04 10:28:50 -03:00
perf.h perf record: Allocate affinity masks 2019-02-06 10:00:39 -03:00