linux/tools/perf
Waiman Long 4598a0a6d2 perf symbols: Improve DSO long names lookup speed with rbtree
With workload that spawns and destroys many threads and processes, it
was found that perf-mem could took a long time to post-process the perf
data after the target workload had completed its operation.

The performance bottleneck was found to be the lookup and insertion of
the new DSO structures (thousands of them in this case).

In a dual-socket Ivy-Bridge E7-4890 v2 machine (30-core, 60-thread), the
perf profile below shows what perf was doing after the profiled AIM7
shared workload completed:

-     83.94%  perf  libc-2.11.3.so     [.] __strcmp_sse42
   - __strcmp_sse42
      - 99.82% map__new
           machine__process_mmap_event
           perf_session_deliver_event
           perf_session__process_event
           __perf_session__process_events
           cmd_record
           cmd_mem
           run_builtin
           main
           __libc_start_main
-     13.17%  perf  perf               [.] __dsos__findnew
     __dsos__findnew
     map__new
     machine__process_mmap_event
     perf_session_deliver_event
     perf_session__process_event
     __perf_session__process_events
     cmd_record
     cmd_mem
     run_builtin
     main
     __libc_start_main

So about 97% of CPU times were spent in the map__new() function trying
to insert new DSO entry into the DSO linked list. The whole
post-processing step took about 9 minutes.

The DSO structures are currently searched linearly. So the total
processing time will be proportional to n^2.

To overcome this performance problem, the DSO code is modified to also
put the DSO structures in a RB tree sorted by its long name in
additional to being in a simple linked list. With this change, the
processing time will become proportional to n*log(n) which will be much
quicker for large n. However, the short name will still be searched
using the old linear searching method.  With that patch in place, the
same perf-mem post-processing step took less than 30 seconds to
complete.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Link: http://lkml.kernel.org/r/1412098575-27863-3-git-send-email-Waiman.Long@hp.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-10-01 14:39:57 -03:00
..
arch perf tool: fix compilation for ARM 2014-09-17 17:08:09 -03:00
bench perf bench futex: Sanitize -q option in requeue 2014-09-29 15:43:26 -03:00
config perf tools: Fix GNU-only grep usage in Makefile 2014-09-17 17:08:09 -03:00
Documentation perf tools: Disable kernel symbol demangling by default 2014-09-17 17:08:09 -03:00
python perf python: Remove duplicate TID bit from mask 2013-08-07 17:35:25 -03:00
scripts perf script: Add callchain to generic and tracepoint events 2014-07-16 17:57:33 -03:00
tests tools lib fd array: Allow associating an integer cookie with each entry 2014-09-25 16:46:55 -03:00
ui perf hists browser: Fix callchain print bug on TUI 2014-09-26 12:38:02 -03:00
util perf symbols: Improve DSO long names lookup speed with rbtree 2014-10-01 14:39:57 -03:00
.gitignore perf tools: Add perf-with-kcore script 2014-09-17 17:08:08 -03:00
builtin-annotate.c perf tools: Modify error code for when perf_session__new() fails 2014-09-26 12:32:58 -03:00
builtin-bench.c perf bench: Add --repeat option 2014-06-19 16:13:15 -03:00
builtin-buildid-cache.c perf buildid-cache: Use strerror_r instead of strerror 2014-08-15 13:07:59 -03:00
builtin-buildid-list.c perf session: Separating data file properties from session 2013-10-21 17:33:25 -03:00
builtin-diff.c perf tools: Modify error code for when perf_session__new() fails 2014-09-26 12:32:58 -03:00
builtin-evlist.c perf tools: Modify error code for when perf_session__new() fails 2014-09-26 12:32:58 -03:00
builtin-help.c perf help: Use strerror_r instead of strerror 2014-08-15 13:08:26 -03:00
builtin-inject.c perf tools: Modify error code for when perf_session__new() fails 2014-09-26 12:32:58 -03:00
builtin-kmem.c perf tools: Modify error code for when perf_session__new() fails 2014-09-26 12:32:58 -03:00
builtin-kvm.c perf tools: Modify error code for when perf_session__new() fails 2014-09-26 12:32:58 -03:00
builtin-list.c perf list: Add usage 2013-11-05 14:26:41 -03:00
builtin-lock.c perf tools: Modify error code for when perf_session__new() fails 2014-09-26 12:32:58 -03:00
builtin-mem.c perf tools: Modify error code for when perf_session__new() fails 2014-09-26 12:32:58 -03:00
builtin-probe.c perf probe: Do not access kallsyms when analyzing user binaries 2014-09-17 18:01:14 -03:00
builtin-record.c perf tools: Convert {record,top}.call-graph option to call-graph.record-mode 2014-09-26 12:43:53 -03:00
builtin-report.c perf tools: Modify error code for when perf_session__new() fails 2014-09-26 12:32:58 -03:00
builtin-sched.c perf sched: Use strerror_r instead of strerror 2014-08-15 13:07:47 -03:00
builtin-script.c perf tools: Modify error code for when perf_session__new() fails 2014-09-26 12:32:58 -03:00
builtin-stat.c perf stat: Fix --per-core on multi socket systems 2014-09-26 10:17:13 -03:00
builtin-timechart.c perf tools: Modify error code for when perf_session__new() fails 2014-09-26 12:32:58 -03:00
builtin-top.c perf tools: Convert {record,top}.call-graph option to call-graph.record-mode 2014-09-26 12:43:53 -03:00
builtin-trace.c perf trace: Fix mmap return address truncation to 32-bit 2014-09-29 15:25:36 -03:00
builtin.h perf tools: Add new mem command for memory access profiling 2013-04-01 12:21:44 -03:00
command-list.txt perf tools: Add new mem command for memory access profiling 2013-04-01 12:21:44 -03:00
CREDITS
design.txt perf tools: Update some code references in design.txt 2014-03-18 18:17:06 -03:00
Makefile perf tools: Add 'build-test' make target 2014-01-16 16:26:26 -03:00
Makefile.perf tools lib api: Adopt fdarray class from perf's evlist 2014-09-25 16:46:55 -03:00
MANIFEST perf kvm: Add stat support on s390 2014-07-16 17:57:33 -03:00
perf-archive.sh perf archive: Make 'f' the last parameter for tar 2012-09-17 13:10:42 -03:00
perf-completion.sh perf sched: Introduce --list-cmds for use by scripts 2014-04-16 17:16:05 +02:00
perf-sys.h perf tools: Allow to use cpuinfo on s390 2014-07-07 16:55:24 -03:00
perf-with-kcore.sh perf tools: Add perf-with-kcore script 2014-09-17 17:08:08 -03:00
perf.c perf: Use strerror_r instead of strerror 2014-08-15 10:54:29 -03:00
perf.h perf tools: Move callchain config from record_opts to callchain_param 2014-09-26 12:40:33 -03:00