linux/tools/perf/ui/browsers/annotate.c

1004 lines
27 KiB
C
Raw Normal View History

License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 14:07:57 +00:00
// SPDX-License-Identifier: GPL-2.0
#include "../browser.h"
#include "../helpline.h"
#include "../ui.h"
#include "../../util/annotate.h"
#include "../../util/debug.h"
#include "../../util/dso.h"
#include "../../util/hist.h"
#include "../../util/sort.h"
#include "../../util/map.h"
perf ui: Update use of pthread mutex Switch to the use of mutex wrappers that provide better error checking. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Truong <alexandre.truong@arm.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andres Freund <andres@anarazel.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: André Almeida <andrealmeid@igalia.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Colin Ian King <colin.king@intel.com> Cc: Dario Petrillo <dario.pk1@gmail.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Dave Marchevsky <davemarchevsky@fb.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Fangrui Song <maskray@google.com> Cc: Hewenliang <hewenliang4@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jason Wang <wangborong@cdjrlc.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin Liška <mliska@suse.cz> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Pavithra Gurushankar <gpavithrasha@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <quentin@isovalent.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Remi Bernon <rbernon@codeweavers.com> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tom Rix <trix@redhat.com> Cc: Weiguo Li <liwg06@foxmail.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: William Cohen <wcohen@redhat.com> Cc: Zechuan Chen <chenzechuan1@huawei.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Cc: yaowenbin <yaowenbin1@huawei.com> Link: https://lore.kernel.org/r/20220826164242.43412-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-26 16:42:33 +00:00
#include "../../util/mutex.h"
#include "../../util/symbol.h"
#include "../../util/evsel.h"
perf annotate: Check for fused instructions Macro fusion merges two instructions to a single micro-op. Intel core platform performs this hardware optimization under limited circumstances. For example, CMP + JCC can be "fused" and executed /retired together. While with sampling this can result in the sample sometimes being on the JCC and sometimes on the CMP. So for the fused instruction pair, they could be considered together. On Nehalem, fused instruction pairs: cmp/test + jcc. On other new CPU: cmp/test/add/sub/and/inc/dec + jcc. This patch adds an x86-specific function which checks if 2 instructions are in a "fused" pair. For non-x86 arch, the function is just NULL. Changelog: v4: Move the CPU model checking to symbol__disassemble and save the CPU family/model in arch structure. It avoids checking every time when jump arrow printed. v3: Add checking for Nehalem (CMP, TEST). For other newer Intel CPUs just check it by default (CMP, TEST, ADD, SUB, AND, INC, DEC). v2: Remove the original weak function. Arnaldo points out that doing it as a weak function that will be overridden by the host arch doesn't work. So now it's implemented as an arch-specific function. Committer fix: Do not access evsel->evlist->env->cpuid, ->env can be null, introduce perf_evsel__env_cpuid(), just like perf_evsel__env_arch(), also used in this function call. The original patch was segfaulting 'perf top' + annotation. But this essentially disables this fused instructions augmentation in 'perf top', the right thing is to get the cpuid from the running kernel, left for a later patch tho. Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1499403995-19857-2-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-07-07 05:06:34 +00:00
#include "../../util/evlist.h"
#include <inttypes.h>
#include <linux/kernel.h>
#include <linux/string.h>
#include <linux/zalloc.h>
#include <sys/ttydefaults.h>
#include <asm/bug.h>
struct arch;
struct annotate_browser {
struct ui_browser b;
struct rb_root entries;
struct rb_node *curr_hot;
struct annotation_line *selection;
struct arch *arch;
bool searching_backwards;
char search_bf[128];
};
static inline struct annotation *browser__annotation(struct ui_browser *browser)
{
struct map_symbol *ms = browser->priv;
return symbol__annotation(ms->sym);
}
static bool disasm_line__filter(struct ui_browser *browser __maybe_unused, void *entry)
{
struct annotation_line *al = list_entry(entry, struct annotation_line, node);
return annotation_line__filter(al);
}
static int ui_browser__jumps_percent_color(struct ui_browser *browser, int nr, bool current)
{
struct annotation *notes = browser__annotation(browser);
if (current && (!browser->use_navkeypressed || browser->navkeypressed))
return HE_COLORSET_SELECTED;
if (nr == notes->max_jump_sources)
return HE_COLORSET_TOP;
if (nr > 1)
return HE_COLORSET_MEDIUM;
return HE_COLORSET_NORMAL;
}
static int ui_browser__set_jumps_percent_color(void *browser, int nr, bool current)
{
int color = ui_browser__jumps_percent_color(browser, nr, current);
return ui_browser__set_color(browser, color);
}
static int annotate_browser__set_color(void *browser, int color)
{
return ui_browser__set_color(browser, color);
}
static void annotate_browser__write_graph(void *browser, int graph)
{
ui_browser__write_graph(browser, graph);
}
static void annotate_browser__set_percent_color(void *browser, double percent, bool current)
{
ui_browser__set_percent_color(browser, percent, current);
}
static void annotate_browser__printf(void *browser, const char *fmt, ...)
{
va_list args;
va_start(args, fmt);
ui_browser__vprintf(browser, fmt, args);
va_end(args);
}
static void annotate_browser__write(struct ui_browser *browser, void *entry, int row)
{
struct annotate_browser *ab = container_of(browser, struct annotate_browser, b);
struct annotation *notes = browser__annotation(browser);
struct annotation_line *al = list_entry(entry, struct annotation_line, node);
const bool is_current_entry = ui_browser__is_current_entry(browser, row);
struct annotation_write_ops ops = {
.first_line = row == 0,
.current_entry = is_current_entry,
.change_color = (!annotate_opts.hide_src_code &&
(!is_current_entry ||
(browser->use_navkeypressed &&
!browser->navkeypressed))),
.width = browser->width,
.obj = browser,
.set_color = annotate_browser__set_color,
.set_percent_color = annotate_browser__set_percent_color,
.set_jumps_percent_color = ui_browser__set_jumps_percent_color,
.printf = annotate_browser__printf,
.write_graph = annotate_browser__write_graph,
};
/* The scroll bar isn't being used */
if (!browser->navkeypressed)
ops.width += 1;
annotation_line__write(al, notes, &ops);
if (ops.current_entry)
ab->selection = al;
}
perf annotate: Fix fused instr logic for assembly functions Some x86 microarchitectures fuse a subset of cmp/test/ALU instructions with branch instructions, and thus perf annotate highlight such valid pairs as fused. When annotated with source, perf uses struct disasm_line to contain either source or instruction line from objdump output. Usually, a C statement generates multiple instructions which include such cmp/test/ALU + branch instruction pairs. But in case of assembly function, each individual assembly source line generate one instruction. The 'perf annotate' instruction fusion logic assumes the previous disasm_line as the previous instruction line, which is wrong because, for assembly function, previous disasm_line contains source line. And thus perf fails to highlight valid fused instruction pairs for assembly functions. Fix it by searching backward until we find an instruction line and consider that disasm_line as fused with current branch instruction. Before: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ cmp %rcx,0x88(%rsp) │ je .Lerror_bad_iret <--- Source line 0.14 │ ┌──je b4 <--- Instruction line │ │movl %ecx, %eax After: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ ┌──cmp %rcx,0x88(%rsp) │ │je .Lerror_bad_iret 0.14 │ ├──je b4 │ │movl %ecx, %eax Reviewed-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https //lore.kernel.org/r/20210911043854.8373-1-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-11 04:38:53 +00:00
static int is_fused(struct annotate_browser *ab, struct disasm_line *cursor)
{
struct disasm_line *pos = list_prev_entry(cursor, al.node);
const char *name;
perf annotate: Fix fused instr logic for assembly functions Some x86 microarchitectures fuse a subset of cmp/test/ALU instructions with branch instructions, and thus perf annotate highlight such valid pairs as fused. When annotated with source, perf uses struct disasm_line to contain either source or instruction line from objdump output. Usually, a C statement generates multiple instructions which include such cmp/test/ALU + branch instruction pairs. But in case of assembly function, each individual assembly source line generate one instruction. The 'perf annotate' instruction fusion logic assumes the previous disasm_line as the previous instruction line, which is wrong because, for assembly function, previous disasm_line contains source line. And thus perf fails to highlight valid fused instruction pairs for assembly functions. Fix it by searching backward until we find an instruction line and consider that disasm_line as fused with current branch instruction. Before: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ cmp %rcx,0x88(%rsp) │ je .Lerror_bad_iret <--- Source line 0.14 │ ┌──je b4 <--- Instruction line │ │movl %ecx, %eax After: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ ┌──cmp %rcx,0x88(%rsp) │ │je .Lerror_bad_iret 0.14 │ ├──je b4 │ │movl %ecx, %eax Reviewed-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https //lore.kernel.org/r/20210911043854.8373-1-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-11 04:38:53 +00:00
int diff = 1;
while (pos && pos->al.offset == -1) {
pos = list_prev_entry(pos, al.node);
if (!annotate_opts.hide_src_code)
perf annotate: Fix fused instr logic for assembly functions Some x86 microarchitectures fuse a subset of cmp/test/ALU instructions with branch instructions, and thus perf annotate highlight such valid pairs as fused. When annotated with source, perf uses struct disasm_line to contain either source or instruction line from objdump output. Usually, a C statement generates multiple instructions which include such cmp/test/ALU + branch instruction pairs. But in case of assembly function, each individual assembly source line generate one instruction. The 'perf annotate' instruction fusion logic assumes the previous disasm_line as the previous instruction line, which is wrong because, for assembly function, previous disasm_line contains source line. And thus perf fails to highlight valid fused instruction pairs for assembly functions. Fix it by searching backward until we find an instruction line and consider that disasm_line as fused with current branch instruction. Before: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ cmp %rcx,0x88(%rsp) │ je .Lerror_bad_iret <--- Source line 0.14 │ ┌──je b4 <--- Instruction line │ │movl %ecx, %eax After: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ ┌──cmp %rcx,0x88(%rsp) │ │je .Lerror_bad_iret 0.14 │ ├──je b4 │ │movl %ecx, %eax Reviewed-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https //lore.kernel.org/r/20210911043854.8373-1-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-11 04:38:53 +00:00
diff++;
}
if (!pos)
perf annotate: Fix fused instr logic for assembly functions Some x86 microarchitectures fuse a subset of cmp/test/ALU instructions with branch instructions, and thus perf annotate highlight such valid pairs as fused. When annotated with source, perf uses struct disasm_line to contain either source or instruction line from objdump output. Usually, a C statement generates multiple instructions which include such cmp/test/ALU + branch instruction pairs. But in case of assembly function, each individual assembly source line generate one instruction. The 'perf annotate' instruction fusion logic assumes the previous disasm_line as the previous instruction line, which is wrong because, for assembly function, previous disasm_line contains source line. And thus perf fails to highlight valid fused instruction pairs for assembly functions. Fix it by searching backward until we find an instruction line and consider that disasm_line as fused with current branch instruction. Before: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ cmp %rcx,0x88(%rsp) │ je .Lerror_bad_iret <--- Source line 0.14 │ ┌──je b4 <--- Instruction line │ │movl %ecx, %eax After: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ ┌──cmp %rcx,0x88(%rsp) │ │je .Lerror_bad_iret 0.14 │ ├──je b4 │ │movl %ecx, %eax Reviewed-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https //lore.kernel.org/r/20210911043854.8373-1-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-11 04:38:53 +00:00
return 0;
if (ins__is_lock(&pos->ins))
name = pos->ops.locked.ins.name;
else
name = pos->ins.name;
if (!name || !cursor->ins.name)
perf annotate: Fix fused instr logic for assembly functions Some x86 microarchitectures fuse a subset of cmp/test/ALU instructions with branch instructions, and thus perf annotate highlight such valid pairs as fused. When annotated with source, perf uses struct disasm_line to contain either source or instruction line from objdump output. Usually, a C statement generates multiple instructions which include such cmp/test/ALU + branch instruction pairs. But in case of assembly function, each individual assembly source line generate one instruction. The 'perf annotate' instruction fusion logic assumes the previous disasm_line as the previous instruction line, which is wrong because, for assembly function, previous disasm_line contains source line. And thus perf fails to highlight valid fused instruction pairs for assembly functions. Fix it by searching backward until we find an instruction line and consider that disasm_line as fused with current branch instruction. Before: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ cmp %rcx,0x88(%rsp) │ je .Lerror_bad_iret <--- Source line 0.14 │ ┌──je b4 <--- Instruction line │ │movl %ecx, %eax After: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ ┌──cmp %rcx,0x88(%rsp) │ │je .Lerror_bad_iret 0.14 │ ├──je b4 │ │movl %ecx, %eax Reviewed-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https //lore.kernel.org/r/20210911043854.8373-1-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-11 04:38:53 +00:00
return 0;
perf annotate: Fix fused instr logic for assembly functions Some x86 microarchitectures fuse a subset of cmp/test/ALU instructions with branch instructions, and thus perf annotate highlight such valid pairs as fused. When annotated with source, perf uses struct disasm_line to contain either source or instruction line from objdump output. Usually, a C statement generates multiple instructions which include such cmp/test/ALU + branch instruction pairs. But in case of assembly function, each individual assembly source line generate one instruction. The 'perf annotate' instruction fusion logic assumes the previous disasm_line as the previous instruction line, which is wrong because, for assembly function, previous disasm_line contains source line. And thus perf fails to highlight valid fused instruction pairs for assembly functions. Fix it by searching backward until we find an instruction line and consider that disasm_line as fused with current branch instruction. Before: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ cmp %rcx,0x88(%rsp) │ je .Lerror_bad_iret <--- Source line 0.14 │ ┌──je b4 <--- Instruction line │ │movl %ecx, %eax After: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ ┌──cmp %rcx,0x88(%rsp) │ │je .Lerror_bad_iret 0.14 │ ├──je b4 │ │movl %ecx, %eax Reviewed-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https //lore.kernel.org/r/20210911043854.8373-1-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-11 04:38:53 +00:00
if (ins__is_fused(ab->arch, name, cursor->ins.name))
return diff;
return 0;
}
static void annotate_browser__draw_current_jump(struct ui_browser *browser)
{
struct annotate_browser *ab = container_of(browser, struct annotate_browser, b);
struct disasm_line *cursor = disasm_line(ab->selection);
struct annotation_line *target;
unsigned int from, to;
struct map_symbol *ms = ab->b.priv;
struct symbol *sym = ms->sym;
struct annotation *notes = symbol__annotation(sym);
u8 pcnt_width = annotation__pcnt_width(notes);
int width;
perf annotate: Fix fused instr logic for assembly functions Some x86 microarchitectures fuse a subset of cmp/test/ALU instructions with branch instructions, and thus perf annotate highlight such valid pairs as fused. When annotated with source, perf uses struct disasm_line to contain either source or instruction line from objdump output. Usually, a C statement generates multiple instructions which include such cmp/test/ALU + branch instruction pairs. But in case of assembly function, each individual assembly source line generate one instruction. The 'perf annotate' instruction fusion logic assumes the previous disasm_line as the previous instruction line, which is wrong because, for assembly function, previous disasm_line contains source line. And thus perf fails to highlight valid fused instruction pairs for assembly functions. Fix it by searching backward until we find an instruction line and consider that disasm_line as fused with current branch instruction. Before: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ cmp %rcx,0x88(%rsp) │ je .Lerror_bad_iret <--- Source line 0.14 │ ┌──je b4 <--- Instruction line │ │movl %ecx, %eax After: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ ┌──cmp %rcx,0x88(%rsp) │ │je .Lerror_bad_iret 0.14 │ ├──je b4 │ │movl %ecx, %eax Reviewed-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https //lore.kernel.org/r/20210911043854.8373-1-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-11 04:38:53 +00:00
int diff = 0;
/* PLT symbols contain external offsets */
if (strstr(sym->name, "@plt"))
return;
if (!disasm_line__is_valid_local_jump(cursor, sym))
return;
/*
* This first was seen with a gcc function, _cpp_lex_token, that
* has the usual jumps:
*
* 1159e6c: jne 115aa32 <_cpp_lex_token@@Base+0xf92>
*
* I.e. jumps to a label inside that function (_cpp_lex_token), and
* those works, but also this kind:
*
* 1159e8b: jne c469be <cpp_named_operator2name@@Base+0xa72>
*
* I.e. jumps to another function, outside _cpp_lex_token, which
* are not being correctly handled generating as a side effect references
* to ab->offset[] entries that are set to NULL, so to make this code
* more robust, check that here.
*
* A proper fix for will be put in place, looking at the function
* name right after the '<' token and probably treating this like a
* 'call' instruction.
*/
target = notes->src->offsets[cursor->ops.target.offset];
if (target == NULL) {
ui_helpline__printf("WARN: jump target inconsistency, press 'o', notes->offsets[%#x] = NULL\n",
cursor->ops.target.offset);
return;
}
if (annotate_opts.hide_src_code) {
from = cursor->al.idx_asm;
to = target->idx_asm;
} else {
from = (u64)cursor->al.idx;
to = (u64)target->idx;
}
width = annotation__cycles_width(notes);
perf report: Fix wrong jump arrow When we use perf report interactive annotate view, we can see the position of jump arrow is not correct. For example, 1. perf record -b ... 2. perf report 3. In interactive mode, select Annotate 'function' Percent│ IPC Cycle │ if (flag) 1.37 │0.4┌── 1 ↓ je 82 │ │ x += x / y + y / x; 0.00 │0.4│ 1310 movsd (%rsp),%xmm0 0.00 │0.4│ 565 movsd 0x8(%rsp),%xmm4 │0.4│ movsd 0x8(%rsp),%xmm1 │0.4│ movsd (%rsp),%xmm3 │0.4│ divsd %xmm4,%xmm0 0.00 │0.4│ 579 divsd %xmm3,%xmm1 │0.4│ movsd (%rsp),%xmm2 │0.4│ addsd %xmm1,%xmm0 │0.4│ addsd %xmm2,%xmm0 0.00 │0.4│ movsd %xmm0,(%rsp) │ │ volatile double x = 1212121212, y = 121212; │ │ │ │ s_randseed = time(0); │ │ srand(s_randseed); │ │ │ │ for (i = 0; i < 2000000000; i++) { 1.37 │0.4└─→ 82: sub $0x1,%ebx 28.21 │0.48 17 ↑ jne 38 The jump arrow in above example is not correct. It should add the width of IPC and Cycle. With this patch, the result is: Percent│ IPC Cycle │ if (flag) 1.37 │0.48 1 ┌──je 82 │ │ x += x / y + y / x; 0.00 │0.48 1310 │ movsd (%rsp),%xmm0 0.00 │0.48 565 │ movsd 0x8(%rsp),%xmm4 │0.48 │ movsd 0x8(%rsp),%xmm1 │0.48 │ movsd (%rsp),%xmm3 │0.48 │ divsd %xmm4,%xmm0 0.00 │0.48 579 │ divsd %xmm3,%xmm1 │0.48 │ movsd (%rsp),%xmm2 │0.48 │ addsd %xmm1,%xmm0 │0.48 │ addsd %xmm2,%xmm0 0.00 │0.48 │ movsd %xmm0,(%rsp) │ │ volatile double x = 1212121212, y = 121212; │ │ │ │ s_randseed = time(0); │ │ srand(s_randseed); │ │ │ │ for (i = 0; i < 2000000000; i++) { 1.37 │0.48 82:└─→sub $0x1,%ebx 28.21 │0.48 17 ↑ jne 38 Committer notes: Please note that only from LBRv5 (according to Jiri) onwards, i.e. >= Skylake is that we'll have the cycles counts in each branch record entry, so to see the Cycles and IPC columns, and be able to test this patch, one need a capable hardware. While applying this I first tested it on a Broadwell class machine and couldn't get those columns, will add code to the annotate browser to warn the user about that, i.e. you have branch records, but no cycles, use a more recent hardware to get the cycles and IPC columns. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1517223473-14750-1-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-29 10:57:53 +00:00
ui_browser__set_color(browser, HE_COLORSET_JUMP_ARROWS);
perf report: Fix wrong jump arrow When we use perf report interactive annotate view, we can see the position of jump arrow is not correct. For example, 1. perf record -b ... 2. perf report 3. In interactive mode, select Annotate 'function' Percent│ IPC Cycle │ if (flag) 1.37 │0.4┌── 1 ↓ je 82 │ │ x += x / y + y / x; 0.00 │0.4│ 1310 movsd (%rsp),%xmm0 0.00 │0.4│ 565 movsd 0x8(%rsp),%xmm4 │0.4│ movsd 0x8(%rsp),%xmm1 │0.4│ movsd (%rsp),%xmm3 │0.4│ divsd %xmm4,%xmm0 0.00 │0.4│ 579 divsd %xmm3,%xmm1 │0.4│ movsd (%rsp),%xmm2 │0.4│ addsd %xmm1,%xmm0 │0.4│ addsd %xmm2,%xmm0 0.00 │0.4│ movsd %xmm0,(%rsp) │ │ volatile double x = 1212121212, y = 121212; │ │ │ │ s_randseed = time(0); │ │ srand(s_randseed); │ │ │ │ for (i = 0; i < 2000000000; i++) { 1.37 │0.4└─→ 82: sub $0x1,%ebx 28.21 │0.48 17 ↑ jne 38 The jump arrow in above example is not correct. It should add the width of IPC and Cycle. With this patch, the result is: Percent│ IPC Cycle │ if (flag) 1.37 │0.48 1 ┌──je 82 │ │ x += x / y + y / x; 0.00 │0.48 1310 │ movsd (%rsp),%xmm0 0.00 │0.48 565 │ movsd 0x8(%rsp),%xmm4 │0.48 │ movsd 0x8(%rsp),%xmm1 │0.48 │ movsd (%rsp),%xmm3 │0.48 │ divsd %xmm4,%xmm0 0.00 │0.48 579 │ divsd %xmm3,%xmm1 │0.48 │ movsd (%rsp),%xmm2 │0.48 │ addsd %xmm1,%xmm0 │0.48 │ addsd %xmm2,%xmm0 0.00 │0.48 │ movsd %xmm0,(%rsp) │ │ volatile double x = 1212121212, y = 121212; │ │ │ │ s_randseed = time(0); │ │ srand(s_randseed); │ │ │ │ for (i = 0; i < 2000000000; i++) { 1.37 │0.48 82:└─→sub $0x1,%ebx 28.21 │0.48 17 ↑ jne 38 Committer notes: Please note that only from LBRv5 (according to Jiri) onwards, i.e. >= Skylake is that we'll have the cycles counts in each branch record entry, so to see the Cycles and IPC columns, and be able to test this patch, one need a capable hardware. While applying this I first tested it on a Broadwell class machine and couldn't get those columns, will add code to the annotate browser to warn the user about that, i.e. you have branch records, but no cycles, use a more recent hardware to get the cycles and IPC columns. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1517223473-14750-1-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-29 10:57:53 +00:00
__ui_browser__line_arrow(browser,
pcnt_width + 2 + notes->widths.addr + width,
from, to);
perf annotate: Fix fused instr logic for assembly functions Some x86 microarchitectures fuse a subset of cmp/test/ALU instructions with branch instructions, and thus perf annotate highlight such valid pairs as fused. When annotated with source, perf uses struct disasm_line to contain either source or instruction line from objdump output. Usually, a C statement generates multiple instructions which include such cmp/test/ALU + branch instruction pairs. But in case of assembly function, each individual assembly source line generate one instruction. The 'perf annotate' instruction fusion logic assumes the previous disasm_line as the previous instruction line, which is wrong because, for assembly function, previous disasm_line contains source line. And thus perf fails to highlight valid fused instruction pairs for assembly functions. Fix it by searching backward until we find an instruction line and consider that disasm_line as fused with current branch instruction. Before: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ cmp %rcx,0x88(%rsp) │ je .Lerror_bad_iret <--- Source line 0.14 │ ┌──je b4 <--- Instruction line │ │movl %ecx, %eax After: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ ┌──cmp %rcx,0x88(%rsp) │ │je .Lerror_bad_iret 0.14 │ ├──je b4 │ │movl %ecx, %eax Reviewed-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https //lore.kernel.org/r/20210911043854.8373-1-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-11 04:38:53 +00:00
diff = is_fused(ab, cursor);
if (diff > 0) {
ui_browser__mark_fused(browser,
pcnt_width + 3 + notes->widths.addr + width,
perf annotate: Fix fused instr logic for assembly functions Some x86 microarchitectures fuse a subset of cmp/test/ALU instructions with branch instructions, and thus perf annotate highlight such valid pairs as fused. When annotated with source, perf uses struct disasm_line to contain either source or instruction line from objdump output. Usually, a C statement generates multiple instructions which include such cmp/test/ALU + branch instruction pairs. But in case of assembly function, each individual assembly source line generate one instruction. The 'perf annotate' instruction fusion logic assumes the previous disasm_line as the previous instruction line, which is wrong because, for assembly function, previous disasm_line contains source line. And thus perf fails to highlight valid fused instruction pairs for assembly functions. Fix it by searching backward until we find an instruction line and consider that disasm_line as fused with current branch instruction. Before: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ cmp %rcx,0x88(%rsp) │ je .Lerror_bad_iret <--- Source line 0.14 │ ┌──je b4 <--- Instruction line │ │movl %ecx, %eax After: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ ┌──cmp %rcx,0x88(%rsp) │ │je .Lerror_bad_iret 0.14 │ ├──je b4 │ │movl %ecx, %eax Reviewed-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https //lore.kernel.org/r/20210911043854.8373-1-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-11 04:38:53 +00:00
from - diff, diff, to > from);
}
}
static unsigned int annotate_browser__refresh(struct ui_browser *browser)
{
struct annotation *notes = browser__annotation(browser);
int ret = ui_browser__list_head_refresh(browser);
int pcnt_width = annotation__pcnt_width(notes);
if (annotate_opts.jump_arrows)
annotate_browser__draw_current_jump(browser);
ui_browser__set_color(browser, HE_COLORSET_NORMAL);
__ui_browser__vline(browser, pcnt_width, 0, browser->rows - 1);
return ret;
}
static double disasm__cmp(struct annotation_line *a, struct annotation_line *b,
int percent_type)
{
int i;
for (i = 0; i < a->data_nr; i++) {
if (a->data[i].percent[percent_type] == b->data[i].percent[percent_type])
continue;
return a->data[i].percent[percent_type] -
b->data[i].percent[percent_type];
}
return 0;
}
static void disasm_rb_tree__insert(struct annotate_browser *browser,
struct annotation_line *al)
{
struct rb_root *root = &browser->entries;
struct rb_node **p = &root->rb_node;
struct rb_node *parent = NULL;
struct annotation_line *l;
while (*p != NULL) {
parent = *p;
l = rb_entry(parent, struct annotation_line, rb_node);
if (disasm__cmp(al, l, annotate_opts.percent_type) < 0)
p = &(*p)->rb_left;
else
p = &(*p)->rb_right;
}
rb_link_node(&al->rb_node, parent, p);
rb_insert_color(&al->rb_node, root);
}
static void annotate_browser__set_top(struct annotate_browser *browser,
struct annotation_line *pos, u32 idx)
{
unsigned back;
ui_browser__refresh_dimensions(&browser->b);
back = browser->b.height / 2;
browser->b.top_idx = browser->b.index = idx;
while (browser->b.top_idx != 0 && back != 0) {
pos = list_entry(pos->node.prev, struct annotation_line, node);
if (annotation_line__filter(pos))
continue;
--browser->b.top_idx;
--back;
}
browser->b.top = pos;
browser->b.navkeypressed = true;
}
static void annotate_browser__set_rb_top(struct annotate_browser *browser,
struct rb_node *nd)
{
struct annotation_line * pos = rb_entry(nd, struct annotation_line, rb_node);
u32 idx = pos->idx;
if (annotate_opts.hide_src_code)
idx = pos->idx_asm;
annotate_browser__set_top(browser, pos, idx);
browser->curr_hot = nd;
}
static void annotate_browser__calc_percent(struct annotate_browser *browser,
struct evsel *evsel)
{
struct map_symbol *ms = browser->b.priv;
struct symbol *sym = ms->sym;
struct annotation *notes = symbol__annotation(sym);
struct disasm_line *pos;
browser->entries = RB_ROOT;
annotation__lock(notes);
symbol__calc_percent(sym, evsel);
list_for_each_entry(pos, &notes->src->source, al.node) {
double max_percent = 0.0;
int i;
if (pos->al.offset == -1) {
RB_CLEAR_NODE(&pos->al.rb_node);
continue;
}
for (i = 0; i < pos->al.data_nr; i++) {
double percent;
percent = annotation_data__percent(&pos->al.data[i],
annotate_opts.percent_type);
if (max_percent < percent)
max_percent = percent;
}
if (max_percent < 0.01 && (!pos->al.cycles || pos->al.cycles->ipc == 0)) {
RB_CLEAR_NODE(&pos->al.rb_node);
continue;
}
disasm_rb_tree__insert(browser, &pos->al);
}
annotation__unlock(notes);
browser->curr_hot = rb_last(&browser->entries);
}
static struct annotation_line *annotate_browser__find_next_asm_line(
struct annotate_browser *browser,
struct annotation_line *al)
{
struct annotation_line *it = al;
/* find next asm line */
list_for_each_entry_continue(it, browser->b.entries, node) {
if (it->idx_asm >= 0)
return it;
}
/* no asm line found forwards, try backwards */
it = al;
list_for_each_entry_continue_reverse(it, browser->b.entries, node) {
if (it->idx_asm >= 0)
return it;
}
/* There are no asm lines */
return NULL;
}
static bool annotate_browser__toggle_source(struct annotate_browser *browser)
{
struct annotation *notes = browser__annotation(&browser->b);
struct annotation_line *al;
off_t offset = browser->b.index - browser->b.top_idx;
browser->b.seek(&browser->b, offset, SEEK_CUR);
al = list_entry(browser->b.top, struct annotation_line, node);
if (annotate_opts.hide_src_code) {
if (al->idx_asm < offset)
offset = al->idx;
browser->b.nr_entries = notes->src->nr_entries;
annotate_opts.hide_src_code = false;
browser->b.seek(&browser->b, -offset, SEEK_CUR);
browser->b.top_idx = al->idx - offset;
browser->b.index = al->idx;
} else {
if (al->idx_asm < 0) {
/* move cursor to next asm line */
al = annotate_browser__find_next_asm_line(browser, al);
if (!al) {
browser->b.seek(&browser->b, -offset, SEEK_CUR);
return false;
}
}
if (al->idx_asm < offset)
offset = al->idx_asm;
browser->b.nr_entries = notes->src->nr_asm_entries;
annotate_opts.hide_src_code = true;
browser->b.seek(&browser->b, -offset, SEEK_CUR);
browser->b.top_idx = al->idx_asm - offset;
browser->b.index = al->idx_asm;
}
return true;
}
#define SYM_TITLE_MAX_SIZE (PATH_MAX + 64)
static void annotate_browser__show_full_location(struct ui_browser *browser)
{
struct annotate_browser *ab = container_of(browser, struct annotate_browser, b);
struct disasm_line *cursor = disasm_line(ab->selection);
struct annotation_line *al = &cursor->al;
if (al->offset != -1)
ui_helpline__puts("Only available for source code lines.");
else if (al->fileloc == NULL)
ui_helpline__puts("No source file location.");
else {
char help_line[SYM_TITLE_MAX_SIZE];
sprintf (help_line, "Source file location: %s", al->fileloc);
ui_helpline__puts(help_line);
}
}
static void ui_browser__init_asm_mode(struct ui_browser *browser)
{
struct annotation *notes = browser__annotation(browser);
ui_browser__reset_index(browser);
browser->nr_entries = notes->src->nr_asm_entries;
}
static int sym_title(struct symbol *sym, struct map *map, char *title,
size_t sz, int percent_type)
{
perf map: Add accessor for dso Later changes will add reference count checking for struct map, with dso being the most frequently accessed variable. Add an accessor so that the reference count check is only necessary in one place. Additional changes: - add a dso variable to avoid repeated map__dso calls. - in builtin-mem.c dump_raw_samples, code only partially tested for dso == NULL. Make the possibility of NULL consistent. - in thread.c thread__memcpy fix use of spaces and use tabs. Committer notes: Did missing conversions on these files: tools/perf/arch/powerpc/util/skip-callchain-idx.c tools/perf/arch/powerpc/util/sym-handling.c tools/perf/ui/browsers/hists.c tools/perf/ui/gtk/annotate.c tools/perf/util/cs-etm.c tools/perf/util/thread.c tools/perf/util/unwind-libunwind-local.c tools/perf/util/unwind-libunwind.c Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-20 21:22:35 +00:00
return snprintf(title, sz, "%s %s [Percent: %s]", sym->name,
map__dso(map)->long_name,
percent_type_str(percent_type));
}
perf annotate: Support jumping from one function to another For instance: entry_SYSCALL_64 /lib/modules/4.16.0-rc5-00086-gdf09348f78dc/build/vmlinux 5.50 │ → callq do_syscall_64 14.56 │ mov 0x58(%rsp),%rcx 7.44 │ mov 0x80(%rsp),%r11 0.32 │ cmp %rcx,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ shl $0x10,%rcx 0.32 │ sar $0x10,%rcx 3.24 │ cmp %rcx,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 2.27 │ cmpq $0x33,0x88(%rsp) 1.29 │ → jne swapgs_restore_regs_and_return_to_usermode │ mov 0x30(%rsp),%r11 8.74 │ cmp %r11,0x90(%rsp) │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ test $0x10100,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ cmpq $0x2b,0xa0(%rsp) 0.65 │ → jne swapgs_restore_regs_and_return_to_usermode It'll behave just like a "call" instruction, i.e. press enter or right arrow over one such line and the browser will navigate to the annotated disassembly of that function, which when exited, via left arrow or esc, will come back to the calling function. Now to support jump to an offset on a different function... Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-78o508mqvr8inhj63ddtw7mo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-23 13:50:35 +00:00
/*
* This can be called from external jumps, i.e. jumps from one function
perf annotate: Support jumping from one function to another For instance: entry_SYSCALL_64 /lib/modules/4.16.0-rc5-00086-gdf09348f78dc/build/vmlinux 5.50 │ → callq do_syscall_64 14.56 │ mov 0x58(%rsp),%rcx 7.44 │ mov 0x80(%rsp),%r11 0.32 │ cmp %rcx,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ shl $0x10,%rcx 0.32 │ sar $0x10,%rcx 3.24 │ cmp %rcx,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 2.27 │ cmpq $0x33,0x88(%rsp) 1.29 │ → jne swapgs_restore_regs_and_return_to_usermode │ mov 0x30(%rsp),%r11 8.74 │ cmp %r11,0x90(%rsp) │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ test $0x10100,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ cmpq $0x2b,0xa0(%rsp) 0.65 │ → jne swapgs_restore_regs_and_return_to_usermode It'll behave just like a "call" instruction, i.e. press enter or right arrow over one such line and the browser will navigate to the annotated disassembly of that function, which when exited, via left arrow or esc, will come back to the calling function. Now to support jump to an offset on a different function... Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-78o508mqvr8inhj63ddtw7mo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-23 13:50:35 +00:00
* to another, like from the kernel's entry_SYSCALL_64 function to the
* swapgs_restore_regs_and_return_to_usermode() function.
*
* So all we check here is that dl->ops.target.sym is set, if it is, just
* go to that function and when exiting from its disassembly, come back
* to the calling function.
*/
static bool annotate_browser__callq(struct annotate_browser *browser,
struct evsel *evsel,
struct hist_browser_timer *hbt)
{
struct map_symbol *ms = browser->b.priv, target_ms;
struct disasm_line *dl = disasm_line(browser->selection);
struct annotation *notes;
char title[SYM_TITLE_MAX_SIZE];
if (!dl->ops.target.sym) {
ui_helpline__puts("The called function was not found.");
return true;
}
notes = symbol__annotation(dl->ops.target.sym);
annotation__lock(notes);
if (!symbol__hists(dl->ops.target.sym, evsel->evlist->core.nr_entries)) {
annotation__unlock(notes);
ui__warning("Not enough memory for annotating '%s' symbol!\n",
dl->ops.target.sym->name);
return true;
}
target_ms.maps = ms->maps;
target_ms.map = ms->map;
target_ms.sym = dl->ops.target.sym;
annotation__unlock(notes);
symbol__tui_annotate(&target_ms, evsel, hbt);
sym_title(ms->sym, ms->map, title, sizeof(title), annotate_opts.percent_type);
ui_browser__show_title(&browser->b, title);
return true;
}
static
struct disasm_line *annotate_browser__find_offset(struct annotate_browser *browser,
s64 offset, s64 *idx)
{
struct annotation *notes = browser__annotation(&browser->b);
struct disasm_line *pos;
*idx = 0;
list_for_each_entry(pos, &notes->src->source, al.node) {
if (pos->al.offset == offset)
return pos;
if (!annotation_line__filter(&pos->al))
++*idx;
}
return NULL;
}
perf annotate: Support jumping from one function to another For instance: entry_SYSCALL_64 /lib/modules/4.16.0-rc5-00086-gdf09348f78dc/build/vmlinux 5.50 │ → callq do_syscall_64 14.56 │ mov 0x58(%rsp),%rcx 7.44 │ mov 0x80(%rsp),%r11 0.32 │ cmp %rcx,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ shl $0x10,%rcx 0.32 │ sar $0x10,%rcx 3.24 │ cmp %rcx,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 2.27 │ cmpq $0x33,0x88(%rsp) 1.29 │ → jne swapgs_restore_regs_and_return_to_usermode │ mov 0x30(%rsp),%r11 8.74 │ cmp %r11,0x90(%rsp) │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ test $0x10100,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ cmpq $0x2b,0xa0(%rsp) 0.65 │ → jne swapgs_restore_regs_and_return_to_usermode It'll behave just like a "call" instruction, i.e. press enter or right arrow over one such line and the browser will navigate to the annotated disassembly of that function, which when exited, via left arrow or esc, will come back to the calling function. Now to support jump to an offset on a different function... Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-78o508mqvr8inhj63ddtw7mo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-23 13:50:35 +00:00
static bool annotate_browser__jump(struct annotate_browser *browser,
struct evsel *evsel,
perf annotate: Support jumping from one function to another For instance: entry_SYSCALL_64 /lib/modules/4.16.0-rc5-00086-gdf09348f78dc/build/vmlinux 5.50 │ → callq do_syscall_64 14.56 │ mov 0x58(%rsp),%rcx 7.44 │ mov 0x80(%rsp),%r11 0.32 │ cmp %rcx,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ shl $0x10,%rcx 0.32 │ sar $0x10,%rcx 3.24 │ cmp %rcx,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 2.27 │ cmpq $0x33,0x88(%rsp) 1.29 │ → jne swapgs_restore_regs_and_return_to_usermode │ mov 0x30(%rsp),%r11 8.74 │ cmp %r11,0x90(%rsp) │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ test $0x10100,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ cmpq $0x2b,0xa0(%rsp) 0.65 │ → jne swapgs_restore_regs_and_return_to_usermode It'll behave just like a "call" instruction, i.e. press enter or right arrow over one such line and the browser will navigate to the annotated disassembly of that function, which when exited, via left arrow or esc, will come back to the calling function. Now to support jump to an offset on a different function... Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-78o508mqvr8inhj63ddtw7mo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-23 13:50:35 +00:00
struct hist_browser_timer *hbt)
{
struct disasm_line *dl = disasm_line(browser->selection);
u64 offset;
s64 idx;
perf annotate: Remove duplicate 'name' field from disasm_line The disasm_line::name field is always equal to ins::name, being used just to locate the instruction's ins_ops from the per-arch instructions table. Eliminate this duplication, nuking that field and instead make ins__find() return an ins_ops, store it in disasm_line::ins.ops, and keep just in disasm_line::ins.name what was in disasm_line::name, this way we end up not keeping a reference to entries in the per-arch instructions table. This in turn will help supporting multiple ways to manage the per-arch instructions table, allowing resorting that array, for instance, when the entries will move after references to its addresses were made. The same problem is avoided when one grows the array with realloc. So architectures simply keeping a constant array will work as well as architectures building the table using regular expressions or other logic that involves resorting the table. Reviewed-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chris Riyder <chris.ryder@arm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-vr899azvabnw9gtuepuqfd9t@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-11-24 14:16:06 +00:00
if (!ins__is_jump(&dl->ins))
return false;
perf annotate: Support jumping from one function to another For instance: entry_SYSCALL_64 /lib/modules/4.16.0-rc5-00086-gdf09348f78dc/build/vmlinux 5.50 │ → callq do_syscall_64 14.56 │ mov 0x58(%rsp),%rcx 7.44 │ mov 0x80(%rsp),%r11 0.32 │ cmp %rcx,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ shl $0x10,%rcx 0.32 │ sar $0x10,%rcx 3.24 │ cmp %rcx,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 2.27 │ cmpq $0x33,0x88(%rsp) 1.29 │ → jne swapgs_restore_regs_and_return_to_usermode │ mov 0x30(%rsp),%r11 8.74 │ cmp %r11,0x90(%rsp) │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ test $0x10100,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ cmpq $0x2b,0xa0(%rsp) 0.65 │ → jne swapgs_restore_regs_and_return_to_usermode It'll behave just like a "call" instruction, i.e. press enter or right arrow over one such line and the browser will navigate to the annotated disassembly of that function, which when exited, via left arrow or esc, will come back to the calling function. Now to support jump to an offset on a different function... Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-78o508mqvr8inhj63ddtw7mo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-23 13:50:35 +00:00
if (dl->ops.target.outside) {
annotate_browser__callq(browser, evsel, hbt);
return true;
}
offset = dl->ops.target.offset;
dl = annotate_browser__find_offset(browser, offset, &idx);
if (dl == NULL) {
ui_helpline__printf("Invalid jump offset: %" PRIx64, offset);
return true;
}
annotate_browser__set_top(browser, &dl->al, idx);
return true;
}
static
struct annotation_line *annotate_browser__find_string(struct annotate_browser *browser,
char *s, s64 *idx)
{
struct annotation *notes = browser__annotation(&browser->b);
struct annotation_line *al = browser->selection;
*idx = browser->b.index;
list_for_each_entry_continue(al, &notes->src->source, node) {
if (annotation_line__filter(al))
continue;
++*idx;
if (al->line && strstr(al->line, s) != NULL)
return al;
}
return NULL;
}
static bool __annotate_browser__search(struct annotate_browser *browser)
{
struct annotation_line *al;
s64 idx;
al = annotate_browser__find_string(browser, browser->search_bf, &idx);
if (al == NULL) {
ui_helpline__puts("String not found!");
return false;
}
annotate_browser__set_top(browser, al, idx);
browser->searching_backwards = false;
return true;
}
static
struct annotation_line *annotate_browser__find_string_reverse(struct annotate_browser *browser,
char *s, s64 *idx)
{
struct annotation *notes = browser__annotation(&browser->b);
struct annotation_line *al = browser->selection;
*idx = browser->b.index;
list_for_each_entry_continue_reverse(al, &notes->src->source, node) {
if (annotation_line__filter(al))
continue;
--*idx;
if (al->line && strstr(al->line, s) != NULL)
return al;
}
return NULL;
}
static bool __annotate_browser__search_reverse(struct annotate_browser *browser)
{
struct annotation_line *al;
s64 idx;
al = annotate_browser__find_string_reverse(browser, browser->search_bf, &idx);
if (al == NULL) {
ui_helpline__puts("String not found!");
return false;
}
annotate_browser__set_top(browser, al, idx);
browser->searching_backwards = true;
return true;
}
static bool annotate_browser__search_window(struct annotate_browser *browser,
int delay_secs)
{
if (ui_browser__input_window("Search", "String: ", browser->search_bf,
"ENTER: OK, ESC: Cancel",
delay_secs * 2) != K_ENTER ||
!*browser->search_bf)
return false;
return true;
}
static bool annotate_browser__search(struct annotate_browser *browser, int delay_secs)
{
if (annotate_browser__search_window(browser, delay_secs))
return __annotate_browser__search(browser);
return false;
}
static bool annotate_browser__continue_search(struct annotate_browser *browser,
int delay_secs)
{
if (!*browser->search_bf)
return annotate_browser__search(browser, delay_secs);
return __annotate_browser__search(browser);
}
static bool annotate_browser__search_reverse(struct annotate_browser *browser,
int delay_secs)
{
if (annotate_browser__search_window(browser, delay_secs))
return __annotate_browser__search_reverse(browser);
return false;
}
static
bool annotate_browser__continue_search_reverse(struct annotate_browser *browser,
int delay_secs)
{
if (!*browser->search_bf)
return annotate_browser__search_reverse(browser, delay_secs);
return __annotate_browser__search_reverse(browser);
}
perf annotate browser: Show extra title line with event information So at the top we'll have two lines, like this, from 'perf report': # perf report --group --ignore-vmlinux ===================================================================================================== Samples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Percent │ nop │ push %rbx 0.00 14.29 0.00 │ pushfq 9.09 0.00 0.00 │ pop %rax 9.09 0.00 20.00 │ nop │ mov %rax,%rbx │ cli 4.55 7.14 0.00 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 77.27 78.57 70.00 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0.00 0.00 10.00 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on│key bindings ===================================================================================================== 9.09 + 9.09 + 4.55 + 77.27 = 100 14.29 + 7.14 + 78.57 = 100 20 + 70 + 10 = 100 We can do the math by using 't' to toggle from 'percent' to nr ===================================================================================================== Samples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Period │ nop │ push %rbx 0 79273 0 │ pushfq 190455 0 0 │ pop %rax 198038 0 3045 │ nop │ mov %rax,%rbx │ cli 217233 32562 0 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 3421649 979174 28273 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0 0 5193 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on│key bindings ===================================================================================================== 79273 + 190455 + 198038 + 3045 + 217233 + 32562 + 3421649 + 979174 + 28273 + 5193 = 5154895 Or number of samples: ===================================================================================================== ooSamples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Samples │ nop │ push %rbx 0 2 0 │ pushfq 2 0 0 │ pop %rax 2 0 2 │ nop │ mov %rax,%rbx │ cli 1 1 0 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 17 11 7 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0 0 1 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on key bindings ===================================================================================================== 2 + 2 + 2 + 2 + 1 + 1 + 17 + 11 + 7 + 1 = 46 Suggested-by: Martin Liška <mliska@suse.cz> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196935 Link: https://lkml.kernel.org/n/tip-ezccyxld50wtwyt66np6aomo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-02 19:18:45 +00:00
static int annotate_browser__show(struct ui_browser *browser, char *title, const char *help)
{
struct map_symbol *ms = browser->priv;
struct symbol *sym = ms->sym;
char symbol_dso[SYM_TITLE_MAX_SIZE];
if (ui_browser__show(browser, title, help) < 0)
return -1;
sym_title(sym, ms->map, symbol_dso, sizeof(symbol_dso), annotate_opts.percent_type);
perf annotate browser: Show extra title line with event information So at the top we'll have two lines, like this, from 'perf report': # perf report --group --ignore-vmlinux ===================================================================================================== Samples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Percent │ nop │ push %rbx 0.00 14.29 0.00 │ pushfq 9.09 0.00 0.00 │ pop %rax 9.09 0.00 20.00 │ nop │ mov %rax,%rbx │ cli 4.55 7.14 0.00 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 77.27 78.57 70.00 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0.00 0.00 10.00 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on│key bindings ===================================================================================================== 9.09 + 9.09 + 4.55 + 77.27 = 100 14.29 + 7.14 + 78.57 = 100 20 + 70 + 10 = 100 We can do the math by using 't' to toggle from 'percent' to nr ===================================================================================================== Samples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Period │ nop │ push %rbx 0 79273 0 │ pushfq 190455 0 0 │ pop %rax 198038 0 3045 │ nop │ mov %rax,%rbx │ cli 217233 32562 0 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 3421649 979174 28273 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0 0 5193 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on│key bindings ===================================================================================================== 79273 + 190455 + 198038 + 3045 + 217233 + 32562 + 3421649 + 979174 + 28273 + 5193 = 5154895 Or number of samples: ===================================================================================================== ooSamples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Samples │ nop │ push %rbx 0 2 0 │ pushfq 2 0 0 │ pop %rax 2 0 2 │ nop │ mov %rax,%rbx │ cli 1 1 0 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 17 11 7 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0 0 1 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on key bindings ===================================================================================================== 2 + 2 + 2 + 2 + 1 + 1 + 17 + 11 + 7 + 1 = 46 Suggested-by: Martin Liška <mliska@suse.cz> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196935 Link: https://lkml.kernel.org/n/tip-ezccyxld50wtwyt66np6aomo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-02 19:18:45 +00:00
ui_browser__gotorc_title(browser, 0, 0);
ui_browser__set_color(browser, HE_COLORSET_ROOT);
ui_browser__write_nstring(browser, symbol_dso, browser->width + 1);
return 0;
}
static void
switch_percent_type(struct annotation_options *opts, bool base)
{
switch (opts->percent_type) {
case PERCENT_HITS_LOCAL:
if (base)
opts->percent_type = PERCENT_PERIOD_LOCAL;
else
opts->percent_type = PERCENT_HITS_GLOBAL;
break;
case PERCENT_HITS_GLOBAL:
if (base)
opts->percent_type = PERCENT_PERIOD_GLOBAL;
else
opts->percent_type = PERCENT_HITS_LOCAL;
break;
case PERCENT_PERIOD_LOCAL:
if (base)
opts->percent_type = PERCENT_HITS_LOCAL;
else
opts->percent_type = PERCENT_PERIOD_GLOBAL;
break;
case PERCENT_PERIOD_GLOBAL:
if (base)
opts->percent_type = PERCENT_HITS_GLOBAL;
else
opts->percent_type = PERCENT_PERIOD_LOCAL;
break;
default:
WARN_ON(1);
}
}
static int annotate_browser__run(struct annotate_browser *browser,
struct evsel *evsel,
struct hist_browser_timer *hbt)
{
struct rb_node *nd = NULL;
perf annotate browser: Show extra title line with event information So at the top we'll have two lines, like this, from 'perf report': # perf report --group --ignore-vmlinux ===================================================================================================== Samples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Percent │ nop │ push %rbx 0.00 14.29 0.00 │ pushfq 9.09 0.00 0.00 │ pop %rax 9.09 0.00 20.00 │ nop │ mov %rax,%rbx │ cli 4.55 7.14 0.00 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 77.27 78.57 70.00 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0.00 0.00 10.00 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on│key bindings ===================================================================================================== 9.09 + 9.09 + 4.55 + 77.27 = 100 14.29 + 7.14 + 78.57 = 100 20 + 70 + 10 = 100 We can do the math by using 't' to toggle from 'percent' to nr ===================================================================================================== Samples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Period │ nop │ push %rbx 0 79273 0 │ pushfq 190455 0 0 │ pop %rax 198038 0 3045 │ nop │ mov %rax,%rbx │ cli 217233 32562 0 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 3421649 979174 28273 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0 0 5193 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on│key bindings ===================================================================================================== 79273 + 190455 + 198038 + 3045 + 217233 + 32562 + 3421649 + 979174 + 28273 + 5193 = 5154895 Or number of samples: ===================================================================================================== ooSamples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Samples │ nop │ push %rbx 0 2 0 │ pushfq 2 0 0 │ pop %rax 2 0 2 │ nop │ mov %rax,%rbx │ cli 1 1 0 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 17 11 7 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0 0 1 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on key bindings ===================================================================================================== 2 + 2 + 2 + 2 + 1 + 1 + 17 + 11 + 7 + 1 = 46 Suggested-by: Martin Liška <mliska@suse.cz> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196935 Link: https://lkml.kernel.org/n/tip-ezccyxld50wtwyt66np6aomo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-02 19:18:45 +00:00
struct hists *hists = evsel__hists(evsel);
struct map_symbol *ms = browser->b.priv;
struct symbol *sym = ms->sym;
struct annotation *notes = symbol__annotation(ms->sym);
const char *help = "Press 'h' for help on key bindings";
int delay_secs = hbt ? hbt->refresh : 0;
perf annotate browser: Show extra title line with event information So at the top we'll have two lines, like this, from 'perf report': # perf report --group --ignore-vmlinux ===================================================================================================== Samples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Percent │ nop │ push %rbx 0.00 14.29 0.00 │ pushfq 9.09 0.00 0.00 │ pop %rax 9.09 0.00 20.00 │ nop │ mov %rax,%rbx │ cli 4.55 7.14 0.00 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 77.27 78.57 70.00 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0.00 0.00 10.00 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on│key bindings ===================================================================================================== 9.09 + 9.09 + 4.55 + 77.27 = 100 14.29 + 7.14 + 78.57 = 100 20 + 70 + 10 = 100 We can do the math by using 't' to toggle from 'percent' to nr ===================================================================================================== Samples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Period │ nop │ push %rbx 0 79273 0 │ pushfq 190455 0 0 │ pop %rax 198038 0 3045 │ nop │ mov %rax,%rbx │ cli 217233 32562 0 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 3421649 979174 28273 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0 0 5193 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on│key bindings ===================================================================================================== 79273 + 190455 + 198038 + 3045 + 217233 + 32562 + 3421649 + 979174 + 28273 + 5193 = 5154895 Or number of samples: ===================================================================================================== ooSamples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Samples │ nop │ push %rbx 0 2 0 │ pushfq 2 0 0 │ pop %rax 2 0 2 │ nop │ mov %rax,%rbx │ cli 1 1 0 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 17 11 7 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0 0 1 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on key bindings ===================================================================================================== 2 + 2 + 2 + 2 + 1 + 1 + 17 + 11 + 7 + 1 = 46 Suggested-by: Martin Liška <mliska@suse.cz> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196935 Link: https://lkml.kernel.org/n/tip-ezccyxld50wtwyt66np6aomo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-02 19:18:45 +00:00
char title[256];
int key;
hists__scnprintf_title(hists, title, sizeof(title));
perf annotate browser: Show extra title line with event information So at the top we'll have two lines, like this, from 'perf report': # perf report --group --ignore-vmlinux ===================================================================================================== Samples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Percent │ nop │ push %rbx 0.00 14.29 0.00 │ pushfq 9.09 0.00 0.00 │ pop %rax 9.09 0.00 20.00 │ nop │ mov %rax,%rbx │ cli 4.55 7.14 0.00 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 77.27 78.57 70.00 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0.00 0.00 10.00 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on│key bindings ===================================================================================================== 9.09 + 9.09 + 4.55 + 77.27 = 100 14.29 + 7.14 + 78.57 = 100 20 + 70 + 10 = 100 We can do the math by using 't' to toggle from 'percent' to nr ===================================================================================================== Samples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Period │ nop │ push %rbx 0 79273 0 │ pushfq 190455 0 0 │ pop %rax 198038 0 3045 │ nop │ mov %rax,%rbx │ cli 217233 32562 0 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 3421649 979174 28273 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0 0 5193 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on│key bindings ===================================================================================================== 79273 + 190455 + 198038 + 3045 + 217233 + 32562 + 3421649 + 979174 + 28273 + 5193 = 5154895 Or number of samples: ===================================================================================================== ooSamples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Samples │ nop │ push %rbx 0 2 0 │ pushfq 2 0 0 │ pop %rax 2 0 2 │ nop │ mov %rax,%rbx │ cli 1 1 0 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 17 11 7 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0 0 1 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on key bindings ===================================================================================================== 2 + 2 + 2 + 2 + 1 + 1 + 17 + 11 + 7 + 1 = 46 Suggested-by: Martin Liška <mliska@suse.cz> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196935 Link: https://lkml.kernel.org/n/tip-ezccyxld50wtwyt66np6aomo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-02 19:18:45 +00:00
if (annotate_browser__show(&browser->b, title, help) < 0)
return -1;
annotate_browser__calc_percent(browser, evsel);
if (browser->curr_hot) {
annotate_browser__set_rb_top(browser, browser->curr_hot);
browser->b.navkeypressed = false;
}
nd = browser->curr_hot;
while (1) {
key = ui_browser__run(&browser->b, delay_secs);
if (delay_secs != 0) {
annotate_browser__calc_percent(browser, evsel);
/*
* Current line focus got out of the list of most active
* lines, NULL it so that if TAB|UNTAB is pressed, we
* move to curr_hot (current hottest line).
*/
if (nd != NULL && RB_EMPTY_NODE(nd))
nd = NULL;
}
switch (key) {
case K_TIMER:
if (hbt)
hbt->timer(hbt->arg);
perf annotate browser: Show extra title line with event information So at the top we'll have two lines, like this, from 'perf report': # perf report --group --ignore-vmlinux ===================================================================================================== Samples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Percent │ nop │ push %rbx 0.00 14.29 0.00 │ pushfq 9.09 0.00 0.00 │ pop %rax 9.09 0.00 20.00 │ nop │ mov %rax,%rbx │ cli 4.55 7.14 0.00 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 77.27 78.57 70.00 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0.00 0.00 10.00 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on│key bindings ===================================================================================================== 9.09 + 9.09 + 4.55 + 77.27 = 100 14.29 + 7.14 + 78.57 = 100 20 + 70 + 10 = 100 We can do the math by using 't' to toggle from 'percent' to nr ===================================================================================================== Samples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Period │ nop │ push %rbx 0 79273 0 │ pushfq 190455 0 0 │ pop %rax 198038 0 3045 │ nop │ mov %rax,%rbx │ cli 217233 32562 0 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 3421649 979174 28273 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0 0 5193 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on│key bindings ===================================================================================================== 79273 + 190455 + 198038 + 3045 + 217233 + 32562 + 3421649 + 979174 + 28273 + 5193 = 5154895 Or number of samples: ===================================================================================================== ooSamples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Samples │ nop │ push %rbx 0 2 0 │ pushfq 2 0 0 │ pop %rax 2 0 2 │ nop │ mov %rax,%rbx │ cli 1 1 0 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 17 11 7 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0 0 1 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on key bindings ===================================================================================================== 2 + 2 + 2 + 2 + 1 + 1 + 17 + 11 + 7 + 1 = 46 Suggested-by: Martin Liška <mliska@suse.cz> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196935 Link: https://lkml.kernel.org/n/tip-ezccyxld50wtwyt66np6aomo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-02 19:18:45 +00:00
if (delay_secs != 0) {
symbol__annotate_decay_histogram(sym, evsel->core.idx);
perf annotate browser: Show extra title line with event information So at the top we'll have two lines, like this, from 'perf report': # perf report --group --ignore-vmlinux ===================================================================================================== Samples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Percent │ nop │ push %rbx 0.00 14.29 0.00 │ pushfq 9.09 0.00 0.00 │ pop %rax 9.09 0.00 20.00 │ nop │ mov %rax,%rbx │ cli 4.55 7.14 0.00 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 77.27 78.57 70.00 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0.00 0.00 10.00 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on│key bindings ===================================================================================================== 9.09 + 9.09 + 4.55 + 77.27 = 100 14.29 + 7.14 + 78.57 = 100 20 + 70 + 10 = 100 We can do the math by using 't' to toggle from 'percent' to nr ===================================================================================================== Samples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Period │ nop │ push %rbx 0 79273 0 │ pushfq 190455 0 0 │ pop %rax 198038 0 3045 │ nop │ mov %rax,%rbx │ cli 217233 32562 0 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 3421649 979174 28273 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0 0 5193 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on│key bindings ===================================================================================================== 79273 + 190455 + 198038 + 3045 + 217233 + 32562 + 3421649 + 979174 + 28273 + 5193 = 5154895 Or number of samples: ===================================================================================================== ooSamples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Samples │ nop │ push %rbx 0 2 0 │ pushfq 2 0 0 │ pop %rax 2 0 2 │ nop │ mov %rax,%rbx │ cli 1 1 0 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 17 11 7 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0 0 1 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on key bindings ===================================================================================================== 2 + 2 + 2 + 2 + 1 + 1 + 17 + 11 + 7 + 1 = 46 Suggested-by: Martin Liška <mliska@suse.cz> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196935 Link: https://lkml.kernel.org/n/tip-ezccyxld50wtwyt66np6aomo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-02 19:18:45 +00:00
hists__scnprintf_title(hists, title, sizeof(title));
annotate_browser__show(&browser->b, title, help);
}
continue;
case K_TAB:
if (nd != NULL) {
nd = rb_prev(nd);
if (nd == NULL)
nd = rb_last(&browser->entries);
} else
nd = browser->curr_hot;
break;
case K_UNTAB:
if (nd != NULL) {
nd = rb_next(nd);
if (nd == NULL)
nd = rb_first(&browser->entries);
} else
nd = browser->curr_hot;
break;
case K_F1:
case 'h':
ui_browser__help_window(&browser->b,
"UP/DOWN/PGUP\n"
"PGDN/SPACE Navigate\n"
"</> Move to prev/next symbol\n"
"q/ESC/CTRL+C Exit\n\n"
"ENTER Go to target\n"
"H Go to hottest instruction\n"
"TAB/shift+TAB Cycle thru hottest instructions\n"
"j Toggle showing jump to target arrows\n"
"J Toggle showing number of jump sources on targets\n"
"n Search next string\n"
"o Toggle disassembler output/simplified view\n"
perf annotate browser: Allow showing offsets in more than just jump targets Jesper wanted to see offsets at callq sites when doing some performance investigation related to retpolines, so save him some time by providing a 'O' hotkey to allow showing offsets from function start at call instructions or in all instructions, just go on pressing 'O' till the offsets you need appear. Example: Starts with: Samples: 64 of event 'cycles:ppp', 100000 Hz, Event count (approx.): 318963 ixgbe_read_reg /proc/kcore Percent│ ↑ je 2a │ ┌──cmp $0xffffffff,%r13d │ ├──je d0 │ │ mov $0x53e3,%edi │ │→ callq __const_udelay │ │ sub $0x1,%r15d │ │↑ jne 83 │ │ mov 0x8(%rbp),%rax │ │ testb $0x20,0x1799(%rax) │ │↑ je 2a │ │ mov 0x200(%rax),%rdi │ │ mov %r13d,%edx │ │ mov $0xffffffffc02595d8,%rsi │ │→ callq netdev_warn │ │↑ jmpq 2a │d0:└─→mov 0x8(%rbp),%rsi │ mov %rbp,%rdi │ mov %eax,0x4(%rsp) │ → callq ixgbe_remove_adapter.isra.77 │ mov 0x4(%rsp),%eax Press 'h' for help on key bindings ============================================================================ Pess 'O': Samples: 64 of event 'cycles:ppp', 100000 Hz, Event count (approx.): 318963 ixgbe_read_reg /proc/kcore Percent│ ↑ je 2a │ ┌──cmp $0xffffffff,%r13d │ ├──je d0 │ │ mov $0x53e3,%edi │99:│→ callq __const_udelay │ │ sub $0x1,%r15d │ │↑ jne 83 │ │ mov 0x8(%rbp),%rax │ │ testb $0x20,0x1799(%rax) │ │↑ je 2a │ │ mov 0x200(%rax),%rdi │ │ mov %r13d,%edx │ │ mov $0xffffffffc02595d8,%rsi │c6:│→ callq netdev_warn │ │↑ jmpq 2a │d0:└─→mov 0x8(%rbp),%rsi │ mov %rbp,%rdi │ mov %eax,0x4(%rsp) │db: → callq ixgbe_remove_adapter.isra.77 │ mov 0x4(%rsp),%eax Press 'h' for help on key bindings ============================================================================ Press 'O' again: Samples: 64 of event 'cycles:ppp', 100000 Hz, Event count (approx.): 318963 ixgbe_read_reg /proc/kcore Percent│8c: ↑ je 2a │8e:┌──cmp $0xffffffff,%r13d │92:├──je d0 │94:│ mov $0x53e3,%edi │99:│→ callq __const_udelay │9e:│ sub $0x1,%r15d │a2:│↑ jne 83 │a4:│ mov 0x8(%rbp),%rax │a8:│ testb $0x20,0x1799(%rax) │af:│↑ je 2a │b5:│ mov 0x200(%rax),%rdi │bc:│ mov %r13d,%edx │bf:│ mov $0xffffffffc02595d8,%rsi │c6:│→ callq netdev_warn │cb:│↑ jmpq 2a │d0:└─→mov 0x8(%rbp),%rsi │d4: mov %rbp,%rdi │d7: mov %eax,0x4(%rsp) │db: → callq ixgbe_remove_adapter.isra.77 │e0: mov 0x4(%rsp),%eax Press 'h' for help on key bindings ============================================================================ Press 'O' again and it will show just jump target offsets. Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-upp6pfdetwlsx18ec2uf1od4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-11 13:41:23 +00:00
"O Bump offset level (jump targets -> +call -> all -> cycle thru)\n"
"s Toggle source code view\n"
"t Circulate percent, total period, samples view\n"
perf annotate: Create hotkey 'c' to show min/max cycles In the 'perf annotate' view, a new hotkey 'c' is created for showing the min/max cycles. For example, when press 'c', the annotate view is: Percent│ IPC Cycle(min/max) │ │ │ Disassembly of section .text: │ │ 000000000003aab0 <random@@GLIBC_2.2.5>: 8.22 │3.92 sub $0x18,%rsp │3.92 mov $0x1,%esi │3.92 xor %eax,%eax │3.92 cmpl $0x0,argp_program_version_hook@@G │3.92 1(2/1) ↓ je 20 │ lock cmpxchg %esi,__abort_msg@@GLIBC_P │ ↓ jne 29 │ ↓ jmp 43 │1.10 20: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+ 8.93 │1.10 1(5/1) ↓ je 43 When press 'c' again, the annotate view is switched back: Percent│ IPC Cycle │ │ │ Disassembly of section .text: │ │ 000000000003aab0 <random@@GLIBC_2.2.5>: 8.22 │3.92 sub $0x18,%rsp │3.92 mov $0x1,%esi │3.92 xor %eax,%eax │3.92 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x │3.92 1 ↓ je 20 │ lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 │ ↓ jne 29 │ ↓ jmp 43 │1.10 20: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 8.93 │1.10 1 ↓ je 43 Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1526569118-14217-3-git-send-email-yao.jin@linux.intel.com [ Rename all maxmin to minmax ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-17 14:58:38 +00:00
"c Show min/max cycle\n"
"/ Search string\n"
"k Toggle line numbers\n"
"l Show full source file location\n"
"P Print to [symbol_name].annotation file.\n"
"r Run available scripts\n"
"p Toggle percent type [local/global]\n"
"b Toggle percent base [period/hits]\n"
"? Search string backwards\n"
"f Toggle showing offsets to full address\n");
continue;
case 'r':
script_browse(NULL, NULL);
annotate_browser__show(&browser->b, title, help);
continue;
case 'k':
annotate_opts.show_linenr = !annotate_opts.show_linenr;
continue;
case 'l':
annotate_browser__show_full_location (&browser->b);
continue;
case 'H':
nd = browser->curr_hot;
break;
case 's':
if (annotate_browser__toggle_source(browser))
ui_helpline__puts(help);
continue;
case 'o':
annotate_opts.use_offset = !annotate_opts.use_offset;
annotation__update_column_widths(notes);
continue;
perf annotate browser: Allow showing offsets in more than just jump targets Jesper wanted to see offsets at callq sites when doing some performance investigation related to retpolines, so save him some time by providing a 'O' hotkey to allow showing offsets from function start at call instructions or in all instructions, just go on pressing 'O' till the offsets you need appear. Example: Starts with: Samples: 64 of event 'cycles:ppp', 100000 Hz, Event count (approx.): 318963 ixgbe_read_reg /proc/kcore Percent│ ↑ je 2a │ ┌──cmp $0xffffffff,%r13d │ ├──je d0 │ │ mov $0x53e3,%edi │ │→ callq __const_udelay │ │ sub $0x1,%r15d │ │↑ jne 83 │ │ mov 0x8(%rbp),%rax │ │ testb $0x20,0x1799(%rax) │ │↑ je 2a │ │ mov 0x200(%rax),%rdi │ │ mov %r13d,%edx │ │ mov $0xffffffffc02595d8,%rsi │ │→ callq netdev_warn │ │↑ jmpq 2a │d0:└─→mov 0x8(%rbp),%rsi │ mov %rbp,%rdi │ mov %eax,0x4(%rsp) │ → callq ixgbe_remove_adapter.isra.77 │ mov 0x4(%rsp),%eax Press 'h' for help on key bindings ============================================================================ Pess 'O': Samples: 64 of event 'cycles:ppp', 100000 Hz, Event count (approx.): 318963 ixgbe_read_reg /proc/kcore Percent│ ↑ je 2a │ ┌──cmp $0xffffffff,%r13d │ ├──je d0 │ │ mov $0x53e3,%edi │99:│→ callq __const_udelay │ │ sub $0x1,%r15d │ │↑ jne 83 │ │ mov 0x8(%rbp),%rax │ │ testb $0x20,0x1799(%rax) │ │↑ je 2a │ │ mov 0x200(%rax),%rdi │ │ mov %r13d,%edx │ │ mov $0xffffffffc02595d8,%rsi │c6:│→ callq netdev_warn │ │↑ jmpq 2a │d0:└─→mov 0x8(%rbp),%rsi │ mov %rbp,%rdi │ mov %eax,0x4(%rsp) │db: → callq ixgbe_remove_adapter.isra.77 │ mov 0x4(%rsp),%eax Press 'h' for help on key bindings ============================================================================ Press 'O' again: Samples: 64 of event 'cycles:ppp', 100000 Hz, Event count (approx.): 318963 ixgbe_read_reg /proc/kcore Percent│8c: ↑ je 2a │8e:┌──cmp $0xffffffff,%r13d │92:├──je d0 │94:│ mov $0x53e3,%edi │99:│→ callq __const_udelay │9e:│ sub $0x1,%r15d │a2:│↑ jne 83 │a4:│ mov 0x8(%rbp),%rax │a8:│ testb $0x20,0x1799(%rax) │af:│↑ je 2a │b5:│ mov 0x200(%rax),%rdi │bc:│ mov %r13d,%edx │bf:│ mov $0xffffffffc02595d8,%rsi │c6:│→ callq netdev_warn │cb:│↑ jmpq 2a │d0:└─→mov 0x8(%rbp),%rsi │d4: mov %rbp,%rdi │d7: mov %eax,0x4(%rsp) │db: → callq ixgbe_remove_adapter.isra.77 │e0: mov 0x4(%rsp),%eax Press 'h' for help on key bindings ============================================================================ Press 'O' again and it will show just jump target offsets. Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-upp6pfdetwlsx18ec2uf1od4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-11 13:41:23 +00:00
case 'O':
if (++annotate_opts.offset_level > ANNOTATION__MAX_OFFSET_LEVEL)
annotate_opts.offset_level = ANNOTATION__MIN_OFFSET_LEVEL;
perf annotate browser: Allow showing offsets in more than just jump targets Jesper wanted to see offsets at callq sites when doing some performance investigation related to retpolines, so save him some time by providing a 'O' hotkey to allow showing offsets from function start at call instructions or in all instructions, just go on pressing 'O' till the offsets you need appear. Example: Starts with: Samples: 64 of event 'cycles:ppp', 100000 Hz, Event count (approx.): 318963 ixgbe_read_reg /proc/kcore Percent│ ↑ je 2a │ ┌──cmp $0xffffffff,%r13d │ ├──je d0 │ │ mov $0x53e3,%edi │ │→ callq __const_udelay │ │ sub $0x1,%r15d │ │↑ jne 83 │ │ mov 0x8(%rbp),%rax │ │ testb $0x20,0x1799(%rax) │ │↑ je 2a │ │ mov 0x200(%rax),%rdi │ │ mov %r13d,%edx │ │ mov $0xffffffffc02595d8,%rsi │ │→ callq netdev_warn │ │↑ jmpq 2a │d0:└─→mov 0x8(%rbp),%rsi │ mov %rbp,%rdi │ mov %eax,0x4(%rsp) │ → callq ixgbe_remove_adapter.isra.77 │ mov 0x4(%rsp),%eax Press 'h' for help on key bindings ============================================================================ Pess 'O': Samples: 64 of event 'cycles:ppp', 100000 Hz, Event count (approx.): 318963 ixgbe_read_reg /proc/kcore Percent│ ↑ je 2a │ ┌──cmp $0xffffffff,%r13d │ ├──je d0 │ │ mov $0x53e3,%edi │99:│→ callq __const_udelay │ │ sub $0x1,%r15d │ │↑ jne 83 │ │ mov 0x8(%rbp),%rax │ │ testb $0x20,0x1799(%rax) │ │↑ je 2a │ │ mov 0x200(%rax),%rdi │ │ mov %r13d,%edx │ │ mov $0xffffffffc02595d8,%rsi │c6:│→ callq netdev_warn │ │↑ jmpq 2a │d0:└─→mov 0x8(%rbp),%rsi │ mov %rbp,%rdi │ mov %eax,0x4(%rsp) │db: → callq ixgbe_remove_adapter.isra.77 │ mov 0x4(%rsp),%eax Press 'h' for help on key bindings ============================================================================ Press 'O' again: Samples: 64 of event 'cycles:ppp', 100000 Hz, Event count (approx.): 318963 ixgbe_read_reg /proc/kcore Percent│8c: ↑ je 2a │8e:┌──cmp $0xffffffff,%r13d │92:├──je d0 │94:│ mov $0x53e3,%edi │99:│→ callq __const_udelay │9e:│ sub $0x1,%r15d │a2:│↑ jne 83 │a4:│ mov 0x8(%rbp),%rax │a8:│ testb $0x20,0x1799(%rax) │af:│↑ je 2a │b5:│ mov 0x200(%rax),%rdi │bc:│ mov %r13d,%edx │bf:│ mov $0xffffffffc02595d8,%rsi │c6:│→ callq netdev_warn │cb:│↑ jmpq 2a │d0:└─→mov 0x8(%rbp),%rsi │d4: mov %rbp,%rdi │d7: mov %eax,0x4(%rsp) │db: → callq ixgbe_remove_adapter.isra.77 │e0: mov 0x4(%rsp),%eax Press 'h' for help on key bindings ============================================================================ Press 'O' again and it will show just jump target offsets. Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-upp6pfdetwlsx18ec2uf1od4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-11 13:41:23 +00:00
continue;
case 'j':
annotate_opts.jump_arrows = !annotate_opts.jump_arrows;
continue;
case 'J':
annotate_opts.show_nr_jumps = !annotate_opts.show_nr_jumps;
annotation__update_column_widths(notes);
continue;
case '/':
if (annotate_browser__search(browser, delay_secs)) {
show_help:
ui_helpline__puts(help);
}
continue;
case 'n':
if (browser->searching_backwards ?
annotate_browser__continue_search_reverse(browser, delay_secs) :
annotate_browser__continue_search(browser, delay_secs))
goto show_help;
continue;
case '?':
if (annotate_browser__search_reverse(browser, delay_secs))
goto show_help;
continue;
case 'D': {
static int seq;
ui_helpline__pop();
ui_helpline__fpush("%d: nr_ent=%d, height=%d, idx=%d, top_idx=%d, nr_asm_entries=%d",
seq++, browser->b.nr_entries,
browser->b.height,
browser->b.index,
browser->b.top_idx,
notes->src->nr_asm_entries);
}
continue;
case K_ENTER:
case K_RIGHT:
{
struct disasm_line *dl = disasm_line(browser->selection);
if (browser->selection == NULL)
ui_helpline__puts("Huh? No selection. Report to linux-kernel@vger.kernel.org");
else if (browser->selection->offset == -1)
ui_helpline__puts("Actions are only available for assembly lines.");
else if (!dl->ins.ops)
goto show_sup_ins;
else if (ins__is_ret(&dl->ins))
goto out;
perf annotate: Support jumping from one function to another For instance: entry_SYSCALL_64 /lib/modules/4.16.0-rc5-00086-gdf09348f78dc/build/vmlinux 5.50 │ → callq do_syscall_64 14.56 │ mov 0x58(%rsp),%rcx 7.44 │ mov 0x80(%rsp),%r11 0.32 │ cmp %rcx,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ shl $0x10,%rcx 0.32 │ sar $0x10,%rcx 3.24 │ cmp %rcx,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 2.27 │ cmpq $0x33,0x88(%rsp) 1.29 │ → jne swapgs_restore_regs_and_return_to_usermode │ mov 0x30(%rsp),%r11 8.74 │ cmp %r11,0x90(%rsp) │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ test $0x10100,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ cmpq $0x2b,0xa0(%rsp) 0.65 │ → jne swapgs_restore_regs_and_return_to_usermode It'll behave just like a "call" instruction, i.e. press enter or right arrow over one such line and the browser will navigate to the annotated disassembly of that function, which when exited, via left arrow or esc, will come back to the calling function. Now to support jump to an offset on a different function... Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-78o508mqvr8inhj63ddtw7mo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-23 13:50:35 +00:00
else if (!(annotate_browser__jump(browser, evsel, hbt) ||
annotate_browser__callq(browser, evsel, hbt))) {
show_sup_ins:
ui_helpline__puts("Actions are only available for function call/return & jump/branch instructions.");
}
continue;
}
case 'P':
map_symbol__annotation_dump(ms, evsel);
continue;
case 't':
perf annotate: Fix --show-total-period for tui/stdio2 perf annotate --show-total-period does not really show total period. The reason is we have two separate variables for the same purpose. One is in symbol_conf.show_total_period and another is annotation_options.show_total_period. We save command line option in symbol_conf.show_total_period but uses annotation_option.show_total_period while rendering tui/stdio2 browser. Though, we copy symbol_conf.show_total_period to annotation__default_options.show_total_period but that is not really effective as we don't use annotation__default_options once we copy default options to dynamic variable annotate.opts in cmd_annotate(). Instead of all these complication, keep only one variable and use it all over. symbol_conf.show_total_period is used by perf report/top as well. So let's kill annotation_options.show_total_period. On a side note, I've kept annotation_options.show_total_period definition because it's still used by perf-config code. Follow up patch to fix perf-config for annotate will remove annotation_options.show_total_period. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yisheng Xie <xieyisheng1@huawei.com> Link: http://lore.kernel.org/lkml/20200213064306.160480-3-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-13 06:43:00 +00:00
if (symbol_conf.show_total_period) {
symbol_conf.show_total_period = false;
perf annotate: Fix --show-nr-samples for tui/stdio2 perf annotate --show-nr-samples does not really show number of samples. The reason is we have two separate variables for the same purpose. One is in symbol_conf.show_nr_samples and another is annotation_options.show_nr_samples. We save command line option in symbol_conf.show_nr_samples but uses annotation_option.show_nr_samples while rendering tui/stdio2 browser. Though, we copy symbol_conf.show_nr_samples to annotation__default_options.show_nr_samples but that is not really effective as we don't use annotation__default_options once we copy default options to dynamic variable annotate.opts in cmd_annotate(). Instead of all these complication, keep only one variable and use it all over. symbol_conf.show_nr_samples is used by perf report/top as well. So let's kill annotation_options.show_nr_samples. On a side note, I've kept annotation_options.show_nr_samples definition because it's still used by perf-config code. Follow up patch to fix perf-config for annotate will remove annotation_options.show_nr_samples. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yisheng Xie <xieyisheng1@huawei.com> Link: http://lore.kernel.org/lkml/20200213064306.160480-4-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-13 06:43:01 +00:00
symbol_conf.show_nr_samples = true;
} else if (symbol_conf.show_nr_samples)
symbol_conf.show_nr_samples = false;
else
perf annotate: Fix --show-total-period for tui/stdio2 perf annotate --show-total-period does not really show total period. The reason is we have two separate variables for the same purpose. One is in symbol_conf.show_total_period and another is annotation_options.show_total_period. We save command line option in symbol_conf.show_total_period but uses annotation_option.show_total_period while rendering tui/stdio2 browser. Though, we copy symbol_conf.show_total_period to annotation__default_options.show_total_period but that is not really effective as we don't use annotation__default_options once we copy default options to dynamic variable annotate.opts in cmd_annotate(). Instead of all these complication, keep only one variable and use it all over. symbol_conf.show_total_period is used by perf report/top as well. So let's kill annotation_options.show_total_period. On a side note, I've kept annotation_options.show_total_period definition because it's still used by perf-config code. Follow up patch to fix perf-config for annotate will remove annotation_options.show_total_period. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yisheng Xie <xieyisheng1@huawei.com> Link: http://lore.kernel.org/lkml/20200213064306.160480-3-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-13 06:43:00 +00:00
symbol_conf.show_total_period = true;
annotation__update_column_widths(notes);
continue;
perf annotate: Create hotkey 'c' to show min/max cycles In the 'perf annotate' view, a new hotkey 'c' is created for showing the min/max cycles. For example, when press 'c', the annotate view is: Percent│ IPC Cycle(min/max) │ │ │ Disassembly of section .text: │ │ 000000000003aab0 <random@@GLIBC_2.2.5>: 8.22 │3.92 sub $0x18,%rsp │3.92 mov $0x1,%esi │3.92 xor %eax,%eax │3.92 cmpl $0x0,argp_program_version_hook@@G │3.92 1(2/1) ↓ je 20 │ lock cmpxchg %esi,__abort_msg@@GLIBC_P │ ↓ jne 29 │ ↓ jmp 43 │1.10 20: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+ 8.93 │1.10 1(5/1) ↓ je 43 When press 'c' again, the annotate view is switched back: Percent│ IPC Cycle │ │ │ Disassembly of section .text: │ │ 000000000003aab0 <random@@GLIBC_2.2.5>: 8.22 │3.92 sub $0x18,%rsp │3.92 mov $0x1,%esi │3.92 xor %eax,%eax │3.92 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x │3.92 1 ↓ je 20 │ lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 │ ↓ jne 29 │ ↓ jmp 43 │1.10 20: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 8.93 │1.10 1 ↓ je 43 Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1526569118-14217-3-git-send-email-yao.jin@linux.intel.com [ Rename all maxmin to minmax ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-17 14:58:38 +00:00
case 'c':
if (annotate_opts.show_minmax_cycle)
annotate_opts.show_minmax_cycle = false;
perf annotate: Create hotkey 'c' to show min/max cycles In the 'perf annotate' view, a new hotkey 'c' is created for showing the min/max cycles. For example, when press 'c', the annotate view is: Percent│ IPC Cycle(min/max) │ │ │ Disassembly of section .text: │ │ 000000000003aab0 <random@@GLIBC_2.2.5>: 8.22 │3.92 sub $0x18,%rsp │3.92 mov $0x1,%esi │3.92 xor %eax,%eax │3.92 cmpl $0x0,argp_program_version_hook@@G │3.92 1(2/1) ↓ je 20 │ lock cmpxchg %esi,__abort_msg@@GLIBC_P │ ↓ jne 29 │ ↓ jmp 43 │1.10 20: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+ 8.93 │1.10 1(5/1) ↓ je 43 When press 'c' again, the annotate view is switched back: Percent│ IPC Cycle │ │ │ Disassembly of section .text: │ │ 000000000003aab0 <random@@GLIBC_2.2.5>: 8.22 │3.92 sub $0x18,%rsp │3.92 mov $0x1,%esi │3.92 xor %eax,%eax │3.92 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x │3.92 1 ↓ je 20 │ lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 │ ↓ jne 29 │ ↓ jmp 43 │1.10 20: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 8.93 │1.10 1 ↓ je 43 Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1526569118-14217-3-git-send-email-yao.jin@linux.intel.com [ Rename all maxmin to minmax ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-17 14:58:38 +00:00
else
annotate_opts.show_minmax_cycle = true;
perf annotate: Create hotkey 'c' to show min/max cycles In the 'perf annotate' view, a new hotkey 'c' is created for showing the min/max cycles. For example, when press 'c', the annotate view is: Percent│ IPC Cycle(min/max) │ │ │ Disassembly of section .text: │ │ 000000000003aab0 <random@@GLIBC_2.2.5>: 8.22 │3.92 sub $0x18,%rsp │3.92 mov $0x1,%esi │3.92 xor %eax,%eax │3.92 cmpl $0x0,argp_program_version_hook@@G │3.92 1(2/1) ↓ je 20 │ lock cmpxchg %esi,__abort_msg@@GLIBC_P │ ↓ jne 29 │ ↓ jmp 43 │1.10 20: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+ 8.93 │1.10 1(5/1) ↓ je 43 When press 'c' again, the annotate view is switched back: Percent│ IPC Cycle │ │ │ Disassembly of section .text: │ │ 000000000003aab0 <random@@GLIBC_2.2.5>: 8.22 │3.92 sub $0x18,%rsp │3.92 mov $0x1,%esi │3.92 xor %eax,%eax │3.92 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x │3.92 1 ↓ je 20 │ lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 │ ↓ jne 29 │ ↓ jmp 43 │1.10 20: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 8.93 │1.10 1 ↓ je 43 Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1526569118-14217-3-git-send-email-yao.jin@linux.intel.com [ Rename all maxmin to minmax ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-17 14:58:38 +00:00
annotation__update_column_widths(notes);
continue;
case 'p':
case 'b':
switch_percent_type(&annotate_opts, key == 'b');
hists__scnprintf_title(hists, title, sizeof(title));
annotate_browser__show(&browser->b, title, help);
continue;
case 'f':
annotation__toggle_full_addr(notes, ms);
continue;
case K_LEFT:
case '<':
case '>':
case K_ESC:
case 'q':
case CTRL('c'):
goto out;
default:
continue;
}
if (nd != NULL)
annotate_browser__set_rb_top(browser, nd);
}
out:
ui_browser__hide(&browser->b);
return key;
}
int map_symbol__tui_annotate(struct map_symbol *ms, struct evsel *evsel,
struct hist_browser_timer *hbt)
{
return symbol__tui_annotate(ms, evsel, hbt);
}
int hist_entry__tui_annotate(struct hist_entry *he, struct evsel *evsel,
struct hist_browser_timer *hbt)
{
/* reset abort key so that it can get Ctrl-C as a key */
SLang_reset_tty();
SLang_init_tty(0, 0, 0);
SLtty_set_suspend_state(true);
return map_symbol__tui_annotate(&he->ms, evsel, hbt);
}
int symbol__tui_annotate(struct map_symbol *ms, struct evsel *evsel,
struct hist_browser_timer *hbt)
{
struct symbol *sym = ms->sym;
struct annotation *notes = symbol__annotation(sym);
struct annotate_browser browser = {
.b = {
.refresh = annotate_browser__refresh,
.seek = ui_browser__list_head_seek,
.write = annotate_browser__write,
.filter = disasm_line__filter,
perf annotate browser: Show extra title line with event information So at the top we'll have two lines, like this, from 'perf report': # perf report --group --ignore-vmlinux ===================================================================================================== Samples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Percent │ nop │ push %rbx 0.00 14.29 0.00 │ pushfq 9.09 0.00 0.00 │ pop %rax 9.09 0.00 20.00 │ nop │ mov %rax,%rbx │ cli 4.55 7.14 0.00 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 77.27 78.57 70.00 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0.00 0.00 10.00 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on│key bindings ===================================================================================================== 9.09 + 9.09 + 4.55 + 77.27 = 100 14.29 + 7.14 + 78.57 = 100 20 + 70 + 10 = 100 We can do the math by using 't' to toggle from 'percent' to nr ===================================================================================================== Samples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Period │ nop │ push %rbx 0 79273 0 │ pushfq 190455 0 0 │ pop %rax 198038 0 3045 │ nop │ mov %rax,%rbx │ cli 217233 32562 0 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 3421649 979174 28273 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0 0 5193 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on│key bindings ===================================================================================================== 79273 + 190455 + 198038 + 3045 + 217233 + 32562 + 3421649 + 979174 + 28273 + 5193 = 5154895 Or number of samples: ===================================================================================================== ooSamples: 46 of events 'cycles', 4000 Hz, Event count (approx.): 5154895 _raw_spin_lock_irqsave /proc/kcore Samples │ nop │ push %rbx 0 2 0 │ pushfq 2 0 0 │ pop %rax 2 0 2 │ nop │ mov %rax,%rbx │ cli 1 1 0 │ nop │ xor %eax,%eax │ mov $0x1,%edx │ lock cmpxchg %edx,(%rdi) 17 11 7 │ test %eax,%eax │ ↓ jne 2b │ mov %rbx,%rax 0 0 1 │ pop %rbx │ ← retq │2b: mov %eax,%esi │ → callq queued_spin_lock_slowpath │ mov %rbx,%rax │ pop %rbx Press 'h' for help on key bindings ===================================================================================================== 2 + 2 + 2 + 2 + 1 + 1 + 17 + 11 + 7 + 1 = 46 Suggested-by: Martin Liška <mliska@suse.cz> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196935 Link: https://lkml.kernel.org/n/tip-ezccyxld50wtwyt66np6aomo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-02 19:18:45 +00:00
.extra_title_lines = 1, /* for hists__scnprintf_title() */
.priv = ms,
.use_navkeypressed = true,
},
};
perf map: Add accessor for dso Later changes will add reference count checking for struct map, with dso being the most frequently accessed variable. Add an accessor so that the reference count check is only necessary in one place. Additional changes: - add a dso variable to avoid repeated map__dso calls. - in builtin-mem.c dump_raw_samples, code only partially tested for dso == NULL. Make the possibility of NULL consistent. - in thread.c thread__memcpy fix use of spaces and use tabs. Committer notes: Did missing conversions on these files: tools/perf/arch/powerpc/util/skip-callchain-idx.c tools/perf/arch/powerpc/util/sym-handling.c tools/perf/ui/browsers/hists.c tools/perf/ui/gtk/annotate.c tools/perf/util/cs-etm.c tools/perf/util/thread.c tools/perf/util/unwind-libunwind-local.c tools/perf/util/unwind-libunwind.c Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-20 21:22:35 +00:00
struct dso *dso;
int ret = -1, err;
perf annotate: Avoid TUI crash when navigating in the annotation of recursive functions In 'perf report', entering a recursive function from inside of itself (either directly of indirectly through some other function) results in calling symbol__annotate2 multiple() times, and freeing the whole disassembly when exiting from the innermost instance. The first issue causes the function's disassembly to be duplicated, and the latter a heap use-after-free (and crash) when trying to access the disassembly again. I reproduced the bug on perf 5.11.22 (Ubuntu 20.04.3 LTS) and 5.16.rc8 with the following testcase (compile with gcc recursive.c -o recursive). To reproduce: - perf record ./recursive - perf report - enter fibonacci and annotate it - move the cursor on one of the "callq fibonacci" instructions and press enter - at this point there will be two copies of the function in the disassembly - go back by pressing q, and perf will crash #include <stdio.h> int fibonacci(int n) { if(n <= 2) return 1; return fibonacci(n-1) + fibonacci(n-2); } int main() { printf("%d\n", fibonacci(40)); } This patch addresses the issue by annotating a function and freeing the associated memory on exit only if no annotation is already present, so that a recursive function is only annotated on entry. Signed-off-by: Dario Petrillo <dario.pk1@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@kernel.org Link: http://lore.kernel.org/lkml/20220109234441.325106-1-dario.pk1@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-09 23:44:41 +00:00
int not_annotated = list_empty(&notes->src->source);
if (sym == NULL)
return -1;
perf map: Add accessor for dso Later changes will add reference count checking for struct map, with dso being the most frequently accessed variable. Add an accessor so that the reference count check is only necessary in one place. Additional changes: - add a dso variable to avoid repeated map__dso calls. - in builtin-mem.c dump_raw_samples, code only partially tested for dso == NULL. Make the possibility of NULL consistent. - in thread.c thread__memcpy fix use of spaces and use tabs. Committer notes: Did missing conversions on these files: tools/perf/arch/powerpc/util/skip-callchain-idx.c tools/perf/arch/powerpc/util/sym-handling.c tools/perf/ui/browsers/hists.c tools/perf/ui/gtk/annotate.c tools/perf/util/cs-etm.c tools/perf/util/thread.c tools/perf/util/unwind-libunwind-local.c tools/perf/util/unwind-libunwind.c Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-20 21:22:35 +00:00
dso = map__dso(ms->map);
if (dso->annotate_warned)
return -1;
perf annotate: Avoid TUI crash when navigating in the annotation of recursive functions In 'perf report', entering a recursive function from inside of itself (either directly of indirectly through some other function) results in calling symbol__annotate2 multiple() times, and freeing the whole disassembly when exiting from the innermost instance. The first issue causes the function's disassembly to be duplicated, and the latter a heap use-after-free (and crash) when trying to access the disassembly again. I reproduced the bug on perf 5.11.22 (Ubuntu 20.04.3 LTS) and 5.16.rc8 with the following testcase (compile with gcc recursive.c -o recursive). To reproduce: - perf record ./recursive - perf report - enter fibonacci and annotate it - move the cursor on one of the "callq fibonacci" instructions and press enter - at this point there will be two copies of the function in the disassembly - go back by pressing q, and perf will crash #include <stdio.h> int fibonacci(int n) { if(n <= 2) return 1; return fibonacci(n-1) + fibonacci(n-2); } int main() { printf("%d\n", fibonacci(40)); } This patch addresses the issue by annotating a function and freeing the associated memory on exit only if no annotation is already present, so that a recursive function is only annotated on entry. Signed-off-by: Dario Petrillo <dario.pk1@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@kernel.org Link: http://lore.kernel.org/lkml/20220109234441.325106-1-dario.pk1@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-09 23:44:41 +00:00
if (not_annotated) {
err = symbol__annotate2(ms, evsel, &browser.arch);
perf annotate: Avoid TUI crash when navigating in the annotation of recursive functions In 'perf report', entering a recursive function from inside of itself (either directly of indirectly through some other function) results in calling symbol__annotate2 multiple() times, and freeing the whole disassembly when exiting from the innermost instance. The first issue causes the function's disassembly to be duplicated, and the latter a heap use-after-free (and crash) when trying to access the disassembly again. I reproduced the bug on perf 5.11.22 (Ubuntu 20.04.3 LTS) and 5.16.rc8 with the following testcase (compile with gcc recursive.c -o recursive). To reproduce: - perf record ./recursive - perf report - enter fibonacci and annotate it - move the cursor on one of the "callq fibonacci" instructions and press enter - at this point there will be two copies of the function in the disassembly - go back by pressing q, and perf will crash #include <stdio.h> int fibonacci(int n) { if(n <= 2) return 1; return fibonacci(n-1) + fibonacci(n-2); } int main() { printf("%d\n", fibonacci(40)); } This patch addresses the issue by annotating a function and freeing the associated memory on exit only if no annotation is already present, so that a recursive function is only annotated on entry. Signed-off-by: Dario Petrillo <dario.pk1@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@kernel.org Link: http://lore.kernel.org/lkml/20220109234441.325106-1-dario.pk1@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-09 23:44:41 +00:00
if (err) {
char msg[BUFSIZ];
perf map: Add accessor for dso Later changes will add reference count checking for struct map, with dso being the most frequently accessed variable. Add an accessor so that the reference count check is only necessary in one place. Additional changes: - add a dso variable to avoid repeated map__dso calls. - in builtin-mem.c dump_raw_samples, code only partially tested for dso == NULL. Make the possibility of NULL consistent. - in thread.c thread__memcpy fix use of spaces and use tabs. Committer notes: Did missing conversions on these files: tools/perf/arch/powerpc/util/skip-callchain-idx.c tools/perf/arch/powerpc/util/sym-handling.c tools/perf/ui/browsers/hists.c tools/perf/ui/gtk/annotate.c tools/perf/util/cs-etm.c tools/perf/util/thread.c tools/perf/util/unwind-libunwind-local.c tools/perf/util/unwind-libunwind.c Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-20 21:22:35 +00:00
dso->annotate_warned = true;
perf annotate: Avoid TUI crash when navigating in the annotation of recursive functions In 'perf report', entering a recursive function from inside of itself (either directly of indirectly through some other function) results in calling symbol__annotate2 multiple() times, and freeing the whole disassembly when exiting from the innermost instance. The first issue causes the function's disassembly to be duplicated, and the latter a heap use-after-free (and crash) when trying to access the disassembly again. I reproduced the bug on perf 5.11.22 (Ubuntu 20.04.3 LTS) and 5.16.rc8 with the following testcase (compile with gcc recursive.c -o recursive). To reproduce: - perf record ./recursive - perf report - enter fibonacci and annotate it - move the cursor on one of the "callq fibonacci" instructions and press enter - at this point there will be two copies of the function in the disassembly - go back by pressing q, and perf will crash #include <stdio.h> int fibonacci(int n) { if(n <= 2) return 1; return fibonacci(n-1) + fibonacci(n-2); } int main() { printf("%d\n", fibonacci(40)); } This patch addresses the issue by annotating a function and freeing the associated memory on exit only if no annotation is already present, so that a recursive function is only annotated on entry. Signed-off-by: Dario Petrillo <dario.pk1@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@kernel.org Link: http://lore.kernel.org/lkml/20220109234441.325106-1-dario.pk1@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-09 23:44:41 +00:00
symbol__strerror_disassemble(ms, err, msg, sizeof(msg));
ui__error("Couldn't annotate %s:\n%s", sym->name, msg);
goto out_free_offsets;
}
}
ui_helpline__push("Press ESC to exit");
browser.b.width = notes->src->max_line_len;
browser.b.nr_entries = notes->src->nr_entries;
browser.b.entries = &notes->src->source,
browser.b.width += 18; /* Percentage */
if (annotate_opts.hide_src_code)
ui_browser__init_asm_mode(&browser.b);
ret = annotate_browser__run(&browser, evsel, hbt);
perf annotate: Avoid TUI crash when navigating in the annotation of recursive functions In 'perf report', entering a recursive function from inside of itself (either directly of indirectly through some other function) results in calling symbol__annotate2 multiple() times, and freeing the whole disassembly when exiting from the innermost instance. The first issue causes the function's disassembly to be duplicated, and the latter a heap use-after-free (and crash) when trying to access the disassembly again. I reproduced the bug on perf 5.11.22 (Ubuntu 20.04.3 LTS) and 5.16.rc8 with the following testcase (compile with gcc recursive.c -o recursive). To reproduce: - perf record ./recursive - perf report - enter fibonacci and annotate it - move the cursor on one of the "callq fibonacci" instructions and press enter - at this point there will be two copies of the function in the disassembly - go back by pressing q, and perf will crash #include <stdio.h> int fibonacci(int n) { if(n <= 2) return 1; return fibonacci(n-1) + fibonacci(n-2); } int main() { printf("%d\n", fibonacci(40)); } This patch addresses the issue by annotating a function and freeing the associated memory on exit only if no annotation is already present, so that a recursive function is only annotated on entry. Signed-off-by: Dario Petrillo <dario.pk1@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@kernel.org Link: http://lore.kernel.org/lkml/20220109234441.325106-1-dario.pk1@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-09 23:44:41 +00:00
if(not_annotated)
annotated_source__purge(notes->src);
perf annotate browser: Hide non jump target addresses in offset mode This: 0.00 : ffffffff8116bd00: lock btsl $0x0,(%r12) 100.00 : ffffffff8116bd07: sbb %eax,%eax 0.00 : ffffffff8116bd09: test %eax,%eax 0.00 : ffffffff8116bd0b: jne ffffffff8116bf5f <__mem_cgroup_commit_charge+0x28f> 0.00 : ffffffff8116bd11: mov (%r12),%rax 0.00 : ffffffff8116bd15: test $0x2,%al 0.00 : ffffffff8116bd17: jne ffffffff8116bf6e <__mem_cgroup_commit_charge+0x29e> 0.00 : ffffffff8116bd1d: test %r9b,%r9b 0.00 : ffffffff8116bd20: jne ffffffff8116be30 <__mem_cgroup_commit_charge+0x160> 0.00 : ffffffff8116bd26: xor %eax,%eax 0.00 : ffffffff8116bd28: mov %r13,0x8(%r12) 0.00 : ffffffff8116bd2d: lock orb $0x2,(%r12) 0.00 : ffffffff8116bd33: test %r9b,%r9b 0.00 : ffffffff8116bd36: je ffffffff8116bdf3 <__mem_cgroup_commit_charge+0x123> Becomes: 0.00 : 30: lock btsl $0x0,(%r12) 100.00 : sbb %eax,%eax 0.00 : test %eax,%eax 0.00 : jne 28f 0.00 : mov (%r12),%rax 0.00 : test $0x2,%al 0.00 : jne 29e 0.00 : test %r9b,%r9b 0.00 : jne 160 0.00 : 56: xor %eax,%eax 0.00 : 58: mov %r13,0x8(%r12) 0.00 : lock orb $0x2,(%r12) 0.00 : test %r9b,%r9b 0.00 : je 123 I.e. We trow away all those useless addresses and keep just jump labels. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-r2vmbtgz0l8coluj8flztgrn@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-04-19 15:19:22 +00:00
out_free_offsets:
perf annotate: Avoid TUI crash when navigating in the annotation of recursive functions In 'perf report', entering a recursive function from inside of itself (either directly of indirectly through some other function) results in calling symbol__annotate2 multiple() times, and freeing the whole disassembly when exiting from the innermost instance. The first issue causes the function's disassembly to be duplicated, and the latter a heap use-after-free (and crash) when trying to access the disassembly again. I reproduced the bug on perf 5.11.22 (Ubuntu 20.04.3 LTS) and 5.16.rc8 with the following testcase (compile with gcc recursive.c -o recursive). To reproduce: - perf record ./recursive - perf report - enter fibonacci and annotate it - move the cursor on one of the "callq fibonacci" instructions and press enter - at this point there will be two copies of the function in the disassembly - go back by pressing q, and perf will crash #include <stdio.h> int fibonacci(int n) { if(n <= 2) return 1; return fibonacci(n-1) + fibonacci(n-2); } int main() { printf("%d\n", fibonacci(40)); } This patch addresses the issue by annotating a function and freeing the associated memory on exit only if no annotation is already present, so that a recursive function is only annotated on entry. Signed-off-by: Dario Petrillo <dario.pk1@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@kernel.org Link: http://lore.kernel.org/lkml/20220109234441.325106-1-dario.pk1@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-09 23:44:41 +00:00
if(not_annotated)
zfree(&notes->src->offsets);
return ret;
}