forked from Minki/linux
7fa46cbf20
19491 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Jin Yao
|
7fa46cbf20 |
perf report: Sort by sampled cycles percent per block for tui
Previous patch has implemented a new option "--total-cycles". But only stdio mode is supported. This patch supports the tui mode and support '--percent-limit'. For example, perf record -b ./div perf report --total-cycles --percent-limit 1 # Samples: 2753248 of event 'cycles' Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div 15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so 5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div 4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so 4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div 3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div 3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so 3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so 2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so 2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so 2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so 2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div 1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so 1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div 1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so -------------------------------------------------- v7: --- 1. Since we have used use_browser in report__browse_block_hists to support stdio mode, now we also add supporting for tui. 2. Move block tui browser code from ui/browsers/hists.c to block-info.c. v6: --- Create report__tui_browse_block_hists in block-info.c (codes are moved from builtin-report.c). v5: --- Fix a crash issue when running perf report without '--total-cycles'. The issue is because the internal flag is renamed from 'total_cycles' to 'total_cycles_mode' in previous patch but this patch still uses 'total_cycles' to check if the '--total-cycles' option is enabled, which causes the code to be inconsistent. v4: --- Since the block collection is moved out of printing in previous patch, this patch is updated accordingly for tui supporting. v3: --- Minor change since the function name is changed: block_total_cycles_percent -> block_info__total_cycles_percent Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-8-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
0b49f83657 |
perf report: Support --percent-limit for --total-cycles
We have already supported the '--total-cycles' option in previous patch. It's also useful to show entries only above a threshold percent. This patch enables '--percent-limit' for not showing entries under that percent. For example: perf report --total-cycles --stdio --percent-limit 1 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 2M of event 'cycles' # Event count (approx.): 2753248 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object # ............... .............. ........... .......... ................................................................. .................... # 26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div 15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so 5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div 4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so 4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div 3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div 3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so 3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so 2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so 2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so 2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so 2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div 1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so 1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div 1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so Committer testing: From second exapmple onwards slightly edited for brevity: # perf report --total-cycles --percent-limit 2 --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 6M of event 'cycles' # Event count (approx.): 6299936 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object # ............... .............. ........... .......... ...................................................................... .................... # 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] # # (Tip: Create an archive with symtabs to analyse on other machine: perf archive) # # perf report --total-cycles --percent-limit 1 --stdio # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] 1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so # # perf report --total-cycles --percent-limit 0.7 --stdio # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] 1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so 0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux] # ------------------------------------------- It only shows the entries which 'Sampled Cycles%' > 1%. v7: --- No functional change. Only fix the conflict issue because previous patches are changed. v6: --- No functional change. Only fix the conflict issue because previous patches are changed. v5: --- No functional change. Only fix the conflict issue because previous patches are changed. v4: --- No functional change. Only fix the build issue because previous patches are changed. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-7-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
6f7164fa23 |
perf report: Sort by sampled cycles percent per block for stdio
It would be useful to support sorting for all blocks by the sampled cycles percent per block. This is useful to concentrate on the globally hottest blocks. This patch implements a new option "--total-cycles" which sorts all blocks by 'Sampled Cycles%'. The 'Sampled Cycles%' is the percent: percent = block sampled cycles aggregation / total sampled cycles Note that, this patch only supports "--stdio" mode. For example, # perf record -b ./div # perf report --total-cycles --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 2M of event 'cycles' # Event count (approx.): 2753248 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object # ............... .............. ........... .......... ................................................ ................. # 26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div 15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so 5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div 4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so 4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div 3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div 3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so 3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so 2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so 2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so 2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so 2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div 1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so 1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div 1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so 0.25% 182.5K 0.02% 1 [random_r.c:388 -> random_r.c:391] libc-2.27.so 0.00% 48 1.07% 48 [x86_pmu_enable+284 -> x86_pmu_enable+298] [kernel.kallsyms] 0.00% 74 1.64% 74 [vm_mmap_pgoff+0 -> vm_mmap_pgoff+92] [kernel.kallsyms] 0.00% 73 1.62% 73 [vm_mmap+0 -> vm_mmap+48] [kernel.kallsyms] 0.00% 63 0.69% 31 [up_write+0 -> up_write+34] [kernel.kallsyms] 0.00% 13 0.29% 13 [setup_arg_pages+396 -> setup_arg_pages+413] [kernel.kallsyms] 0.00% 3 0.07% 3 [setup_arg_pages+418 -> setup_arg_pages+450] [kernel.kallsyms] 0.00% 616 6.84% 308 [security_mmap_file+0 -> security_mmap_file+72] [kernel.kallsyms] 0.00% 23 0.51% 23 [security_mmap_file+77 -> security_mmap_file+87] [kernel.kallsyms] 0.00% 4 0.02% 1 [sched_clock+0 -> sched_clock+4] [kernel.kallsyms] 0.00% 4 0.02% 1 [sched_clock+9 -> sched_clock+12] [kernel.kallsyms] 0.00% 1 0.02% 1 [rcu_nmi_exit+0 -> rcu_nmi_exit+9] [kernel.kallsyms] Committer testing: This should provide material for hours of endless joy, both from looking for suspicious things in the implementation of this patch, such as the top one: # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] As well from things that look legit: # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 0.16% 123.0K 0.60% 4.7K [nospec-branch.h:265 -> nospec-branch.h:278] [kernel.vmlinux] :-) Very short system wide taken branches session: # perf record -h -b Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -b, --branch-any sample any taken branches # # perf record -b ^C[ perf record: Woken up 595 times to write data ] [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ] # # perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY # # perf report --total-cycles --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 6M of event 'cycles' # Event count (approx.): 6299936 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object # ............... .............. ........... .......... ...................................................................... .................... # 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] 1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so 0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux] 0.56% 541.8K 0.09% 672 [compiler.h:199 -> common.c:300] [kernel.vmlinux] 0.39% 293.2K 0.01% 104 [list_debug.c:43 -> list_debug.c:61] [kernel.vmlinux] 0.36% 278.6K 0.03% 272 [entry_64.S:1289 -> entry_64.S:1308] [kernel.vmlinux] 0.30% 260.8K 0.07% 564 [clear_page_64.S:47 -> clear_page_64.S:50] [kernel.vmlinux] 0.28% 215.3K 0.05% 369 [traps.c:623 -> traps.c:628] [kernel.vmlinux] 0.23% 178.1K 0.04% 278 [entry_64.S:271 -> entry_64.S:275] [kernel.vmlinux] 0.20% 152.6K 0.09% 706 [paravirt.c:177 -> paravirt.c:179] [kernel.vmlinux] 0.20% 155.8K 0.05% 373 [entry_64.S:153 -> entry_64.S:175] [kernel.vmlinux] 0.18% 136.6K 0.03% 222 [msr.h:105 -> msr.h:166] [kernel.vmlinux] 0.16% 123.0K 0.60% 4.7K [nospec-branch.h:265 -> nospec-branch.h:278] [kernel.vmlinux] 0.16% 118.3K 0.01% 44 [entry_64.S:632 -> entry_64.S:657] [kernel.vmlinux] 0.14% 104.5K 0.00% 28 [rwsem.c:1541 -> rwsem.c:1544] [kernel.vmlinux] 0.13% 99.2K 0.01% 53 [spinlock.c:150 -> spinlock.c:152] [kernel.vmlinux] 0.13% 95.5K 0.00% 35 [swap.c:456 -> swap.c:471] [kernel.vmlinux] 0.12% 96.2K 0.05% 407 [copy_user_64.S:175 -> copy_user_64.S:209] [kernel.vmlinux] 0.11% 85.9K 0.00% 31 [swap.c:400 -> page-flags.h:188] [kernel.vmlinux] 0.10% 73.0K 0.01% 52 [paravirt.h:763 -> list.h:131] [kernel.vmlinux] 0.07% 56.2K 0.03% 214 [filemap.c:1524 -> filemap.c:1557] [kernel.vmlinux] 0.07% 54.2K 0.02% 145 [memory.c:1032 -> memory.c:1049] [kernel.vmlinux] 0.07% 50.3K 0.00% 39 [mmzone.c:49 -> mmzone.c:69] [kernel.vmlinux] 0.06% 48.3K 0.01% 40 [paravirt.h:768 -> page_alloc.c:3304] [kernel.vmlinux] 0.06% 46.7K 0.02% 155 [memory.c:1032 -> memory.c:1056] [kernel.vmlinux] 0.06% 46.9K 0.01% 103 [swap.c:867 -> swap.c:902] [kernel.vmlinux] 0.06% 47.8K 0.00% 34 [entry_64.S:1201 -> entry_64.S:1202] [kernel.vmlinux] ----------------------------------------------------------- v7: --- Use use_browser in report__browse_block_hists for supporting stdio and potential tui mode. v6: --- Create report__browse_block_hists in block-info.c (codes are moved from builtin-report.c). It's called from perf_evlist__tty_browse_hists. v5: --- 1. Move all block functions to block-info.c 2. Move the code of setting ms in block hist_entry to other patch. v4: --- 1. Use new option '--total-cycles' to replace '-s total_cycles' in v3. 2. Move block info collection out of block info printing. v3: --- 1. Use common function block_info__process_sym to process the blocks per symbol. 2. Remove the nasty hack for skipping calculation of column length 3. Some minor cleanup Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-6-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
b65a7d372b |
perf hist: Support block formats with compare/sort/display
This patch provides helper routines to support new columns for block info output. The new columns are: Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object v5: --- 1. Move more block related functions from builtin-report.c to block-info.c 2. Set ms (map+sym) in block hist_entry. Because this info is needed for reporting the block range (i.e. source line) Committer notes: Remove unused set_fmt() function, some build were not completing with: util/block-info.c:396:20: error: unused function 'set_fmt' [-Werror,-Wunused-function] static inline void set_fmt(struct block_fmt *block_fmt, ^ 1 error generated. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-5-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
7841f40aed |
perf hist: Count the total cycles of all samples
We can get the per sample cycles by hist__account_cycles(). It's also useful to know the total cycles of all samples in order to get the cycles coverage for a single program block in further. For example: coverage = per block sampled cycles / total sampled cycles This patch creates a new argument 'total_cycles' in hist__account_cycles(), which will be added with the cycles of each sample. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-4-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
6041441870 |
perf block: Cleanup and refactor block info functions
We have already implemented some block-info related functions. Now it's time to do some cleanup, refactoring and move the functions and structures to new block-info.h/block-info.c. v4: --- Move code for skipping column length calculation to patch: 'perf diff: Don't use hack to skip column length calculation' v3: --- 1. Rename the patch title 2. Rename from block.h/block.c to block-info.h/block-info.c 3. Move more common part to block-info, such as block_info__process_sym. 4. Remove the nasty hack for skipping calculation of column length Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-3-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
0bdf181fe0 |
perf diff: Don't use hack to skip column length calculation
Previously we use a nasty hack to skip the hists__calc_col_len for block since this function is not very suitable for block column length calculation. This patch removes the hack code and add a check at the entry of hists__calc_col_len to skip for block case. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-2-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
af8490eb2b |
perf tests: Fix out of bounds memory access
The test case 'Read backward ring buffer' failed on 32-bit architectures
which were found by LKFT perf testing. The test failed on arm32 x15
device, qemu_arm32, qemu_i386, and found intermittent failure on i386;
the failure log is as below:
50: Read backward ring buffer :
--- start ---
test child forked, pid 510
Using CPUID GenuineIntel-6-9E-9
mmap size 1052672B
mmap size 8192B
Finished reading overwrite ring buffer: rewind
free(): invalid next size (fast)
test child interrupted
---- end ----
Read backward ring buffer: FAILED!
The log hints there have issue for memory usage, thus free() reports
error 'invalid next size' and directly exit for the case. Finally, this
issue is root caused as out of bounds memory access for the data array
'evsel->id'.
The backward ring buffer test invokes do_test() twice. 'evsel->id' is
allocated at the first call with the flow:
test__backward_ring_buffer()
`-> do_test()
`-> evlist__mmap()
`-> evlist__mmap_ex()
`-> perf_evsel__alloc_id()
So 'evsel->id' is allocated with one item, and it will be used in
function perf_evlist__id_add():
evsel->id[0] = id
evsel->ids = 1
At the second call for do_test(), it skips to initialize 'evsel->id'
and reuses the array which is allocated in the first call. But
'evsel->ids' contains the stale value. Thus:
evsel->id[1] = id -> out of bound access
evsel->ids = 2
To fix this issue, we will use evlist__open() and evlist__close() pair
functions to prepare and cleanup context for evlist; so 'evsel->id' and
'evsel->ids' can be initialized properly when invoke do_test() and avoid
the out of bounds memory access.
Fixes:
|
||
Jiwei Sun
|
6d57581659 |
perf record: Add support for limit perf output file size
The patch adds a new option to limit the output file size, then based on it, we can create a wrapper of the perf command that uses the option to avoid exhausting the disk space by the unconscious user. In order to make the perf.data parsable, we just limit the sample data size, since the perf.data consists of many headers and sample data and other data, the actual size of the recorded file will bigger than the setting value. Testing it: # ./perf record -a -g --max-size=10M Couldn't synthesize bpf events. [ perf record: perf size limit reached (10249 KB), stopping session ] [ perf record: Woken up 32 times to write data ] [ perf record: Captured and wrote 10.133 MB perf.data (71964 samples) ] # ls -lh perf.data -rw------- 1 root root 11M Oct 22 14:32 perf.data # ./perf record -a -g --max-size=10K [ perf record: perf size limit reached (10 KB), stopping session ] Couldn't synthesize bpf events. [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 1.546 MB perf.data (69 samples) ] # ls -l perf.data -rw------- 1 root root 1626952 Oct 22 14:36 perf.data Committer notes: Fixed the build in multiple distros by using PRIu64 to print u64 struct members, fixing this: builtin-record.c: In function 'record__write': builtin-record.c:150:5: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'u64' [-Werror=format=] rec->bytes_written >> 10); ^ CC /tmp/build/pe Signed-off-by: Jiwei Sun <jiwei.sun@windriver.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Danter <richard.danter@windriver.com> Link: http://lore.kernel.org/lkml/20191022080901.3841-1-jiwei.sun@windriver.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Masami Hiramatsu
|
dee36a2abb |
perf probe: Skip overlapped location on searching variables
Since debuginfo__find_probes() callback function can be called with the location which already passed, the callback function must filter out such overlapped locations. add_probe_trace_event() has already done it by commit |
||
Masami Hiramatsu
|
86c0bf8539 |
perf probe: Fix to show calling lines of inlined functions
Fix to show calling lines of inlined functions (where an inline function
is called).
die_walk_lines() filtered out the lines inside inlined functions based
on the address. However this also filtered out the lines which call
those inlined functions from the target function.
To solve this issue, check the call_file and call_line attributes and do
not filter out if it matches to the line information.
Without this fix, perf probe -L doesn't show some lines correctly.
(don't see the lines after 17)
# perf probe -L vfs_read
<vfs_read@/home/mhiramat/ksrc/linux/fs/read_write.c:0>
0 ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
1 {
2 ssize_t ret;
4 if (!(file->f_mode & FMODE_READ))
return -EBADF;
6 if (!(file->f_mode & FMODE_CAN_READ))
return -EINVAL;
8 if (unlikely(!access_ok(buf, count)))
return -EFAULT;
11 ret = rw_verify_area(READ, file, pos, count);
12 if (!ret) {
13 if (count > MAX_RW_COUNT)
count = MAX_RW_COUNT;
15 ret = __vfs_read(file, buf, count, pos);
16 if (ret > 0) {
fsnotify_access(file);
add_rchar(current, ret);
}
With this fix:
# perf probe -L vfs_read
<vfs_read@/home/mhiramat/ksrc/linux/fs/read_write.c:0>
0 ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
1 {
2 ssize_t ret;
4 if (!(file->f_mode & FMODE_READ))
return -EBADF;
6 if (!(file->f_mode & FMODE_CAN_READ))
return -EINVAL;
8 if (unlikely(!access_ok(buf, count)))
return -EFAULT;
11 ret = rw_verify_area(READ, file, pos, count);
12 if (!ret) {
13 if (count > MAX_RW_COUNT)
count = MAX_RW_COUNT;
15 ret = __vfs_read(file, buf, count, pos);
16 if (ret > 0) {
17 fsnotify_access(file);
18 add_rchar(current, ret);
}
20 inc_syscr(current);
}
Fixes:
|
||
Masami Hiramatsu
|
da6cb952a8 |
perf probe: Filter out instances except for inlined subroutine and subprogram
Filter out instances except for inlined_subroutine and subprogram DIE in
die_walk_instances() and die_is_func_instance().
This fixes an issue that perf probe sets some probes on calling address
instead of a target function itself.
When perf probe walks on instances of an abstruct origin (a kind of
function prototype of inlined function), die_walk_instances() can also
pass a GNU_call_site (a GNU extension for call site) to callback. Since
it is not an inlined instance of target function, we have to filter out
when searching a probe point.
Without this patch, perf probe sets probes on call site address too.This
can happen on some function which is marked "inlined", but has actual
symbol. (I'm not sure why GCC mark it "inlined"):
# perf probe -D vfs_read
p:probe/vfs_read _text+2500017
p:probe/vfs_read_1 _text+2499468
p:probe/vfs_read_2 _text+2499563
p:probe/vfs_read_3 _text+2498876
p:probe/vfs_read_4 _text+2498512
p:probe/vfs_read_5 _text+2498627
With this patch:
Slightly different results, similar tho:
# perf probe -D vfs_read
p:probe/vfs_read _text+2498512
Committer testing:
# uname -a
Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Before:
# perf probe -D vfs_read
p:probe/vfs_read _text+3131557
p:probe/vfs_read_1 _text+3130975
p:probe/vfs_read_2 _text+3131047
p:probe/vfs_read_3 _text+3130380
p:probe/vfs_read_4 _text+3130000
# uname -a
Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
#
After:
# perf probe -D vfs_read
p:probe/vfs_read _text+3130000
#
Fixes:
|
||
Masami Hiramatsu
|
f4d99bdfd1 |
perf probe: Skip end-of-sequence and non statement lines
Skip end-of-sequence and non-statement lines while walking through lines
list.
The "end-of-sequence" line information means:
"the current address is that of the first byte after the
end of a sequence of target machine instructions."
(DWARF version 4 spec 6.2.2)
This actually means out of scope and we can not probe on it.
On the other hand, the statement lines (is_stmt) means:
"the current instruction is a recommended breakpoint location.
A recommended breakpoint location is intended to “represent”
a line, a statement and/or a semantically distinct subpart
of a statement."
(DWARF version 4 spec 6.2.2)
So, non-statement line info also should be skipped.
These can reduce unneeded probe points and also avoid an error.
E.g. without this patch:
# perf probe -a "clear_tasks_mm_cpumask:1"
Added new events:
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:1)
probe:clear_tasks_mm_cpumask_2 (on clear_tasks_mm_cpumask:1)
probe:clear_tasks_mm_cpumask_3 (on clear_tasks_mm_cpumask:1)
probe:clear_tasks_mm_cpumask_4 (on clear_tasks_mm_cpumask:1)
You can now use it in all perf tools, such as:
perf record -e probe:clear_tasks_mm_cpumask_4 -aR sleep 1
#
This puts 5 probes on one line, but acutally it's not inlined function.
This is because there are many non statement instructions at the
function prologue.
With this patch:
# perf probe -a "clear_tasks_mm_cpumask:1"
Added new event:
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
You can now use it in all perf tools, such as:
perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1
#
Now perf-probe skips unneeded addresses.
Committer testing:
Slightly different results, but similar:
Before:
# uname -a
Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
#
# perf probe -a "clear_tasks_mm_cpumask:1"
Added new events:
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:1)
probe:clear_tasks_mm_cpumask_2 (on clear_tasks_mm_cpumask:1)
You can now use it in all perf tools, such as:
perf record -e probe:clear_tasks_mm_cpumask_2 -aR sleep 1
#
After:
# perf probe -a "clear_tasks_mm_cpumask:1"
Added new event:
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
You can now use it in all perf tools, such as:
perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1
# perf probe -l
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c)
#
Fixes:
|
||
Masami Hiramatsu
|
c701636aee |
perf probe: Return a better scope DIE if there is no best scope
Make find_best_scope() returns innermost DIE at given address if there is no best matched scope DIE. Since Gcc sometimes generates intuitively strange line info which is out of inlined function address range, we need this fixup. Without this, sometimes perf probe failed to probe on a line inside an inlined function: # perf probe -D ksys_open:3 Failed to find scope of probe point. Error: Failed to add events. With this fix, 'perf probe' can probe it: # perf probe -D ksys_open:3 p:probe/ksys_open _text+25707308 p:probe/ksys_open_1 _text+25710596 p:probe/ksys_open_2 _text+25711114 p:probe/ksys_open_3 _text+25711343 p:probe/ksys_open_4 _text+25714058 p:probe/ksys_open_5 _text+2819653 p:probe/ksys_open_6 _text+2819701 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Link: http://lore.kernel.org/lkml/157291300887.19771.14936015360963292236.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
5c65b1c084 |
perf annotate: Fix heap overflow
Fix expand_tabs that copies the source lines '\0' and then appends another '\0' at a potentially out of bounds address. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20191026035644.217548-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
93730f85eb |
perf machine: Add kernel_dso() method
To reduce boilerplate in some places. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-9s1bgoxxhlnu037e1nqx0tw3@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
b0c76fc4cf |
perf symbols: Remove needless checks for map->groups->machine
Its sufficient to check if map->groups is NULL before using it to get ->machine value. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-utiepyiv8b1tf8f79ok9d6j8@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
1dc925568f |
perf parse: Add a deep delete for parse event terms
Add a parse_events_term deep delete function so that owned strings and arrays are freed. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191030223448.12930-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
38f2c4226e |
perf parse: If pmu configuration fails free terms
Avoid a memory leak when the configuration fails. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191030223448.12930-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
cabbf26821 |
perf parse: Before yyabort-ing free components
Yyabort doesn't destruct inputs and so this must be done manually before using yyabort. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191030223448.12930-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
f2a8ecd8b1 |
perf parse: Add destructors for parse event terms
If parsing fails then destructors are ran to clean the up the stack. Rename the head union member to make the term and evlist use cases more distinct, this simplifies matching the correct destructor. Committer notes: Jiri: "Nice did not know about this.. looks like it's been in bison for some time, right?" Ian: "Looks like it wasn't in Bison 1 but in Bison 2, we're at Bison 3 and Bison 2 is > 14 years old: https://web.archive.org/web/20050924004158/http://www.gnu.org/software/bison/manual/html_mono/bison.html#Destructor-Decl" Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191030223448.12930-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
b6645a7235 |
perf parse: Ensure config and str in terms are unique
Make it easier to release memory associated with parse event terms by duplicating the string for the config name and ensuring the val string is a duplicate. Currently the parser may memory leak terms and this is addressed in a later patch. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191030223448.12930-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
448d732cef |
perf parse: Add parse events handle error
Parse event error handling may overwrite one error string with another creating memory leaks. Introduce a helper routine that warns about multiple error messages as well as avoiding the memory leak. A reproduction of this problem can be seen with: perf stat -e c/c/ After this change this produces: WARNING: multiple event parsing errors event syntax error: 'c/c/' \___ unknown term valid terms: event,filter_rem,filter_opc0,edge,filter_isoc,filter_tid,filter_loc,filter_nc,inv,umask,filter_opc1,tid_en,thresh,filter_all_op,filter_not_nm,filter_state,filter_nm,config,config1,config2,name,period,percore Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191030223448.12930-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Adrian Hunter
|
ef5502a1d9 |
perf inject: Make --strip keep evsels
create_gcov (refer to the autofdo example in tools/perf/Documentation/intel-pt.txt) now needs the evsels to read the perf.data file. So don't strip them. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191105100057.21465-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
John Garry
|
71f699078b |
perf tools: Fix cross compile for ARM64
Currently when cross compiling perf tool for ARM64 on my x86 machine I get this error: arch/arm64/util/sym-handling.c:9:10: fatal error: gelf.h: No such file or directory #include <gelf.h> For the build, libelf is reported off: Auto-detecting system features: ... ... libelf: [ OFF ] Indeed, test-libelf is not built successfully: more ./build/feature/test-libelf.make.output test-libelf.c:2:10: fatal error: libelf.h: No such file or directory #include <libelf.h> ^~~~~~~~~~ compilation terminated. I have no such problems natively compiling on ARM64, and I did not previously have this issue for cross compiling. Fix by relocating the gelf.h include. Signed-off-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/1573045254-39833-1-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jiri Olsa
|
86895b480a |
perf stat: Add --per-node agregation support
Adding new --per-node option to aggregate counts per NUMA nodes for system-wide mode measurements. You can specify --per-node in live mode: # perf stat -a -I 1000 -e cycles --per-node # time node cpus counts unit events 1.000542550 N0 20 6,202,097 cycles 1.000542550 N1 20 639,559 cycles 2.002040063 N0 20 7,412,495 cycles 2.002040063 N1 20 2,185,577 cycles 3.003451699 N0 20 6,508,917 cycles 3.003451699 N1 20 765,607 cycles ... Or in the record/report stat session: # perf stat record -a -I 1000 -e cycles # time counts unit events 1.000536937 10,008,468 cycles 2.002090152 9,578,539 cycles 3.003625233 7,647,869 cycles 4.005135036 7,032,086 cycles ^C 4.340902364 3,923,893 cycles # perf stat report --per-node # time node cpus counts unit events 1.000536937 N0 20 9,355,086 cycles 1.000536937 N1 20 653,382 cycles 2.002090152 N0 20 7,712,838 cycles 2.002090152 N1 20 1,865,701 cycles 3.003625233 N0 20 6,604,441 cycles 3.003625233 N1 20 1,043,428 cycles 4.005135036 N0 20 6,350,522 cycles 4.005135036 N1 20 681,564 cycles 4.340902364 N0 20 3,403,188 cycles 4.340902364 N1 20 520,705 cycles Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20190904073415.723-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jiri Olsa
|
389799a7a1 |
perf env: Add perf_env__numa_node()
To speed up cpu to node lookup, add perf_env__numa_node(), that creates cpu array on the first lookup, that holds numa nodes for each stored cpu. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20190904073415.723-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Haiyan Song
|
61ec07f591 |
perf vendor events intel: Update all the Intel JSON metrics from TMAM 3.6.
New Metrics: - DSB_Switches: fraction of cycles CPU was stalled due to switches from DSB to MITE pipeline [all] - L2_Evictions_{Silent|NonSilent}_PKI: L2 {silent|non silent} ecivtions rate per Kilo instruction [SKX+] - IpFarBranch - Instructions per Far Branch Other Enhancements & fixes: - KBLR/CFL & CLX move to separate columns (no column sharing via if #model) - Re-organized/renamed Metric Group Signed-off-by: Haiyan Song <haiyanx.song@intel.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Link: http://lore.kernel.org/lkml/20191030082308.10919-1-haiyanx.song@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Haiyan Song
|
7fcf1b89c8 |
perf vendor events intel: Update CascadelakeX events to v1.05
Update CascadelakeX events to v1.05. Other changes: remove duplicated and without description events. Signed-off-by: Haiyan Song <haiyanx.song@intel.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Link: http://lore.kernel.org/lkml/20191030082308.10919-1-haiyanx.song@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
8e8714c3d1 |
perf tools: Splice events onto evlist even on error
If event parsing fails the event list is leaked, instead splice the list onto the out result and let the caller cleanup. An example input for parse_events found by libFuzzer that reproduces this memory leak is 'm{'. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191025180827.191916-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
22bd8f1b5a |
libsubcmd: Use -O0 with DEBUG=1
When a 'make DEBUG=1' build is done, the command parser is still built with -O6 and is hard to step through, fix it making it use -O0 in that case. Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: nd <nd@arm.com> Link: http://lore.kernel.org/lkml/20191028113340.4282-1-james.clark@arm.com [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
d894967fca |
libsubcmd: Move EXTRA_FLAGS to the end to allow overriding existing flags
Move EXTRA_WARNINGS and EXTRA_FLAGS to the end of the compilation line, otherwise they cannot be used to override the default values. Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: nd <nd@arm.com> Link: http://lore.kernel.org/lkml/20191028113340.4282-1-james.clark@arm.com [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
50481461cf |
perf map_groups: Introduce for_each_entry() and for_each_entry_safe() iterators
To reduce boilerplate, providing a more compact form to iterate over the maps in a map_group. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-gc3go6fmdn30twusg91t2q56@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
8efc4f0568 |
perf maps: Add for_each_entry()/_safe() iterators
To reduce boilerplate, provide a more compact form using an idiom present in other trees of data structures. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-59gmq4kg1r68ou1wknyjl78x@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
20419d3a5b |
perf map: Allow map__next() to receive a NULL arg
Just like free(), return NULL in that case, will simplify the for_each_entry_safe() iterators. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-pbde2ucn49khnrebclys9pny@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
ee2555b612 |
perf map: Check if the map still has some refcounts on exit
We were checking just if it was still on some rb tree, but that is not the only way that this map can still have references, map->refcnt is there exactly for this, use it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-hany65tbeavsax7n3xvwl9pc@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Adrian Hunter
|
b86a9d918a |
perf dso: Add dso__data_write_cache_addr()
Add functions to write into the dso file data cache, but not change the file itself. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20191025130000.13032-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Adrian Hunter
|
366df72657 |
perf dso: Refactor dso_cache__read()
Refactor dso_cache__read() to separate populating the cache from copying data from it. This is preparation for adding a cache "write" that will update the data in the cache. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20191025130000.13032-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Adrian Hunter
|
fd62c1097a |
perf auxtrace: Add auxtrace_cache__remove()
Add auxtrace_cache__remove(). Intel PT uses an auxtrace_cache to store the results of code-walking, so that the same block of instructions does not have to be decoded repeatedly. However, when there are text poke events, the associated cache entries need to be removed. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20191025130000.13032-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Masami Hiramatsu
|
af04dd2f8e |
perf probe: Fix to show ranges of variables in functions without entry_pc
Fix to show ranges of variables (--range and --vars option) in functions
which DIE has only ranges but no entry_pc attribute.
Without this fix:
# perf probe --range -V clear_tasks_mm_cpumask
Available variables at clear_tasks_mm_cpumask
@<clear_tasks_mm_cpumask+0>
(No matched variables)
With this fix:
# perf probe --range -V clear_tasks_mm_cpumask
Available variables at clear_tasks_mm_cpumask
@<clear_tasks_mm_cpumask+0>
[VAL] int cpu @<clear_tasks_mm_cpumask+[0-35,317-317,2052-2059]>
Committer testing:
Before:
[root@quaco ~]# perf probe --range -V clear_tasks_mm_cpumask
Available variables at clear_tasks_mm_cpumask
@<clear_tasks_mm_cpumask+0>
(No matched variables)
[root@quaco ~]#
After:
[root@quaco ~]# perf probe --range -V clear_tasks_mm_cpumask
Available variables at clear_tasks_mm_cpumask
@<clear_tasks_mm_cpumask+0>
[VAL] int cpu @<clear_tasks_mm_cpumask+[0-23,23-105,105-106,106-106,1843-1850,1850-1862]>
[root@quaco ~]#
Using it:
[root@quaco ~]# perf probe clear_tasks_mm_cpumask cpu
Added new event:
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask with cpu)
You can now use it in all perf tools, such as:
perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1
[root@quaco ~]# perf probe -l
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c with cpu)
[root@quaco ~]#
[root@quaco ~]# perf trace -e probe:*cpumask
^C[root@quaco ~]#
Fixes:
|
||
Masami Hiramatsu
|
18e21eb671 |
perf probe: Fix to show inlined function callsite without entry_pc
Fix 'perf probe --line' option to show inlined function callsite lines
even if the function DIE has only ranges.
Without this:
# perf probe -L amd_put_event_constraints
...
2 {
3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw))
__amd_put_nb_event_constraints(cpuc, event);
5 }
With this patch:
# perf probe -L amd_put_event_constraints
...
2 {
3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw))
4 __amd_put_nb_event_constraints(cpuc, event);
5 }
Committer testing:
Before:
[root@quaco ~]# perf probe -L amd_put_event_constraints
<amd_put_event_constraints@/usr/src/debug/kernel-5.2.fc30/linux-5.2.18-200.fc30.x86_64/arch/x86/events/amd/core.c:0>
0 static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
struct perf_event *event)
2 {
3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw))
__amd_put_nb_event_constraints(cpuc, event);
5 }
PMU_FORMAT_ATTR(event, "config:0-7,32-35");
PMU_FORMAT_ATTR(umask, "config:8-15" );
[root@quaco ~]#
After:
[root@quaco ~]# perf probe -L amd_put_event_constraints
<amd_put_event_constraints@/usr/src/debug/kernel-5.2.fc30/linux-5.2.18-200.fc30.x86_64/arch/x86/events/amd/core.c:0>
0 static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
struct perf_event *event)
2 {
3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw))
4 __amd_put_nb_event_constraints(cpuc, event);
5 }
PMU_FORMAT_ATTR(event, "config:0-7,32-35");
PMU_FORMAT_ATTR(umask, "config:8-15" );
[root@quaco ~]# perf probe amd_put_event_constraints:4
Added new event:
probe:amd_put_event_constraints (on amd_put_event_constraints:4)
You can now use it in all perf tools, such as:
perf record -e probe:amd_put_event_constraints -aR sleep 1
[root@quaco ~]#
[root@quaco ~]# perf probe -l
probe:amd_put_event_constraints (on amd_put_event_constraints:4@arch/x86/events/amd/core.c)
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c)
[root@quaco ~]#
Using it:
[root@quaco ~]# perf trace -e probe:*
^C[root@quaco ~]#
Ok, Intel system here... :-)
Fixes:
|
||
Masami Hiramatsu
|
3895534dd7 |
perf probe: Fix to list probe event with correct line number
Since debuginfo__find_probe_point() uses dwarf_entrypc() for finding the
entry address of the function on which a probe is, it will fail when the
function DIE has only ranges attribute.
To fix this issue, use die_entrypc() instead of dwarf_entrypc().
Without this fix, perf probe -l shows incorrect offset:
# perf probe -l
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask+18446744071579263632@work/linux/linux/kernel/cpu.c)
probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask+18446744071579263752@work/linux/linux/kernel/cpu.c)
With this:
# perf probe -l
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@work/linux/linux/kernel/cpu.c)
probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:21@work/linux/linux/kernel/cpu.c)
Committer testing:
Before:
[root@quaco ~]# perf probe -l
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask+18446744071579765152@kernel/cpu.c)
[root@quaco ~]#
After:
[root@quaco ~]# perf probe -l
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c)
[root@quaco ~]#
Fixes:
|
||
Masami Hiramatsu
|
eb6933b29d |
perf probe: Fix to probe an inline function which has no entry pc
Fix perf probe to probe an inlne function which has no entry pc
or low pc but only has ranges attribute.
This seems very rare case, but I could find a few examples, as
same as probe_point_search_cb(), use die_entrypc() to get the
entry address in probe_point_inline_cb() too.
Without this patch:
# perf probe -D __amd_put_nb_event_constraints
Failed to get entry address of __amd_put_nb_event_constraints.
Probe point '__amd_put_nb_event_constraints' not found.
Error: Failed to add events.
With this patch:
# perf probe -D __amd_put_nb_event_constraints
p:probe/__amd_put_nb_event_constraints amd_put_event_constraints+43
Committer testing:
Before:
[root@quaco ~]# perf probe -D __amd_put_nb_event_constraints
Failed to get entry address of __amd_put_nb_event_constraints.
Probe point '__amd_put_nb_event_constraints' not found.
Error: Failed to add events.
[root@quaco ~]#
After:
[root@quaco ~]# perf probe -D __amd_put_nb_event_constraints
p:probe/__amd_put_nb_event_constraints _text+33789
[root@quaco ~]#
Fixes:
|
||
Masami Hiramatsu
|
5d16dbcc31 |
perf probe: Fix to probe a function which has no entry pc
Fix 'perf probe' to probe a function which has no entry pc or low pc but
only has ranges attribute.
probe_point_search_cb() uses dwarf_entrypc() to get the probe address,
but that doesn't work for the function DIE which has only ranges
attribute. Use die_entrypc() instead.
Without this fix:
# perf probe -k ../build-x86_64/vmlinux -D clear_tasks_mm_cpumask:0
Probe point 'clear_tasks_mm_cpumask' not found.
Error: Failed to add events.
With this:
# perf probe -k ../build-x86_64/vmlinux -D clear_tasks_mm_cpumask:0
p:probe/clear_tasks_mm_cpumask clear_tasks_mm_cpumask+0
Committer testing:
Before:
[root@quaco ~]# perf probe clear_tasks_mm_cpumask:0
Probe point 'clear_tasks_mm_cpumask' not found.
Error: Failed to add events.
[root@quaco ~]#
After:
[root@quaco ~]# perf probe clear_tasks_mm_cpumask:0
Added new event:
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask)
You can now use it in all perf tools, such as:
perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1
[root@quaco ~]#
Using it with 'perf trace':
[root@quaco ~]# perf trace -e probe:clear_tasks_mm_cpumask
Doesn't seem to be used in x86_64:
$ find . -name "*.c" | xargs grep clear_tasks_mm_cpumask
./kernel/cpu.c: * clear_tasks_mm_cpumask - Safely clear tasks' mm_cpumask for a CPU
./kernel/cpu.c:void clear_tasks_mm_cpumask(int cpu)
./arch/xtensa/kernel/smp.c: clear_tasks_mm_cpumask(cpu);
./arch/csky/kernel/smp.c: clear_tasks_mm_cpumask(cpu);
./arch/sh/kernel/smp.c: clear_tasks_mm_cpumask(cpu);
./arch/arm/kernel/smp.c: clear_tasks_mm_cpumask(cpu);
./arch/powerpc/mm/nohash/mmu_context.c: clear_tasks_mm_cpumask(cpu);
$ find . -name "*.h" | xargs grep clear_tasks_mm_cpumask
./include/linux/cpu.h:void clear_tasks_mm_cpumask(int cpu);
$ find . -name "*.S" | xargs grep clear_tasks_mm_cpumask
$
Fixes:
|
||
Masami Hiramatsu
|
07d3698578 |
perf probe: Fix wrong address verification
Since there are some DIE which has only ranges instead of the
combination of entrypc/highpc, address verification must use
dwarf_haspc() instead of dwarf_entrypc/dwarf_highpc.
Also, the ranges only DIE will have a partial code in different section
(e.g. unlikely code will be in text.unlikely as "FUNC.cold" symbol). In
that case, we can not use dwarf_entrypc() or die_entrypc(), because the
offset from original DIE can be a minus value.
Instead, this simply gets the symbol and offset from symtab.
Without this patch;
# perf probe -D clear_tasks_mm_cpumask:1
Failed to get entry address of clear_tasks_mm_cpumask
Error: Failed to add events.
And with this patch:
# perf probe -D clear_tasks_mm_cpumask:1
p:probe/clear_tasks_mm_cpumask clear_tasks_mm_cpumask+0
p:probe/clear_tasks_mm_cpumask_1 clear_tasks_mm_cpumask+5
p:probe/clear_tasks_mm_cpumask_2 clear_tasks_mm_cpumask+8
p:probe/clear_tasks_mm_cpumask_3 clear_tasks_mm_cpumask+16
p:probe/clear_tasks_mm_cpumask_4 clear_tasks_mm_cpumask+82
Committer testing:
I managed to reproduce the above:
[root@quaco ~]# perf probe -D clear_tasks_mm_cpumask:1
p:probe/clear_tasks_mm_cpumask _text+919968
p:probe/clear_tasks_mm_cpumask_1 _text+919973
p:probe/clear_tasks_mm_cpumask_2 _text+919976
[root@quaco ~]#
But then when trying to actually put the probe in place, it fails if I
use :0 as the offset:
[root@quaco ~]# perf probe -L clear_tasks_mm_cpumask | head -5
<clear_tasks_mm_cpumask@/usr/src/debug/kernel-5.2.fc30/linux-5.2.18-200.fc30.x86_64/kernel/cpu.c:0>
0 void clear_tasks_mm_cpumask(int cpu)
1 {
2 struct task_struct *p;
[root@quaco ~]# perf probe clear_tasks_mm_cpumask:0
Probe point 'clear_tasks_mm_cpumask' not found.
Error: Failed to add events.
[root@quaco
The next patch is needed to fix this case.
Fixes:
|
||
Yunfeng Ye
|
1785fbb738 |
perf jevents: Fix resource leak in process_mapfile() and main()
There are memory leaks and file descriptor resource leaks in process_mapfile() and main(). Fix this by adding free(), fclose() and free_arch_std_events() on the error paths. Fixes: |
||
Masami Hiramatsu
|
91e2f539ee |
perf probe: Fix to show function entry line as probe-able
Fix die_walk_lines() to list the function entry line correctly. Since
the dwarf_entrypc() does not return the entry pc if the DIE has only
range attribute, __die_walk_funclines() fails to list the declaration
line (entry line) in that case.
To solve this issue, this introduces die_entrypc() which correctly
returns the entry PC (the first address range) even if the DIE has only
range attribute. With this fix die_walk_lines() shows the function entry
line is able to probe correctly.
Fixes:
|
||
Masami Hiramatsu
|
acb6a7047a |
perf probe: Walk function lines in lexical blocks
Since some inlined functions are in lexical blocks of given function, we
have to recursively walk through the DIE tree. Without this fix,
perf-probe -L can miss the inlined functions which is in a lexical block
(like if (..) { func() } case.)
However, even though, to walk the lines in a given function, we don't
need to follow the children DIE of inlined functions because those do
not have any lines in the specified function.
We need to walk though whole trees only if we walk all lines in a given
file, because an inlined function can include another inlined function
in the same file.
Fixes:
|
||
Masami Hiramatsu
|
b77afa1f81 |
perf probe: Fix to find range-only function instance
Fix die_is_func_instance() to find range-only function instance.
In some case, a function instance can be made without any low PC or
entry PC, but only with address ranges by optimization. (e.g. cold text
partially in "text.unlikely" section) To find such function instance, we
have to check the range attribute too.
Fixes:
|
||
Igor Lubashev
|
4bfbcf3ee1 |
perf kvm: Use evlist layer api when possible
No need for layer violations when a proper evlist api is available. Signed-off-by: Igor Lubashev <ilubashe@akamai.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/1571795693-23558-4-git-send-email-ilubashe@akamai.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |