linux/tools/perf
Alexey Budankov 470530bbb8 perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.

  $ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
  $ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc

The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.

The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.

The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.

Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.

Committer testing:

Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:

  # perf trace -m 2048 --call-graph dwarf -e write -- perf record
  <SNIP>
     101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
                                         __libc_write (/usr/lib64/libpthread-2.28.so)
                                         ion (/home/acme/bin/perf)
                                         record__write (inlined)
                                         process_synthesized_event (/home/acme/bin/perf)
                                         perf_tool__process_synth_event (inlined)
                                         perf_event__synthesize_mmap_events (/home/acme/bin/perf)

Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:

     107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
                                         __libc_write (/usr/lib64/libpthread-2.28.so)
                                         ion (/home/acme/bin/perf)
                                         record__write (inlined)
                                         record__pushfn (/home/acme/bin/perf)
                                         perf_mmap__push (/home/acme/bin/perf)
                                         record__mmap_read_evlist (inlined)
                                         record__mmap_read_all (inlined)
                                         __cmd_record (inlined)
                                         cmd_record (/home/acme/bin/perf)
     12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
  <SNIP same backtrace as in the 107.561 timestamp>
     12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
  <SNIP same backtrace as in the 107.561 timestamp>
     12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
  <SNIP same backtrace as in the 107.561 timestamp>

If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':

  mmap flush: 132096
  mmap size 528384B

With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.

Trying with a bigger mmap size:

   perf trace -e write perf record -v -m 2048 --mmap-flush 16M
   74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
   74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
   74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
   74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
   74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
   74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
   74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760

Was again limited to a quarter of the mmap size:

  mmap flush: 2098176
  mmap size 8392704B

A warning about that would be good to have but can be added later,
something like:

  "max flush is a quarter of the mmap size, if wanting to bump the mmap
   flush further, bump the mmap size as well using -m/--mmap-pages"

Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:

  cc1: warnings being treated as errors
  builtin-record.c: In function 'record__mmap_read_evlist':
  builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
  /usr/include/unistd.h:933: warning: shadowed declaration is here
  builtin-record.c: In function 'record__mmap_read_all':
  builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
  /usr/include/unistd.h:933: warning: shadowed declaration is here

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01 15:18:10 -03:00
..
arch tools headers: Update x86's syscall_64.tbl and uapi/asm-generic/unistd 2019-03-28 14:41:11 -03:00
bench perf/core improvements and fixes: 2019-03-22 22:51:21 +01:00
Documentation perf record: Implement --mmap-flush=<number> option 2019-04-01 15:18:10 -03:00
examples/bpf perf augmented_raw_syscalls: Use a PERCPU_ARRAY map to copy more string bytes 2019-04-01 14:49:24 -03:00
include/bpf perf bpf: Automatically add BTF ELF markers 2019-03-06 09:45:37 -03:00
jvmti perf jvmti: Separate jvmti cmlr check 2018-11-21 22:39:58 -03:00
pmu-events perf list: Fix s390 counter long description for L1D_RO_EXCL_WRITES 2019-04-01 14:49:24 -03:00
python
scripts perf scripts python: exported-sql-viewer.py: Fix python3 support 2019-03-28 15:53:16 -03:00
tests perf tools: Save bpf_prog_info and BTF of new BPF programs 2019-03-21 11:27:04 -03:00
trace perf trace beauty renameat: No need to include linux/fs.h 2019-04-01 14:49:24 -03:00
ui perf tools report: Add custom scripts to script menu 2019-03-11 16:33:20 -03:00
util perf record: Implement --mmap-flush=<number> option 2019-04-01 15:18:10 -03:00
.gitignore
Build perf tools: Rename build libperf to perf 2019-02-14 15:18:08 -03:00
builtin-annotate.c perf data: Add global path holder 2019-02-22 16:52:07 -03:00
builtin-bench.c perf bench: Add epoll_ctl(2) benchmark 2018-11-21 22:39:55 -03:00
builtin-buildid-cache.c perf data: Add global path holder 2019-02-22 16:52:07 -03:00
builtin-buildid-list.c perf data: Add global path holder 2019-02-22 16:52:07 -03:00
builtin-c2c.c perf c2c: Fix c2c report for empty numa node 2019-03-06 18:15:24 -03:00
builtin-config.c perf config: Show the configuration when no arguments are provided 2018-12-18 12:24:00 -03:00
builtin-data.c
builtin-diff.c perf diff: Support --pid/--tid filter options 2019-03-06 18:06:16 -03:00
builtin-evlist.c perf data: Add global path holder 2019-02-22 16:52:07 -03:00
builtin-ftrace.c
builtin-help.c perf help: Remove needless use of strncpy() 2018-12-17 14:59:18 -03:00
builtin-inject.c perf data: Add global path holder 2019-02-22 16:52:07 -03:00
builtin-kallsyms.c pref tools: Add missing map.h includes 2019-02-06 10:00:38 -03:00
builtin-kmem.c perf tools, tools lib traceevent: Rename "pevent" member of struct tep_event to "tep" 2019-04-01 15:18:10 -03:00
builtin-kvm.c perf data: Add global path holder 2019-02-22 16:52:07 -03:00
builtin-list.c perf list: Output tool events 2019-04-01 14:49:25 -03:00
builtin-lock.c perf data: Add global path holder 2019-02-22 16:52:07 -03:00
builtin-mem.c perf data: Add global path holder 2019-02-22 16:52:07 -03:00
builtin-probe.c perf namespaces: Remove namespaces.h from .h headers 2019-01-25 15:12:09 +01:00
builtin-record.c perf record: Implement --mmap-flush=<number> option 2019-04-01 15:18:10 -03:00
builtin-report.c perf report: Show all sort keys in help output 2019-03-19 16:15:42 -03:00
builtin-sched.c perf data: Add global path holder 2019-02-22 16:52:07 -03:00
builtin-script.c perf script: Support relative time 2019-03-19 16:52:03 -03:00
builtin-stat.c perf stat: Implement duration_time as a proper event 2019-04-01 14:49:24 -03:00
builtin-timechart.c perf data: Add global path holder 2019-02-22 16:52:07 -03:00
builtin-top.c perf tools: Save bpf_prog_info and BTF of new BPF programs 2019-03-21 11:27:04 -03:00
builtin-trace.c perf data: Add global path holder 2019-02-22 16:52:07 -03:00
builtin-version.c tools build: Implement libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD defines 2019-04-01 15:18:10 -03:00
builtin.h perf script: Add array bound checking to list_scripts 2019-03-11 16:33:19 -03:00
check-headers.sh tools headers uapi: Sync asm-generic/mman-common.h and linux/mman.h 2019-03-28 14:31:56 -03:00
command-list.txt perf help: Add missing subcommand version 2018-09-19 14:53:36 -03:00
CREDITS
design.txt perf/doc: Update design.txt for exclude_{host|guest} flags 2019-01-21 11:01:18 +01:00
Makefile perf tools: Disable parallelism for 'make clean' 2018-08-20 08:54:58 -03:00
Makefile.config tools build: Implement libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD defines 2019-04-01 15:18:10 -03:00
Makefile.perf tools build: Implement libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD defines 2019-04-01 15:18:10 -03:00
MANIFEST
perf-archive.sh
perf-completion.sh
perf-read-vdso.c perf tools: Make find_vdso_map() more modular 2019-01-08 13:28:13 -03:00
perf-sys.h
perf-with-kcore.sh
perf.c perf bpf: Save bpf_prog_info in a rbtree in perf_env 2019-03-19 16:52:06 -03:00
perf.h perf record: Implement --mmap-flush=<number> option 2019-04-01 15:18:10 -03:00