Peter noticed that we have 3 ways of referring to the idle thread:
[idle]:0
swapper:0
swapper-0
Standardize on 'swapper:0'.
Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use 'perf sched latency' to track the current task based on
context-switch events, and flag the cases where there's some
impossible transition: such as a PID being switched out that
was not switched in.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Output such lost event and state machine weirdness stats:
TOTAL: | 14974.910 ms | 46384 |
---------------------------------------------------
INFO: 8.865% lost events (19132 out of 215819, in 8 chunks)
INFO: 0.198% state machine bugs (49 out of 24708) (due to lost events?)
And increase buffering to -m 1024 (4 MB) by default. Since we
use output multiplexing that kind of space is needed.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This allows more precise 'perf sched latency' output:
---------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms | Maximum delay ms |
---------------------------------------------------------------------------------------
ksoftirqd/0-4 | 0.010 ms | 2 | avg: 2.476 ms | max: 2.977 ms |
perf-12328 | 15.844 ms | 66 | avg: 1.118 ms | max: 9.979 ms |
bdi-default-235 | 0.009 ms | 1 | avg: 0.998 ms | max: 0.998 ms |
events/1-8 | 0.020 ms | 2 | avg: 0.998 ms | max: 0.998 ms |
events/0-7 | 0.018 ms | 2 | avg: 0.992 ms | max: 0.996 ms |
sleep-12329 | 0.742 ms | 3 | avg: 0.906 ms | max: 2.289 ms |
sshd-12122 | 0.163 ms | 2 | avg: 0.283 ms | max: 0.562 ms |
loop-getpid-lon-12322 | 1023.636 ms | 69 | avg: 0.208 ms | max: 5.996 ms |
loop-getpid-lon-12321 | 1038.638 ms | 5 | avg: 0.073 ms | max: 0.171 ms |
migration/1-5 | 0.000 ms | 1 | avg: 0.006 ms | max: 0.006 ms |
---------------------------------------------------------------------------------------
TOTAL: | 2079.078 ms | 153 |
-------------------------------------------------
Also, streamline the code a bit more, add asserts for various state
machine failures (they should be debugged if they occur) and fix
a few odd ends.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Often it's useful to know the PID of the task as well - print it
out too.
( While at it, reformat the output to be a bit more
paste-into-commit-logs friendly. )
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Before:
-----------------------------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms | Maximum delay ms |
-----------------------------------------------------------------------------------
perf |4853313.251 ms | 10 | avg: 0.046 ms | max: 0.337 ms |
flush-8:0 |2426659.202 ms | 5 | avg: 0.015 ms | max: 0.016 ms |
sleep |485331.966 ms | 1 | avg: 0.012 ms | max: 0.012 ms |
ksoftirqd/1 |485331.320 ms | 1 | avg: 0.005 ms | max: 0.005 ms |
-----------------------------------------------------------------------------------
TOTAL: |8250635.739 ms | 17 |
---------------------------------------------
After:
-----------------------------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms | Maximum delay ms |
-----------------------------------------------------------------------------------
perf | 0.206 ms | 10 | avg: 0.046 ms | max: 0.337 ms |
flush-8:0 | 2.680 ms | 5 | avg: 0.015 ms | max: 0.016 ms |
sleep | 0.662 ms | 1 | avg: 0.012 ms | max: 0.012 ms |
ksoftirqd/1 | 0.015 ms | 1 | avg: 0.005 ms | max: 0.005 ms |
-----------------------------------------------------------------------------------
TOTAL: | 3.563 ms | 17 |
---------------------------------------------
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Finish the -M/--multiplex option implementation:
- separate it out from group_fd
- correctly set it via the ioctl and dont mmap counters that
are multiplexed
- modify the perf record event loop to deal with buffer-less
counters.
- remove the -g option from perf sched record
- account for unordered events in perf sched latency
- (add -f to perf sched record to ease measurements)
- skip idle threads (pid==0) in latency output
The result is better latency output by 'perf sched latency':
-----------------------------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms | Maximum delay ms |
-----------------------------------------------------------------------------------
ksoftirqd/8 | 0.071 ms | 2 | avg: 0.458 ms | max: 0.913 ms |
at-spi-registry | 0.609 ms | 19 | avg: 0.013 ms | max: 0.023 ms |
perf | 3.316 ms | 16 | avg: 0.013 ms | max: 0.054 ms |
Xorg | 0.392 ms | 19 | avg: 0.011 ms | max: 0.018 ms |
sleep | 0.537 ms | 2 | avg: 0.009 ms | max: 0.009 ms |
-----------------------------------------------------------------------------------
TOTAL: | 4.925 ms | 58 |
---------------------------------------------
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently it's possible to meet such too high latency results
with 'perf sched latency'.
-----------------------------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms | Maximum delay ms |
-----------------------------------------------------------------------------------
xfce4-panel | 0.222 ms | 2 | avg: 4718.345 ms | max: 9436.493 ms |
scsi_eh_3 | 3.962 ms | 36 | avg: 55.957 ms | max: 1977.829 ms |
The origin is on traces that are sometimes badly serialized across cpus.
For example the raw traces that raised such results for xfce4-panel:
(1) [init]-0 [000] 1494.663899990: sched_switch: task swapper:0 [140] (R) ==> xfce4-panel:4569 [120]
(2) xfce4-panel-4569 [000] 1494.663928373: sched_switch: task xfce4-panel:4569 [120] (S) ==> swapper:0 [140]
(3) Xorg-4276 [001] 1494.663860125: sched_wakeup: task xfce4-panel:4569 [120] success=1 [000]
(4) Xorg-4276 [001] 1504.098252756: sched_wakeup: task xfce4-panel:4569 [120] success=1 [000]
(5) perf-5219 [000] 1504.100353302: sched_switch: task perf:5219 [120] (S) ==> xfce4-panel:4569 [120]
The traces are processed in the order they arrive. Then in (2),
xfce4-panel sleeps, it is first waken up in (3) and eventually
scheduled in (5).
The latency reported is then 1504 - 1495 = 9 secs, as reported by perf
sched. But this is wrong, we are confident in the fact the traces are
nicely serialized while we should actually more trust the timestamps.
If we reorder by timestamps we get:
(1) Xorg-4276 [001] 1494.663860125: sched_wakeup: task xfce4-panel:4569 [120] success=1 [000]
(2) [init]-0 [000] 1494.663899990: sched_switch: task swapper:0 [140] (R) ==> xfce4-panel:4569 [120]
(3) xfce4-panel-4569 [000] 1494.663928373: sched_switch: task xfce4-panel:4569 [120] (S) ==> swapper:0 [140]
(4) Xorg-4276 [001] 1504.098252756: sched_wakeup: task xfce4-panel:4569 [120] success=1 [000]
(5) perf-5219 [000] 1504.100353302: sched_switch: task perf:5219 [120] (S) ==> xfce4-panel:4569 [120]
Now the trace make more sense, xfce4-panel is sleeping. Then it is
woken up in (1), scheduled in (2)
It goes to sleep in (3), woken up in (4) and scheduled in (5).
Now, latency captured between (1) and (2) is of 39 us.
And between (4) and (5) it is 2.1 ms.
Such pattern of bad serializing is the origin of the high latencies
reported by perf sched.
Basically, we need to check whether wake up time is higher than
schedule out time. If it's not the case, we need to tag the current
work atom as invalid.
Beside that, we may need to work later on a better ordering of the
traces given by the kernel.
After this patch:
xfce4-session | 0.221 ms | 1 | avg: 0.538 ms | max: 0.538 ms |
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add an option to multiplex counters output in the channel of
the group leader, ie: the first counter opened:
-M --multiplex
The effect is better serialized samples. This is especially
useful for tracepoint samples that need to be well serialized
for their post-processing.
Also make use of this option in 'perf sched'.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Alias 'perf sched trace' to 'perf trace', for workflow completeness.
Add a bit of documentation for perf sched.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Implement the 'perf sched record' subcommand that adds a
default list of events, turns on raw sampling and system-wide
tracing and passes off the rest of the command to perf record.
This is more convenient than having to specify the events all
the time.
Before:
$ perf record -a -R -e sched:sched_switch:r -e sched:sched_stat_wait:r -e sched:sched_stat_sleep:r -e sched:sched_stat_iowait:r -e sched:sched_process_exit:r -e sched:sched_process_fork:r -e sched:sched_wakeup:r -e sched:sched_migrate_task:r -c 1 sleep 1
After:
$ perf sched record -f sleep 1
Also fix an assumption in the event string parser that assumed
that strings passed in can be modified. (In this case they wont
be as they come from a readonly constant section.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use a sort list for thread atoms insertion as well - instead of
hardcoded for PID.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- Rename 'latency' field/variable names to the better 'atom' ones
- Reduce the number of #include lines and consolidate them
- Gather file scope variables at the top of the file
- Remove unused bits
No change in functionality.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Separate the option parsing cleanly and add two variants:
- 'perf sched latency' (can be abbreviated via 'perf sched lat')
- 'perf sched replay' (can be abbreviated via 'perf sched rep')
Also add a repeat count option to replay and add a separation
set of options for replay.
Do the sorting setup only in the latency sub-command.
Display separate help screens for 'perf sched' and
'perf sched replay -h' - i.e. further separation of the
sub-commands.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Implement multidimensional sorting on perf sched so that
you can sort either by number of switches, latency average,
latency maximum, runtime.
perf sched -l -s avg,max (this is the default)
-----------------------------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms | Maximum delay ms |
-----------------------------------------------------------------------------------
gnome-power-man | 0.113 ms | 1 | avg: 4998.531 ms | max: 4998.531 ms |
xfdesktop | 1.190 ms | 7 | avg: 136.475 ms | max: 940.933 ms |
xfce-mcs-manage | 2.194 ms | 22 | avg: 38.534 ms | max: 735.174 ms |
notification-da | 2.749 ms | 31 | avg: 27.436 ms | max: 731.791 ms |
xfce4-session | 3.343 ms | 28 | avg: 26.796 ms | max: 734.891 ms |
xfwm4 | 3.159 ms | 22 | avg: 12.406 ms | max: 241.333 ms |
xchat | 42.789 ms | 214 | avg: 11.886 ms | max: 100.349 ms |
xfce4-terminal | 5.386 ms | 22 | avg: 11.414 ms | max: 241.611 ms |
firefox | 151.992 ms | 123 | avg: 9.543 ms | max: 153.717 ms |
xfce4-panel | 24.324 ms | 47 | avg: 8.189 ms | max: 242.352 ms |
:5090 | 6.932 ms | 111 | avg: 8.131 ms | max: 102.665 ms |
events/0 | 0.758 ms | 12 | avg: 1.964 ms | max: 21.879 ms |
Xorg | 280.558 ms | 340 | avg: 1.864 ms | max: 99.526 ms |
geany | 63.391 ms | 295 | avg: 1.099 ms | max: 9.334 ms |
reiserfs/0 | 0.039 ms | 2 | avg: 0.854 ms | max: 1.487 ms |
kondemand/0 | 8.251 ms | 245 | avg: 0.691 ms | max: 34.372 ms |
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We are dividing a time in ns by 1e9. This is a nsec to sec
conversion. What we want is msecs. Fix it by dividing by 1e6.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add a field in the thread atom list that keeps track of the
total and max latencies and also the total runtime. This makes
a faster output and also prepares for sorting.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently in perf sched, we are measuring the scheduler wakeup
latencies.
Now we also want measure the time a task wait to be scheduled
after it gets preempted.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To measures the latencies, we capture the sched atoms data into
a specific structure named struct lat_snapshot.
As this structure can be used for other purposes of scheduler
profiling and mirrors what happens in a thread work atom, lets
rename it to struct work_atom and propagate this renaming in
other functions and structures names to keep it coherent.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Extend the latency tracking structure with scheduling atom
runtime info - and sum it up during per task display.
(Also clean up a few details.)
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After:
-----------------------------------------------------------------------------------
Task | runtime ms | switches | average delay ms | maximum delay ms |
-----------------------------------------------------------------------------------
migration/0 | 0.000 ms | 1 | avg: 0.047 ms | max: 0.047 ms |
ksoftirqd/0 | 0.000 ms | 1 | avg: 0.039 ms | max: 0.039 ms |
migration/1 | 0.000 ms | 3 | avg: 0.013 ms | max: 0.016 ms |
migration/3 | 0.000 ms | 2 | avg: 0.003 ms | max: 0.004 ms |
migration/4 | 0.000 ms | 1 | avg: 0.022 ms | max: 0.022 ms |
distccd | 0.000 ms | 1 | avg: 0.004 ms | max: 0.004 ms |
distccd | 0.000 ms | 1 | avg: 0.014 ms | max: 0.014 ms |
distccd | 0.000 ms | 2 | avg: 0.000 ms | max: 0.000 ms |
distccd | 0.000 ms | 2 | avg: 0.012 ms | max: 0.019 ms |
distccd | 0.000 ms | 1 | avg: 0.002 ms | max: 0.002 ms |
as | 0.000 ms | 2 | avg: 0.019 ms | max: 0.019 ms |
as | 0.000 ms | 3 | avg: 0.015 ms | max: 0.017 ms |
as | 0.000 ms | 1 | avg: 0.009 ms | max: 0.009 ms |
perf | 0.000 ms | 1 | avg: 0.001 ms | max: 0.001 ms |
gcc | 0.000 ms | 1 | avg: 0.021 ms | max: 0.021 ms |
run-mozilla.sh | 0.000 ms | 2 | avg: 0.010 ms | max: 0.017 ms |
mozilla-plugin- | 0.000 ms | 1 | avg: 0.006 ms | max: 0.006 ms |
gcc | 0.000 ms | 2 | avg: 0.013 ms | max: 0.013 ms |
-----------------------------------------------------------------------------------
(The runtime ms column is not filled in yet.)
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- Separate the latency and the replay commands more cleanly
- Use consistent naming
- Display help page on 'perf sched' outlining comments,
instead of aborting
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add the -l --latency option that reports statistics about the
scheduler latencies.
For now, the latencies are measured in the following sequence
scope:
- task A is sleeping (D or S state)
- task B wakes up A
^
|
|
latency timeframe
|
|
v
- task A is scheduled in
Start by recording every scheduler events:
perf record -e sched:*
and then fetch the results:
perf sched -l
Tasks count total avg max
migration/0 2 39849 19924 28826
ksoftirqd/0 7 756383 108054 373014
migration/1 5 45391 9078 10452
ksoftirqd/1 2 399055 199527 359130
events/0 8 4780110 597513 4500250
events/1 9 6353057 705895 2986012
kblockd/0 42 37805097 900121 5077684
The snapshot are in nanoseconds.
- Count: number of snapshots taken for the given task
- Total: total latencies in nanosec
- Avg : average of latency between wake up and sched in
- Max : max snapshot latency
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Create a sched event structure of handlers in which various
sched events reader can plug their own callbacks.
This makes easier the addition of new perf sched sub commands.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf sched raises the following error when it meets a sched
switch event:
perf: builtin-sched.c:286: register_pid: Assertion `!(pid >= 65536)' failed.
Abandon
Currently in x86-64, the sched switch events have a hole in the
middle of the structure:
u16 common_type;
u8 common_flags;
u8 common_preempt_count;
u32 common_pid;
u32 common_tgid;
char prev_comm[16];
u32 prev_pid;
u32 prev_prio;
<--- there
u64 prev_state;
char next_comm[16];
u32 next_pid;
u32 next_prio;
Gcc inserts a 4 bytes hole there for prev_state to be u64
aligned. And the events are exported to userspace with this
hole.
But in userspace, from perf sched, we fetch it using a
structure that has a new field in the beginning: u32 size. This
is because our trace is exported with its size as a field. But
now that we have this new field, the hole in the middle
disappears because it makes prev_state becoming well aligned.
And since we are using a pointer to the raw trace using this
struct, instead of reading prev_state, we are reading the hole.
We could fix it by keeping the size seperate from the struct
but actually there a lot of other potential problems: some
fields may be saved as long in a 64 bits system and later read
as long in a 32 bits system. Also this direct cast doesn't care
about the endianness differences between the host traced
machine and the machine in which we do the post processing.
So instead of using such dangerous direct casts, fetch the
values using the trace parsing API that already takes care of
all these problems.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, when one wants to activate every tracepoint
counters of a subsystem from perf record, the current sequence
is needed:
perf record -e subsys:ev1 -e subsys:ev2 -e subsys:ev3
This may annoy the most patient of us.
Now we can just do:
perf record -e subsys:*
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Various small cleanups - removal of debug printks and dead
functions, etc.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Import the schedbench.c tool that i wrote some time ago to
simulate scheduler behavior but never finished. It's a good
basis for perf sched nevertheless.
Most of its guts are not hooked up to the perf event loop
yet - that will be done in the patches to come.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This turn-key tool allows scheduler measurements to be
conducted and the results be displayed numerically.
First baby step towards that goal: clone the new command off of
perf trace.
Fix a few other details along the way:
- add (minimal) perf trace documentation
- reorder a few places
- list perf trace in the mainporcelain list as well
as it's a very useful utility.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (105 commits)
ring-buffer: only enable ring_buffer_swap_cpu when needed
ring-buffer: check for swapped buffers in start of committing
tracing: report error in trace if we fail to swap latency buffer
tracing: add trace_array_printk for internal tracers to use
tracing: pass around ring buffer instead of tracer
tracing: make tracing_reset safe for external use
tracing: use timestamp to determine start of latency traces
tracing: Remove mentioning of legacy latency_trace file from documentation
tracing/filters: Defer pred allocation, fix memory leak
tracing: remove users of tracing_reset
tracing: disable buffers and synchronize_sched before resetting
tracing: disable update max tracer while reading trace
tracing: print out start and stop in latency traces
ring-buffer: disable all cpu buffers when one finds a problem
ring-buffer: do not count discarded events
ring-buffer: remove ring_buffer_event_discard
ring-buffer: fix ring_buffer_read crossing pages
ring-buffer: remove unnecessary cpu_relax
ring-buffer: do not swap buffers during a commit
ring-buffer: do not reset while in a commit
...
This patch improves some (common) inefficiencies in the
handling of directory lookups:
- not using the d_type information returned by the kernel
- constructing (absolute) paths for file operation even though
directory-relative operations using the *at functions is
possible
There are more places to fix but this is a start.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20090904193951.GB6186@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove some, now useless, global storage.
Don't calculate the stddev when not needed.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use the more advanced single pass variance algorithm outlined
on the wikipedia page. This is numerically more stable for
larger sample sets.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When we're computing the mean by sampling the distribution,
then the std dev of the mean is related to the std dev of the
sample set by:
stddev_mean = std_dev / sqrt(N)
Which is exactly what we want.
This results in the error on the mean decreasing with
increasing number of samples.
Also fix the scaled == -1, aka not counted case.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since we don't need all the individual samples to calculate the
error remove both the limit and the storage overhead associated
with that.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The current noise computation does:
\Sum abs(n_i - avg(n)) * N^-1.5
Which is (afaik) not a regular noise function, and needs the
complete sample set available to post-process.
Change this to use a regular stddev computation which can be
done by keeping a two sums:
stddev = sqrt( 1/N (\Sum n_i^2) - avg(n)^2 )
For which we only need to keep \Sum n_i and \Sum n_i^2.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We did not account for the enclosing \0. Depending on what malloc()
gave us this resulted in corrupted version string printouts.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Print out more accurate timestamps - usecs does not cut it
anymore on fast enough boxes ;-)
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Leave the input fd at the data area.
It does not matter right now - but seeking at the end of it
certainly did not make sense.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We started parsing perf.data at head 0. This caused -D to
segfault and it could possibly also case incorrect trace
entries to be displayed.
Parse it at data_offset instead.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Older versions of GCC are rather stupid about strict aliasing:
util/trace-event-parse.c: In function 'parse_cmdlines':
util/trace-event-parse.c:93: warning: dereferencing type-punned pointer will break strict-aliasing rules
util/trace-event-parse.c: In function 'parse_proc_kallsyms':
util/trace-event-parse.c:155: warning: dereferencing type-punned pointer will break strict-aliasing rules
util/trace-event-parse.c:157: warning: dereferencing type-punned pointer will break strict-aliasing rules
util/trace-event-parse.c:158: warning: dereferencing type-punned pointer will break strict-aliasing rules
util/trace-event-parse.c: In function 'parse_ftrace_printk':
util/trace-event-parse.c:294: warning: dereferencing type-punned pointer will break strict-aliasing rules
util/trace-event-parse.c:295: warning: dereferencing type-punned pointer will break strict-aliasing rules
make: *** [util/trace-event-parse.o] Error 1
Make it clear to GCC that we intend with those pointers, by passing
them through via an explicit (void *) cast.
We might want to add -fno-strict-aliasing as well, like the kernel
itself does.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make it easier to turn warnings on/off by using a separate
line for each warning added.
Some of the warnings have too much of a nuisance factor and
we might want to turn them off in the future.
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In perf tools, we hardcode the pid 0 cmdline resolving to
"idle" because the init task is not included in the COMM
events.
But the idle tasks secondary cpus are resolved into their
"init" name through the COMM events.
We have then such strange result in perf report (ditto with
trace):
19.66% init [kernel] [k] acpi_idle_enter_c1
17.32% [idle] [kernel] [k] acpi_idle_enter_c1
It's then better to unify the swapper tasks into a single init
name.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1251693921-6579-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
The cmd-trace tool used the cmdline file and resolved the idle
thread using a hardcoded check for the 0 task pid.
Now we have a centralized way to do that from perf using
register_idle_thread() API.
Before:
:0-0 [000] 0.000000: irq_handler_entry: irq=0 handler=name
:0-0 [000] 0.000000: irq_handler_entry: irq=0 handler=name
After:
[idle]-0 [000] 0.000000: irq_handler_entry: irq=0 handler=name
[idle]-0 [000] 0.000000: irq_handler_entry: irq=0 handler=name
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1251693921-6579-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>