mirror of
https://github.com/torvalds/linux.git
synced 2025-01-01 15:51:46 +00:00
6673016f87
Add notes on using perf to collect and analyze CoreSight trace Signed-off-by: Robert Walker <robert.walker@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1518607481-4059-4-git-send-email-robert.walker@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
384 lines
18 KiB
Plaintext
384 lines
18 KiB
Plaintext
Coresight - HW Assisted Tracing on ARM
|
|
======================================
|
|
|
|
Author: Mathieu Poirier <mathieu.poirier@linaro.org>
|
|
Date: September 11th, 2014
|
|
|
|
Introduction
|
|
------------
|
|
|
|
Coresight is an umbrella of technologies allowing for the debugging of ARM
|
|
based SoC. It includes solutions for JTAG and HW assisted tracing. This
|
|
document is concerned with the latter.
|
|
|
|
HW assisted tracing is becoming increasingly useful when dealing with systems
|
|
that have many SoCs and other components like GPU and DMA engines. ARM has
|
|
developed a HW assisted tracing solution by means of different components, each
|
|
being added to a design at synthesis time to cater to specific tracing needs.
|
|
Components are generally categorised as source, link and sinks and are
|
|
(usually) discovered using the AMBA bus.
|
|
|
|
"Sources" generate a compressed stream representing the processor instruction
|
|
path based on tracing scenarios as configured by users. From there the stream
|
|
flows through the coresight system (via ATB bus) using links that are connecting
|
|
the emanating source to a sink(s). Sinks serve as endpoints to the coresight
|
|
implementation, either storing the compressed stream in a memory buffer or
|
|
creating an interface to the outside world where data can be transferred to a
|
|
host without fear of filling up the onboard coresight memory buffer.
|
|
|
|
At typical coresight system would look like this:
|
|
|
|
*****************************************************************
|
|
**************************** AMBA AXI ****************************===||
|
|
***************************************************************** ||
|
|
^ ^ | ||
|
|
| | * **
|
|
0000000 ::::: 0000000 ::::: ::::: @@@@@@@ ||||||||||||
|
|
0 CPU 0<-->: C : 0 CPU 0<-->: C : : C : @ STM @ || System ||
|
|
|->0000000 : T : |->0000000 : T : : T :<--->@@@@@ || Memory ||
|
|
| #######<-->: I : | #######<-->: I : : I : @@@<-| ||||||||||||
|
|
| # ETM # ::::: | # PTM # ::::: ::::: @ |
|
|
| ##### ^ ^ | ##### ^ ! ^ ! . | |||||||||
|
|
| |->### | ! | |->### | ! | ! . | || DAP ||
|
|
| | # | ! | | # | ! | ! . | |||||||||
|
|
| | . | ! | | . | ! | ! . | | |
|
|
| | . | ! | | . | ! | ! . | | *
|
|
| | . | ! | | . | ! | ! . | | SWD/
|
|
| | . | ! | | . | ! | ! . | | JTAG
|
|
*****************************************************************<-|
|
|
*************************** AMBA Debug APB ************************
|
|
*****************************************************************
|
|
| . ! . ! ! . |
|
|
| . * . * * . |
|
|
*****************************************************************
|
|
******************** Cross Trigger Matrix (CTM) *******************
|
|
*****************************************************************
|
|
| . ^ . . |
|
|
| * ! * * |
|
|
*****************************************************************
|
|
****************** AMBA Advanced Trace Bus (ATB) ******************
|
|
*****************************************************************
|
|
| ! =============== |
|
|
| * ===== F =====<---------|
|
|
| ::::::::: ==== U ====
|
|
|-->:: CTI ::<!! === N ===
|
|
| ::::::::: ! == N ==
|
|
| ^ * == E ==
|
|
| ! &&&&&&&&& IIIIIII == L ==
|
|
|------>&& ETB &&<......II I =======
|
|
| ! &&&&&&&&& II I .
|
|
| ! I I .
|
|
| ! I REP I<..........
|
|
| ! I I
|
|
| !!>&&&&&&&&& II I *Source: ARM ltd.
|
|
|------>& TPIU &<......II I DAP = Debug Access Port
|
|
&&&&&&&&& IIIIIII ETM = Embedded Trace Macrocell
|
|
; PTM = Program Trace Macrocell
|
|
; CTI = Cross Trigger Interface
|
|
* ETB = Embedded Trace Buffer
|
|
To trace port TPIU= Trace Port Interface Unit
|
|
SWD = Serial Wire Debug
|
|
|
|
While on target configuration of the components is done via the APB bus,
|
|
all trace data are carried out-of-band on the ATB bus. The CTM provides
|
|
a way to aggregate and distribute signals between CoreSight components.
|
|
|
|
The coresight framework provides a central point to represent, configure and
|
|
manage coresight devices on a platform. This first implementation centers on
|
|
the basic tracing functionality, enabling components such ETM/PTM, funnel,
|
|
replicator, TMC, TPIU and ETB. Future work will enable more
|
|
intricate IP blocks such as STM and CTI.
|
|
|
|
|
|
Acronyms and Classification
|
|
---------------------------
|
|
|
|
Acronyms:
|
|
|
|
PTM: Program Trace Macrocell
|
|
ETM: Embedded Trace Macrocell
|
|
STM: System trace Macrocell
|
|
ETB: Embedded Trace Buffer
|
|
ITM: Instrumentation Trace Macrocell
|
|
TPIU: Trace Port Interface Unit
|
|
TMC-ETR: Trace Memory Controller, configured as Embedded Trace Router
|
|
TMC-ETF: Trace Memory Controller, configured as Embedded Trace FIFO
|
|
CTI: Cross Trigger Interface
|
|
|
|
Classification:
|
|
|
|
Source:
|
|
ETMv3.x ETMv4, PTMv1.0, PTMv1.1, STM, STM500, ITM
|
|
Link:
|
|
Funnel, replicator (intelligent or not), TMC-ETR
|
|
Sinks:
|
|
ETBv1.0, ETB1.1, TPIU, TMC-ETF
|
|
Misc:
|
|
CTI
|
|
|
|
|
|
Device Tree Bindings
|
|
----------------------
|
|
|
|
See Documentation/devicetree/bindings/arm/coresight.txt for details.
|
|
|
|
As of this writing drivers for ITM, STMs and CTIs are not provided but are
|
|
expected to be added as the solution matures.
|
|
|
|
|
|
Framework and implementation
|
|
----------------------------
|
|
|
|
The coresight framework provides a central point to represent, configure and
|
|
manage coresight devices on a platform. Any coresight compliant device can
|
|
register with the framework for as long as they use the right APIs:
|
|
|
|
struct coresight_device *coresight_register(struct coresight_desc *desc);
|
|
void coresight_unregister(struct coresight_device *csdev);
|
|
|
|
The registering function is taking a "struct coresight_device *csdev" and
|
|
register the device with the core framework. The unregister function takes
|
|
a reference to a "struct coresight_device", obtained at registration time.
|
|
|
|
If everything goes well during the registration process the new devices will
|
|
show up under /sys/bus/coresight/devices, as showns here for a TC2 platform:
|
|
|
|
root:~# ls /sys/bus/coresight/devices/
|
|
replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm
|
|
20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm
|
|
root:~#
|
|
|
|
The functions take a "struct coresight_device", which looks like this:
|
|
|
|
struct coresight_desc {
|
|
enum coresight_dev_type type;
|
|
struct coresight_dev_subtype subtype;
|
|
const struct coresight_ops *ops;
|
|
struct coresight_platform_data *pdata;
|
|
struct device *dev;
|
|
const struct attribute_group **groups;
|
|
};
|
|
|
|
|
|
The "coresight_dev_type" identifies what the device is, i.e, source link or
|
|
sink while the "coresight_dev_subtype" will characterise that type further.
|
|
|
|
The "struct coresight_ops" is mandatory and will tell the framework how to
|
|
perform base operations related to the components, each component having
|
|
a different set of requirement. For that "struct coresight_ops_sink",
|
|
"struct coresight_ops_link" and "struct coresight_ops_source" have been
|
|
provided.
|
|
|
|
The next field, "struct coresight_platform_data *pdata" is acquired by calling
|
|
"of_get_coresight_platform_data()", as part of the driver's _probe routine and
|
|
"struct device *dev" gets the device reference embedded in the "amba_device":
|
|
|
|
static int etm_probe(struct amba_device *adev, const struct amba_id *id)
|
|
{
|
|
...
|
|
...
|
|
drvdata->dev = &adev->dev;
|
|
...
|
|
}
|
|
|
|
Specific class of device (source, link, or sink) have generic operations
|
|
that can be performed on them (see "struct coresight_ops"). The
|
|
"**groups" is a list of sysfs entries pertaining to operations
|
|
specific to that component only. "Implementation defined" customisations are
|
|
expected to be accessed and controlled using those entries.
|
|
|
|
Last but not least, "struct module *owner" is expected to be set to reflect
|
|
the information carried in "THIS_MODULE".
|
|
|
|
How to use the tracer modules
|
|
-----------------------------
|
|
|
|
Before trace collection can start, a coresight sink needs to be identify.
|
|
There is no limit on the amount of sinks (nor sources) that can be enabled at
|
|
any given moment. As a generic operation, all device pertaining to the sink
|
|
class will have an "active" entry in sysfs:
|
|
|
|
root:/sys/bus/coresight/devices# ls
|
|
replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm
|
|
20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm
|
|
root:/sys/bus/coresight/devices# ls 20010000.etb
|
|
enable_sink status trigger_cntr
|
|
root:/sys/bus/coresight/devices# echo 1 > 20010000.etb/enable_sink
|
|
root:/sys/bus/coresight/devices# cat 20010000.etb/enable_sink
|
|
1
|
|
root:/sys/bus/coresight/devices#
|
|
|
|
At boot time the current etm3x driver will configure the first address
|
|
comparator with "_stext" and "_etext", essentially tracing any instruction
|
|
that falls within that range. As such "enabling" a source will immediately
|
|
trigger a trace capture:
|
|
|
|
root:/sys/bus/coresight/devices# echo 1 > 2201c000.ptm/enable_source
|
|
root:/sys/bus/coresight/devices# cat 2201c000.ptm/enable_source
|
|
1
|
|
root:/sys/bus/coresight/devices# cat 20010000.etb/status
|
|
Depth: 0x2000
|
|
Status: 0x1
|
|
RAM read ptr: 0x0
|
|
RAM wrt ptr: 0x19d3 <----- The write pointer is moving
|
|
Trigger cnt: 0x0
|
|
Control: 0x1
|
|
Flush status: 0x0
|
|
Flush ctrl: 0x2001
|
|
root:/sys/bus/coresight/devices#
|
|
|
|
Trace collection is stopped the same way:
|
|
|
|
root:/sys/bus/coresight/devices# echo 0 > 2201c000.ptm/enable_source
|
|
root:/sys/bus/coresight/devices#
|
|
|
|
The content of the ETB buffer can be harvested directly from /dev:
|
|
|
|
root:/sys/bus/coresight/devices# dd if=/dev/20010000.etb \
|
|
of=~/cstrace.bin
|
|
|
|
64+0 records in
|
|
64+0 records out
|
|
32768 bytes (33 kB) copied, 0.00125258 s, 26.2 MB/s
|
|
root:/sys/bus/coresight/devices#
|
|
|
|
The file cstrace.bin can be decompressed using "ptm2human", DS-5 or Trace32.
|
|
|
|
Following is a DS-5 output of an experimental loop that increments a variable up
|
|
to a certain value. The example is simple and yet provides a glimpse of the
|
|
wealth of possibilities that coresight provides.
|
|
|
|
Info Tracing enabled
|
|
Instruction 106378866 0x8026B53C E52DE004 false PUSH {lr}
|
|
Instruction 0 0x8026B540 E24DD00C false SUB sp,sp,#0xc
|
|
Instruction 0 0x8026B544 E3A03000 false MOV r3,#0
|
|
Instruction 0 0x8026B548 E58D3004 false STR r3,[sp,#4]
|
|
Instruction 0 0x8026B54C E59D3004 false LDR r3,[sp,#4]
|
|
Instruction 0 0x8026B550 E3530004 false CMP r3,#4
|
|
Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
|
|
Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
|
|
Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
|
|
Timestamp Timestamp: 17106715833
|
|
Instruction 319 0x8026B54C E59D3004 false LDR r3,[sp,#4]
|
|
Instruction 0 0x8026B550 E3530004 false CMP r3,#4
|
|
Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
|
|
Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
|
|
Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
|
|
Instruction 9 0x8026B54C E59D3004 false LDR r3,[sp,#4]
|
|
Instruction 0 0x8026B550 E3530004 false CMP r3,#4
|
|
Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
|
|
Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
|
|
Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
|
|
Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4]
|
|
Instruction 0 0x8026B550 E3530004 false CMP r3,#4
|
|
Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
|
|
Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
|
|
Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
|
|
Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4]
|
|
Instruction 0 0x8026B550 E3530004 false CMP r3,#4
|
|
Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
|
|
Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
|
|
Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
|
|
Instruction 10 0x8026B54C E59D3004 false LDR r3,[sp,#4]
|
|
Instruction 0 0x8026B550 E3530004 false CMP r3,#4
|
|
Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
|
|
Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
|
|
Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
|
|
Instruction 6 0x8026B560 EE1D3F30 false MRC p15,#0x0,r3,c13,c0,#1
|
|
Instruction 0 0x8026B564 E1A0100D false MOV r1,sp
|
|
Instruction 0 0x8026B568 E3C12D7F false BIC r2,r1,#0x1fc0
|
|
Instruction 0 0x8026B56C E3C2203F false BIC r2,r2,#0x3f
|
|
Instruction 0 0x8026B570 E59D1004 false LDR r1,[sp,#4]
|
|
Instruction 0 0x8026B574 E59F0010 false LDR r0,[pc,#16] ; [0x8026B58C] = 0x80550368
|
|
Instruction 0 0x8026B578 E592200C false LDR r2,[r2,#0xc]
|
|
Instruction 0 0x8026B57C E59221D0 false LDR r2,[r2,#0x1d0]
|
|
Instruction 0 0x8026B580 EB07A4CF true BL {pc}+0x1e9344 ; 0x804548c4
|
|
Info Tracing enabled
|
|
Instruction 13570831 0x8026B584 E28DD00C false ADD sp,sp,#0xc
|
|
Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc}
|
|
Timestamp Timestamp: 17107041535
|
|
|
|
How to use the STM module
|
|
-------------------------
|
|
|
|
Using the System Trace Macrocell module is the same as the tracers - the only
|
|
difference is that clients are driving the trace capture rather
|
|
than the program flow through the code.
|
|
|
|
As with any other CoreSight component, specifics about the STM tracer can be
|
|
found in sysfs with more information on each entry being found in [1]:
|
|
|
|
root@genericarmv8:~# ls /sys/bus/coresight/devices/20100000.stm
|
|
enable_source hwevent_select port_enable subsystem uevent
|
|
hwevent_enable mgmt port_select traceid
|
|
root@genericarmv8:~#
|
|
|
|
Like any other source a sink needs to be identified and the STM enabled before
|
|
being used:
|
|
|
|
root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20010000.etf/enable_sink
|
|
root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20100000.stm/enable_source
|
|
|
|
From there user space applications can request and use channels using the devfs
|
|
interface provided for that purpose by the generic STM API:
|
|
|
|
root@genericarmv8:~# ls -l /dev/20100000.stm
|
|
crw------- 1 root root 10, 61 Jan 3 18:11 /dev/20100000.stm
|
|
root@genericarmv8:~#
|
|
|
|
Details on how to use the generic STM API can be found here [2].
|
|
|
|
[1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
|
|
[2]. Documentation/trace/stm.txt
|
|
|
|
|
|
Using perf tools
|
|
----------------
|
|
|
|
perf can be used to record and analyze trace of programs.
|
|
|
|
Execution can be recorded using 'perf record' with the cs_etm event,
|
|
specifying the name of the sink to record to, e.g:
|
|
|
|
perf record -e cs_etm/@20070000.etr/u --per-thread
|
|
|
|
The 'perf report' and 'perf script' commands can be used to analyze execution,
|
|
synthesizing instruction and branch events from the instruction trace.
|
|
'perf inject' can be used to replace the trace data with the synthesized events.
|
|
The --itrace option controls the type and frequency of synthesized events
|
|
(see perf documentation).
|
|
|
|
Note that only 64-bit programs are currently supported - further work is
|
|
required to support instruction decode of 32-bit Arm programs.
|
|
|
|
|
|
Generating coverage files for Feedback Directed Optimization: AutoFDO
|
|
---------------------------------------------------------------------
|
|
|
|
'perf inject' accepts the --itrace option in which case tracing data is
|
|
removed and replaced with the synthesized events. e.g.
|
|
|
|
perf inject --itrace --strip -i perf.data -o perf.data.new
|
|
|
|
Below is an example of using ARM ETM for autoFDO. It requires autofdo
|
|
(https://github.com/google/autofdo) and gcc version 5. The bubble
|
|
sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
|
|
|
|
$ gcc-5 -O3 sort.c -o sort
|
|
$ taskset -c 2 ./sort
|
|
Bubble sorting array of 30000 elements
|
|
5910 ms
|
|
|
|
$ perf record -e cs_etm/@20070000.etr/u --per-thread taskset -c 2 ./sort
|
|
Bubble sorting array of 30000 elements
|
|
12543 ms
|
|
[ perf record: Woken up 35 times to write data ]
|
|
[ perf record: Captured and wrote 69.640 MB perf.data ]
|
|
|
|
$ perf inject -i perf.data -o inj.data --itrace=il64 --strip
|
|
$ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
|
|
$ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
|
|
$ taskset -c 2 ./sort_autofdo
|
|
Bubble sorting array of 30000 elements
|
|
5806 ms
|