Merge branch 'doc/4.9' into docs-next
This commit is contained in:
commit
e349b1b700
@ -1,10 +1,18 @@
|
|||||||
Copyright 2010 Nicolas Palix <npalix@diku.dk>
|
.. Copyright 2010 Nicolas Palix <npalix@diku.dk>
|
||||||
Copyright 2010 Julia Lawall <julia@diku.dk>
|
.. Copyright 2010 Julia Lawall <julia@diku.dk>
|
||||||
Copyright 2010 Gilles Muller <Gilles.Muller@lip6.fr>
|
.. Copyright 2010 Gilles Muller <Gilles.Muller@lip6.fr>
|
||||||
|
|
||||||
|
.. highlight:: none
|
||||||
|
|
||||||
Getting Coccinelle
|
Coccinelle
|
||||||
~~~~~~~~~~~~~~~~~~~~
|
==========
|
||||||
|
|
||||||
|
Coccinelle is a tool for pattern matching and text transformation that has
|
||||||
|
many uses in kernel development, including the application of complex,
|
||||||
|
tree-wide patches and detection of problematic programming patterns.
|
||||||
|
|
||||||
|
Getting Coccinelle
|
||||||
|
-------------------
|
||||||
|
|
||||||
The semantic patches included in the kernel use features and options
|
The semantic patches included in the kernel use features and options
|
||||||
which are provided by Coccinelle version 1.0.0-rc11 and above.
|
which are provided by Coccinelle version 1.0.0-rc11 and above.
|
||||||
@ -22,24 +30,23 @@ of many distributions, e.g. :
|
|||||||
- NetBSD
|
- NetBSD
|
||||||
- FreeBSD
|
- FreeBSD
|
||||||
|
|
||||||
|
|
||||||
You can get the latest version released from the Coccinelle homepage at
|
You can get the latest version released from the Coccinelle homepage at
|
||||||
http://coccinelle.lip6.fr/
|
http://coccinelle.lip6.fr/
|
||||||
|
|
||||||
Information and tips about Coccinelle are also provided on the wiki
|
Information and tips about Coccinelle are also provided on the wiki
|
||||||
pages at http://cocci.ekstranet.diku.dk/wiki/doku.php
|
pages at http://cocci.ekstranet.diku.dk/wiki/doku.php
|
||||||
|
|
||||||
Once you have it, run the following command:
|
Once you have it, run the following command::
|
||||||
|
|
||||||
./configure
|
./configure
|
||||||
make
|
make
|
||||||
|
|
||||||
as a regular user, and install it with
|
as a regular user, and install it with::
|
||||||
|
|
||||||
sudo make install
|
sudo make install
|
||||||
|
|
||||||
Supplemental documentation
|
Supplemental documentation
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
---------------------------
|
||||||
|
|
||||||
For supplemental documentation refer to the wiki:
|
For supplemental documentation refer to the wiki:
|
||||||
|
|
||||||
@ -47,49 +54,52 @@ https://bottest.wiki.kernel.org/coccicheck
|
|||||||
|
|
||||||
The wiki documentation always refers to the linux-next version of the script.
|
The wiki documentation always refers to the linux-next version of the script.
|
||||||
|
|
||||||
Using Coccinelle on the Linux kernel
|
Using Coccinelle on the Linux kernel
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
------------------------------------
|
||||||
|
|
||||||
A Coccinelle-specific target is defined in the top level
|
A Coccinelle-specific target is defined in the top level
|
||||||
Makefile. This target is named 'coccicheck' and calls the 'coccicheck'
|
Makefile. This target is named ``coccicheck`` and calls the ``coccicheck``
|
||||||
front-end in the 'scripts' directory.
|
front-end in the ``scripts`` directory.
|
||||||
|
|
||||||
Four basic modes are defined: patch, report, context, and org. The mode to
|
Four basic modes are defined: ``patch``, ``report``, ``context``, and
|
||||||
use is specified by setting the MODE variable with 'MODE=<mode>'.
|
``org``. The mode to use is specified by setting the MODE variable with
|
||||||
|
``MODE=<mode>``.
|
||||||
|
|
||||||
'patch' proposes a fix, when possible.
|
- ``patch`` proposes a fix, when possible.
|
||||||
|
|
||||||
'report' generates a list in the following format:
|
- ``report`` generates a list in the following format:
|
||||||
file:line:column-column: message
|
file:line:column-column: message
|
||||||
|
|
||||||
'context' highlights lines of interest and their context in a
|
- ``context`` highlights lines of interest and their context in a
|
||||||
diff-like style.Lines of interest are indicated with '-'.
|
diff-like style.Lines of interest are indicated with ``-``.
|
||||||
|
|
||||||
'org' generates a report in the Org mode format of Emacs.
|
- ``org`` generates a report in the Org mode format of Emacs.
|
||||||
|
|
||||||
Note that not all semantic patches implement all modes. For easy use
|
Note that not all semantic patches implement all modes. For easy use
|
||||||
of Coccinelle, the default mode is "report".
|
of Coccinelle, the default mode is "report".
|
||||||
|
|
||||||
Two other modes provide some common combinations of these modes.
|
Two other modes provide some common combinations of these modes.
|
||||||
|
|
||||||
'chain' tries the previous modes in the order above until one succeeds.
|
- ``chain`` tries the previous modes in the order above until one succeeds.
|
||||||
|
|
||||||
'rep+ctxt' runs successively the report mode and the context mode.
|
- ``rep+ctxt`` runs successively the report mode and the context mode.
|
||||||
It should be used with the C option (described later)
|
It should be used with the C option (described later)
|
||||||
which checks the code on a file basis.
|
which checks the code on a file basis.
|
||||||
|
|
||||||
Examples:
|
Examples
|
||||||
To make a report for every semantic patch, run the following command:
|
~~~~~~~~
|
||||||
|
|
||||||
|
To make a report for every semantic patch, run the following command::
|
||||||
|
|
||||||
make coccicheck MODE=report
|
make coccicheck MODE=report
|
||||||
|
|
||||||
To produce patches, run:
|
To produce patches, run::
|
||||||
|
|
||||||
make coccicheck MODE=patch
|
make coccicheck MODE=patch
|
||||||
|
|
||||||
|
|
||||||
The coccicheck target applies every semantic patch available in the
|
The coccicheck target applies every semantic patch available in the
|
||||||
sub-directories of 'scripts/coccinelle' to the entire Linux kernel.
|
sub-directories of ``scripts/coccinelle`` to the entire Linux kernel.
|
||||||
|
|
||||||
For each semantic patch, a commit message is proposed. It gives a
|
For each semantic patch, a commit message is proposed. It gives a
|
||||||
description of the problem being checked by the semantic patch, and
|
description of the problem being checked by the semantic patch, and
|
||||||
@ -99,15 +109,15 @@ As any static code analyzer, Coccinelle produces false
|
|||||||
positives. Thus, reports must be carefully checked, and patches
|
positives. Thus, reports must be carefully checked, and patches
|
||||||
reviewed.
|
reviewed.
|
||||||
|
|
||||||
To enable verbose messages set the V= variable, for example:
|
To enable verbose messages set the V= variable, for example::
|
||||||
|
|
||||||
make coccicheck MODE=report V=1
|
make coccicheck MODE=report V=1
|
||||||
|
|
||||||
Coccinelle parallelization
|
Coccinelle parallelization
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
---------------------------
|
||||||
|
|
||||||
By default, coccicheck tries to run as parallel as possible. To change
|
By default, coccicheck tries to run as parallel as possible. To change
|
||||||
the parallelism, set the J= variable. For example, to run across 4 CPUs:
|
the parallelism, set the J= variable. For example, to run across 4 CPUs::
|
||||||
|
|
||||||
make coccicheck MODE=report J=4
|
make coccicheck MODE=report J=4
|
||||||
|
|
||||||
@ -115,44 +125,47 @@ As of Coccinelle 1.0.2 Coccinelle uses Ocaml parmap for parallelization,
|
|||||||
if support for this is detected you will benefit from parmap parallelization.
|
if support for this is detected you will benefit from parmap parallelization.
|
||||||
|
|
||||||
When parmap is enabled coccicheck will enable dynamic load balancing by using
|
When parmap is enabled coccicheck will enable dynamic load balancing by using
|
||||||
'--chunksize 1' argument, this ensures we keep feeding threads with work
|
``--chunksize 1`` argument, this ensures we keep feeding threads with work
|
||||||
one by one, so that we avoid the situation where most work gets done by only
|
one by one, so that we avoid the situation where most work gets done by only
|
||||||
a few threads. With dynamic load balancing, if a thread finishes early we keep
|
a few threads. With dynamic load balancing, if a thread finishes early we keep
|
||||||
feeding it more work.
|
feeding it more work.
|
||||||
|
|
||||||
When parmap is enabled, if an error occurs in Coccinelle, this error
|
When parmap is enabled, if an error occurs in Coccinelle, this error
|
||||||
value is propagated back, the return value of the 'make coccicheck'
|
value is propagated back, the return value of the ``make coccicheck``
|
||||||
captures this return value.
|
captures this return value.
|
||||||
|
|
||||||
Using Coccinelle with a single semantic patch
|
Using Coccinelle with a single semantic patch
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
---------------------------------------------
|
||||||
|
|
||||||
The optional make variable COCCI can be used to check a single
|
The optional make variable COCCI can be used to check a single
|
||||||
semantic patch. In that case, the variable must be initialized with
|
semantic patch. In that case, the variable must be initialized with
|
||||||
the name of the semantic patch to apply.
|
the name of the semantic patch to apply.
|
||||||
|
|
||||||
For instance:
|
For instance::
|
||||||
|
|
||||||
make coccicheck COCCI=<my_SP.cocci> MODE=patch
|
make coccicheck COCCI=<my_SP.cocci> MODE=patch
|
||||||
or
|
|
||||||
|
or::
|
||||||
|
|
||||||
make coccicheck COCCI=<my_SP.cocci> MODE=report
|
make coccicheck COCCI=<my_SP.cocci> MODE=report
|
||||||
|
|
||||||
|
|
||||||
Controlling Which Files are Processed by Coccinelle
|
Controlling Which Files are Processed by Coccinelle
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
---------------------------------------------------
|
||||||
|
|
||||||
By default the entire kernel source tree is checked.
|
By default the entire kernel source tree is checked.
|
||||||
|
|
||||||
To apply Coccinelle to a specific directory, M= can be used.
|
To apply Coccinelle to a specific directory, ``M=`` can be used.
|
||||||
For example, to check drivers/net/wireless/ one may write:
|
For example, to check drivers/net/wireless/ one may write::
|
||||||
|
|
||||||
make coccicheck M=drivers/net/wireless/
|
make coccicheck M=drivers/net/wireless/
|
||||||
|
|
||||||
To apply Coccinelle on a file basis, instead of a directory basis, the
|
To apply Coccinelle on a file basis, instead of a directory basis, the
|
||||||
following command may be used:
|
following command may be used::
|
||||||
|
|
||||||
make C=1 CHECK="scripts/coccicheck"
|
make C=1 CHECK="scripts/coccicheck"
|
||||||
|
|
||||||
To check only newly edited code, use the value 2 for the C flag, i.e.
|
To check only newly edited code, use the value 2 for the C flag, i.e.::
|
||||||
|
|
||||||
make C=2 CHECK="scripts/coccicheck"
|
make C=2 CHECK="scripts/coccicheck"
|
||||||
|
|
||||||
@ -166,8 +179,8 @@ semantic patch as shown in the previous section.
|
|||||||
The "report" mode is the default. You can select another one with the
|
The "report" mode is the default. You can select another one with the
|
||||||
MODE variable explained above.
|
MODE variable explained above.
|
||||||
|
|
||||||
Debugging Coccinelle SmPL patches
|
Debugging Coccinelle SmPL patches
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
---------------------------------
|
||||||
|
|
||||||
Using coccicheck is best as it provides in the spatch command line
|
Using coccicheck is best as it provides in the spatch command line
|
||||||
include options matching the options used when we compile the kernel.
|
include options matching the options used when we compile the kernel.
|
||||||
@ -177,8 +190,8 @@ manually run Coccinelle with debug options added.
|
|||||||
Alternatively you can debug running Coccinelle against SmPL patches
|
Alternatively you can debug running Coccinelle against SmPL patches
|
||||||
by asking for stderr to be redirected to stderr, by default stderr
|
by asking for stderr to be redirected to stderr, by default stderr
|
||||||
is redirected to /dev/null, if you'd like to capture stderr you
|
is redirected to /dev/null, if you'd like to capture stderr you
|
||||||
can specify the DEBUG_FILE="file.txt" option to coccicheck. For
|
can specify the ``DEBUG_FILE="file.txt"`` option to coccicheck. For
|
||||||
instance:
|
instance::
|
||||||
|
|
||||||
rm -f cocci.err
|
rm -f cocci.err
|
||||||
make coccicheck COCCI=scripts/coccinelle/free/kfree.cocci MODE=report DEBUG_FILE=cocci.err
|
make coccicheck COCCI=scripts/coccinelle/free/kfree.cocci MODE=report DEBUG_FILE=cocci.err
|
||||||
@ -186,7 +199,7 @@ instance:
|
|||||||
|
|
||||||
You can use SPFLAGS to add debugging flags, for instance you may want to
|
You can use SPFLAGS to add debugging flags, for instance you may want to
|
||||||
add both --profile --show-trying to SPFLAGS when debugging. For instance
|
add both --profile --show-trying to SPFLAGS when debugging. For instance
|
||||||
you may want to use:
|
you may want to use::
|
||||||
|
|
||||||
rm -f err.log
|
rm -f err.log
|
||||||
export COCCI=scripts/coccinelle/misc/irqf_oneshot.cocci
|
export COCCI=scripts/coccinelle/misc/irqf_oneshot.cocci
|
||||||
@ -198,24 +211,24 @@ work.
|
|||||||
|
|
||||||
DEBUG_FILE support is only supported when using coccinelle >= 1.2.
|
DEBUG_FILE support is only supported when using coccinelle >= 1.2.
|
||||||
|
|
||||||
.cocciconfig support
|
.cocciconfig support
|
||||||
~~~~~~~~~~~~~~~~~~~~~~
|
--------------------
|
||||||
|
|
||||||
Coccinelle supports reading .cocciconfig for default Coccinelle options that
|
Coccinelle supports reading .cocciconfig for default Coccinelle options that
|
||||||
should be used every time spatch is spawned, the order of precedence for
|
should be used every time spatch is spawned, the order of precedence for
|
||||||
variables for .cocciconfig is as follows:
|
variables for .cocciconfig is as follows:
|
||||||
|
|
||||||
o Your current user's home directory is processed first
|
- Your current user's home directory is processed first
|
||||||
o Your directory from which spatch is called is processed next
|
- Your directory from which spatch is called is processed next
|
||||||
o The directory provided with the --dir option is processed last, if used
|
- The directory provided with the --dir option is processed last, if used
|
||||||
|
|
||||||
Since coccicheck runs through make, it naturally runs from the kernel
|
Since coccicheck runs through make, it naturally runs from the kernel
|
||||||
proper dir, as such the second rule above would be implied for picking up a
|
proper dir, as such the second rule above would be implied for picking up a
|
||||||
.cocciconfig when using 'make coccicheck'.
|
.cocciconfig when using ``make coccicheck``.
|
||||||
|
|
||||||
'make coccicheck' also supports using M= targets.If you do not supply
|
``make coccicheck`` also supports using M= targets.If you do not supply
|
||||||
any M= target, it is assumed you want to target the entire kernel.
|
any M= target, it is assumed you want to target the entire kernel.
|
||||||
The kernel coccicheck script has:
|
The kernel coccicheck script has::
|
||||||
|
|
||||||
if [ "$KBUILD_EXTMOD" = "" ] ; then
|
if [ "$KBUILD_EXTMOD" = "" ] ; then
|
||||||
OPTIONS="--dir $srctree $COCCIINCLUDE"
|
OPTIONS="--dir $srctree $COCCIINCLUDE"
|
||||||
@ -235,12 +248,12 @@ override any of the kernel's .coccicheck's settings using SPFLAGS.
|
|||||||
|
|
||||||
We help Coccinelle when used against Linux with a set of sensible defaults
|
We help Coccinelle when used against Linux with a set of sensible defaults
|
||||||
options for Linux with our own Linux .cocciconfig. This hints to coccinelle
|
options for Linux with our own Linux .cocciconfig. This hints to coccinelle
|
||||||
git can be used for 'git grep' queries over coccigrep. A timeout of 200
|
git can be used for ``git grep`` queries over coccigrep. A timeout of 200
|
||||||
seconds should suffice for now.
|
seconds should suffice for now.
|
||||||
|
|
||||||
The options picked up by coccinelle when reading a .cocciconfig do not appear
|
The options picked up by coccinelle when reading a .cocciconfig do not appear
|
||||||
as arguments to spatch processes running on your system, to confirm what
|
as arguments to spatch processes running on your system, to confirm what
|
||||||
options will be used by Coccinelle run:
|
options will be used by Coccinelle run::
|
||||||
|
|
||||||
spatch --print-options-only
|
spatch --print-options-only
|
||||||
|
|
||||||
@ -252,219 +265,227 @@ carries its own .cocciconfig, you will need to use SPFLAGS to use idutils if
|
|||||||
desired. See below section "Additional flags" for more details on how to use
|
desired. See below section "Additional flags" for more details on how to use
|
||||||
idutils.
|
idutils.
|
||||||
|
|
||||||
Additional flags
|
Additional flags
|
||||||
~~~~~~~~~~~~~~~~~~
|
----------------
|
||||||
|
|
||||||
Additional flags can be passed to spatch through the SPFLAGS
|
Additional flags can be passed to spatch through the SPFLAGS
|
||||||
variable. This works as Coccinelle respects the last flags
|
variable. This works as Coccinelle respects the last flags
|
||||||
given to it when options are in conflict.
|
given to it when options are in conflict. ::
|
||||||
|
|
||||||
make SPFLAGS=--use-glimpse coccicheck
|
make SPFLAGS=--use-glimpse coccicheck
|
||||||
|
|
||||||
Coccinelle supports idutils as well but requires coccinelle >= 1.0.6.
|
Coccinelle supports idutils as well but requires coccinelle >= 1.0.6.
|
||||||
When no ID file is specified coccinelle assumes your ID database file
|
When no ID file is specified coccinelle assumes your ID database file
|
||||||
is in the file .id-utils.index on the top level of the kernel, coccinelle
|
is in the file .id-utils.index on the top level of the kernel, coccinelle
|
||||||
carries a script scripts/idutils_index.sh which creates the database with
|
carries a script scripts/idutils_index.sh which creates the database with::
|
||||||
|
|
||||||
mkid -i C --output .id-utils.index
|
mkid -i C --output .id-utils.index
|
||||||
|
|
||||||
If you have another database filename you can also just symlink with this
|
If you have another database filename you can also just symlink with this
|
||||||
name.
|
name. ::
|
||||||
|
|
||||||
make SPFLAGS=--use-idutils coccicheck
|
make SPFLAGS=--use-idutils coccicheck
|
||||||
|
|
||||||
Alternatively you can specify the database filename explicitly, for
|
Alternatively you can specify the database filename explicitly, for
|
||||||
instance:
|
instance::
|
||||||
|
|
||||||
make SPFLAGS="--use-idutils /full-path/to/ID" coccicheck
|
make SPFLAGS="--use-idutils /full-path/to/ID" coccicheck
|
||||||
|
|
||||||
See spatch --help to learn more about spatch options.
|
See ``spatch --help`` to learn more about spatch options.
|
||||||
|
|
||||||
Note that the '--use-glimpse' and '--use-idutils' options
|
Note that the ``--use-glimpse`` and ``--use-idutils`` options
|
||||||
require external tools for indexing the code. None of them is
|
require external tools for indexing the code. None of them is
|
||||||
thus active by default. However, by indexing the code with
|
thus active by default. However, by indexing the code with
|
||||||
one of these tools, and according to the cocci file used,
|
one of these tools, and according to the cocci file used,
|
||||||
spatch could proceed the entire code base more quickly.
|
spatch could proceed the entire code base more quickly.
|
||||||
|
|
||||||
SmPL patch specific options
|
SmPL patch specific options
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
---------------------------
|
||||||
|
|
||||||
SmPL patches can have their own requirements for options passed
|
SmPL patches can have their own requirements for options passed
|
||||||
to Coccinelle. SmPL patch specific options can be provided by
|
to Coccinelle. SmPL patch specific options can be provided by
|
||||||
providing them at the top of the SmPL patch, for instance:
|
providing them at the top of the SmPL patch, for instance::
|
||||||
|
|
||||||
// Options: --no-includes --include-headers
|
// Options: --no-includes --include-headers
|
||||||
|
|
||||||
SmPL patch Coccinelle requirements
|
SmPL patch Coccinelle requirements
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
----------------------------------
|
||||||
|
|
||||||
As Coccinelle features get added some more advanced SmPL patches
|
As Coccinelle features get added some more advanced SmPL patches
|
||||||
may require newer versions of Coccinelle. If an SmPL patch requires
|
may require newer versions of Coccinelle. If an SmPL patch requires
|
||||||
at least a version of Coccinelle, this can be specified as follows,
|
at least a version of Coccinelle, this can be specified as follows,
|
||||||
as an example if requiring at least Coccinelle >= 1.0.5:
|
as an example if requiring at least Coccinelle >= 1.0.5::
|
||||||
|
|
||||||
// Requires: 1.0.5
|
// Requires: 1.0.5
|
||||||
|
|
||||||
Proposing new semantic patches
|
Proposing new semantic patches
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
-------------------------------
|
||||||
|
|
||||||
New semantic patches can be proposed and submitted by kernel
|
New semantic patches can be proposed and submitted by kernel
|
||||||
developers. For sake of clarity, they should be organized in the
|
developers. For sake of clarity, they should be organized in the
|
||||||
sub-directories of 'scripts/coccinelle/'.
|
sub-directories of ``scripts/coccinelle/``.
|
||||||
|
|
||||||
|
|
||||||
Detailed description of the 'report' mode
|
Detailed description of the ``report`` mode
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
-------------------------------------------
|
||||||
|
|
||||||
|
``report`` generates a list in the following format::
|
||||||
|
|
||||||
'report' generates a list in the following format:
|
|
||||||
file:line:column-column: message
|
file:line:column-column: message
|
||||||
|
|
||||||
Example:
|
Example
|
||||||
|
~~~~~~~
|
||||||
|
|
||||||
Running
|
Running::
|
||||||
|
|
||||||
make coccicheck MODE=report COCCI=scripts/coccinelle/api/err_cast.cocci
|
make coccicheck MODE=report COCCI=scripts/coccinelle/api/err_cast.cocci
|
||||||
|
|
||||||
will execute the following part of the SmPL script.
|
will execute the following part of the SmPL script::
|
||||||
|
|
||||||
<smpl>
|
<smpl>
|
||||||
@r depends on !context && !patch && (org || report)@
|
@r depends on !context && !patch && (org || report)@
|
||||||
expression x;
|
expression x;
|
||||||
position p;
|
position p;
|
||||||
@@
|
@@
|
||||||
|
|
||||||
ERR_PTR@p(PTR_ERR(x))
|
ERR_PTR@p(PTR_ERR(x))
|
||||||
|
|
||||||
@script:python depends on report@
|
@script:python depends on report@
|
||||||
p << r.p;
|
p << r.p;
|
||||||
x << r.x;
|
x << r.x;
|
||||||
@@
|
@@
|
||||||
|
|
||||||
msg="ERR_CAST can be used with %s" % (x)
|
msg="ERR_CAST can be used with %s" % (x)
|
||||||
coccilib.report.print_report(p[0], msg)
|
coccilib.report.print_report(p[0], msg)
|
||||||
</smpl>
|
</smpl>
|
||||||
|
|
||||||
This SmPL excerpt generates entries on the standard output, as
|
This SmPL excerpt generates entries on the standard output, as
|
||||||
illustrated below:
|
illustrated below::
|
||||||
|
|
||||||
/home/user/linux/crypto/ctr.c:188:9-16: ERR_CAST can be used with alg
|
/home/user/linux/crypto/ctr.c:188:9-16: ERR_CAST can be used with alg
|
||||||
/home/user/linux/crypto/authenc.c:619:9-16: ERR_CAST can be used with auth
|
/home/user/linux/crypto/authenc.c:619:9-16: ERR_CAST can be used with auth
|
||||||
/home/user/linux/crypto/xts.c:227:9-16: ERR_CAST can be used with alg
|
/home/user/linux/crypto/xts.c:227:9-16: ERR_CAST can be used with alg
|
||||||
|
|
||||||
|
|
||||||
Detailed description of the 'patch' mode
|
Detailed description of the ``patch`` mode
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
------------------------------------------
|
||||||
|
|
||||||
When the 'patch' mode is available, it proposes a fix for each problem
|
When the ``patch`` mode is available, it proposes a fix for each problem
|
||||||
identified.
|
identified.
|
||||||
|
|
||||||
Example:
|
Example
|
||||||
|
~~~~~~~
|
||||||
|
|
||||||
|
Running::
|
||||||
|
|
||||||
Running
|
|
||||||
make coccicheck MODE=patch COCCI=scripts/coccinelle/api/err_cast.cocci
|
make coccicheck MODE=patch COCCI=scripts/coccinelle/api/err_cast.cocci
|
||||||
|
|
||||||
will execute the following part of the SmPL script.
|
will execute the following part of the SmPL script::
|
||||||
|
|
||||||
<smpl>
|
<smpl>
|
||||||
@ depends on !context && patch && !org && !report @
|
@ depends on !context && patch && !org && !report @
|
||||||
expression x;
|
expression x;
|
||||||
@@
|
@@
|
||||||
|
|
||||||
- ERR_PTR(PTR_ERR(x))
|
- ERR_PTR(PTR_ERR(x))
|
||||||
+ ERR_CAST(x)
|
+ ERR_CAST(x)
|
||||||
</smpl>
|
</smpl>
|
||||||
|
|
||||||
This SmPL excerpt generates patch hunks on the standard output, as
|
This SmPL excerpt generates patch hunks on the standard output, as
|
||||||
illustrated below:
|
illustrated below::
|
||||||
|
|
||||||
diff -u -p a/crypto/ctr.c b/crypto/ctr.c
|
diff -u -p a/crypto/ctr.c b/crypto/ctr.c
|
||||||
--- a/crypto/ctr.c 2010-05-26 10:49:38.000000000 +0200
|
--- a/crypto/ctr.c 2010-05-26 10:49:38.000000000 +0200
|
||||||
+++ b/crypto/ctr.c 2010-06-03 23:44:49.000000000 +0200
|
+++ b/crypto/ctr.c 2010-06-03 23:44:49.000000000 +0200
|
||||||
@@ -185,7 +185,7 @@ static struct crypto_instance *crypto_ct
|
@@ -185,7 +185,7 @@ static struct crypto_instance *crypto_ct
|
||||||
alg = crypto_attr_alg(tb[1], CRYPTO_ALG_TYPE_CIPHER,
|
alg = crypto_attr_alg(tb[1], CRYPTO_ALG_TYPE_CIPHER,
|
||||||
CRYPTO_ALG_TYPE_MASK);
|
CRYPTO_ALG_TYPE_MASK);
|
||||||
if (IS_ERR(alg))
|
if (IS_ERR(alg))
|
||||||
- return ERR_PTR(PTR_ERR(alg));
|
- return ERR_PTR(PTR_ERR(alg));
|
||||||
+ return ERR_CAST(alg);
|
+ return ERR_CAST(alg);
|
||||||
|
|
||||||
/* Block size must be >= 4 bytes. */
|
/* Block size must be >= 4 bytes. */
|
||||||
err = -EINVAL;
|
err = -EINVAL;
|
||||||
|
|
||||||
Detailed description of the 'context' mode
|
Detailed description of the ``context`` mode
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
--------------------------------------------
|
||||||
|
|
||||||
'context' highlights lines of interest and their context
|
``context`` highlights lines of interest and their context
|
||||||
in a diff-like style.
|
in a diff-like style.
|
||||||
|
|
||||||
NOTE: The diff-like output generated is NOT an applicable patch. The
|
**NOTE**: The diff-like output generated is NOT an applicable patch. The
|
||||||
intent of the 'context' mode is to highlight the important lines
|
intent of the ``context`` mode is to highlight the important lines
|
||||||
(annotated with minus, '-') and gives some surrounding context
|
(annotated with minus, ``-``) and gives some surrounding context
|
||||||
lines around. This output can be used with the diff mode of
|
lines around. This output can be used with the diff mode of
|
||||||
Emacs to review the code.
|
Emacs to review the code.
|
||||||
|
|
||||||
Example:
|
Example
|
||||||
|
~~~~~~~
|
||||||
|
|
||||||
|
Running::
|
||||||
|
|
||||||
Running
|
|
||||||
make coccicheck MODE=context COCCI=scripts/coccinelle/api/err_cast.cocci
|
make coccicheck MODE=context COCCI=scripts/coccinelle/api/err_cast.cocci
|
||||||
|
|
||||||
will execute the following part of the SmPL script.
|
will execute the following part of the SmPL script::
|
||||||
|
|
||||||
<smpl>
|
<smpl>
|
||||||
@ depends on context && !patch && !org && !report@
|
@ depends on context && !patch && !org && !report@
|
||||||
expression x;
|
expression x;
|
||||||
@@
|
@@
|
||||||
|
|
||||||
* ERR_PTR(PTR_ERR(x))
|
* ERR_PTR(PTR_ERR(x))
|
||||||
</smpl>
|
</smpl>
|
||||||
|
|
||||||
This SmPL excerpt generates diff hunks on the standard output, as
|
This SmPL excerpt generates diff hunks on the standard output, as
|
||||||
illustrated below:
|
illustrated below::
|
||||||
|
|
||||||
diff -u -p /home/user/linux/crypto/ctr.c /tmp/nothing
|
diff -u -p /home/user/linux/crypto/ctr.c /tmp/nothing
|
||||||
--- /home/user/linux/crypto/ctr.c 2010-05-26 10:49:38.000000000 +0200
|
--- /home/user/linux/crypto/ctr.c 2010-05-26 10:49:38.000000000 +0200
|
||||||
+++ /tmp/nothing
|
+++ /tmp/nothing
|
||||||
@@ -185,7 +185,6 @@ static struct crypto_instance *crypto_ct
|
@@ -185,7 +185,6 @@ static struct crypto_instance *crypto_ct
|
||||||
alg = crypto_attr_alg(tb[1], CRYPTO_ALG_TYPE_CIPHER,
|
alg = crypto_attr_alg(tb[1], CRYPTO_ALG_TYPE_CIPHER,
|
||||||
CRYPTO_ALG_TYPE_MASK);
|
CRYPTO_ALG_TYPE_MASK);
|
||||||
if (IS_ERR(alg))
|
if (IS_ERR(alg))
|
||||||
- return ERR_PTR(PTR_ERR(alg));
|
- return ERR_PTR(PTR_ERR(alg));
|
||||||
|
|
||||||
/* Block size must be >= 4 bytes. */
|
/* Block size must be >= 4 bytes. */
|
||||||
err = -EINVAL;
|
err = -EINVAL;
|
||||||
|
|
||||||
Detailed description of the 'org' mode
|
Detailed description of the ``org`` mode
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
----------------------------------------
|
||||||
|
|
||||||
'org' generates a report in the Org mode format of Emacs.
|
``org`` generates a report in the Org mode format of Emacs.
|
||||||
|
|
||||||
Example:
|
Example
|
||||||
|
~~~~~~~
|
||||||
|
|
||||||
|
Running::
|
||||||
|
|
||||||
Running
|
|
||||||
make coccicheck MODE=org COCCI=scripts/coccinelle/api/err_cast.cocci
|
make coccicheck MODE=org COCCI=scripts/coccinelle/api/err_cast.cocci
|
||||||
|
|
||||||
will execute the following part of the SmPL script.
|
will execute the following part of the SmPL script::
|
||||||
|
|
||||||
<smpl>
|
<smpl>
|
||||||
@r depends on !context && !patch && (org || report)@
|
@r depends on !context && !patch && (org || report)@
|
||||||
expression x;
|
expression x;
|
||||||
position p;
|
position p;
|
||||||
@@
|
@@
|
||||||
|
|
||||||
ERR_PTR@p(PTR_ERR(x))
|
ERR_PTR@p(PTR_ERR(x))
|
||||||
|
|
||||||
@script:python depends on org@
|
@script:python depends on org@
|
||||||
p << r.p;
|
p << r.p;
|
||||||
x << r.x;
|
x << r.x;
|
||||||
@@
|
@@
|
||||||
|
|
||||||
msg="ERR_CAST can be used with %s" % (x)
|
msg="ERR_CAST can be used with %s" % (x)
|
||||||
msg_safe=msg.replace("[","@(").replace("]",")")
|
msg_safe=msg.replace("[","@(").replace("]",")")
|
||||||
coccilib.org.print_todo(p[0], msg_safe)
|
coccilib.org.print_todo(p[0], msg_safe)
|
||||||
</smpl>
|
</smpl>
|
||||||
|
|
||||||
This SmPL excerpt generates Org entries on the standard output, as
|
This SmPL excerpt generates Org entries on the standard output, as
|
||||||
illustrated below:
|
illustrated below::
|
||||||
|
|
||||||
* TODO [[view:/home/user/linux/crypto/ctr.c::face=ovl-face1::linb=188::colb=9::cole=16][ERR_CAST can be used with alg]]
|
* TODO [[view:/home/user/linux/crypto/ctr.c::face=ovl-face1::linb=188::colb=9::cole=16][ERR_CAST can be used with alg]]
|
||||||
* TODO [[view:/home/user/linux/crypto/authenc.c::face=ovl-face1::linb=619::colb=9::cole=16][ERR_CAST can be used with auth]]
|
* TODO [[view:/home/user/linux/crypto/authenc.c::face=ovl-face1::linb=619::colb=9::cole=16][ERR_CAST can be used with auth]]
|
||||||
* TODO [[view:/home/user/linux/crypto/xts.c::face=ovl-face1::linb=227::colb=9::cole=16][ERR_CAST can be used with alg]]
|
* TODO [[view:/home/user/linux/crypto/xts.c::face=ovl-face1::linb=227::colb=9::cole=16][ERR_CAST can be used with alg]]
|
256
Documentation/dev-tools/gcov.rst
Normal file
256
Documentation/dev-tools/gcov.rst
Normal file
@ -0,0 +1,256 @@
|
|||||||
|
Using gcov with the Linux kernel
|
||||||
|
================================
|
||||||
|
|
||||||
|
gcov profiling kernel support enables the use of GCC's coverage testing
|
||||||
|
tool gcov_ with the Linux kernel. Coverage data of a running kernel
|
||||||
|
is exported in gcov-compatible format via the "gcov" debugfs directory.
|
||||||
|
To get coverage data for a specific file, change to the kernel build
|
||||||
|
directory and use gcov with the ``-o`` option as follows (requires root)::
|
||||||
|
|
||||||
|
# cd /tmp/linux-out
|
||||||
|
# gcov -o /sys/kernel/debug/gcov/tmp/linux-out/kernel spinlock.c
|
||||||
|
|
||||||
|
This will create source code files annotated with execution counts
|
||||||
|
in the current directory. In addition, graphical gcov front-ends such
|
||||||
|
as lcov_ can be used to automate the process of collecting data
|
||||||
|
for the entire kernel and provide coverage overviews in HTML format.
|
||||||
|
|
||||||
|
Possible uses:
|
||||||
|
|
||||||
|
* debugging (has this line been reached at all?)
|
||||||
|
* test improvement (how do I change my test to cover these lines?)
|
||||||
|
* minimizing kernel configurations (do I need this option if the
|
||||||
|
associated code is never run?)
|
||||||
|
|
||||||
|
.. _gcov: http://gcc.gnu.org/onlinedocs/gcc/Gcov.html
|
||||||
|
.. _lcov: http://ltp.sourceforge.net/coverage/lcov.php
|
||||||
|
|
||||||
|
|
||||||
|
Preparation
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Configure the kernel with::
|
||||||
|
|
||||||
|
CONFIG_DEBUG_FS=y
|
||||||
|
CONFIG_GCOV_KERNEL=y
|
||||||
|
|
||||||
|
select the gcc's gcov format, default is autodetect based on gcc version::
|
||||||
|
|
||||||
|
CONFIG_GCOV_FORMAT_AUTODETECT=y
|
||||||
|
|
||||||
|
and to get coverage data for the entire kernel::
|
||||||
|
|
||||||
|
CONFIG_GCOV_PROFILE_ALL=y
|
||||||
|
|
||||||
|
Note that kernels compiled with profiling flags will be significantly
|
||||||
|
larger and run slower. Also CONFIG_GCOV_PROFILE_ALL may not be supported
|
||||||
|
on all architectures.
|
||||||
|
|
||||||
|
Profiling data will only become accessible once debugfs has been
|
||||||
|
mounted::
|
||||||
|
|
||||||
|
mount -t debugfs none /sys/kernel/debug
|
||||||
|
|
||||||
|
|
||||||
|
Customization
|
||||||
|
-------------
|
||||||
|
|
||||||
|
To enable profiling for specific files or directories, add a line
|
||||||
|
similar to the following to the respective kernel Makefile:
|
||||||
|
|
||||||
|
- For a single file (e.g. main.o)::
|
||||||
|
|
||||||
|
GCOV_PROFILE_main.o := y
|
||||||
|
|
||||||
|
- For all files in one directory::
|
||||||
|
|
||||||
|
GCOV_PROFILE := y
|
||||||
|
|
||||||
|
To exclude files from being profiled even when CONFIG_GCOV_PROFILE_ALL
|
||||||
|
is specified, use::
|
||||||
|
|
||||||
|
GCOV_PROFILE_main.o := n
|
||||||
|
|
||||||
|
and::
|
||||||
|
|
||||||
|
GCOV_PROFILE := n
|
||||||
|
|
||||||
|
Only files which are linked to the main kernel image or are compiled as
|
||||||
|
kernel modules are supported by this mechanism.
|
||||||
|
|
||||||
|
|
||||||
|
Files
|
||||||
|
-----
|
||||||
|
|
||||||
|
The gcov kernel support creates the following files in debugfs:
|
||||||
|
|
||||||
|
``/sys/kernel/debug/gcov``
|
||||||
|
Parent directory for all gcov-related files.
|
||||||
|
|
||||||
|
``/sys/kernel/debug/gcov/reset``
|
||||||
|
Global reset file: resets all coverage data to zero when
|
||||||
|
written to.
|
||||||
|
|
||||||
|
``/sys/kernel/debug/gcov/path/to/compile/dir/file.gcda``
|
||||||
|
The actual gcov data file as understood by the gcov
|
||||||
|
tool. Resets file coverage data to zero when written to.
|
||||||
|
|
||||||
|
``/sys/kernel/debug/gcov/path/to/compile/dir/file.gcno``
|
||||||
|
Symbolic link to a static data file required by the gcov
|
||||||
|
tool. This file is generated by gcc when compiling with
|
||||||
|
option ``-ftest-coverage``.
|
||||||
|
|
||||||
|
|
||||||
|
Modules
|
||||||
|
-------
|
||||||
|
|
||||||
|
Kernel modules may contain cleanup code which is only run during
|
||||||
|
module unload time. The gcov mechanism provides a means to collect
|
||||||
|
coverage data for such code by keeping a copy of the data associated
|
||||||
|
with the unloaded module. This data remains available through debugfs.
|
||||||
|
Once the module is loaded again, the associated coverage counters are
|
||||||
|
initialized with the data from its previous instantiation.
|
||||||
|
|
||||||
|
This behavior can be deactivated by specifying the gcov_persist kernel
|
||||||
|
parameter::
|
||||||
|
|
||||||
|
gcov_persist=0
|
||||||
|
|
||||||
|
At run-time, a user can also choose to discard data for an unloaded
|
||||||
|
module by writing to its data file or the global reset file.
|
||||||
|
|
||||||
|
|
||||||
|
Separated build and test machines
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
|
The gcov kernel profiling infrastructure is designed to work out-of-the
|
||||||
|
box for setups where kernels are built and run on the same machine. In
|
||||||
|
cases where the kernel runs on a separate machine, special preparations
|
||||||
|
must be made, depending on where the gcov tool is used:
|
||||||
|
|
||||||
|
a) gcov is run on the TEST machine
|
||||||
|
|
||||||
|
The gcov tool version on the test machine must be compatible with the
|
||||||
|
gcc version used for kernel build. Also the following files need to be
|
||||||
|
copied from build to test machine:
|
||||||
|
|
||||||
|
from the source tree:
|
||||||
|
- all C source files + headers
|
||||||
|
|
||||||
|
from the build tree:
|
||||||
|
- all C source files + headers
|
||||||
|
- all .gcda and .gcno files
|
||||||
|
- all links to directories
|
||||||
|
|
||||||
|
It is important to note that these files need to be placed into the
|
||||||
|
exact same file system location on the test machine as on the build
|
||||||
|
machine. If any of the path components is symbolic link, the actual
|
||||||
|
directory needs to be used instead (due to make's CURDIR handling).
|
||||||
|
|
||||||
|
b) gcov is run on the BUILD machine
|
||||||
|
|
||||||
|
The following files need to be copied after each test case from test
|
||||||
|
to build machine:
|
||||||
|
|
||||||
|
from the gcov directory in sysfs:
|
||||||
|
- all .gcda files
|
||||||
|
- all links to .gcno files
|
||||||
|
|
||||||
|
These files can be copied to any location on the build machine. gcov
|
||||||
|
must then be called with the -o option pointing to that directory.
|
||||||
|
|
||||||
|
Example directory setup on the build machine::
|
||||||
|
|
||||||
|
/tmp/linux: kernel source tree
|
||||||
|
/tmp/out: kernel build directory as specified by make O=
|
||||||
|
/tmp/coverage: location of the files copied from the test machine
|
||||||
|
|
||||||
|
[user@build] cd /tmp/out
|
||||||
|
[user@build] gcov -o /tmp/coverage/tmp/out/init main.c
|
||||||
|
|
||||||
|
|
||||||
|
Troubleshooting
|
||||||
|
---------------
|
||||||
|
|
||||||
|
Problem
|
||||||
|
Compilation aborts during linker step.
|
||||||
|
|
||||||
|
Cause
|
||||||
|
Profiling flags are specified for source files which are not
|
||||||
|
linked to the main kernel or which are linked by a custom
|
||||||
|
linker procedure.
|
||||||
|
|
||||||
|
Solution
|
||||||
|
Exclude affected source files from profiling by specifying
|
||||||
|
``GCOV_PROFILE := n`` or ``GCOV_PROFILE_basename.o := n`` in the
|
||||||
|
corresponding Makefile.
|
||||||
|
|
||||||
|
Problem
|
||||||
|
Files copied from sysfs appear empty or incomplete.
|
||||||
|
|
||||||
|
Cause
|
||||||
|
Due to the way seq_file works, some tools such as cp or tar
|
||||||
|
may not correctly copy files from sysfs.
|
||||||
|
|
||||||
|
Solution
|
||||||
|
Use ``cat``' to read ``.gcda`` files and ``cp -d`` to copy links.
|
||||||
|
Alternatively use the mechanism shown in Appendix B.
|
||||||
|
|
||||||
|
|
||||||
|
Appendix A: gather_on_build.sh
|
||||||
|
------------------------------
|
||||||
|
|
||||||
|
Sample script to gather coverage meta files on the build machine
|
||||||
|
(see 6a)::
|
||||||
|
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
KSRC=$1
|
||||||
|
KOBJ=$2
|
||||||
|
DEST=$3
|
||||||
|
|
||||||
|
if [ -z "$KSRC" ] || [ -z "$KOBJ" ] || [ -z "$DEST" ]; then
|
||||||
|
echo "Usage: $0 <ksrc directory> <kobj directory> <output.tar.gz>" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
KSRC=$(cd $KSRC; printf "all:\n\t@echo \${CURDIR}\n" | make -f -)
|
||||||
|
KOBJ=$(cd $KOBJ; printf "all:\n\t@echo \${CURDIR}\n" | make -f -)
|
||||||
|
|
||||||
|
find $KSRC $KOBJ \( -name '*.gcno' -o -name '*.[ch]' -o -type l \) -a \
|
||||||
|
-perm /u+r,g+r | tar cfz $DEST -P -T -
|
||||||
|
|
||||||
|
if [ $? -eq 0 ] ; then
|
||||||
|
echo "$DEST successfully created, copy to test system and unpack with:"
|
||||||
|
echo " tar xfz $DEST -P"
|
||||||
|
else
|
||||||
|
echo "Could not create file $DEST"
|
||||||
|
fi
|
||||||
|
|
||||||
|
|
||||||
|
Appendix B: gather_on_test.sh
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
|
Sample script to gather coverage data files on the test machine
|
||||||
|
(see 6b)::
|
||||||
|
|
||||||
|
#!/bin/bash -e
|
||||||
|
|
||||||
|
DEST=$1
|
||||||
|
GCDA=/sys/kernel/debug/gcov
|
||||||
|
|
||||||
|
if [ -z "$DEST" ] ; then
|
||||||
|
echo "Usage: $0 <output.tar.gz>" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
TEMPDIR=$(mktemp -d)
|
||||||
|
echo Collecting data..
|
||||||
|
find $GCDA -type d -exec mkdir -p $TEMPDIR/\{\} \;
|
||||||
|
find $GCDA -name '*.gcda' -exec sh -c 'cat < $0 > '$TEMPDIR'/$0' {} \;
|
||||||
|
find $GCDA -name '*.gcno' -exec sh -c 'cp -d $0 '$TEMPDIR'/$0' {} \;
|
||||||
|
tar czf $DEST -C $TEMPDIR sys
|
||||||
|
rm -rf $TEMPDIR
|
||||||
|
|
||||||
|
echo "$DEST successfully created, copy to build system and unpack with:"
|
||||||
|
echo " tar xfz $DEST"
|
@ -1,3 +1,5 @@
|
|||||||
|
.. highlight:: none
|
||||||
|
|
||||||
Debugging kernel and modules via gdb
|
Debugging kernel and modules via gdb
|
||||||
====================================
|
====================================
|
||||||
|
|
||||||
@ -13,54 +15,58 @@ be transferred to the other gdb stubs as well.
|
|||||||
Requirements
|
Requirements
|
||||||
------------
|
------------
|
||||||
|
|
||||||
o gdb 7.2+ (recommended: 7.4+) with python support enabled (typically true
|
- gdb 7.2+ (recommended: 7.4+) with python support enabled (typically true
|
||||||
for distributions)
|
for distributions)
|
||||||
|
|
||||||
|
|
||||||
Setup
|
Setup
|
||||||
-----
|
-----
|
||||||
|
|
||||||
o Create a virtual Linux machine for QEMU/KVM (see www.linux-kvm.org and
|
- Create a virtual Linux machine for QEMU/KVM (see www.linux-kvm.org and
|
||||||
www.qemu.org for more details). For cross-development,
|
www.qemu.org for more details). For cross-development,
|
||||||
http://landley.net/aboriginal/bin keeps a pool of machine images and
|
http://landley.net/aboriginal/bin keeps a pool of machine images and
|
||||||
toolchains that can be helpful to start from.
|
toolchains that can be helpful to start from.
|
||||||
|
|
||||||
o Build the kernel with CONFIG_GDB_SCRIPTS enabled, but leave
|
- Build the kernel with CONFIG_GDB_SCRIPTS enabled, but leave
|
||||||
CONFIG_DEBUG_INFO_REDUCED off. If your architecture supports
|
CONFIG_DEBUG_INFO_REDUCED off. If your architecture supports
|
||||||
CONFIG_FRAME_POINTER, keep it enabled.
|
CONFIG_FRAME_POINTER, keep it enabled.
|
||||||
|
|
||||||
o Install that kernel on the guest.
|
- Install that kernel on the guest.
|
||||||
|
Alternatively, QEMU allows to boot the kernel directly using -kernel,
|
||||||
|
-append, -initrd command line switches. This is generally only useful if
|
||||||
|
you do not depend on modules. See QEMU documentation for more details on
|
||||||
|
this mode.
|
||||||
|
|
||||||
Alternatively, QEMU allows to boot the kernel directly using -kernel,
|
- Enable the gdb stub of QEMU/KVM, either
|
||||||
-append, -initrd command line switches. This is generally only useful if
|
|
||||||
you do not depend on modules. See QEMU documentation for more details on
|
|
||||||
this mode.
|
|
||||||
|
|
||||||
o Enable the gdb stub of QEMU/KVM, either
|
|
||||||
- at VM startup time by appending "-s" to the QEMU command line
|
- at VM startup time by appending "-s" to the QEMU command line
|
||||||
or
|
|
||||||
|
or
|
||||||
|
|
||||||
- during runtime by issuing "gdbserver" from the QEMU monitor
|
- during runtime by issuing "gdbserver" from the QEMU monitor
|
||||||
console
|
console
|
||||||
|
|
||||||
o cd /path/to/linux-build
|
- cd /path/to/linux-build
|
||||||
|
|
||||||
o Start gdb: gdb vmlinux
|
- Start gdb: gdb vmlinux
|
||||||
|
|
||||||
Note: Some distros may restrict auto-loading of gdb scripts to known safe
|
Note: Some distros may restrict auto-loading of gdb scripts to known safe
|
||||||
directories. In case gdb reports to refuse loading vmlinux-gdb.py, add
|
directories. In case gdb reports to refuse loading vmlinux-gdb.py, add::
|
||||||
|
|
||||||
add-auto-load-safe-path /path/to/linux-build
|
add-auto-load-safe-path /path/to/linux-build
|
||||||
|
|
||||||
to ~/.gdbinit. See gdb help for more details.
|
to ~/.gdbinit. See gdb help for more details.
|
||||||
|
|
||||||
|
- Attach to the booted guest::
|
||||||
|
|
||||||
o Attach to the booted guest:
|
|
||||||
(gdb) target remote :1234
|
(gdb) target remote :1234
|
||||||
|
|
||||||
|
|
||||||
Examples of using the Linux-provided gdb helpers
|
Examples of using the Linux-provided gdb helpers
|
||||||
------------------------------------------------
|
------------------------------------------------
|
||||||
|
|
||||||
o Load module (and main kernel) symbols:
|
- Load module (and main kernel) symbols::
|
||||||
|
|
||||||
(gdb) lx-symbols
|
(gdb) lx-symbols
|
||||||
loading vmlinux
|
loading vmlinux
|
||||||
scanning for modules in /home/user/linux/build
|
scanning for modules in /home/user/linux/build
|
||||||
@ -72,17 +78,20 @@ Examples of using the Linux-provided gdb helpers
|
|||||||
...
|
...
|
||||||
loading @0xffffffffa0000000: /home/user/linux/build/drivers/ata/ata_generic.ko
|
loading @0xffffffffa0000000: /home/user/linux/build/drivers/ata/ata_generic.ko
|
||||||
|
|
||||||
o Set a breakpoint on some not yet loaded module function, e.g.:
|
- Set a breakpoint on some not yet loaded module function, e.g.::
|
||||||
|
|
||||||
(gdb) b btrfs_init_sysfs
|
(gdb) b btrfs_init_sysfs
|
||||||
Function "btrfs_init_sysfs" not defined.
|
Function "btrfs_init_sysfs" not defined.
|
||||||
Make breakpoint pending on future shared library load? (y or [n]) y
|
Make breakpoint pending on future shared library load? (y or [n]) y
|
||||||
Breakpoint 1 (btrfs_init_sysfs) pending.
|
Breakpoint 1 (btrfs_init_sysfs) pending.
|
||||||
|
|
||||||
o Continue the target
|
- Continue the target::
|
||||||
|
|
||||||
(gdb) c
|
(gdb) c
|
||||||
|
|
||||||
o Load the module on the target and watch the symbols being loaded as well as
|
- Load the module on the target and watch the symbols being loaded as well as
|
||||||
the breakpoint hit:
|
the breakpoint hit::
|
||||||
|
|
||||||
loading @0xffffffffa0034000: /home/user/linux/build/lib/libcrc32c.ko
|
loading @0xffffffffa0034000: /home/user/linux/build/lib/libcrc32c.ko
|
||||||
loading @0xffffffffa0050000: /home/user/linux/build/lib/lzo/lzo_compress.ko
|
loading @0xffffffffa0050000: /home/user/linux/build/lib/lzo/lzo_compress.ko
|
||||||
loading @0xffffffffa006e000: /home/user/linux/build/lib/zlib_deflate/zlib_deflate.ko
|
loading @0xffffffffa006e000: /home/user/linux/build/lib/zlib_deflate/zlib_deflate.ko
|
||||||
@ -91,7 +100,8 @@ Examples of using the Linux-provided gdb helpers
|
|||||||
Breakpoint 1, btrfs_init_sysfs () at /home/user/linux/fs/btrfs/sysfs.c:36
|
Breakpoint 1, btrfs_init_sysfs () at /home/user/linux/fs/btrfs/sysfs.c:36
|
||||||
36 btrfs_kset = kset_create_and_add("btrfs", NULL, fs_kobj);
|
36 btrfs_kset = kset_create_and_add("btrfs", NULL, fs_kobj);
|
||||||
|
|
||||||
o Dump the log buffer of the target kernel:
|
- Dump the log buffer of the target kernel::
|
||||||
|
|
||||||
(gdb) lx-dmesg
|
(gdb) lx-dmesg
|
||||||
[ 0.000000] Initializing cgroup subsys cpuset
|
[ 0.000000] Initializing cgroup subsys cpuset
|
||||||
[ 0.000000] Initializing cgroup subsys cpu
|
[ 0.000000] Initializing cgroup subsys cpu
|
||||||
@ -102,19 +112,22 @@ Examples of using the Linux-provided gdb helpers
|
|||||||
[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
|
[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
|
||||||
....
|
....
|
||||||
|
|
||||||
o Examine fields of the current task struct:
|
- Examine fields of the current task struct::
|
||||||
|
|
||||||
(gdb) p $lx_current().pid
|
(gdb) p $lx_current().pid
|
||||||
$1 = 4998
|
$1 = 4998
|
||||||
(gdb) p $lx_current().comm
|
(gdb) p $lx_current().comm
|
||||||
$2 = "modprobe\000\000\000\000\000\000\000"
|
$2 = "modprobe\000\000\000\000\000\000\000"
|
||||||
|
|
||||||
o Make use of the per-cpu function for the current or a specified CPU:
|
- Make use of the per-cpu function for the current or a specified CPU::
|
||||||
|
|
||||||
(gdb) p $lx_per_cpu("runqueues").nr_running
|
(gdb) p $lx_per_cpu("runqueues").nr_running
|
||||||
$3 = 1
|
$3 = 1
|
||||||
(gdb) p $lx_per_cpu("runqueues", 2).nr_running
|
(gdb) p $lx_per_cpu("runqueues", 2).nr_running
|
||||||
$4 = 0
|
$4 = 0
|
||||||
|
|
||||||
o Dig into hrtimers using the container_of helper:
|
- Dig into hrtimers using the container_of helper::
|
||||||
|
|
||||||
(gdb) set $next = $lx_per_cpu("hrtimer_bases").clock_base[0].active.next
|
(gdb) set $next = $lx_per_cpu("hrtimer_bases").clock_base[0].active.next
|
||||||
(gdb) p *$container_of($next, "struct hrtimer", "node")
|
(gdb) p *$container_of($next, "struct hrtimer", "node")
|
||||||
$5 = {
|
$5 = {
|
||||||
@ -144,7 +157,7 @@ List of commands and functions
|
|||||||
------------------------------
|
------------------------------
|
||||||
|
|
||||||
The number of commands and convenience functions may evolve over the time,
|
The number of commands and convenience functions may evolve over the time,
|
||||||
this is just a snapshot of the initial version:
|
this is just a snapshot of the initial version::
|
||||||
|
|
||||||
(gdb) apropos lx
|
(gdb) apropos lx
|
||||||
function lx_current -- Return current task
|
function lx_current -- Return current task
|
173
Documentation/dev-tools/kasan.rst
Normal file
173
Documentation/dev-tools/kasan.rst
Normal file
@ -0,0 +1,173 @@
|
|||||||
|
The Kernel Address Sanitizer (KASAN)
|
||||||
|
====================================
|
||||||
|
|
||||||
|
Overview
|
||||||
|
--------
|
||||||
|
|
||||||
|
KernelAddressSANitizer (KASAN) is a dynamic memory error detector. It provides
|
||||||
|
a fast and comprehensive solution for finding use-after-free and out-of-bounds
|
||||||
|
bugs.
|
||||||
|
|
||||||
|
KASAN uses compile-time instrumentation for checking every memory access,
|
||||||
|
therefore you will need a GCC version 4.9.2 or later. GCC 5.0 or later is
|
||||||
|
required for detection of out-of-bounds accesses to stack or global variables.
|
||||||
|
|
||||||
|
Currently KASAN is supported only for the x86_64 and arm64 architectures.
|
||||||
|
|
||||||
|
Usage
|
||||||
|
-----
|
||||||
|
|
||||||
|
To enable KASAN configure kernel with::
|
||||||
|
|
||||||
|
CONFIG_KASAN = y
|
||||||
|
|
||||||
|
and choose between CONFIG_KASAN_OUTLINE and CONFIG_KASAN_INLINE. Outline and
|
||||||
|
inline are compiler instrumentation types. The former produces smaller binary
|
||||||
|
the latter is 1.1 - 2 times faster. Inline instrumentation requires a GCC
|
||||||
|
version 5.0 or later.
|
||||||
|
|
||||||
|
KASAN works with both SLUB and SLAB memory allocators.
|
||||||
|
For better bug detection and nicer reporting, enable CONFIG_STACKTRACE.
|
||||||
|
|
||||||
|
To disable instrumentation for specific files or directories, add a line
|
||||||
|
similar to the following to the respective kernel Makefile:
|
||||||
|
|
||||||
|
- For a single file (e.g. main.o)::
|
||||||
|
|
||||||
|
KASAN_SANITIZE_main.o := n
|
||||||
|
|
||||||
|
- For all files in one directory::
|
||||||
|
|
||||||
|
KASAN_SANITIZE := n
|
||||||
|
|
||||||
|
Error reports
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
A typical out of bounds access report looks like this::
|
||||||
|
|
||||||
|
==================================================================
|
||||||
|
BUG: AddressSanitizer: out of bounds access in kmalloc_oob_right+0x65/0x75 [test_kasan] at addr ffff8800693bc5d3
|
||||||
|
Write of size 1 by task modprobe/1689
|
||||||
|
=============================================================================
|
||||||
|
BUG kmalloc-128 (Not tainted): kasan error
|
||||||
|
-----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
Disabling lock debugging due to kernel taint
|
||||||
|
INFO: Allocated in kmalloc_oob_right+0x3d/0x75 [test_kasan] age=0 cpu=0 pid=1689
|
||||||
|
__slab_alloc+0x4b4/0x4f0
|
||||||
|
kmem_cache_alloc_trace+0x10b/0x190
|
||||||
|
kmalloc_oob_right+0x3d/0x75 [test_kasan]
|
||||||
|
init_module+0x9/0x47 [test_kasan]
|
||||||
|
do_one_initcall+0x99/0x200
|
||||||
|
load_module+0x2cb3/0x3b20
|
||||||
|
SyS_finit_module+0x76/0x80
|
||||||
|
system_call_fastpath+0x12/0x17
|
||||||
|
INFO: Slab 0xffffea0001a4ef00 objects=17 used=7 fp=0xffff8800693bd728 flags=0x100000000004080
|
||||||
|
INFO: Object 0xffff8800693bc558 @offset=1368 fp=0xffff8800693bc720
|
||||||
|
|
||||||
|
Bytes b4 ffff8800693bc548: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
|
||||||
|
Object ffff8800693bc558: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
||||||
|
Object ffff8800693bc568: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
||||||
|
Object ffff8800693bc578: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
||||||
|
Object ffff8800693bc588: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
||||||
|
Object ffff8800693bc598: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
||||||
|
Object ffff8800693bc5a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
||||||
|
Object ffff8800693bc5b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
||||||
|
Object ffff8800693bc5c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
|
||||||
|
Redzone ffff8800693bc5d8: cc cc cc cc cc cc cc cc ........
|
||||||
|
Padding ffff8800693bc718: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
|
||||||
|
CPU: 0 PID: 1689 Comm: modprobe Tainted: G B 3.18.0-rc1-mm1+ #98
|
||||||
|
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
|
||||||
|
ffff8800693bc000 0000000000000000 ffff8800693bc558 ffff88006923bb78
|
||||||
|
ffffffff81cc68ae 00000000000000f3 ffff88006d407600 ffff88006923bba8
|
||||||
|
ffffffff811fd848 ffff88006d407600 ffffea0001a4ef00 ffff8800693bc558
|
||||||
|
Call Trace:
|
||||||
|
[<ffffffff81cc68ae>] dump_stack+0x46/0x58
|
||||||
|
[<ffffffff811fd848>] print_trailer+0xf8/0x160
|
||||||
|
[<ffffffffa00026a7>] ? kmem_cache_oob+0xc3/0xc3 [test_kasan]
|
||||||
|
[<ffffffff811ff0f5>] object_err+0x35/0x40
|
||||||
|
[<ffffffffa0002065>] ? kmalloc_oob_right+0x65/0x75 [test_kasan]
|
||||||
|
[<ffffffff8120b9fa>] kasan_report_error+0x38a/0x3f0
|
||||||
|
[<ffffffff8120a79f>] ? kasan_poison_shadow+0x2f/0x40
|
||||||
|
[<ffffffff8120b344>] ? kasan_unpoison_shadow+0x14/0x40
|
||||||
|
[<ffffffff8120a79f>] ? kasan_poison_shadow+0x2f/0x40
|
||||||
|
[<ffffffffa00026a7>] ? kmem_cache_oob+0xc3/0xc3 [test_kasan]
|
||||||
|
[<ffffffff8120a995>] __asan_store1+0x75/0xb0
|
||||||
|
[<ffffffffa0002601>] ? kmem_cache_oob+0x1d/0xc3 [test_kasan]
|
||||||
|
[<ffffffffa0002065>] ? kmalloc_oob_right+0x65/0x75 [test_kasan]
|
||||||
|
[<ffffffffa0002065>] kmalloc_oob_right+0x65/0x75 [test_kasan]
|
||||||
|
[<ffffffffa00026b0>] init_module+0x9/0x47 [test_kasan]
|
||||||
|
[<ffffffff810002d9>] do_one_initcall+0x99/0x200
|
||||||
|
[<ffffffff811e4e5c>] ? __vunmap+0xec/0x160
|
||||||
|
[<ffffffff81114f63>] load_module+0x2cb3/0x3b20
|
||||||
|
[<ffffffff8110fd70>] ? m_show+0x240/0x240
|
||||||
|
[<ffffffff81115f06>] SyS_finit_module+0x76/0x80
|
||||||
|
[<ffffffff81cd3129>] system_call_fastpath+0x12/0x17
|
||||||
|
Memory state around the buggy address:
|
||||||
|
ffff8800693bc300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
|
||||||
|
ffff8800693bc380: fc fc 00 00 00 00 00 00 00 00 00 00 00 00 00 fc
|
||||||
|
ffff8800693bc400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
|
||||||
|
ffff8800693bc480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
|
||||||
|
ffff8800693bc500: fc fc fc fc fc fc fc fc fc fc fc 00 00 00 00 00
|
||||||
|
>ffff8800693bc580: 00 00 00 00 00 00 00 00 00 00 03 fc fc fc fc fc
|
||||||
|
^
|
||||||
|
ffff8800693bc600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
|
||||||
|
ffff8800693bc680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
|
||||||
|
ffff8800693bc700: fc fc fc fc fb fb fb fb fb fb fb fb fb fb fb fb
|
||||||
|
ffff8800693bc780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
|
||||||
|
ffff8800693bc800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
|
||||||
|
==================================================================
|
||||||
|
|
||||||
|
The header of the report discribe what kind of bug happened and what kind of
|
||||||
|
access caused it. It's followed by the description of the accessed slub object
|
||||||
|
(see 'SLUB Debug output' section in Documentation/vm/slub.txt for details) and
|
||||||
|
the description of the accessed memory page.
|
||||||
|
|
||||||
|
In the last section the report shows memory state around the accessed address.
|
||||||
|
Reading this part requires some understanding of how KASAN works.
|
||||||
|
|
||||||
|
The state of each 8 aligned bytes of memory is encoded in one shadow byte.
|
||||||
|
Those 8 bytes can be accessible, partially accessible, freed or be a redzone.
|
||||||
|
We use the following encoding for each shadow byte: 0 means that all 8 bytes
|
||||||
|
of the corresponding memory region are accessible; number N (1 <= N <= 7) means
|
||||||
|
that the first N bytes are accessible, and other (8 - N) bytes are not;
|
||||||
|
any negative value indicates that the entire 8-byte word is inaccessible.
|
||||||
|
We use different negative values to distinguish between different kinds of
|
||||||
|
inaccessible memory like redzones or freed memory (see mm/kasan/kasan.h).
|
||||||
|
|
||||||
|
In the report above the arrows point to the shadow byte 03, which means that
|
||||||
|
the accessed address is partially accessible.
|
||||||
|
|
||||||
|
|
||||||
|
Implementation details
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
From a high level, our approach to memory error detection is similar to that
|
||||||
|
of kmemcheck: use shadow memory to record whether each byte of memory is safe
|
||||||
|
to access, and use compile-time instrumentation to check shadow memory on each
|
||||||
|
memory access.
|
||||||
|
|
||||||
|
AddressSanitizer dedicates 1/8 of kernel memory to its shadow memory
|
||||||
|
(e.g. 16TB to cover 128TB on x86_64) and uses direct mapping with a scale and
|
||||||
|
offset to translate a memory address to its corresponding shadow address.
|
||||||
|
|
||||||
|
Here is the function which translates an address to its corresponding shadow
|
||||||
|
address::
|
||||||
|
|
||||||
|
static inline void *kasan_mem_to_shadow(const void *addr)
|
||||||
|
{
|
||||||
|
return ((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT)
|
||||||
|
+ KASAN_SHADOW_OFFSET;
|
||||||
|
}
|
||||||
|
|
||||||
|
where ``KASAN_SHADOW_SCALE_SHIFT = 3``.
|
||||||
|
|
||||||
|
Compile-time instrumentation used for checking memory accesses. Compiler inserts
|
||||||
|
function calls (__asan_load*(addr), __asan_store*(addr)) before each memory
|
||||||
|
access of size 1, 2, 4, 8 or 16. These functions check whether memory access is
|
||||||
|
valid or not by checking corresponding shadow memory.
|
||||||
|
|
||||||
|
GCC 5.0 has possibility to perform inline instrumentation. Instead of making
|
||||||
|
function calls GCC directly inserts the code to check the shadow memory.
|
||||||
|
This option significantly enlarges kernel but it gives x1.1-x2 performance
|
||||||
|
boost over outline instrumented kernel.
|
@ -12,38 +12,38 @@ To achieve this goal it does not collect coverage in soft/hard interrupts
|
|||||||
and instrumentation of some inherently non-deterministic parts of kernel is
|
and instrumentation of some inherently non-deterministic parts of kernel is
|
||||||
disbled (e.g. scheduler, locking).
|
disbled (e.g. scheduler, locking).
|
||||||
|
|
||||||
Usage:
|
Usage
|
||||||
======
|
-----
|
||||||
|
|
||||||
Configure kernel with:
|
Configure the kernel with::
|
||||||
|
|
||||||
CONFIG_KCOV=y
|
CONFIG_KCOV=y
|
||||||
|
|
||||||
CONFIG_KCOV requires gcc built on revision 231296 or later.
|
CONFIG_KCOV requires gcc built on revision 231296 or later.
|
||||||
Profiling data will only become accessible once debugfs has been mounted:
|
Profiling data will only become accessible once debugfs has been mounted::
|
||||||
|
|
||||||
mount -t debugfs none /sys/kernel/debug
|
mount -t debugfs none /sys/kernel/debug
|
||||||
|
|
||||||
The following program demonstrates kcov usage from within a test program:
|
The following program demonstrates kcov usage from within a test program::
|
||||||
|
|
||||||
#include <stdio.h>
|
#include <stdio.h>
|
||||||
#include <stddef.h>
|
#include <stddef.h>
|
||||||
#include <stdint.h>
|
#include <stdint.h>
|
||||||
#include <stdlib.h>
|
#include <stdlib.h>
|
||||||
#include <sys/types.h>
|
#include <sys/types.h>
|
||||||
#include <sys/stat.h>
|
#include <sys/stat.h>
|
||||||
#include <sys/ioctl.h>
|
#include <sys/ioctl.h>
|
||||||
#include <sys/mman.h>
|
#include <sys/mman.h>
|
||||||
#include <unistd.h>
|
#include <unistd.h>
|
||||||
#include <fcntl.h>
|
#include <fcntl.h>
|
||||||
|
|
||||||
#define KCOV_INIT_TRACE _IOR('c', 1, unsigned long)
|
#define KCOV_INIT_TRACE _IOR('c', 1, unsigned long)
|
||||||
#define KCOV_ENABLE _IO('c', 100)
|
#define KCOV_ENABLE _IO('c', 100)
|
||||||
#define KCOV_DISABLE _IO('c', 101)
|
#define KCOV_DISABLE _IO('c', 101)
|
||||||
#define COVER_SIZE (64<<10)
|
#define COVER_SIZE (64<<10)
|
||||||
|
|
||||||
int main(int argc, char **argv)
|
int main(int argc, char **argv)
|
||||||
{
|
{
|
||||||
int fd;
|
int fd;
|
||||||
unsigned long *cover, n, i;
|
unsigned long *cover, n, i;
|
||||||
|
|
||||||
@ -83,24 +83,24 @@ int main(int argc, char **argv)
|
|||||||
if (close(fd))
|
if (close(fd))
|
||||||
perror("close"), exit(1);
|
perror("close"), exit(1);
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
After piping through addr2line output of the program looks as follows:
|
After piping through addr2line output of the program looks as follows::
|
||||||
|
|
||||||
SyS_read
|
SyS_read
|
||||||
fs/read_write.c:562
|
fs/read_write.c:562
|
||||||
__fdget_pos
|
__fdget_pos
|
||||||
fs/file.c:774
|
fs/file.c:774
|
||||||
__fget_light
|
__fget_light
|
||||||
fs/file.c:746
|
fs/file.c:746
|
||||||
__fget_light
|
__fget_light
|
||||||
fs/file.c:750
|
fs/file.c:750
|
||||||
__fget_light
|
__fget_light
|
||||||
fs/file.c:760
|
fs/file.c:760
|
||||||
__fdget_pos
|
__fdget_pos
|
||||||
fs/file.c:784
|
fs/file.c:784
|
||||||
SyS_read
|
SyS_read
|
||||||
fs/read_write.c:562
|
fs/read_write.c:562
|
||||||
|
|
||||||
If a program needs to collect coverage from several threads (independently),
|
If a program needs to collect coverage from several threads (independently),
|
||||||
it needs to open /sys/kernel/debug/kcov in each thread separately.
|
it needs to open /sys/kernel/debug/kcov in each thread separately.
|
733
Documentation/dev-tools/kmemcheck.rst
Normal file
733
Documentation/dev-tools/kmemcheck.rst
Normal file
@ -0,0 +1,733 @@
|
|||||||
|
Getting started with kmemcheck
|
||||||
|
==============================
|
||||||
|
|
||||||
|
Vegard Nossum <vegardno@ifi.uio.no>
|
||||||
|
|
||||||
|
|
||||||
|
Introduction
|
||||||
|
------------
|
||||||
|
|
||||||
|
kmemcheck is a debugging feature for the Linux Kernel. More specifically, it
|
||||||
|
is a dynamic checker that detects and warns about some uses of uninitialized
|
||||||
|
memory.
|
||||||
|
|
||||||
|
Userspace programmers might be familiar with Valgrind's memcheck. The main
|
||||||
|
difference between memcheck and kmemcheck is that memcheck works for userspace
|
||||||
|
programs only, and kmemcheck works for the kernel only. The implementations
|
||||||
|
are of course vastly different. Because of this, kmemcheck is not as accurate
|
||||||
|
as memcheck, but it turns out to be good enough in practice to discover real
|
||||||
|
programmer errors that the compiler is not able to find through static
|
||||||
|
analysis.
|
||||||
|
|
||||||
|
Enabling kmemcheck on a kernel will probably slow it down to the extent that
|
||||||
|
the machine will not be usable for normal workloads such as e.g. an
|
||||||
|
interactive desktop. kmemcheck will also cause the kernel to use about twice
|
||||||
|
as much memory as normal. For this reason, kmemcheck is strictly a debugging
|
||||||
|
feature.
|
||||||
|
|
||||||
|
|
||||||
|
Downloading
|
||||||
|
-----------
|
||||||
|
|
||||||
|
As of version 2.6.31-rc1, kmemcheck is included in the mainline kernel.
|
||||||
|
|
||||||
|
|
||||||
|
Configuring and compiling
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
kmemcheck only works for the x86 (both 32- and 64-bit) platform. A number of
|
||||||
|
configuration variables must have specific settings in order for the kmemcheck
|
||||||
|
menu to even appear in "menuconfig". These are:
|
||||||
|
|
||||||
|
- ``CONFIG_CC_OPTIMIZE_FOR_SIZE=n``
|
||||||
|
This option is located under "General setup" / "Optimize for size".
|
||||||
|
|
||||||
|
Without this, gcc will use certain optimizations that usually lead to
|
||||||
|
false positive warnings from kmemcheck. An example of this is a 16-bit
|
||||||
|
field in a struct, where gcc may load 32 bits, then discard the upper
|
||||||
|
16 bits. kmemcheck sees only the 32-bit load, and may trigger a
|
||||||
|
warning for the upper 16 bits (if they're uninitialized).
|
||||||
|
|
||||||
|
- ``CONFIG_SLAB=y`` or ``CONFIG_SLUB=y``
|
||||||
|
This option is located under "General setup" / "Choose SLAB
|
||||||
|
allocator".
|
||||||
|
|
||||||
|
- ``CONFIG_FUNCTION_TRACER=n``
|
||||||
|
This option is located under "Kernel hacking" / "Tracers" / "Kernel
|
||||||
|
Function Tracer"
|
||||||
|
|
||||||
|
When function tracing is compiled in, gcc emits a call to another
|
||||||
|
function at the beginning of every function. This means that when the
|
||||||
|
page fault handler is called, the ftrace framework will be called
|
||||||
|
before kmemcheck has had a chance to handle the fault. If ftrace then
|
||||||
|
modifies memory that was tracked by kmemcheck, the result is an
|
||||||
|
endless recursive page fault.
|
||||||
|
|
||||||
|
- ``CONFIG_DEBUG_PAGEALLOC=n``
|
||||||
|
This option is located under "Kernel hacking" / "Memory Debugging"
|
||||||
|
/ "Debug page memory allocations".
|
||||||
|
|
||||||
|
In addition, I highly recommend turning on ``CONFIG_DEBUG_INFO=y``. This is also
|
||||||
|
located under "Kernel hacking". With this, you will be able to get line number
|
||||||
|
information from the kmemcheck warnings, which is extremely valuable in
|
||||||
|
debugging a problem. This option is not mandatory, however, because it slows
|
||||||
|
down the compilation process and produces a much bigger kernel image.
|
||||||
|
|
||||||
|
Now the kmemcheck menu should be visible (under "Kernel hacking" / "Memory
|
||||||
|
Debugging" / "kmemcheck: trap use of uninitialized memory"). Here follows
|
||||||
|
a description of the kmemcheck configuration variables:
|
||||||
|
|
||||||
|
- ``CONFIG_KMEMCHECK``
|
||||||
|
This must be enabled in order to use kmemcheck at all...
|
||||||
|
|
||||||
|
- ``CONFIG_KMEMCHECK_``[``DISABLED`` | ``ENABLED`` | ``ONESHOT``]``_BY_DEFAULT``
|
||||||
|
This option controls the status of kmemcheck at boot-time. "Enabled"
|
||||||
|
will enable kmemcheck right from the start, "disabled" will boot the
|
||||||
|
kernel as normal (but with the kmemcheck code compiled in, so it can
|
||||||
|
be enabled at run-time after the kernel has booted), and "one-shot" is
|
||||||
|
a special mode which will turn kmemcheck off automatically after
|
||||||
|
detecting the first use of uninitialized memory.
|
||||||
|
|
||||||
|
If you are using kmemcheck to actively debug a problem, then you
|
||||||
|
probably want to choose "enabled" here.
|
||||||
|
|
||||||
|
The one-shot mode is mostly useful in automated test setups because it
|
||||||
|
can prevent floods of warnings and increase the chances of the machine
|
||||||
|
surviving in case something is really wrong. In other cases, the one-
|
||||||
|
shot mode could actually be counter-productive because it would turn
|
||||||
|
itself off at the very first error -- in the case of a false positive
|
||||||
|
too -- and this would come in the way of debugging the specific
|
||||||
|
problem you were interested in.
|
||||||
|
|
||||||
|
If you would like to use your kernel as normal, but with a chance to
|
||||||
|
enable kmemcheck in case of some problem, it might be a good idea to
|
||||||
|
choose "disabled" here. When kmemcheck is disabled, most of the run-
|
||||||
|
time overhead is not incurred, and the kernel will be almost as fast
|
||||||
|
as normal.
|
||||||
|
|
||||||
|
- ``CONFIG_KMEMCHECK_QUEUE_SIZE``
|
||||||
|
Select the maximum number of error reports to store in an internal
|
||||||
|
(fixed-size) buffer. Since errors can occur virtually anywhere and in
|
||||||
|
any context, we need a temporary storage area which is guaranteed not
|
||||||
|
to generate any other page faults when accessed. The queue will be
|
||||||
|
emptied as soon as a tasklet may be scheduled. If the queue is full,
|
||||||
|
new error reports will be lost.
|
||||||
|
|
||||||
|
The default value of 64 is probably fine. If some code produces more
|
||||||
|
than 64 errors within an irqs-off section, then the code is likely to
|
||||||
|
produce many, many more, too, and these additional reports seldom give
|
||||||
|
any more information (the first report is usually the most valuable
|
||||||
|
anyway).
|
||||||
|
|
||||||
|
This number might have to be adjusted if you are not using serial
|
||||||
|
console or similar to capture the kernel log. If you are using the
|
||||||
|
"dmesg" command to save the log, then getting a lot of kmemcheck
|
||||||
|
warnings might overflow the kernel log itself, and the earlier reports
|
||||||
|
will get lost in that way instead. Try setting this to 10 or so on
|
||||||
|
such a setup.
|
||||||
|
|
||||||
|
- ``CONFIG_KMEMCHECK_SHADOW_COPY_SHIFT``
|
||||||
|
Select the number of shadow bytes to save along with each entry of the
|
||||||
|
error-report queue. These bytes indicate what parts of an allocation
|
||||||
|
are initialized, uninitialized, etc. and will be displayed when an
|
||||||
|
error is detected to help the debugging of a particular problem.
|
||||||
|
|
||||||
|
The number entered here is actually the logarithm of the number of
|
||||||
|
bytes that will be saved. So if you pick for example 5 here, kmemcheck
|
||||||
|
will save 2^5 = 32 bytes.
|
||||||
|
|
||||||
|
The default value should be fine for debugging most problems. It also
|
||||||
|
fits nicely within 80 columns.
|
||||||
|
|
||||||
|
- ``CONFIG_KMEMCHECK_PARTIAL_OK``
|
||||||
|
This option (when enabled) works around certain GCC optimizations that
|
||||||
|
produce 32-bit reads from 16-bit variables where the upper 16 bits are
|
||||||
|
thrown away afterwards.
|
||||||
|
|
||||||
|
The default value (enabled) is recommended. This may of course hide
|
||||||
|
some real errors, but disabling it would probably produce a lot of
|
||||||
|
false positives.
|
||||||
|
|
||||||
|
- ``CONFIG_KMEMCHECK_BITOPS_OK``
|
||||||
|
This option silences warnings that would be generated for bit-field
|
||||||
|
accesses where not all the bits are initialized at the same time. This
|
||||||
|
may also hide some real bugs.
|
||||||
|
|
||||||
|
This option is probably obsolete, or it should be replaced with
|
||||||
|
the kmemcheck-/bitfield-annotations for the code in question. The
|
||||||
|
default value is therefore fine.
|
||||||
|
|
||||||
|
Now compile the kernel as usual.
|
||||||
|
|
||||||
|
|
||||||
|
How to use
|
||||||
|
----------
|
||||||
|
|
||||||
|
Booting
|
||||||
|
~~~~~~~
|
||||||
|
|
||||||
|
First some information about the command-line options. There is only one
|
||||||
|
option specific to kmemcheck, and this is called "kmemcheck". It can be used
|
||||||
|
to override the default mode as chosen by the ``CONFIG_KMEMCHECK_*_BY_DEFAULT``
|
||||||
|
option. Its possible settings are:
|
||||||
|
|
||||||
|
- ``kmemcheck=0`` (disabled)
|
||||||
|
- ``kmemcheck=1`` (enabled)
|
||||||
|
- ``kmemcheck=2`` (one-shot mode)
|
||||||
|
|
||||||
|
If SLUB debugging has been enabled in the kernel, it may take precedence over
|
||||||
|
kmemcheck in such a way that the slab caches which are under SLUB debugging
|
||||||
|
will not be tracked by kmemcheck. In order to ensure that this doesn't happen
|
||||||
|
(even though it shouldn't by default), use SLUB's boot option ``slub_debug``,
|
||||||
|
like this: ``slub_debug=-``
|
||||||
|
|
||||||
|
In fact, this option may also be used for fine-grained control over SLUB vs.
|
||||||
|
kmemcheck. For example, if the command line includes
|
||||||
|
``kmemcheck=1 slub_debug=,dentry``, then SLUB debugging will be used only
|
||||||
|
for the "dentry" slab cache, and with kmemcheck tracking all the other
|
||||||
|
caches. This is advanced usage, however, and is not generally recommended.
|
||||||
|
|
||||||
|
|
||||||
|
Run-time enable/disable
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
When the kernel has booted, it is possible to enable or disable kmemcheck at
|
||||||
|
run-time. WARNING: This feature is still experimental and may cause false
|
||||||
|
positive warnings to appear. Therefore, try not to use this. If you find that
|
||||||
|
it doesn't work properly (e.g. you see an unreasonable amount of warnings), I
|
||||||
|
will be happy to take bug reports.
|
||||||
|
|
||||||
|
Use the file ``/proc/sys/kernel/kmemcheck`` for this purpose, e.g.::
|
||||||
|
|
||||||
|
$ echo 0 > /proc/sys/kernel/kmemcheck # disables kmemcheck
|
||||||
|
|
||||||
|
The numbers are the same as for the ``kmemcheck=`` command-line option.
|
||||||
|
|
||||||
|
|
||||||
|
Debugging
|
||||||
|
~~~~~~~~~
|
||||||
|
|
||||||
|
A typical report will look something like this::
|
||||||
|
|
||||||
|
WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024)
|
||||||
|
80000000000000000000000000000000000000000088ffff0000000000000000
|
||||||
|
i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
|
||||||
|
^
|
||||||
|
|
||||||
|
Pid: 1856, comm: ntpdate Not tainted 2.6.29-rc5 #264 945P-A
|
||||||
|
RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190
|
||||||
|
RSP: 0018:ffff88003cdf7d98 EFLAGS: 00210002
|
||||||
|
RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009
|
||||||
|
RDX: ffff88003e5d6018 RSI: ffff88003e5d6024 RDI: ffff88003cdf7e84
|
||||||
|
RBP: ffff88003cdf7db8 R08: ffff88003e5d6000 R09: 0000000000000000
|
||||||
|
R10: 0000000000000080 R11: 0000000000000000 R12: 000000000000000e
|
||||||
|
R13: ffff88003cdf7e78 R14: ffff88003d530710 R15: ffff88003d5a98c8
|
||||||
|
FS: 0000000000000000(0000) GS:ffff880001982000(0063) knlGS:00000
|
||||||
|
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
|
||||||
|
CR2: ffff88003f806ea0 CR3: 000000003c036000 CR4: 00000000000006a0
|
||||||
|
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
|
||||||
|
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
|
||||||
|
[<ffffffff8104f04e>] dequeue_signal+0x8e/0x170
|
||||||
|
[<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390
|
||||||
|
[<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0
|
||||||
|
[<ffffffff8100c7b5>] int_signal+0x12/0x17
|
||||||
|
[<ffffffffffffffff>] 0xffffffffffffffff
|
||||||
|
|
||||||
|
The single most valuable information in this report is the RIP (or EIP on 32-
|
||||||
|
bit) value. This will help us pinpoint exactly which instruction that caused
|
||||||
|
the warning.
|
||||||
|
|
||||||
|
If your kernel was compiled with ``CONFIG_DEBUG_INFO=y``, then all we have to do
|
||||||
|
is give this address to the addr2line program, like this::
|
||||||
|
|
||||||
|
$ addr2line -e vmlinux -i ffffffff8104ede8
|
||||||
|
arch/x86/include/asm/string_64.h:12
|
||||||
|
include/asm-generic/siginfo.h:287
|
||||||
|
kernel/signal.c:380
|
||||||
|
kernel/signal.c:410
|
||||||
|
|
||||||
|
The "``-e vmlinux``" tells addr2line which file to look in. **IMPORTANT:**
|
||||||
|
This must be the vmlinux of the kernel that produced the warning in the
|
||||||
|
first place! If not, the line number information will almost certainly be
|
||||||
|
wrong.
|
||||||
|
|
||||||
|
The "``-i``" tells addr2line to also print the line numbers of inlined
|
||||||
|
functions. In this case, the flag was very important, because otherwise,
|
||||||
|
it would only have printed the first line, which is just a call to
|
||||||
|
``memcpy()``, which could be called from a thousand places in the kernel, and
|
||||||
|
is therefore not very useful. These inlined functions would not show up in
|
||||||
|
the stack trace above, simply because the kernel doesn't load the extra
|
||||||
|
debugging information. This technique can of course be used with ordinary
|
||||||
|
kernel oopses as well.
|
||||||
|
|
||||||
|
In this case, it's the caller of ``memcpy()`` that is interesting, and it can be
|
||||||
|
found in ``include/asm-generic/siginfo.h``, line 287::
|
||||||
|
|
||||||
|
281 static inline void copy_siginfo(struct siginfo *to, struct siginfo *from)
|
||||||
|
282 {
|
||||||
|
283 if (from->si_code < 0)
|
||||||
|
284 memcpy(to, from, sizeof(*to));
|
||||||
|
285 else
|
||||||
|
286 /* _sigchld is currently the largest know union member */
|
||||||
|
287 memcpy(to, from, __ARCH_SI_PREAMBLE_SIZE + sizeof(from->_sifields._sigchld));
|
||||||
|
288 }
|
||||||
|
|
||||||
|
Since this was a read (kmemcheck usually warns about reads only, though it can
|
||||||
|
warn about writes to unallocated or freed memory as well), it was probably the
|
||||||
|
"from" argument which contained some uninitialized bytes. Following the chain
|
||||||
|
of calls, we move upwards to see where "from" was allocated or initialized,
|
||||||
|
``kernel/signal.c``, line 380::
|
||||||
|
|
||||||
|
359 static void collect_signal(int sig, struct sigpending *list, siginfo_t *info)
|
||||||
|
360 {
|
||||||
|
...
|
||||||
|
367 list_for_each_entry(q, &list->list, list) {
|
||||||
|
368 if (q->info.si_signo == sig) {
|
||||||
|
369 if (first)
|
||||||
|
370 goto still_pending;
|
||||||
|
371 first = q;
|
||||||
|
...
|
||||||
|
377 if (first) {
|
||||||
|
378 still_pending:
|
||||||
|
379 list_del_init(&first->list);
|
||||||
|
380 copy_siginfo(info, &first->info);
|
||||||
|
381 __sigqueue_free(first);
|
||||||
|
...
|
||||||
|
392 }
|
||||||
|
393 }
|
||||||
|
|
||||||
|
Here, it is ``&first->info`` that is being passed on to ``copy_siginfo()``. The
|
||||||
|
variable ``first`` was found on a list -- passed in as the second argument to
|
||||||
|
``collect_signal()``. We continue our journey through the stack, to figure out
|
||||||
|
where the item on "list" was allocated or initialized. We move to line 410::
|
||||||
|
|
||||||
|
395 static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
|
||||||
|
396 siginfo_t *info)
|
||||||
|
397 {
|
||||||
|
...
|
||||||
|
410 collect_signal(sig, pending, info);
|
||||||
|
...
|
||||||
|
414 }
|
||||||
|
|
||||||
|
Now we need to follow the ``pending`` pointer, since that is being passed on to
|
||||||
|
``collect_signal()`` as ``list``. At this point, we've run out of lines from the
|
||||||
|
"addr2line" output. Not to worry, we just paste the next addresses from the
|
||||||
|
kmemcheck stack dump, i.e.::
|
||||||
|
|
||||||
|
[<ffffffff8104f04e>] dequeue_signal+0x8e/0x170
|
||||||
|
[<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390
|
||||||
|
[<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0
|
||||||
|
[<ffffffff8100c7b5>] int_signal+0x12/0x17
|
||||||
|
|
||||||
|
$ addr2line -e vmlinux -i ffffffff8104f04e ffffffff81050bd8 \
|
||||||
|
ffffffff8100b87d ffffffff8100c7b5
|
||||||
|
kernel/signal.c:446
|
||||||
|
kernel/signal.c:1806
|
||||||
|
arch/x86/kernel/signal.c:805
|
||||||
|
arch/x86/kernel/signal.c:871
|
||||||
|
arch/x86/kernel/entry_64.S:694
|
||||||
|
|
||||||
|
Remember that since these addresses were found on the stack and not as the
|
||||||
|
RIP value, they actually point to the _next_ instruction (they are return
|
||||||
|
addresses). This becomes obvious when we look at the code for line 446::
|
||||||
|
|
||||||
|
422 int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
|
||||||
|
423 {
|
||||||
|
...
|
||||||
|
431 signr = __dequeue_signal(&tsk->signal->shared_pending,
|
||||||
|
432 mask, info);
|
||||||
|
433 /*
|
||||||
|
434 * itimer signal ?
|
||||||
|
435 *
|
||||||
|
436 * itimers are process shared and we restart periodic
|
||||||
|
437 * itimers in the signal delivery path to prevent DoS
|
||||||
|
438 * attacks in the high resolution timer case. This is
|
||||||
|
439 * compliant with the old way of self restarting
|
||||||
|
440 * itimers, as the SIGALRM is a legacy signal and only
|
||||||
|
441 * queued once. Changing the restart behaviour to
|
||||||
|
442 * restart the timer in the signal dequeue path is
|
||||||
|
443 * reducing the timer noise on heavy loaded !highres
|
||||||
|
444 * systems too.
|
||||||
|
445 */
|
||||||
|
446 if (unlikely(signr == SIGALRM)) {
|
||||||
|
...
|
||||||
|
489 }
|
||||||
|
|
||||||
|
So instead of looking at 446, we should be looking at 431, which is the line
|
||||||
|
that executes just before 446. Here we see that what we are looking for is
|
||||||
|
``&tsk->signal->shared_pending``.
|
||||||
|
|
||||||
|
Our next task is now to figure out which function that puts items on this
|
||||||
|
``shared_pending`` list. A crude, but efficient tool, is ``git grep``::
|
||||||
|
|
||||||
|
$ git grep -n 'shared_pending' kernel/
|
||||||
|
...
|
||||||
|
kernel/signal.c:828: pending = group ? &t->signal->shared_pending : &t->pending;
|
||||||
|
kernel/signal.c:1339: pending = group ? &t->signal->shared_pending : &t->pending;
|
||||||
|
...
|
||||||
|
|
||||||
|
There were more results, but none of them were related to list operations,
|
||||||
|
and these were the only assignments. We inspect the line numbers more closely
|
||||||
|
and find that this is indeed where items are being added to the list::
|
||||||
|
|
||||||
|
816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
|
||||||
|
817 int group)
|
||||||
|
818 {
|
||||||
|
...
|
||||||
|
828 pending = group ? &t->signal->shared_pending : &t->pending;
|
||||||
|
...
|
||||||
|
851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN &&
|
||||||
|
852 (is_si_special(info) ||
|
||||||
|
853 info->si_code >= 0)));
|
||||||
|
854 if (q) {
|
||||||
|
855 list_add_tail(&q->list, &pending->list);
|
||||||
|
...
|
||||||
|
890 }
|
||||||
|
|
||||||
|
and::
|
||||||
|
|
||||||
|
1309 int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
|
||||||
|
1310 {
|
||||||
|
....
|
||||||
|
1339 pending = group ? &t->signal->shared_pending : &t->pending;
|
||||||
|
1340 list_add_tail(&q->list, &pending->list);
|
||||||
|
....
|
||||||
|
1347 }
|
||||||
|
|
||||||
|
In the first case, the list element we are looking for, ``q``, is being
|
||||||
|
returned from the function ``__sigqueue_alloc()``, which looks like an
|
||||||
|
allocation function. Let's take a look at it::
|
||||||
|
|
||||||
|
187 static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
|
||||||
|
188 int override_rlimit)
|
||||||
|
189 {
|
||||||
|
190 struct sigqueue *q = NULL;
|
||||||
|
191 struct user_struct *user;
|
||||||
|
192
|
||||||
|
193 /*
|
||||||
|
194 * We won't get problems with the target's UID changing under us
|
||||||
|
195 * because changing it requires RCU be used, and if t != current, the
|
||||||
|
196 * caller must be holding the RCU readlock (by way of a spinlock) and
|
||||||
|
197 * we use RCU protection here
|
||||||
|
198 */
|
||||||
|
199 user = get_uid(__task_cred(t)->user);
|
||||||
|
200 atomic_inc(&user->sigpending);
|
||||||
|
201 if (override_rlimit ||
|
||||||
|
202 atomic_read(&user->sigpending) <=
|
||||||
|
203 t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur)
|
||||||
|
204 q = kmem_cache_alloc(sigqueue_cachep, flags);
|
||||||
|
205 if (unlikely(q == NULL)) {
|
||||||
|
206 atomic_dec(&user->sigpending);
|
||||||
|
207 free_uid(user);
|
||||||
|
208 } else {
|
||||||
|
209 INIT_LIST_HEAD(&q->list);
|
||||||
|
210 q->flags = 0;
|
||||||
|
211 q->user = user;
|
||||||
|
212 }
|
||||||
|
213
|
||||||
|
214 return q;
|
||||||
|
215 }
|
||||||
|
|
||||||
|
We see that this function initializes ``q->list``, ``q->flags``, and
|
||||||
|
``q->user``. It seems that now is the time to look at the definition of
|
||||||
|
``struct sigqueue``, e.g.::
|
||||||
|
|
||||||
|
14 struct sigqueue {
|
||||||
|
15 struct list_head list;
|
||||||
|
16 int flags;
|
||||||
|
17 siginfo_t info;
|
||||||
|
18 struct user_struct *user;
|
||||||
|
19 };
|
||||||
|
|
||||||
|
And, you might remember, it was a ``memcpy()`` on ``&first->info`` that
|
||||||
|
caused the warning, so this makes perfect sense. It also seems reasonable
|
||||||
|
to assume that it is the caller of ``__sigqueue_alloc()`` that has the
|
||||||
|
responsibility of filling out (initializing) this member.
|
||||||
|
|
||||||
|
But just which fields of the struct were uninitialized? Let's look at
|
||||||
|
kmemcheck's report again::
|
||||||
|
|
||||||
|
WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024)
|
||||||
|
80000000000000000000000000000000000000000088ffff0000000000000000
|
||||||
|
i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
|
||||||
|
^
|
||||||
|
|
||||||
|
These first two lines are the memory dump of the memory object itself, and
|
||||||
|
the shadow bytemap, respectively. The memory object itself is in this case
|
||||||
|
``&first->info``. Just beware that the start of this dump is NOT the start
|
||||||
|
of the object itself! The position of the caret (^) corresponds with the
|
||||||
|
address of the read (ffff88003e4a2024).
|
||||||
|
|
||||||
|
The shadow bytemap dump legend is as follows:
|
||||||
|
|
||||||
|
- i: initialized
|
||||||
|
- u: uninitialized
|
||||||
|
- a: unallocated (memory has been allocated by the slab layer, but has not
|
||||||
|
yet been handed off to anybody)
|
||||||
|
- f: freed (memory has been allocated by the slab layer, but has been freed
|
||||||
|
by the previous owner)
|
||||||
|
|
||||||
|
In order to figure out where (relative to the start of the object) the
|
||||||
|
uninitialized memory was located, we have to look at the disassembly. For
|
||||||
|
that, we'll need the RIP address again::
|
||||||
|
|
||||||
|
RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190
|
||||||
|
|
||||||
|
$ objdump -d --no-show-raw-insn vmlinux | grep -C 8 ffffffff8104ede8:
|
||||||
|
ffffffff8104edc8: mov %r8,0x8(%r8)
|
||||||
|
ffffffff8104edcc: test %r10d,%r10d
|
||||||
|
ffffffff8104edcf: js ffffffff8104ee88 <__dequeue_signal+0x168>
|
||||||
|
ffffffff8104edd5: mov %rax,%rdx
|
||||||
|
ffffffff8104edd8: mov $0xc,%ecx
|
||||||
|
ffffffff8104eddd: mov %r13,%rdi
|
||||||
|
ffffffff8104ede0: mov $0x30,%eax
|
||||||
|
ffffffff8104ede5: mov %rdx,%rsi
|
||||||
|
ffffffff8104ede8: rep movsl %ds:(%rsi),%es:(%rdi)
|
||||||
|
ffffffff8104edea: test $0x2,%al
|
||||||
|
ffffffff8104edec: je ffffffff8104edf0 <__dequeue_signal+0xd0>
|
||||||
|
ffffffff8104edee: movsw %ds:(%rsi),%es:(%rdi)
|
||||||
|
ffffffff8104edf0: test $0x1,%al
|
||||||
|
ffffffff8104edf2: je ffffffff8104edf5 <__dequeue_signal+0xd5>
|
||||||
|
ffffffff8104edf4: movsb %ds:(%rsi),%es:(%rdi)
|
||||||
|
ffffffff8104edf5: mov %r8,%rdi
|
||||||
|
ffffffff8104edf8: callq ffffffff8104de60 <__sigqueue_free>
|
||||||
|
|
||||||
|
As expected, it's the "``rep movsl``" instruction from the ``memcpy()``
|
||||||
|
that causes the warning. We know about ``REP MOVSL`` that it uses the register
|
||||||
|
``RCX`` to count the number of remaining iterations. By taking a look at the
|
||||||
|
register dump again (from the kmemcheck report), we can figure out how many
|
||||||
|
bytes were left to copy::
|
||||||
|
|
||||||
|
RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009
|
||||||
|
|
||||||
|
By looking at the disassembly, we also see that ``%ecx`` is being loaded
|
||||||
|
with the value ``$0xc`` just before (ffffffff8104edd8), so we are very
|
||||||
|
lucky. Keep in mind that this is the number of iterations, not bytes. And
|
||||||
|
since this is a "long" operation, we need to multiply by 4 to get the
|
||||||
|
number of bytes. So this means that the uninitialized value was encountered
|
||||||
|
at 4 * (0xc - 0x9) = 12 bytes from the start of the object.
|
||||||
|
|
||||||
|
We can now try to figure out which field of the "``struct siginfo``" that
|
||||||
|
was not initialized. This is the beginning of the struct::
|
||||||
|
|
||||||
|
40 typedef struct siginfo {
|
||||||
|
41 int si_signo;
|
||||||
|
42 int si_errno;
|
||||||
|
43 int si_code;
|
||||||
|
44
|
||||||
|
45 union {
|
||||||
|
..
|
||||||
|
92 } _sifields;
|
||||||
|
93 } siginfo_t;
|
||||||
|
|
||||||
|
On 64-bit, the int is 4 bytes long, so it must the union member that has
|
||||||
|
not been initialized. We can verify this using gdb::
|
||||||
|
|
||||||
|
$ gdb vmlinux
|
||||||
|
...
|
||||||
|
(gdb) p &((struct siginfo *) 0)->_sifields
|
||||||
|
$1 = (union {...} *) 0x10
|
||||||
|
|
||||||
|
Actually, it seems that the union member is located at offset 0x10 -- which
|
||||||
|
means that gcc has inserted 4 bytes of padding between the members ``si_code``
|
||||||
|
and ``_sifields``. We can now get a fuller picture of the memory dump::
|
||||||
|
|
||||||
|
_----------------------------=> si_code
|
||||||
|
/ _--------------------=> (padding)
|
||||||
|
| / _------------=> _sifields(._kill._pid)
|
||||||
|
| | / _----=> _sifields(._kill._uid)
|
||||||
|
| | | /
|
||||||
|
-------|-------|-------|-------|
|
||||||
|
80000000000000000000000000000000000000000088ffff0000000000000000
|
||||||
|
i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
|
||||||
|
|
||||||
|
This allows us to realize another important fact: ``si_code`` contains the
|
||||||
|
value 0x80. Remember that x86 is little endian, so the first 4 bytes
|
||||||
|
"80000000" are really the number 0x00000080. With a bit of research, we
|
||||||
|
find that this is actually the constant ``SI_KERNEL`` defined in
|
||||||
|
``include/asm-generic/siginfo.h``::
|
||||||
|
|
||||||
|
144 #define SI_KERNEL 0x80 /* sent by the kernel from somewhere */
|
||||||
|
|
||||||
|
This macro is used in exactly one place in the x86 kernel: In ``send_signal()``
|
||||||
|
in ``kernel/signal.c``::
|
||||||
|
|
||||||
|
816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
|
||||||
|
817 int group)
|
||||||
|
818 {
|
||||||
|
...
|
||||||
|
828 pending = group ? &t->signal->shared_pending : &t->pending;
|
||||||
|
...
|
||||||
|
851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN &&
|
||||||
|
852 (is_si_special(info) ||
|
||||||
|
853 info->si_code >= 0)));
|
||||||
|
854 if (q) {
|
||||||
|
855 list_add_tail(&q->list, &pending->list);
|
||||||
|
856 switch ((unsigned long) info) {
|
||||||
|
...
|
||||||
|
865 case (unsigned long) SEND_SIG_PRIV:
|
||||||
|
866 q->info.si_signo = sig;
|
||||||
|
867 q->info.si_errno = 0;
|
||||||
|
868 q->info.si_code = SI_KERNEL;
|
||||||
|
869 q->info.si_pid = 0;
|
||||||
|
870 q->info.si_uid = 0;
|
||||||
|
871 break;
|
||||||
|
...
|
||||||
|
890 }
|
||||||
|
|
||||||
|
Not only does this match with the ``.si_code`` member, it also matches the place
|
||||||
|
we found earlier when looking for where siginfo_t objects are enqueued on the
|
||||||
|
``shared_pending`` list.
|
||||||
|
|
||||||
|
So to sum up: It seems that it is the padding introduced by the compiler
|
||||||
|
between two struct fields that is uninitialized, and this gets reported when
|
||||||
|
we do a ``memcpy()`` on the struct. This means that we have identified a false
|
||||||
|
positive warning.
|
||||||
|
|
||||||
|
Normally, kmemcheck will not report uninitialized accesses in ``memcpy()`` calls
|
||||||
|
when both the source and destination addresses are tracked. (Instead, we copy
|
||||||
|
the shadow bytemap as well). In this case, the destination address clearly
|
||||||
|
was not tracked. We can dig a little deeper into the stack trace from above::
|
||||||
|
|
||||||
|
arch/x86/kernel/signal.c:805
|
||||||
|
arch/x86/kernel/signal.c:871
|
||||||
|
arch/x86/kernel/entry_64.S:694
|
||||||
|
|
||||||
|
And we clearly see that the destination siginfo object is located on the
|
||||||
|
stack::
|
||||||
|
|
||||||
|
782 static void do_signal(struct pt_regs *regs)
|
||||||
|
783 {
|
||||||
|
784 struct k_sigaction ka;
|
||||||
|
785 siginfo_t info;
|
||||||
|
...
|
||||||
|
804 signr = get_signal_to_deliver(&info, &ka, regs, NULL);
|
||||||
|
...
|
||||||
|
854 }
|
||||||
|
|
||||||
|
And this ``&info`` is what eventually gets passed to ``copy_siginfo()`` as the
|
||||||
|
destination argument.
|
||||||
|
|
||||||
|
Now, even though we didn't find an actual error here, the example is still a
|
||||||
|
good one, because it shows how one would go about to find out what the report
|
||||||
|
was all about.
|
||||||
|
|
||||||
|
|
||||||
|
Annotating false positives
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
There are a few different ways to make annotations in the source code that
|
||||||
|
will keep kmemcheck from checking and reporting certain allocations. Here
|
||||||
|
they are:
|
||||||
|
|
||||||
|
- ``__GFP_NOTRACK_FALSE_POSITIVE``
|
||||||
|
This flag can be passed to ``kmalloc()`` or ``kmem_cache_alloc()``
|
||||||
|
(therefore also to other functions that end up calling one of
|
||||||
|
these) to indicate that the allocation should not be tracked
|
||||||
|
because it would lead to a false positive report. This is a "big
|
||||||
|
hammer" way of silencing kmemcheck; after all, even if the false
|
||||||
|
positive pertains to particular field in a struct, for example, we
|
||||||
|
will now lose the ability to find (real) errors in other parts of
|
||||||
|
the same struct.
|
||||||
|
|
||||||
|
Example::
|
||||||
|
|
||||||
|
/* No warnings will ever trigger on accessing any part of x */
|
||||||
|
x = kmalloc(sizeof *x, GFP_KERNEL | __GFP_NOTRACK_FALSE_POSITIVE);
|
||||||
|
|
||||||
|
- ``kmemcheck_bitfield_begin(name)``/``kmemcheck_bitfield_end(name)`` and
|
||||||
|
``kmemcheck_annotate_bitfield(ptr, name)``
|
||||||
|
The first two of these three macros can be used inside struct
|
||||||
|
definitions to signal, respectively, the beginning and end of a
|
||||||
|
bitfield. Additionally, this will assign the bitfield a name, which
|
||||||
|
is given as an argument to the macros.
|
||||||
|
|
||||||
|
Having used these markers, one can later use
|
||||||
|
kmemcheck_annotate_bitfield() at the point of allocation, to indicate
|
||||||
|
which parts of the allocation is part of a bitfield.
|
||||||
|
|
||||||
|
Example::
|
||||||
|
|
||||||
|
struct foo {
|
||||||
|
int x;
|
||||||
|
|
||||||
|
kmemcheck_bitfield_begin(flags);
|
||||||
|
int flag_a:1;
|
||||||
|
int flag_b:1;
|
||||||
|
kmemcheck_bitfield_end(flags);
|
||||||
|
|
||||||
|
int y;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct foo *x = kmalloc(sizeof *x);
|
||||||
|
|
||||||
|
/* No warnings will trigger on accessing the bitfield of x */
|
||||||
|
kmemcheck_annotate_bitfield(x, flags);
|
||||||
|
|
||||||
|
Note that ``kmemcheck_annotate_bitfield()`` can be used even before the
|
||||||
|
return value of ``kmalloc()`` is checked -- in other words, passing NULL
|
||||||
|
as the first argument is legal (and will do nothing).
|
||||||
|
|
||||||
|
|
||||||
|
Reporting errors
|
||||||
|
----------------
|
||||||
|
|
||||||
|
As we have seen, kmemcheck will produce false positive reports. Therefore, it
|
||||||
|
is not very wise to blindly post kmemcheck warnings to mailing lists and
|
||||||
|
maintainers. Instead, I encourage maintainers and developers to find errors
|
||||||
|
in their own code. If you get a warning, you can try to work around it, try
|
||||||
|
to figure out if it's a real error or not, or simply ignore it. Most
|
||||||
|
developers know their own code and will quickly and efficiently determine the
|
||||||
|
root cause of a kmemcheck report. This is therefore also the most efficient
|
||||||
|
way to work with kmemcheck.
|
||||||
|
|
||||||
|
That said, we (the kmemcheck maintainers) will always be on the lookout for
|
||||||
|
false positives that we can annotate and silence. So whatever you find,
|
||||||
|
please drop us a note privately! Kernel configs and steps to reproduce (if
|
||||||
|
available) are of course a great help too.
|
||||||
|
|
||||||
|
Happy hacking!
|
||||||
|
|
||||||
|
|
||||||
|
Technical description
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
kmemcheck works by marking memory pages non-present. This means that whenever
|
||||||
|
somebody attempts to access the page, a page fault is generated. The page
|
||||||
|
fault handler notices that the page was in fact only hidden, and so it calls
|
||||||
|
on the kmemcheck code to make further investigations.
|
||||||
|
|
||||||
|
When the investigations are completed, kmemcheck "shows" the page by marking
|
||||||
|
it present (as it would be under normal circumstances). This way, the
|
||||||
|
interrupted code can continue as usual.
|
||||||
|
|
||||||
|
But after the instruction has been executed, we should hide the page again, so
|
||||||
|
that we can catch the next access too! Now kmemcheck makes use of a debugging
|
||||||
|
feature of the processor, namely single-stepping. When the processor has
|
||||||
|
finished the one instruction that generated the memory access, a debug
|
||||||
|
exception is raised. From here, we simply hide the page again and continue
|
||||||
|
execution, this time with the single-stepping feature turned off.
|
||||||
|
|
||||||
|
kmemcheck requires some assistance from the memory allocator in order to work.
|
||||||
|
The memory allocator needs to
|
||||||
|
|
||||||
|
1. Tell kmemcheck about newly allocated pages and pages that are about to
|
||||||
|
be freed. This allows kmemcheck to set up and tear down the shadow memory
|
||||||
|
for the pages in question. The shadow memory stores the status of each
|
||||||
|
byte in the allocation proper, e.g. whether it is initialized or
|
||||||
|
uninitialized.
|
||||||
|
|
||||||
|
2. Tell kmemcheck which parts of memory should be marked uninitialized.
|
||||||
|
There are actually a few more states, such as "not yet allocated" and
|
||||||
|
"recently freed".
|
||||||
|
|
||||||
|
If a slab cache is set up using the SLAB_NOTRACK flag, it will never return
|
||||||
|
memory that can take page faults because of kmemcheck.
|
||||||
|
|
||||||
|
If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still
|
||||||
|
request memory with the __GFP_NOTRACK or __GFP_NOTRACK_FALSE_POSITIVE flags.
|
||||||
|
This does not prevent the page faults from occurring, however, but marks the
|
||||||
|
object in question as being initialized so that no warnings will ever be
|
||||||
|
produced for this object.
|
||||||
|
|
||||||
|
Currently, the SLAB and SLUB allocators are supported by kmemcheck.
|
@ -1,15 +1,12 @@
|
|||||||
Kernel Memory Leak Detector
|
Kernel Memory Leak Detector
|
||||||
===========================
|
===========================
|
||||||
|
|
||||||
Introduction
|
|
||||||
------------
|
|
||||||
|
|
||||||
Kmemleak provides a way of detecting possible kernel memory leaks in a
|
Kmemleak provides a way of detecting possible kernel memory leaks in a
|
||||||
way similar to a tracing garbage collector
|
way similar to a tracing garbage collector
|
||||||
(https://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29#Tracing_garbage_collectors),
|
(https://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29#Tracing_garbage_collectors),
|
||||||
with the difference that the orphan objects are not freed but only
|
with the difference that the orphan objects are not freed but only
|
||||||
reported via /sys/kernel/debug/kmemleak. A similar method is used by the
|
reported via /sys/kernel/debug/kmemleak. A similar method is used by the
|
||||||
Valgrind tool (memcheck --leak-check) to detect the memory leaks in
|
Valgrind tool (``memcheck --leak-check``) to detect the memory leaks in
|
||||||
user-space applications.
|
user-space applications.
|
||||||
Kmemleak is supported on x86, arm, powerpc, sparc, sh, microblaze, ppc, mips, s390, metag and tile.
|
Kmemleak is supported on x86, arm, powerpc, sparc, sh, microblaze, ppc, mips, s390, metag and tile.
|
||||||
|
|
||||||
@ -19,20 +16,20 @@ Usage
|
|||||||
CONFIG_DEBUG_KMEMLEAK in "Kernel hacking" has to be enabled. A kernel
|
CONFIG_DEBUG_KMEMLEAK in "Kernel hacking" has to be enabled. A kernel
|
||||||
thread scans the memory every 10 minutes (by default) and prints the
|
thread scans the memory every 10 minutes (by default) and prints the
|
||||||
number of new unreferenced objects found. To display the details of all
|
number of new unreferenced objects found. To display the details of all
|
||||||
the possible memory leaks:
|
the possible memory leaks::
|
||||||
|
|
||||||
# mount -t debugfs nodev /sys/kernel/debug/
|
# mount -t debugfs nodev /sys/kernel/debug/
|
||||||
# cat /sys/kernel/debug/kmemleak
|
# cat /sys/kernel/debug/kmemleak
|
||||||
|
|
||||||
To trigger an intermediate memory scan:
|
To trigger an intermediate memory scan::
|
||||||
|
|
||||||
# echo scan > /sys/kernel/debug/kmemleak
|
# echo scan > /sys/kernel/debug/kmemleak
|
||||||
|
|
||||||
To clear the list of all current possible memory leaks:
|
To clear the list of all current possible memory leaks::
|
||||||
|
|
||||||
# echo clear > /sys/kernel/debug/kmemleak
|
# echo clear > /sys/kernel/debug/kmemleak
|
||||||
|
|
||||||
New leaks will then come up upon reading /sys/kernel/debug/kmemleak
|
New leaks will then come up upon reading ``/sys/kernel/debug/kmemleak``
|
||||||
again.
|
again.
|
||||||
|
|
||||||
Note that the orphan objects are listed in the order they were allocated
|
Note that the orphan objects are listed in the order they were allocated
|
||||||
@ -40,22 +37,31 @@ and one object at the beginning of the list may cause other subsequent
|
|||||||
objects to be reported as orphan.
|
objects to be reported as orphan.
|
||||||
|
|
||||||
Memory scanning parameters can be modified at run-time by writing to the
|
Memory scanning parameters can be modified at run-time by writing to the
|
||||||
/sys/kernel/debug/kmemleak file. The following parameters are supported:
|
``/sys/kernel/debug/kmemleak`` file. The following parameters are supported:
|
||||||
|
|
||||||
off - disable kmemleak (irreversible)
|
- off
|
||||||
stack=on - enable the task stacks scanning (default)
|
disable kmemleak (irreversible)
|
||||||
stack=off - disable the tasks stacks scanning
|
- stack=on
|
||||||
scan=on - start the automatic memory scanning thread (default)
|
enable the task stacks scanning (default)
|
||||||
scan=off - stop the automatic memory scanning thread
|
- stack=off
|
||||||
scan=<secs> - set the automatic memory scanning period in seconds
|
disable the tasks stacks scanning
|
||||||
(default 600, 0 to stop the automatic scanning)
|
- scan=on
|
||||||
scan - trigger a memory scan
|
start the automatic memory scanning thread (default)
|
||||||
clear - clear list of current memory leak suspects, done by
|
- scan=off
|
||||||
marking all current reported unreferenced objects grey,
|
stop the automatic memory scanning thread
|
||||||
or free all kmemleak objects if kmemleak has been disabled.
|
- scan=<secs>
|
||||||
dump=<addr> - dump information about the object found at <addr>
|
set the automatic memory scanning period in seconds
|
||||||
|
(default 600, 0 to stop the automatic scanning)
|
||||||
|
- scan
|
||||||
|
trigger a memory scan
|
||||||
|
- clear
|
||||||
|
clear list of current memory leak suspects, done by
|
||||||
|
marking all current reported unreferenced objects grey,
|
||||||
|
or free all kmemleak objects if kmemleak has been disabled.
|
||||||
|
- dump=<addr>
|
||||||
|
dump information about the object found at <addr>
|
||||||
|
|
||||||
Kmemleak can also be disabled at boot-time by passing "kmemleak=off" on
|
Kmemleak can also be disabled at boot-time by passing ``kmemleak=off`` on
|
||||||
the kernel command line.
|
the kernel command line.
|
||||||
|
|
||||||
Memory may be allocated or freed before kmemleak is initialised and
|
Memory may be allocated or freed before kmemleak is initialised and
|
||||||
@ -63,13 +69,14 @@ these actions are stored in an early log buffer. The size of this buffer
|
|||||||
is configured via the CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE option.
|
is configured via the CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE option.
|
||||||
|
|
||||||
If CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF are enabled, the kmemleak is
|
If CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF are enabled, the kmemleak is
|
||||||
disabled by default. Passing "kmemleak=on" on the kernel command
|
disabled by default. Passing ``kmemleak=on`` on the kernel command
|
||||||
line enables the function.
|
line enables the function.
|
||||||
|
|
||||||
Basic Algorithm
|
Basic Algorithm
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
The memory allocations via kmalloc, vmalloc, kmem_cache_alloc and
|
The memory allocations via :c:func:`kmalloc`, :c:func:`vmalloc`,
|
||||||
|
:c:func:`kmem_cache_alloc` and
|
||||||
friends are traced and the pointers, together with additional
|
friends are traced and the pointers, together with additional
|
||||||
information like size and stack trace, are stored in a rbtree.
|
information like size and stack trace, are stored in a rbtree.
|
||||||
The corresponding freeing function calls are tracked and the pointers
|
The corresponding freeing function calls are tracked and the pointers
|
||||||
@ -113,13 +120,13 @@ when doing development. To work around these situations you can use the
|
|||||||
you can find new unreferenced objects; this should help with testing
|
you can find new unreferenced objects; this should help with testing
|
||||||
specific sections of code.
|
specific sections of code.
|
||||||
|
|
||||||
To test a critical section on demand with a clean kmemleak do:
|
To test a critical section on demand with a clean kmemleak do::
|
||||||
|
|
||||||
# echo clear > /sys/kernel/debug/kmemleak
|
# echo clear > /sys/kernel/debug/kmemleak
|
||||||
... test your kernel or modules ...
|
... test your kernel or modules ...
|
||||||
# echo scan > /sys/kernel/debug/kmemleak
|
# echo scan > /sys/kernel/debug/kmemleak
|
||||||
|
|
||||||
Then as usual to get your report with:
|
Then as usual to get your report with::
|
||||||
|
|
||||||
# cat /sys/kernel/debug/kmemleak
|
# cat /sys/kernel/debug/kmemleak
|
||||||
|
|
||||||
@ -131,7 +138,7 @@ disabled by the user or due to an fatal error, internal kmemleak objects
|
|||||||
won't be freed when kmemleak is disabled, and those objects may occupy
|
won't be freed when kmemleak is disabled, and those objects may occupy
|
||||||
a large part of physical memory.
|
a large part of physical memory.
|
||||||
|
|
||||||
In this situation, you may reclaim memory with:
|
In this situation, you may reclaim memory with::
|
||||||
|
|
||||||
# echo clear > /sys/kernel/debug/kmemleak
|
# echo clear > /sys/kernel/debug/kmemleak
|
||||||
|
|
||||||
@ -140,20 +147,20 @@ Kmemleak API
|
|||||||
|
|
||||||
See the include/linux/kmemleak.h header for the functions prototype.
|
See the include/linux/kmemleak.h header for the functions prototype.
|
||||||
|
|
||||||
kmemleak_init - initialize kmemleak
|
- ``kmemleak_init`` - initialize kmemleak
|
||||||
kmemleak_alloc - notify of a memory block allocation
|
- ``kmemleak_alloc`` - notify of a memory block allocation
|
||||||
kmemleak_alloc_percpu - notify of a percpu memory block allocation
|
- ``kmemleak_alloc_percpu`` - notify of a percpu memory block allocation
|
||||||
kmemleak_free - notify of a memory block freeing
|
- ``kmemleak_free`` - notify of a memory block freeing
|
||||||
kmemleak_free_part - notify of a partial memory block freeing
|
- ``kmemleak_free_part`` - notify of a partial memory block freeing
|
||||||
kmemleak_free_percpu - notify of a percpu memory block freeing
|
- ``kmemleak_free_percpu`` - notify of a percpu memory block freeing
|
||||||
kmemleak_update_trace - update object allocation stack trace
|
- ``kmemleak_update_trace`` - update object allocation stack trace
|
||||||
kmemleak_not_leak - mark an object as not a leak
|
- ``kmemleak_not_leak`` - mark an object as not a leak
|
||||||
kmemleak_ignore - do not scan or report an object as leak
|
- ``kmemleak_ignore`` - do not scan or report an object as leak
|
||||||
kmemleak_scan_area - add scan areas inside a memory block
|
- ``kmemleak_scan_area`` - add scan areas inside a memory block
|
||||||
kmemleak_no_scan - do not scan a memory block
|
- ``kmemleak_no_scan`` - do not scan a memory block
|
||||||
kmemleak_erase - erase an old value in a pointer variable
|
- ``kmemleak_erase`` - erase an old value in a pointer variable
|
||||||
kmemleak_alloc_recursive - as kmemleak_alloc but checks the recursiveness
|
- ``kmemleak_alloc_recursive`` - as kmemleak_alloc but checks the recursiveness
|
||||||
kmemleak_free_recursive - as kmemleak_free but checks the recursiveness
|
- ``kmemleak_free_recursive`` - as kmemleak_free but checks the recursiveness
|
||||||
|
|
||||||
Dealing with false positives/negatives
|
Dealing with false positives/negatives
|
||||||
--------------------------------------
|
--------------------------------------
|
@ -1,11 +1,20 @@
|
|||||||
Copyright 2004 Linus Torvalds
|
.. Copyright 2004 Linus Torvalds
|
||||||
Copyright 2004 Pavel Machek <pavel@ucw.cz>
|
.. Copyright 2004 Pavel Machek <pavel@ucw.cz>
|
||||||
Copyright 2006 Bob Copeland <me@bobcopeland.com>
|
.. Copyright 2006 Bob Copeland <me@bobcopeland.com>
|
||||||
|
|
||||||
|
Sparse
|
||||||
|
======
|
||||||
|
|
||||||
|
Sparse is a semantic checker for C programs; it can be used to find a
|
||||||
|
number of potential problems with kernel code. See
|
||||||
|
https://lwn.net/Articles/689907/ for an overview of sparse; this document
|
||||||
|
contains some kernel-specific sparse information.
|
||||||
|
|
||||||
|
|
||||||
Using sparse for typechecking
|
Using sparse for typechecking
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
-----------------------------
|
||||||
|
|
||||||
"__bitwise" is a type attribute, so you have to do something like this:
|
"__bitwise" is a type attribute, so you have to do something like this::
|
||||||
|
|
||||||
typedef int __bitwise pm_request_t;
|
typedef int __bitwise pm_request_t;
|
||||||
|
|
||||||
@ -20,13 +29,13 @@ but in this case we really _do_ want to force the conversion). And because
|
|||||||
the enum values are all the same type, now "enum pm_request" will be that
|
the enum values are all the same type, now "enum pm_request" will be that
|
||||||
type too.
|
type too.
|
||||||
|
|
||||||
And with gcc, all the __bitwise/__force stuff goes away, and it all ends
|
And with gcc, all the "__bitwise"/"__force stuff" goes away, and it all
|
||||||
up looking just like integers to gcc.
|
ends up looking just like integers to gcc.
|
||||||
|
|
||||||
Quite frankly, you don't need the enum there. The above all really just
|
Quite frankly, you don't need the enum there. The above all really just
|
||||||
boils down to one special "int __bitwise" type.
|
boils down to one special "int __bitwise" type.
|
||||||
|
|
||||||
So the simpler way is to just do
|
So the simpler way is to just do::
|
||||||
|
|
||||||
typedef int __bitwise pm_request_t;
|
typedef int __bitwise pm_request_t;
|
||||||
|
|
||||||
@ -50,7 +59,7 @@ __bitwise - noisy stuff; in particular, __le*/__be* are that. We really
|
|||||||
don't want to drown in noise unless we'd explicitly asked for it.
|
don't want to drown in noise unless we'd explicitly asked for it.
|
||||||
|
|
||||||
Using sparse for lock checking
|
Using sparse for lock checking
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
------------------------------
|
||||||
|
|
||||||
The following macros are undefined for gcc and defined during a sparse
|
The following macros are undefined for gcc and defined during a sparse
|
||||||
run to use the "context" tracking feature of sparse, applied to
|
run to use the "context" tracking feature of sparse, applied to
|
||||||
@ -69,22 +78,22 @@ annotation is needed. The tree annotations above are for cases where
|
|||||||
sparse would otherwise report a context imbalance.
|
sparse would otherwise report a context imbalance.
|
||||||
|
|
||||||
Getting sparse
|
Getting sparse
|
||||||
~~~~~~~~~~~~~~
|
--------------
|
||||||
|
|
||||||
You can get latest released versions from the Sparse homepage at
|
You can get latest released versions from the Sparse homepage at
|
||||||
https://sparse.wiki.kernel.org/index.php/Main_Page
|
https://sparse.wiki.kernel.org/index.php/Main_Page
|
||||||
|
|
||||||
Alternatively, you can get snapshots of the latest development version
|
Alternatively, you can get snapshots of the latest development version
|
||||||
of sparse using git to clone..
|
of sparse using git to clone::
|
||||||
|
|
||||||
git://git.kernel.org/pub/scm/devel/sparse/sparse.git
|
git://git.kernel.org/pub/scm/devel/sparse/sparse.git
|
||||||
|
|
||||||
DaveJ has hourly generated tarballs of the git tree available at..
|
DaveJ has hourly generated tarballs of the git tree available at::
|
||||||
|
|
||||||
http://www.codemonkey.org.uk/projects/git-snapshots/sparse/
|
http://www.codemonkey.org.uk/projects/git-snapshots/sparse/
|
||||||
|
|
||||||
|
|
||||||
Once you have it, just do
|
Once you have it, just do::
|
||||||
|
|
||||||
make
|
make
|
||||||
make install
|
make install
|
||||||
@ -92,7 +101,7 @@ Once you have it, just do
|
|||||||
as a regular user, and it will install sparse in your ~/bin directory.
|
as a regular user, and it will install sparse in your ~/bin directory.
|
||||||
|
|
||||||
Using sparse
|
Using sparse
|
||||||
~~~~~~~~~~~~
|
------------
|
||||||
|
|
||||||
Do a kernel make with "make C=1" to run sparse on all the C files that get
|
Do a kernel make with "make C=1" to run sparse on all the C files that get
|
||||||
recompiled, or use "make C=2" to run sparse on the files whether they need to
|
recompiled, or use "make C=2" to run sparse on the files whether they need to
|
||||||
@ -101,7 +110,7 @@ have already built it.
|
|||||||
|
|
||||||
The optional make variable CF can be used to pass arguments to sparse. The
|
The optional make variable CF can be used to pass arguments to sparse. The
|
||||||
build system passes -Wbitwise to sparse automatically. To perform endianness
|
build system passes -Wbitwise to sparse automatically. To perform endianness
|
||||||
checks, you may define __CHECK_ENDIAN__:
|
checks, you may define __CHECK_ENDIAN__::
|
||||||
|
|
||||||
make C=2 CF="-D__CHECK_ENDIAN__"
|
make C=2 CF="-D__CHECK_ENDIAN__"
|
||||||
|
|
25
Documentation/dev-tools/tools.rst
Normal file
25
Documentation/dev-tools/tools.rst
Normal file
@ -0,0 +1,25 @@
|
|||||||
|
================================
|
||||||
|
Development tools for the kernel
|
||||||
|
================================
|
||||||
|
|
||||||
|
This document is a collection of documents about development tools that can
|
||||||
|
be used to work on the kernel. For now, the documents have been pulled
|
||||||
|
together without any significant effot to integrate them into a coherent
|
||||||
|
whole; patches welcome!
|
||||||
|
|
||||||
|
.. class:: toc-title
|
||||||
|
|
||||||
|
Table of contents
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
coccinelle
|
||||||
|
sparse
|
||||||
|
kcov
|
||||||
|
gcov
|
||||||
|
kasan
|
||||||
|
ubsan
|
||||||
|
kmemleak
|
||||||
|
kmemcheck
|
||||||
|
gdb-kernel-debugging
|
@ -1,7 +1,5 @@
|
|||||||
Undefined Behavior Sanitizer - UBSAN
|
The Undefined Behavior Sanitizer - UBSAN
|
||||||
|
========================================
|
||||||
Overview
|
|
||||||
--------
|
|
||||||
|
|
||||||
UBSAN is a runtime undefined behaviour checker.
|
UBSAN is a runtime undefined behaviour checker.
|
||||||
|
|
||||||
@ -10,11 +8,13 @@ Compiler inserts code that perform certain kinds of checks before operations
|
|||||||
that may cause UB. If check fails (i.e. UB detected) __ubsan_handle_*
|
that may cause UB. If check fails (i.e. UB detected) __ubsan_handle_*
|
||||||
function called to print error message.
|
function called to print error message.
|
||||||
|
|
||||||
GCC has that feature since 4.9.x [1] (see -fsanitize=undefined option and
|
GCC has that feature since 4.9.x [1_] (see ``-fsanitize=undefined`` option and
|
||||||
its suboptions). GCC 5.x has more checkers implemented [2].
|
its suboptions). GCC 5.x has more checkers implemented [2_].
|
||||||
|
|
||||||
Report example
|
Report example
|
||||||
---------------
|
--------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
================================================================================
|
================================================================================
|
||||||
UBSAN: Undefined behaviour in ../include/linux/bitops.h:110:33
|
UBSAN: Undefined behaviour in ../include/linux/bitops.h:110:33
|
||||||
@ -47,29 +47,33 @@ Report example
|
|||||||
Usage
|
Usage
|
||||||
-----
|
-----
|
||||||
|
|
||||||
To enable UBSAN configure kernel with:
|
To enable UBSAN configure kernel with::
|
||||||
|
|
||||||
CONFIG_UBSAN=y
|
CONFIG_UBSAN=y
|
||||||
|
|
||||||
and to check the entire kernel:
|
and to check the entire kernel::
|
||||||
|
|
||||||
CONFIG_UBSAN_SANITIZE_ALL=y
|
CONFIG_UBSAN_SANITIZE_ALL=y
|
||||||
|
|
||||||
To enable instrumentation for specific files or directories, add a line
|
To enable instrumentation for specific files or directories, add a line
|
||||||
similar to the following to the respective kernel Makefile:
|
similar to the following to the respective kernel Makefile:
|
||||||
|
|
||||||
For a single file (e.g. main.o):
|
- For a single file (e.g. main.o)::
|
||||||
UBSAN_SANITIZE_main.o := y
|
|
||||||
|
|
||||||
For all files in one directory:
|
UBSAN_SANITIZE_main.o := y
|
||||||
UBSAN_SANITIZE := y
|
|
||||||
|
- For all files in one directory::
|
||||||
|
|
||||||
|
UBSAN_SANITIZE := y
|
||||||
|
|
||||||
To exclude files from being instrumented even if
|
To exclude files from being instrumented even if
|
||||||
CONFIG_UBSAN_SANITIZE_ALL=y, use:
|
``CONFIG_UBSAN_SANITIZE_ALL=y``, use::
|
||||||
|
|
||||||
UBSAN_SANITIZE_main.o := n
|
UBSAN_SANITIZE_main.o := n
|
||||||
and:
|
|
||||||
UBSAN_SANITIZE := n
|
and::
|
||||||
|
|
||||||
|
UBSAN_SANITIZE := n
|
||||||
|
|
||||||
Detection of unaligned accesses controlled through the separate option -
|
Detection of unaligned accesses controlled through the separate option -
|
||||||
CONFIG_UBSAN_ALIGNMENT. It's off by default on architectures that support
|
CONFIG_UBSAN_ALIGNMENT. It's off by default on architectures that support
|
||||||
@ -80,5 +84,5 @@ reports.
|
|||||||
References
|
References
|
||||||
----------
|
----------
|
||||||
|
|
||||||
[1] - https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Debugging-Options.html
|
.. _1: https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Debugging-Options.html
|
||||||
[2] - https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html
|
.. _2: https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html
|
@ -1,257 +0,0 @@
|
|||||||
Using gcov with the Linux kernel
|
|
||||||
================================
|
|
||||||
|
|
||||||
1. Introduction
|
|
||||||
2. Preparation
|
|
||||||
3. Customization
|
|
||||||
4. Files
|
|
||||||
5. Modules
|
|
||||||
6. Separated build and test machines
|
|
||||||
7. Troubleshooting
|
|
||||||
Appendix A: sample script: gather_on_build.sh
|
|
||||||
Appendix B: sample script: gather_on_test.sh
|
|
||||||
|
|
||||||
|
|
||||||
1. Introduction
|
|
||||||
===============
|
|
||||||
|
|
||||||
gcov profiling kernel support enables the use of GCC's coverage testing
|
|
||||||
tool gcov [1] with the Linux kernel. Coverage data of a running kernel
|
|
||||||
is exported in gcov-compatible format via the "gcov" debugfs directory.
|
|
||||||
To get coverage data for a specific file, change to the kernel build
|
|
||||||
directory and use gcov with the -o option as follows (requires root):
|
|
||||||
|
|
||||||
# cd /tmp/linux-out
|
|
||||||
# gcov -o /sys/kernel/debug/gcov/tmp/linux-out/kernel spinlock.c
|
|
||||||
|
|
||||||
This will create source code files annotated with execution counts
|
|
||||||
in the current directory. In addition, graphical gcov front-ends such
|
|
||||||
as lcov [2] can be used to automate the process of collecting data
|
|
||||||
for the entire kernel and provide coverage overviews in HTML format.
|
|
||||||
|
|
||||||
Possible uses:
|
|
||||||
|
|
||||||
* debugging (has this line been reached at all?)
|
|
||||||
* test improvement (how do I change my test to cover these lines?)
|
|
||||||
* minimizing kernel configurations (do I need this option if the
|
|
||||||
associated code is never run?)
|
|
||||||
|
|
||||||
--
|
|
||||||
|
|
||||||
[1] http://gcc.gnu.org/onlinedocs/gcc/Gcov.html
|
|
||||||
[2] http://ltp.sourceforge.net/coverage/lcov.php
|
|
||||||
|
|
||||||
|
|
||||||
2. Preparation
|
|
||||||
==============
|
|
||||||
|
|
||||||
Configure the kernel with:
|
|
||||||
|
|
||||||
CONFIG_DEBUG_FS=y
|
|
||||||
CONFIG_GCOV_KERNEL=y
|
|
||||||
|
|
||||||
select the gcc's gcov format, default is autodetect based on gcc version:
|
|
||||||
|
|
||||||
CONFIG_GCOV_FORMAT_AUTODETECT=y
|
|
||||||
|
|
||||||
and to get coverage data for the entire kernel:
|
|
||||||
|
|
||||||
CONFIG_GCOV_PROFILE_ALL=y
|
|
||||||
|
|
||||||
Note that kernels compiled with profiling flags will be significantly
|
|
||||||
larger and run slower. Also CONFIG_GCOV_PROFILE_ALL may not be supported
|
|
||||||
on all architectures.
|
|
||||||
|
|
||||||
Profiling data will only become accessible once debugfs has been
|
|
||||||
mounted:
|
|
||||||
|
|
||||||
mount -t debugfs none /sys/kernel/debug
|
|
||||||
|
|
||||||
|
|
||||||
3. Customization
|
|
||||||
================
|
|
||||||
|
|
||||||
To enable profiling for specific files or directories, add a line
|
|
||||||
similar to the following to the respective kernel Makefile:
|
|
||||||
|
|
||||||
For a single file (e.g. main.o):
|
|
||||||
GCOV_PROFILE_main.o := y
|
|
||||||
|
|
||||||
For all files in one directory:
|
|
||||||
GCOV_PROFILE := y
|
|
||||||
|
|
||||||
To exclude files from being profiled even when CONFIG_GCOV_PROFILE_ALL
|
|
||||||
is specified, use:
|
|
||||||
|
|
||||||
GCOV_PROFILE_main.o := n
|
|
||||||
and:
|
|
||||||
GCOV_PROFILE := n
|
|
||||||
|
|
||||||
Only files which are linked to the main kernel image or are compiled as
|
|
||||||
kernel modules are supported by this mechanism.
|
|
||||||
|
|
||||||
|
|
||||||
4. Files
|
|
||||||
========
|
|
||||||
|
|
||||||
The gcov kernel support creates the following files in debugfs:
|
|
||||||
|
|
||||||
/sys/kernel/debug/gcov
|
|
||||||
Parent directory for all gcov-related files.
|
|
||||||
|
|
||||||
/sys/kernel/debug/gcov/reset
|
|
||||||
Global reset file: resets all coverage data to zero when
|
|
||||||
written to.
|
|
||||||
|
|
||||||
/sys/kernel/debug/gcov/path/to/compile/dir/file.gcda
|
|
||||||
The actual gcov data file as understood by the gcov
|
|
||||||
tool. Resets file coverage data to zero when written to.
|
|
||||||
|
|
||||||
/sys/kernel/debug/gcov/path/to/compile/dir/file.gcno
|
|
||||||
Symbolic link to a static data file required by the gcov
|
|
||||||
tool. This file is generated by gcc when compiling with
|
|
||||||
option -ftest-coverage.
|
|
||||||
|
|
||||||
|
|
||||||
5. Modules
|
|
||||||
==========
|
|
||||||
|
|
||||||
Kernel modules may contain cleanup code which is only run during
|
|
||||||
module unload time. The gcov mechanism provides a means to collect
|
|
||||||
coverage data for such code by keeping a copy of the data associated
|
|
||||||
with the unloaded module. This data remains available through debugfs.
|
|
||||||
Once the module is loaded again, the associated coverage counters are
|
|
||||||
initialized with the data from its previous instantiation.
|
|
||||||
|
|
||||||
This behavior can be deactivated by specifying the gcov_persist kernel
|
|
||||||
parameter:
|
|
||||||
|
|
||||||
gcov_persist=0
|
|
||||||
|
|
||||||
At run-time, a user can also choose to discard data for an unloaded
|
|
||||||
module by writing to its data file or the global reset file.
|
|
||||||
|
|
||||||
|
|
||||||
6. Separated build and test machines
|
|
||||||
====================================
|
|
||||||
|
|
||||||
The gcov kernel profiling infrastructure is designed to work out-of-the
|
|
||||||
box for setups where kernels are built and run on the same machine. In
|
|
||||||
cases where the kernel runs on a separate machine, special preparations
|
|
||||||
must be made, depending on where the gcov tool is used:
|
|
||||||
|
|
||||||
a) gcov is run on the TEST machine
|
|
||||||
|
|
||||||
The gcov tool version on the test machine must be compatible with the
|
|
||||||
gcc version used for kernel build. Also the following files need to be
|
|
||||||
copied from build to test machine:
|
|
||||||
|
|
||||||
from the source tree:
|
|
||||||
- all C source files + headers
|
|
||||||
|
|
||||||
from the build tree:
|
|
||||||
- all C source files + headers
|
|
||||||
- all .gcda and .gcno files
|
|
||||||
- all links to directories
|
|
||||||
|
|
||||||
It is important to note that these files need to be placed into the
|
|
||||||
exact same file system location on the test machine as on the build
|
|
||||||
machine. If any of the path components is symbolic link, the actual
|
|
||||||
directory needs to be used instead (due to make's CURDIR handling).
|
|
||||||
|
|
||||||
b) gcov is run on the BUILD machine
|
|
||||||
|
|
||||||
The following files need to be copied after each test case from test
|
|
||||||
to build machine:
|
|
||||||
|
|
||||||
from the gcov directory in sysfs:
|
|
||||||
- all .gcda files
|
|
||||||
- all links to .gcno files
|
|
||||||
|
|
||||||
These files can be copied to any location on the build machine. gcov
|
|
||||||
must then be called with the -o option pointing to that directory.
|
|
||||||
|
|
||||||
Example directory setup on the build machine:
|
|
||||||
|
|
||||||
/tmp/linux: kernel source tree
|
|
||||||
/tmp/out: kernel build directory as specified by make O=
|
|
||||||
/tmp/coverage: location of the files copied from the test machine
|
|
||||||
|
|
||||||
[user@build] cd /tmp/out
|
|
||||||
[user@build] gcov -o /tmp/coverage/tmp/out/init main.c
|
|
||||||
|
|
||||||
|
|
||||||
7. Troubleshooting
|
|
||||||
==================
|
|
||||||
|
|
||||||
Problem: Compilation aborts during linker step.
|
|
||||||
Cause: Profiling flags are specified for source files which are not
|
|
||||||
linked to the main kernel or which are linked by a custom
|
|
||||||
linker procedure.
|
|
||||||
Solution: Exclude affected source files from profiling by specifying
|
|
||||||
GCOV_PROFILE := n or GCOV_PROFILE_basename.o := n in the
|
|
||||||
corresponding Makefile.
|
|
||||||
|
|
||||||
Problem: Files copied from sysfs appear empty or incomplete.
|
|
||||||
Cause: Due to the way seq_file works, some tools such as cp or tar
|
|
||||||
may not correctly copy files from sysfs.
|
|
||||||
Solution: Use 'cat' to read .gcda files and 'cp -d' to copy links.
|
|
||||||
Alternatively use the mechanism shown in Appendix B.
|
|
||||||
|
|
||||||
|
|
||||||
Appendix A: gather_on_build.sh
|
|
||||||
==============================
|
|
||||||
|
|
||||||
Sample script to gather coverage meta files on the build machine
|
|
||||||
(see 6a):
|
|
||||||
#!/bin/bash
|
|
||||||
|
|
||||||
KSRC=$1
|
|
||||||
KOBJ=$2
|
|
||||||
DEST=$3
|
|
||||||
|
|
||||||
if [ -z "$KSRC" ] || [ -z "$KOBJ" ] || [ -z "$DEST" ]; then
|
|
||||||
echo "Usage: $0 <ksrc directory> <kobj directory> <output.tar.gz>" >&2
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
KSRC=$(cd $KSRC; printf "all:\n\t@echo \${CURDIR}\n" | make -f -)
|
|
||||||
KOBJ=$(cd $KOBJ; printf "all:\n\t@echo \${CURDIR}\n" | make -f -)
|
|
||||||
|
|
||||||
find $KSRC $KOBJ \( -name '*.gcno' -o -name '*.[ch]' -o -type l \) -a \
|
|
||||||
-perm /u+r,g+r | tar cfz $DEST -P -T -
|
|
||||||
|
|
||||||
if [ $? -eq 0 ] ; then
|
|
||||||
echo "$DEST successfully created, copy to test system and unpack with:"
|
|
||||||
echo " tar xfz $DEST -P"
|
|
||||||
else
|
|
||||||
echo "Could not create file $DEST"
|
|
||||||
fi
|
|
||||||
|
|
||||||
|
|
||||||
Appendix B: gather_on_test.sh
|
|
||||||
=============================
|
|
||||||
|
|
||||||
Sample script to gather coverage data files on the test machine
|
|
||||||
(see 6b):
|
|
||||||
|
|
||||||
#!/bin/bash -e
|
|
||||||
|
|
||||||
DEST=$1
|
|
||||||
GCDA=/sys/kernel/debug/gcov
|
|
||||||
|
|
||||||
if [ -z "$DEST" ] ; then
|
|
||||||
echo "Usage: $0 <output.tar.gz>" >&2
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
TEMPDIR=$(mktemp -d)
|
|
||||||
echo Collecting data..
|
|
||||||
find $GCDA -type d -exec mkdir -p $TEMPDIR/\{\} \;
|
|
||||||
find $GCDA -name '*.gcda' -exec sh -c 'cat < $0 > '$TEMPDIR'/$0' {} \;
|
|
||||||
find $GCDA -name '*.gcno' -exec sh -c 'cp -d $0 '$TEMPDIR'/$0' {} \;
|
|
||||||
tar czf $DEST -C $TEMPDIR sys
|
|
||||||
rm -rf $TEMPDIR
|
|
||||||
|
|
||||||
echo "$DEST successfully created, copy to build system and unpack with:"
|
|
||||||
echo " tar xfz $DEST"
|
|
@ -12,6 +12,7 @@ Contents:
|
|||||||
:maxdepth: 2
|
:maxdepth: 2
|
||||||
|
|
||||||
kernel-documentation
|
kernel-documentation
|
||||||
|
dev-tools/tools
|
||||||
media/index
|
media/index
|
||||||
gpu/index
|
gpu/index
|
||||||
|
|
||||||
|
@ -1,171 +0,0 @@
|
|||||||
KernelAddressSanitizer (KASAN)
|
|
||||||
==============================
|
|
||||||
|
|
||||||
0. Overview
|
|
||||||
===========
|
|
||||||
|
|
||||||
KernelAddressSANitizer (KASAN) is a dynamic memory error detector. It provides
|
|
||||||
a fast and comprehensive solution for finding use-after-free and out-of-bounds
|
|
||||||
bugs.
|
|
||||||
|
|
||||||
KASAN uses compile-time instrumentation for checking every memory access,
|
|
||||||
therefore you will need a GCC version 4.9.2 or later. GCC 5.0 or later is
|
|
||||||
required for detection of out-of-bounds accesses to stack or global variables.
|
|
||||||
|
|
||||||
Currently KASAN is supported only for x86_64 and arm64 architecture.
|
|
||||||
|
|
||||||
1. Usage
|
|
||||||
========
|
|
||||||
|
|
||||||
To enable KASAN configure kernel with:
|
|
||||||
|
|
||||||
CONFIG_KASAN = y
|
|
||||||
|
|
||||||
and choose between CONFIG_KASAN_OUTLINE and CONFIG_KASAN_INLINE. Outline and
|
|
||||||
inline are compiler instrumentation types. The former produces smaller binary
|
|
||||||
the latter is 1.1 - 2 times faster. Inline instrumentation requires a GCC
|
|
||||||
version 5.0 or later.
|
|
||||||
|
|
||||||
KASAN works with both SLUB and SLAB memory allocators.
|
|
||||||
For better bug detection and nicer reporting, enable CONFIG_STACKTRACE.
|
|
||||||
|
|
||||||
To disable instrumentation for specific files or directories, add a line
|
|
||||||
similar to the following to the respective kernel Makefile:
|
|
||||||
|
|
||||||
For a single file (e.g. main.o):
|
|
||||||
KASAN_SANITIZE_main.o := n
|
|
||||||
|
|
||||||
For all files in one directory:
|
|
||||||
KASAN_SANITIZE := n
|
|
||||||
|
|
||||||
1.1 Error reports
|
|
||||||
=================
|
|
||||||
|
|
||||||
A typical out of bounds access report looks like this:
|
|
||||||
|
|
||||||
==================================================================
|
|
||||||
BUG: AddressSanitizer: out of bounds access in kmalloc_oob_right+0x65/0x75 [test_kasan] at addr ffff8800693bc5d3
|
|
||||||
Write of size 1 by task modprobe/1689
|
|
||||||
=============================================================================
|
|
||||||
BUG kmalloc-128 (Not tainted): kasan error
|
|
||||||
-----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
Disabling lock debugging due to kernel taint
|
|
||||||
INFO: Allocated in kmalloc_oob_right+0x3d/0x75 [test_kasan] age=0 cpu=0 pid=1689
|
|
||||||
__slab_alloc+0x4b4/0x4f0
|
|
||||||
kmem_cache_alloc_trace+0x10b/0x190
|
|
||||||
kmalloc_oob_right+0x3d/0x75 [test_kasan]
|
|
||||||
init_module+0x9/0x47 [test_kasan]
|
|
||||||
do_one_initcall+0x99/0x200
|
|
||||||
load_module+0x2cb3/0x3b20
|
|
||||||
SyS_finit_module+0x76/0x80
|
|
||||||
system_call_fastpath+0x12/0x17
|
|
||||||
INFO: Slab 0xffffea0001a4ef00 objects=17 used=7 fp=0xffff8800693bd728 flags=0x100000000004080
|
|
||||||
INFO: Object 0xffff8800693bc558 @offset=1368 fp=0xffff8800693bc720
|
|
||||||
|
|
||||||
Bytes b4 ffff8800693bc548: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
|
|
||||||
Object ffff8800693bc558: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
|
||||||
Object ffff8800693bc568: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
|
||||||
Object ffff8800693bc578: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
|
||||||
Object ffff8800693bc588: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
|
||||||
Object ffff8800693bc598: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
|
||||||
Object ffff8800693bc5a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
|
||||||
Object ffff8800693bc5b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
|
||||||
Object ffff8800693bc5c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
|
|
||||||
Redzone ffff8800693bc5d8: cc cc cc cc cc cc cc cc ........
|
|
||||||
Padding ffff8800693bc718: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
|
|
||||||
CPU: 0 PID: 1689 Comm: modprobe Tainted: G B 3.18.0-rc1-mm1+ #98
|
|
||||||
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
|
|
||||||
ffff8800693bc000 0000000000000000 ffff8800693bc558 ffff88006923bb78
|
|
||||||
ffffffff81cc68ae 00000000000000f3 ffff88006d407600 ffff88006923bba8
|
|
||||||
ffffffff811fd848 ffff88006d407600 ffffea0001a4ef00 ffff8800693bc558
|
|
||||||
Call Trace:
|
|
||||||
[<ffffffff81cc68ae>] dump_stack+0x46/0x58
|
|
||||||
[<ffffffff811fd848>] print_trailer+0xf8/0x160
|
|
||||||
[<ffffffffa00026a7>] ? kmem_cache_oob+0xc3/0xc3 [test_kasan]
|
|
||||||
[<ffffffff811ff0f5>] object_err+0x35/0x40
|
|
||||||
[<ffffffffa0002065>] ? kmalloc_oob_right+0x65/0x75 [test_kasan]
|
|
||||||
[<ffffffff8120b9fa>] kasan_report_error+0x38a/0x3f0
|
|
||||||
[<ffffffff8120a79f>] ? kasan_poison_shadow+0x2f/0x40
|
|
||||||
[<ffffffff8120b344>] ? kasan_unpoison_shadow+0x14/0x40
|
|
||||||
[<ffffffff8120a79f>] ? kasan_poison_shadow+0x2f/0x40
|
|
||||||
[<ffffffffa00026a7>] ? kmem_cache_oob+0xc3/0xc3 [test_kasan]
|
|
||||||
[<ffffffff8120a995>] __asan_store1+0x75/0xb0
|
|
||||||
[<ffffffffa0002601>] ? kmem_cache_oob+0x1d/0xc3 [test_kasan]
|
|
||||||
[<ffffffffa0002065>] ? kmalloc_oob_right+0x65/0x75 [test_kasan]
|
|
||||||
[<ffffffffa0002065>] kmalloc_oob_right+0x65/0x75 [test_kasan]
|
|
||||||
[<ffffffffa00026b0>] init_module+0x9/0x47 [test_kasan]
|
|
||||||
[<ffffffff810002d9>] do_one_initcall+0x99/0x200
|
|
||||||
[<ffffffff811e4e5c>] ? __vunmap+0xec/0x160
|
|
||||||
[<ffffffff81114f63>] load_module+0x2cb3/0x3b20
|
|
||||||
[<ffffffff8110fd70>] ? m_show+0x240/0x240
|
|
||||||
[<ffffffff81115f06>] SyS_finit_module+0x76/0x80
|
|
||||||
[<ffffffff81cd3129>] system_call_fastpath+0x12/0x17
|
|
||||||
Memory state around the buggy address:
|
|
||||||
ffff8800693bc300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
|
|
||||||
ffff8800693bc380: fc fc 00 00 00 00 00 00 00 00 00 00 00 00 00 fc
|
|
||||||
ffff8800693bc400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
|
|
||||||
ffff8800693bc480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
|
|
||||||
ffff8800693bc500: fc fc fc fc fc fc fc fc fc fc fc 00 00 00 00 00
|
|
||||||
>ffff8800693bc580: 00 00 00 00 00 00 00 00 00 00 03 fc fc fc fc fc
|
|
||||||
^
|
|
||||||
ffff8800693bc600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
|
|
||||||
ffff8800693bc680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
|
|
||||||
ffff8800693bc700: fc fc fc fc fb fb fb fb fb fb fb fb fb fb fb fb
|
|
||||||
ffff8800693bc780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
|
|
||||||
ffff8800693bc800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
|
|
||||||
==================================================================
|
|
||||||
|
|
||||||
The header of the report discribe what kind of bug happened and what kind of
|
|
||||||
access caused it. It's followed by the description of the accessed slub object
|
|
||||||
(see 'SLUB Debug output' section in Documentation/vm/slub.txt for details) and
|
|
||||||
the description of the accessed memory page.
|
|
||||||
|
|
||||||
In the last section the report shows memory state around the accessed address.
|
|
||||||
Reading this part requires some understanding of how KASAN works.
|
|
||||||
|
|
||||||
The state of each 8 aligned bytes of memory is encoded in one shadow byte.
|
|
||||||
Those 8 bytes can be accessible, partially accessible, freed or be a redzone.
|
|
||||||
We use the following encoding for each shadow byte: 0 means that all 8 bytes
|
|
||||||
of the corresponding memory region are accessible; number N (1 <= N <= 7) means
|
|
||||||
that the first N bytes are accessible, and other (8 - N) bytes are not;
|
|
||||||
any negative value indicates that the entire 8-byte word is inaccessible.
|
|
||||||
We use different negative values to distinguish between different kinds of
|
|
||||||
inaccessible memory like redzones or freed memory (see mm/kasan/kasan.h).
|
|
||||||
|
|
||||||
In the report above the arrows point to the shadow byte 03, which means that
|
|
||||||
the accessed address is partially accessible.
|
|
||||||
|
|
||||||
|
|
||||||
2. Implementation details
|
|
||||||
=========================
|
|
||||||
|
|
||||||
From a high level, our approach to memory error detection is similar to that
|
|
||||||
of kmemcheck: use shadow memory to record whether each byte of memory is safe
|
|
||||||
to access, and use compile-time instrumentation to check shadow memory on each
|
|
||||||
memory access.
|
|
||||||
|
|
||||||
AddressSanitizer dedicates 1/8 of kernel memory to its shadow memory
|
|
||||||
(e.g. 16TB to cover 128TB on x86_64) and uses direct mapping with a scale and
|
|
||||||
offset to translate a memory address to its corresponding shadow address.
|
|
||||||
|
|
||||||
Here is the function which translates an address to its corresponding shadow
|
|
||||||
address:
|
|
||||||
|
|
||||||
static inline void *kasan_mem_to_shadow(const void *addr)
|
|
||||||
{
|
|
||||||
return ((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT)
|
|
||||||
+ KASAN_SHADOW_OFFSET;
|
|
||||||
}
|
|
||||||
|
|
||||||
where KASAN_SHADOW_SCALE_SHIFT = 3.
|
|
||||||
|
|
||||||
Compile-time instrumentation used for checking memory accesses. Compiler inserts
|
|
||||||
function calls (__asan_load*(addr), __asan_store*(addr)) before each memory
|
|
||||||
access of size 1, 2, 4, 8 or 16. These functions check whether memory access is
|
|
||||||
valid or not by checking corresponding shadow memory.
|
|
||||||
|
|
||||||
GCC 5.0 has possibility to perform inline instrumentation. Instead of making
|
|
||||||
function calls GCC directly inserts the code to check the shadow memory.
|
|
||||||
This option significantly enlarges kernel but it gives x1.1-x2 performance
|
|
||||||
boost over outline instrumented kernel.
|
|
@ -1,754 +0,0 @@
|
|||||||
GETTING STARTED WITH KMEMCHECK
|
|
||||||
==============================
|
|
||||||
|
|
||||||
Vegard Nossum <vegardno@ifi.uio.no>
|
|
||||||
|
|
||||||
|
|
||||||
Contents
|
|
||||||
========
|
|
||||||
0. Introduction
|
|
||||||
1. Downloading
|
|
||||||
2. Configuring and compiling
|
|
||||||
3. How to use
|
|
||||||
3.1. Booting
|
|
||||||
3.2. Run-time enable/disable
|
|
||||||
3.3. Debugging
|
|
||||||
3.4. Annotating false positives
|
|
||||||
4. Reporting errors
|
|
||||||
5. Technical description
|
|
||||||
|
|
||||||
|
|
||||||
0. Introduction
|
|
||||||
===============
|
|
||||||
|
|
||||||
kmemcheck is a debugging feature for the Linux Kernel. More specifically, it
|
|
||||||
is a dynamic checker that detects and warns about some uses of uninitialized
|
|
||||||
memory.
|
|
||||||
|
|
||||||
Userspace programmers might be familiar with Valgrind's memcheck. The main
|
|
||||||
difference between memcheck and kmemcheck is that memcheck works for userspace
|
|
||||||
programs only, and kmemcheck works for the kernel only. The implementations
|
|
||||||
are of course vastly different. Because of this, kmemcheck is not as accurate
|
|
||||||
as memcheck, but it turns out to be good enough in practice to discover real
|
|
||||||
programmer errors that the compiler is not able to find through static
|
|
||||||
analysis.
|
|
||||||
|
|
||||||
Enabling kmemcheck on a kernel will probably slow it down to the extent that
|
|
||||||
the machine will not be usable for normal workloads such as e.g. an
|
|
||||||
interactive desktop. kmemcheck will also cause the kernel to use about twice
|
|
||||||
as much memory as normal. For this reason, kmemcheck is strictly a debugging
|
|
||||||
feature.
|
|
||||||
|
|
||||||
|
|
||||||
1. Downloading
|
|
||||||
==============
|
|
||||||
|
|
||||||
As of version 2.6.31-rc1, kmemcheck is included in the mainline kernel.
|
|
||||||
|
|
||||||
|
|
||||||
2. Configuring and compiling
|
|
||||||
============================
|
|
||||||
|
|
||||||
kmemcheck only works for the x86 (both 32- and 64-bit) platform. A number of
|
|
||||||
configuration variables must have specific settings in order for the kmemcheck
|
|
||||||
menu to even appear in "menuconfig". These are:
|
|
||||||
|
|
||||||
o CONFIG_CC_OPTIMIZE_FOR_SIZE=n
|
|
||||||
|
|
||||||
This option is located under "General setup" / "Optimize for size".
|
|
||||||
|
|
||||||
Without this, gcc will use certain optimizations that usually lead to
|
|
||||||
false positive warnings from kmemcheck. An example of this is a 16-bit
|
|
||||||
field in a struct, where gcc may load 32 bits, then discard the upper
|
|
||||||
16 bits. kmemcheck sees only the 32-bit load, and may trigger a
|
|
||||||
warning for the upper 16 bits (if they're uninitialized).
|
|
||||||
|
|
||||||
o CONFIG_SLAB=y or CONFIG_SLUB=y
|
|
||||||
|
|
||||||
This option is located under "General setup" / "Choose SLAB
|
|
||||||
allocator".
|
|
||||||
|
|
||||||
o CONFIG_FUNCTION_TRACER=n
|
|
||||||
|
|
||||||
This option is located under "Kernel hacking" / "Tracers" / "Kernel
|
|
||||||
Function Tracer"
|
|
||||||
|
|
||||||
When function tracing is compiled in, gcc emits a call to another
|
|
||||||
function at the beginning of every function. This means that when the
|
|
||||||
page fault handler is called, the ftrace framework will be called
|
|
||||||
before kmemcheck has had a chance to handle the fault. If ftrace then
|
|
||||||
modifies memory that was tracked by kmemcheck, the result is an
|
|
||||||
endless recursive page fault.
|
|
||||||
|
|
||||||
o CONFIG_DEBUG_PAGEALLOC=n
|
|
||||||
|
|
||||||
This option is located under "Kernel hacking" / "Memory Debugging"
|
|
||||||
/ "Debug page memory allocations".
|
|
||||||
|
|
||||||
In addition, I highly recommend turning on CONFIG_DEBUG_INFO=y. This is also
|
|
||||||
located under "Kernel hacking". With this, you will be able to get line number
|
|
||||||
information from the kmemcheck warnings, which is extremely valuable in
|
|
||||||
debugging a problem. This option is not mandatory, however, because it slows
|
|
||||||
down the compilation process and produces a much bigger kernel image.
|
|
||||||
|
|
||||||
Now the kmemcheck menu should be visible (under "Kernel hacking" / "Memory
|
|
||||||
Debugging" / "kmemcheck: trap use of uninitialized memory"). Here follows
|
|
||||||
a description of the kmemcheck configuration variables:
|
|
||||||
|
|
||||||
o CONFIG_KMEMCHECK
|
|
||||||
|
|
||||||
This must be enabled in order to use kmemcheck at all...
|
|
||||||
|
|
||||||
o CONFIG_KMEMCHECK_[DISABLED | ENABLED | ONESHOT]_BY_DEFAULT
|
|
||||||
|
|
||||||
This option controls the status of kmemcheck at boot-time. "Enabled"
|
|
||||||
will enable kmemcheck right from the start, "disabled" will boot the
|
|
||||||
kernel as normal (but with the kmemcheck code compiled in, so it can
|
|
||||||
be enabled at run-time after the kernel has booted), and "one-shot" is
|
|
||||||
a special mode which will turn kmemcheck off automatically after
|
|
||||||
detecting the first use of uninitialized memory.
|
|
||||||
|
|
||||||
If you are using kmemcheck to actively debug a problem, then you
|
|
||||||
probably want to choose "enabled" here.
|
|
||||||
|
|
||||||
The one-shot mode is mostly useful in automated test setups because it
|
|
||||||
can prevent floods of warnings and increase the chances of the machine
|
|
||||||
surviving in case something is really wrong. In other cases, the one-
|
|
||||||
shot mode could actually be counter-productive because it would turn
|
|
||||||
itself off at the very first error -- in the case of a false positive
|
|
||||||
too -- and this would come in the way of debugging the specific
|
|
||||||
problem you were interested in.
|
|
||||||
|
|
||||||
If you would like to use your kernel as normal, but with a chance to
|
|
||||||
enable kmemcheck in case of some problem, it might be a good idea to
|
|
||||||
choose "disabled" here. When kmemcheck is disabled, most of the run-
|
|
||||||
time overhead is not incurred, and the kernel will be almost as fast
|
|
||||||
as normal.
|
|
||||||
|
|
||||||
o CONFIG_KMEMCHECK_QUEUE_SIZE
|
|
||||||
|
|
||||||
Select the maximum number of error reports to store in an internal
|
|
||||||
(fixed-size) buffer. Since errors can occur virtually anywhere and in
|
|
||||||
any context, we need a temporary storage area which is guaranteed not
|
|
||||||
to generate any other page faults when accessed. The queue will be
|
|
||||||
emptied as soon as a tasklet may be scheduled. If the queue is full,
|
|
||||||
new error reports will be lost.
|
|
||||||
|
|
||||||
The default value of 64 is probably fine. If some code produces more
|
|
||||||
than 64 errors within an irqs-off section, then the code is likely to
|
|
||||||
produce many, many more, too, and these additional reports seldom give
|
|
||||||
any more information (the first report is usually the most valuable
|
|
||||||
anyway).
|
|
||||||
|
|
||||||
This number might have to be adjusted if you are not using serial
|
|
||||||
console or similar to capture the kernel log. If you are using the
|
|
||||||
"dmesg" command to save the log, then getting a lot of kmemcheck
|
|
||||||
warnings might overflow the kernel log itself, and the earlier reports
|
|
||||||
will get lost in that way instead. Try setting this to 10 or so on
|
|
||||||
such a setup.
|
|
||||||
|
|
||||||
o CONFIG_KMEMCHECK_SHADOW_COPY_SHIFT
|
|
||||||
|
|
||||||
Select the number of shadow bytes to save along with each entry of the
|
|
||||||
error-report queue. These bytes indicate what parts of an allocation
|
|
||||||
are initialized, uninitialized, etc. and will be displayed when an
|
|
||||||
error is detected to help the debugging of a particular problem.
|
|
||||||
|
|
||||||
The number entered here is actually the logarithm of the number of
|
|
||||||
bytes that will be saved. So if you pick for example 5 here, kmemcheck
|
|
||||||
will save 2^5 = 32 bytes.
|
|
||||||
|
|
||||||
The default value should be fine for debugging most problems. It also
|
|
||||||
fits nicely within 80 columns.
|
|
||||||
|
|
||||||
o CONFIG_KMEMCHECK_PARTIAL_OK
|
|
||||||
|
|
||||||
This option (when enabled) works around certain GCC optimizations that
|
|
||||||
produce 32-bit reads from 16-bit variables where the upper 16 bits are
|
|
||||||
thrown away afterwards.
|
|
||||||
|
|
||||||
The default value (enabled) is recommended. This may of course hide
|
|
||||||
some real errors, but disabling it would probably produce a lot of
|
|
||||||
false positives.
|
|
||||||
|
|
||||||
o CONFIG_KMEMCHECK_BITOPS_OK
|
|
||||||
|
|
||||||
This option silences warnings that would be generated for bit-field
|
|
||||||
accesses where not all the bits are initialized at the same time. This
|
|
||||||
may also hide some real bugs.
|
|
||||||
|
|
||||||
This option is probably obsolete, or it should be replaced with
|
|
||||||
the kmemcheck-/bitfield-annotations for the code in question. The
|
|
||||||
default value is therefore fine.
|
|
||||||
|
|
||||||
Now compile the kernel as usual.
|
|
||||||
|
|
||||||
|
|
||||||
3. How to use
|
|
||||||
=============
|
|
||||||
|
|
||||||
3.1. Booting
|
|
||||||
============
|
|
||||||
|
|
||||||
First some information about the command-line options. There is only one
|
|
||||||
option specific to kmemcheck, and this is called "kmemcheck". It can be used
|
|
||||||
to override the default mode as chosen by the CONFIG_KMEMCHECK_*_BY_DEFAULT
|
|
||||||
option. Its possible settings are:
|
|
||||||
|
|
||||||
o kmemcheck=0 (disabled)
|
|
||||||
o kmemcheck=1 (enabled)
|
|
||||||
o kmemcheck=2 (one-shot mode)
|
|
||||||
|
|
||||||
If SLUB debugging has been enabled in the kernel, it may take precedence over
|
|
||||||
kmemcheck in such a way that the slab caches which are under SLUB debugging
|
|
||||||
will not be tracked by kmemcheck. In order to ensure that this doesn't happen
|
|
||||||
(even though it shouldn't by default), use SLUB's boot option "slub_debug",
|
|
||||||
like this: slub_debug=-
|
|
||||||
|
|
||||||
In fact, this option may also be used for fine-grained control over SLUB vs.
|
|
||||||
kmemcheck. For example, if the command line includes "kmemcheck=1
|
|
||||||
slub_debug=,dentry", then SLUB debugging will be used only for the "dentry"
|
|
||||||
slab cache, and with kmemcheck tracking all the other caches. This is advanced
|
|
||||||
usage, however, and is not generally recommended.
|
|
||||||
|
|
||||||
|
|
||||||
3.2. Run-time enable/disable
|
|
||||||
============================
|
|
||||||
|
|
||||||
When the kernel has booted, it is possible to enable or disable kmemcheck at
|
|
||||||
run-time. WARNING: This feature is still experimental and may cause false
|
|
||||||
positive warnings to appear. Therefore, try not to use this. If you find that
|
|
||||||
it doesn't work properly (e.g. you see an unreasonable amount of warnings), I
|
|
||||||
will be happy to take bug reports.
|
|
||||||
|
|
||||||
Use the file /proc/sys/kernel/kmemcheck for this purpose, e.g.:
|
|
||||||
|
|
||||||
$ echo 0 > /proc/sys/kernel/kmemcheck # disables kmemcheck
|
|
||||||
|
|
||||||
The numbers are the same as for the kmemcheck= command-line option.
|
|
||||||
|
|
||||||
|
|
||||||
3.3. Debugging
|
|
||||||
==============
|
|
||||||
|
|
||||||
A typical report will look something like this:
|
|
||||||
|
|
||||||
WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024)
|
|
||||||
80000000000000000000000000000000000000000088ffff0000000000000000
|
|
||||||
i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
|
|
||||||
^
|
|
||||||
|
|
||||||
Pid: 1856, comm: ntpdate Not tainted 2.6.29-rc5 #264 945P-A
|
|
||||||
RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190
|
|
||||||
RSP: 0018:ffff88003cdf7d98 EFLAGS: 00210002
|
|
||||||
RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009
|
|
||||||
RDX: ffff88003e5d6018 RSI: ffff88003e5d6024 RDI: ffff88003cdf7e84
|
|
||||||
RBP: ffff88003cdf7db8 R08: ffff88003e5d6000 R09: 0000000000000000
|
|
||||||
R10: 0000000000000080 R11: 0000000000000000 R12: 000000000000000e
|
|
||||||
R13: ffff88003cdf7e78 R14: ffff88003d530710 R15: ffff88003d5a98c8
|
|
||||||
FS: 0000000000000000(0000) GS:ffff880001982000(0063) knlGS:00000
|
|
||||||
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
|
|
||||||
CR2: ffff88003f806ea0 CR3: 000000003c036000 CR4: 00000000000006a0
|
|
||||||
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
|
|
||||||
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
|
|
||||||
[<ffffffff8104f04e>] dequeue_signal+0x8e/0x170
|
|
||||||
[<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390
|
|
||||||
[<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0
|
|
||||||
[<ffffffff8100c7b5>] int_signal+0x12/0x17
|
|
||||||
[<ffffffffffffffff>] 0xffffffffffffffff
|
|
||||||
|
|
||||||
The single most valuable information in this report is the RIP (or EIP on 32-
|
|
||||||
bit) value. This will help us pinpoint exactly which instruction that caused
|
|
||||||
the warning.
|
|
||||||
|
|
||||||
If your kernel was compiled with CONFIG_DEBUG_INFO=y, then all we have to do
|
|
||||||
is give this address to the addr2line program, like this:
|
|
||||||
|
|
||||||
$ addr2line -e vmlinux -i ffffffff8104ede8
|
|
||||||
arch/x86/include/asm/string_64.h:12
|
|
||||||
include/asm-generic/siginfo.h:287
|
|
||||||
kernel/signal.c:380
|
|
||||||
kernel/signal.c:410
|
|
||||||
|
|
||||||
The "-e vmlinux" tells addr2line which file to look in. IMPORTANT: This must
|
|
||||||
be the vmlinux of the kernel that produced the warning in the first place! If
|
|
||||||
not, the line number information will almost certainly be wrong.
|
|
||||||
|
|
||||||
The "-i" tells addr2line to also print the line numbers of inlined functions.
|
|
||||||
In this case, the flag was very important, because otherwise, it would only
|
|
||||||
have printed the first line, which is just a call to memcpy(), which could be
|
|
||||||
called from a thousand places in the kernel, and is therefore not very useful.
|
|
||||||
These inlined functions would not show up in the stack trace above, simply
|
|
||||||
because the kernel doesn't load the extra debugging information. This
|
|
||||||
technique can of course be used with ordinary kernel oopses as well.
|
|
||||||
|
|
||||||
In this case, it's the caller of memcpy() that is interesting, and it can be
|
|
||||||
found in include/asm-generic/siginfo.h, line 287:
|
|
||||||
|
|
||||||
281 static inline void copy_siginfo(struct siginfo *to, struct siginfo *from)
|
|
||||||
282 {
|
|
||||||
283 if (from->si_code < 0)
|
|
||||||
284 memcpy(to, from, sizeof(*to));
|
|
||||||
285 else
|
|
||||||
286 /* _sigchld is currently the largest know union member */
|
|
||||||
287 memcpy(to, from, __ARCH_SI_PREAMBLE_SIZE + sizeof(from->_sifields._sigchld));
|
|
||||||
288 }
|
|
||||||
|
|
||||||
Since this was a read (kmemcheck usually warns about reads only, though it can
|
|
||||||
warn about writes to unallocated or freed memory as well), it was probably the
|
|
||||||
"from" argument which contained some uninitialized bytes. Following the chain
|
|
||||||
of calls, we move upwards to see where "from" was allocated or initialized,
|
|
||||||
kernel/signal.c, line 380:
|
|
||||||
|
|
||||||
359 static void collect_signal(int sig, struct sigpending *list, siginfo_t *info)
|
|
||||||
360 {
|
|
||||||
...
|
|
||||||
367 list_for_each_entry(q, &list->list, list) {
|
|
||||||
368 if (q->info.si_signo == sig) {
|
|
||||||
369 if (first)
|
|
||||||
370 goto still_pending;
|
|
||||||
371 first = q;
|
|
||||||
...
|
|
||||||
377 if (first) {
|
|
||||||
378 still_pending:
|
|
||||||
379 list_del_init(&first->list);
|
|
||||||
380 copy_siginfo(info, &first->info);
|
|
||||||
381 __sigqueue_free(first);
|
|
||||||
...
|
|
||||||
392 }
|
|
||||||
393 }
|
|
||||||
|
|
||||||
Here, it is &first->info that is being passed on to copy_siginfo(). The
|
|
||||||
variable "first" was found on a list -- passed in as the second argument to
|
|
||||||
collect_signal(). We continue our journey through the stack, to figure out
|
|
||||||
where the item on "list" was allocated or initialized. We move to line 410:
|
|
||||||
|
|
||||||
395 static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
|
|
||||||
396 siginfo_t *info)
|
|
||||||
397 {
|
|
||||||
...
|
|
||||||
410 collect_signal(sig, pending, info);
|
|
||||||
...
|
|
||||||
414 }
|
|
||||||
|
|
||||||
Now we need to follow the "pending" pointer, since that is being passed on to
|
|
||||||
collect_signal() as "list". At this point, we've run out of lines from the
|
|
||||||
"addr2line" output. Not to worry, we just paste the next addresses from the
|
|
||||||
kmemcheck stack dump, i.e.:
|
|
||||||
|
|
||||||
[<ffffffff8104f04e>] dequeue_signal+0x8e/0x170
|
|
||||||
[<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390
|
|
||||||
[<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0
|
|
||||||
[<ffffffff8100c7b5>] int_signal+0x12/0x17
|
|
||||||
|
|
||||||
$ addr2line -e vmlinux -i ffffffff8104f04e ffffffff81050bd8 \
|
|
||||||
ffffffff8100b87d ffffffff8100c7b5
|
|
||||||
kernel/signal.c:446
|
|
||||||
kernel/signal.c:1806
|
|
||||||
arch/x86/kernel/signal.c:805
|
|
||||||
arch/x86/kernel/signal.c:871
|
|
||||||
arch/x86/kernel/entry_64.S:694
|
|
||||||
|
|
||||||
Remember that since these addresses were found on the stack and not as the
|
|
||||||
RIP value, they actually point to the _next_ instruction (they are return
|
|
||||||
addresses). This becomes obvious when we look at the code for line 446:
|
|
||||||
|
|
||||||
422 int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
|
|
||||||
423 {
|
|
||||||
...
|
|
||||||
431 signr = __dequeue_signal(&tsk->signal->shared_pending,
|
|
||||||
432 mask, info);
|
|
||||||
433 /*
|
|
||||||
434 * itimer signal ?
|
|
||||||
435 *
|
|
||||||
436 * itimers are process shared and we restart periodic
|
|
||||||
437 * itimers in the signal delivery path to prevent DoS
|
|
||||||
438 * attacks in the high resolution timer case. This is
|
|
||||||
439 * compliant with the old way of self restarting
|
|
||||||
440 * itimers, as the SIGALRM is a legacy signal and only
|
|
||||||
441 * queued once. Changing the restart behaviour to
|
|
||||||
442 * restart the timer in the signal dequeue path is
|
|
||||||
443 * reducing the timer noise on heavy loaded !highres
|
|
||||||
444 * systems too.
|
|
||||||
445 */
|
|
||||||
446 if (unlikely(signr == SIGALRM)) {
|
|
||||||
...
|
|
||||||
489 }
|
|
||||||
|
|
||||||
So instead of looking at 446, we should be looking at 431, which is the line
|
|
||||||
that executes just before 446. Here we see that what we are looking for is
|
|
||||||
&tsk->signal->shared_pending.
|
|
||||||
|
|
||||||
Our next task is now to figure out which function that puts items on this
|
|
||||||
"shared_pending" list. A crude, but efficient tool, is git grep:
|
|
||||||
|
|
||||||
$ git grep -n 'shared_pending' kernel/
|
|
||||||
...
|
|
||||||
kernel/signal.c:828: pending = group ? &t->signal->shared_pending : &t->pending;
|
|
||||||
kernel/signal.c:1339: pending = group ? &t->signal->shared_pending : &t->pending;
|
|
||||||
...
|
|
||||||
|
|
||||||
There were more results, but none of them were related to list operations,
|
|
||||||
and these were the only assignments. We inspect the line numbers more closely
|
|
||||||
and find that this is indeed where items are being added to the list:
|
|
||||||
|
|
||||||
816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
|
|
||||||
817 int group)
|
|
||||||
818 {
|
|
||||||
...
|
|
||||||
828 pending = group ? &t->signal->shared_pending : &t->pending;
|
|
||||||
...
|
|
||||||
851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN &&
|
|
||||||
852 (is_si_special(info) ||
|
|
||||||
853 info->si_code >= 0)));
|
|
||||||
854 if (q) {
|
|
||||||
855 list_add_tail(&q->list, &pending->list);
|
|
||||||
...
|
|
||||||
890 }
|
|
||||||
|
|
||||||
and:
|
|
||||||
|
|
||||||
1309 int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
|
|
||||||
1310 {
|
|
||||||
....
|
|
||||||
1339 pending = group ? &t->signal->shared_pending : &t->pending;
|
|
||||||
1340 list_add_tail(&q->list, &pending->list);
|
|
||||||
....
|
|
||||||
1347 }
|
|
||||||
|
|
||||||
In the first case, the list element we are looking for, "q", is being returned
|
|
||||||
from the function __sigqueue_alloc(), which looks like an allocation function.
|
|
||||||
Let's take a look at it:
|
|
||||||
|
|
||||||
187 static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
|
|
||||||
188 int override_rlimit)
|
|
||||||
189 {
|
|
||||||
190 struct sigqueue *q = NULL;
|
|
||||||
191 struct user_struct *user;
|
|
||||||
192
|
|
||||||
193 /*
|
|
||||||
194 * We won't get problems with the target's UID changing under us
|
|
||||||
195 * because changing it requires RCU be used, and if t != current, the
|
|
||||||
196 * caller must be holding the RCU readlock (by way of a spinlock) and
|
|
||||||
197 * we use RCU protection here
|
|
||||||
198 */
|
|
||||||
199 user = get_uid(__task_cred(t)->user);
|
|
||||||
200 atomic_inc(&user->sigpending);
|
|
||||||
201 if (override_rlimit ||
|
|
||||||
202 atomic_read(&user->sigpending) <=
|
|
||||||
203 t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur)
|
|
||||||
204 q = kmem_cache_alloc(sigqueue_cachep, flags);
|
|
||||||
205 if (unlikely(q == NULL)) {
|
|
||||||
206 atomic_dec(&user->sigpending);
|
|
||||||
207 free_uid(user);
|
|
||||||
208 } else {
|
|
||||||
209 INIT_LIST_HEAD(&q->list);
|
|
||||||
210 q->flags = 0;
|
|
||||||
211 q->user = user;
|
|
||||||
212 }
|
|
||||||
213
|
|
||||||
214 return q;
|
|
||||||
215 }
|
|
||||||
|
|
||||||
We see that this function initializes q->list, q->flags, and q->user. It seems
|
|
||||||
that now is the time to look at the definition of "struct sigqueue", e.g.:
|
|
||||||
|
|
||||||
14 struct sigqueue {
|
|
||||||
15 struct list_head list;
|
|
||||||
16 int flags;
|
|
||||||
17 siginfo_t info;
|
|
||||||
18 struct user_struct *user;
|
|
||||||
19 };
|
|
||||||
|
|
||||||
And, you might remember, it was a memcpy() on &first->info that caused the
|
|
||||||
warning, so this makes perfect sense. It also seems reasonable to assume that
|
|
||||||
it is the caller of __sigqueue_alloc() that has the responsibility of filling
|
|
||||||
out (initializing) this member.
|
|
||||||
|
|
||||||
But just which fields of the struct were uninitialized? Let's look at
|
|
||||||
kmemcheck's report again:
|
|
||||||
|
|
||||||
WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024)
|
|
||||||
80000000000000000000000000000000000000000088ffff0000000000000000
|
|
||||||
i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
|
|
||||||
^
|
|
||||||
|
|
||||||
These first two lines are the memory dump of the memory object itself, and the
|
|
||||||
shadow bytemap, respectively. The memory object itself is in this case
|
|
||||||
&first->info. Just beware that the start of this dump is NOT the start of the
|
|
||||||
object itself! The position of the caret (^) corresponds with the address of
|
|
||||||
the read (ffff88003e4a2024).
|
|
||||||
|
|
||||||
The shadow bytemap dump legend is as follows:
|
|
||||||
|
|
||||||
i - initialized
|
|
||||||
u - uninitialized
|
|
||||||
a - unallocated (memory has been allocated by the slab layer, but has not
|
|
||||||
yet been handed off to anybody)
|
|
||||||
f - freed (memory has been allocated by the slab layer, but has been freed
|
|
||||||
by the previous owner)
|
|
||||||
|
|
||||||
In order to figure out where (relative to the start of the object) the
|
|
||||||
uninitialized memory was located, we have to look at the disassembly. For
|
|
||||||
that, we'll need the RIP address again:
|
|
||||||
|
|
||||||
RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190
|
|
||||||
|
|
||||||
$ objdump -d --no-show-raw-insn vmlinux | grep -C 8 ffffffff8104ede8:
|
|
||||||
ffffffff8104edc8: mov %r8,0x8(%r8)
|
|
||||||
ffffffff8104edcc: test %r10d,%r10d
|
|
||||||
ffffffff8104edcf: js ffffffff8104ee88 <__dequeue_signal+0x168>
|
|
||||||
ffffffff8104edd5: mov %rax,%rdx
|
|
||||||
ffffffff8104edd8: mov $0xc,%ecx
|
|
||||||
ffffffff8104eddd: mov %r13,%rdi
|
|
||||||
ffffffff8104ede0: mov $0x30,%eax
|
|
||||||
ffffffff8104ede5: mov %rdx,%rsi
|
|
||||||
ffffffff8104ede8: rep movsl %ds:(%rsi),%es:(%rdi)
|
|
||||||
ffffffff8104edea: test $0x2,%al
|
|
||||||
ffffffff8104edec: je ffffffff8104edf0 <__dequeue_signal+0xd0>
|
|
||||||
ffffffff8104edee: movsw %ds:(%rsi),%es:(%rdi)
|
|
||||||
ffffffff8104edf0: test $0x1,%al
|
|
||||||
ffffffff8104edf2: je ffffffff8104edf5 <__dequeue_signal+0xd5>
|
|
||||||
ffffffff8104edf4: movsb %ds:(%rsi),%es:(%rdi)
|
|
||||||
ffffffff8104edf5: mov %r8,%rdi
|
|
||||||
ffffffff8104edf8: callq ffffffff8104de60 <__sigqueue_free>
|
|
||||||
|
|
||||||
As expected, it's the "rep movsl" instruction from the memcpy() that causes
|
|
||||||
the warning. We know about REP MOVSL that it uses the register RCX to count
|
|
||||||
the number of remaining iterations. By taking a look at the register dump
|
|
||||||
again (from the kmemcheck report), we can figure out how many bytes were left
|
|
||||||
to copy:
|
|
||||||
|
|
||||||
RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009
|
|
||||||
|
|
||||||
By looking at the disassembly, we also see that %ecx is being loaded with the
|
|
||||||
value $0xc just before (ffffffff8104edd8), so we are very lucky. Keep in mind
|
|
||||||
that this is the number of iterations, not bytes. And since this is a "long"
|
|
||||||
operation, we need to multiply by 4 to get the number of bytes. So this means
|
|
||||||
that the uninitialized value was encountered at 4 * (0xc - 0x9) = 12 bytes
|
|
||||||
from the start of the object.
|
|
||||||
|
|
||||||
We can now try to figure out which field of the "struct siginfo" that was not
|
|
||||||
initialized. This is the beginning of the struct:
|
|
||||||
|
|
||||||
40 typedef struct siginfo {
|
|
||||||
41 int si_signo;
|
|
||||||
42 int si_errno;
|
|
||||||
43 int si_code;
|
|
||||||
44
|
|
||||||
45 union {
|
|
||||||
..
|
|
||||||
92 } _sifields;
|
|
||||||
93 } siginfo_t;
|
|
||||||
|
|
||||||
On 64-bit, the int is 4 bytes long, so it must the union member that has
|
|
||||||
not been initialized. We can verify this using gdb:
|
|
||||||
|
|
||||||
$ gdb vmlinux
|
|
||||||
...
|
|
||||||
(gdb) p &((struct siginfo *) 0)->_sifields
|
|
||||||
$1 = (union {...} *) 0x10
|
|
||||||
|
|
||||||
Actually, it seems that the union member is located at offset 0x10 -- which
|
|
||||||
means that gcc has inserted 4 bytes of padding between the members si_code
|
|
||||||
and _sifields. We can now get a fuller picture of the memory dump:
|
|
||||||
|
|
||||||
_----------------------------=> si_code
|
|
||||||
/ _--------------------=> (padding)
|
|
||||||
| / _------------=> _sifields(._kill._pid)
|
|
||||||
| | / _----=> _sifields(._kill._uid)
|
|
||||||
| | | /
|
|
||||||
-------|-------|-------|-------|
|
|
||||||
80000000000000000000000000000000000000000088ffff0000000000000000
|
|
||||||
i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
|
|
||||||
|
|
||||||
This allows us to realize another important fact: si_code contains the value
|
|
||||||
0x80. Remember that x86 is little endian, so the first 4 bytes "80000000" are
|
|
||||||
really the number 0x00000080. With a bit of research, we find that this is
|
|
||||||
actually the constant SI_KERNEL defined in include/asm-generic/siginfo.h:
|
|
||||||
|
|
||||||
144 #define SI_KERNEL 0x80 /* sent by the kernel from somewhere */
|
|
||||||
|
|
||||||
This macro is used in exactly one place in the x86 kernel: In send_signal()
|
|
||||||
in kernel/signal.c:
|
|
||||||
|
|
||||||
816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
|
|
||||||
817 int group)
|
|
||||||
818 {
|
|
||||||
...
|
|
||||||
828 pending = group ? &t->signal->shared_pending : &t->pending;
|
|
||||||
...
|
|
||||||
851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN &&
|
|
||||||
852 (is_si_special(info) ||
|
|
||||||
853 info->si_code >= 0)));
|
|
||||||
854 if (q) {
|
|
||||||
855 list_add_tail(&q->list, &pending->list);
|
|
||||||
856 switch ((unsigned long) info) {
|
|
||||||
...
|
|
||||||
865 case (unsigned long) SEND_SIG_PRIV:
|
|
||||||
866 q->info.si_signo = sig;
|
|
||||||
867 q->info.si_errno = 0;
|
|
||||||
868 q->info.si_code = SI_KERNEL;
|
|
||||||
869 q->info.si_pid = 0;
|
|
||||||
870 q->info.si_uid = 0;
|
|
||||||
871 break;
|
|
||||||
...
|
|
||||||
890 }
|
|
||||||
|
|
||||||
Not only does this match with the .si_code member, it also matches the place
|
|
||||||
we found earlier when looking for where siginfo_t objects are enqueued on the
|
|
||||||
"shared_pending" list.
|
|
||||||
|
|
||||||
So to sum up: It seems that it is the padding introduced by the compiler
|
|
||||||
between two struct fields that is uninitialized, and this gets reported when
|
|
||||||
we do a memcpy() on the struct. This means that we have identified a false
|
|
||||||
positive warning.
|
|
||||||
|
|
||||||
Normally, kmemcheck will not report uninitialized accesses in memcpy() calls
|
|
||||||
when both the source and destination addresses are tracked. (Instead, we copy
|
|
||||||
the shadow bytemap as well). In this case, the destination address clearly
|
|
||||||
was not tracked. We can dig a little deeper into the stack trace from above:
|
|
||||||
|
|
||||||
arch/x86/kernel/signal.c:805
|
|
||||||
arch/x86/kernel/signal.c:871
|
|
||||||
arch/x86/kernel/entry_64.S:694
|
|
||||||
|
|
||||||
And we clearly see that the destination siginfo object is located on the
|
|
||||||
stack:
|
|
||||||
|
|
||||||
782 static void do_signal(struct pt_regs *regs)
|
|
||||||
783 {
|
|
||||||
784 struct k_sigaction ka;
|
|
||||||
785 siginfo_t info;
|
|
||||||
...
|
|
||||||
804 signr = get_signal_to_deliver(&info, &ka, regs, NULL);
|
|
||||||
...
|
|
||||||
854 }
|
|
||||||
|
|
||||||
And this &info is what eventually gets passed to copy_siginfo() as the
|
|
||||||
destination argument.
|
|
||||||
|
|
||||||
Now, even though we didn't find an actual error here, the example is still a
|
|
||||||
good one, because it shows how one would go about to find out what the report
|
|
||||||
was all about.
|
|
||||||
|
|
||||||
|
|
||||||
3.4. Annotating false positives
|
|
||||||
===============================
|
|
||||||
|
|
||||||
There are a few different ways to make annotations in the source code that
|
|
||||||
will keep kmemcheck from checking and reporting certain allocations. Here
|
|
||||||
they are:
|
|
||||||
|
|
||||||
o __GFP_NOTRACK_FALSE_POSITIVE
|
|
||||||
|
|
||||||
This flag can be passed to kmalloc() or kmem_cache_alloc() (therefore
|
|
||||||
also to other functions that end up calling one of these) to indicate
|
|
||||||
that the allocation should not be tracked because it would lead to
|
|
||||||
a false positive report. This is a "big hammer" way of silencing
|
|
||||||
kmemcheck; after all, even if the false positive pertains to
|
|
||||||
particular field in a struct, for example, we will now lose the
|
|
||||||
ability to find (real) errors in other parts of the same struct.
|
|
||||||
|
|
||||||
Example:
|
|
||||||
|
|
||||||
/* No warnings will ever trigger on accessing any part of x */
|
|
||||||
x = kmalloc(sizeof *x, GFP_KERNEL | __GFP_NOTRACK_FALSE_POSITIVE);
|
|
||||||
|
|
||||||
o kmemcheck_bitfield_begin(name)/kmemcheck_bitfield_end(name) and
|
|
||||||
kmemcheck_annotate_bitfield(ptr, name)
|
|
||||||
|
|
||||||
The first two of these three macros can be used inside struct
|
|
||||||
definitions to signal, respectively, the beginning and end of a
|
|
||||||
bitfield. Additionally, this will assign the bitfield a name, which
|
|
||||||
is given as an argument to the macros.
|
|
||||||
|
|
||||||
Having used these markers, one can later use
|
|
||||||
kmemcheck_annotate_bitfield() at the point of allocation, to indicate
|
|
||||||
which parts of the allocation is part of a bitfield.
|
|
||||||
|
|
||||||
Example:
|
|
||||||
|
|
||||||
struct foo {
|
|
||||||
int x;
|
|
||||||
|
|
||||||
kmemcheck_bitfield_begin(flags);
|
|
||||||
int flag_a:1;
|
|
||||||
int flag_b:1;
|
|
||||||
kmemcheck_bitfield_end(flags);
|
|
||||||
|
|
||||||
int y;
|
|
||||||
};
|
|
||||||
|
|
||||||
struct foo *x = kmalloc(sizeof *x);
|
|
||||||
|
|
||||||
/* No warnings will trigger on accessing the bitfield of x */
|
|
||||||
kmemcheck_annotate_bitfield(x, flags);
|
|
||||||
|
|
||||||
Note that kmemcheck_annotate_bitfield() can be used even before the
|
|
||||||
return value of kmalloc() is checked -- in other words, passing NULL
|
|
||||||
as the first argument is legal (and will do nothing).
|
|
||||||
|
|
||||||
|
|
||||||
4. Reporting errors
|
|
||||||
===================
|
|
||||||
|
|
||||||
As we have seen, kmemcheck will produce false positive reports. Therefore, it
|
|
||||||
is not very wise to blindly post kmemcheck warnings to mailing lists and
|
|
||||||
maintainers. Instead, I encourage maintainers and developers to find errors
|
|
||||||
in their own code. If you get a warning, you can try to work around it, try
|
|
||||||
to figure out if it's a real error or not, or simply ignore it. Most
|
|
||||||
developers know their own code and will quickly and efficiently determine the
|
|
||||||
root cause of a kmemcheck report. This is therefore also the most efficient
|
|
||||||
way to work with kmemcheck.
|
|
||||||
|
|
||||||
That said, we (the kmemcheck maintainers) will always be on the lookout for
|
|
||||||
false positives that we can annotate and silence. So whatever you find,
|
|
||||||
please drop us a note privately! Kernel configs and steps to reproduce (if
|
|
||||||
available) are of course a great help too.
|
|
||||||
|
|
||||||
Happy hacking!
|
|
||||||
|
|
||||||
|
|
||||||
5. Technical description
|
|
||||||
========================
|
|
||||||
|
|
||||||
kmemcheck works by marking memory pages non-present. This means that whenever
|
|
||||||
somebody attempts to access the page, a page fault is generated. The page
|
|
||||||
fault handler notices that the page was in fact only hidden, and so it calls
|
|
||||||
on the kmemcheck code to make further investigations.
|
|
||||||
|
|
||||||
When the investigations are completed, kmemcheck "shows" the page by marking
|
|
||||||
it present (as it would be under normal circumstances). This way, the
|
|
||||||
interrupted code can continue as usual.
|
|
||||||
|
|
||||||
But after the instruction has been executed, we should hide the page again, so
|
|
||||||
that we can catch the next access too! Now kmemcheck makes use of a debugging
|
|
||||||
feature of the processor, namely single-stepping. When the processor has
|
|
||||||
finished the one instruction that generated the memory access, a debug
|
|
||||||
exception is raised. From here, we simply hide the page again and continue
|
|
||||||
execution, this time with the single-stepping feature turned off.
|
|
||||||
|
|
||||||
kmemcheck requires some assistance from the memory allocator in order to work.
|
|
||||||
The memory allocator needs to
|
|
||||||
|
|
||||||
1. Tell kmemcheck about newly allocated pages and pages that are about to
|
|
||||||
be freed. This allows kmemcheck to set up and tear down the shadow memory
|
|
||||||
for the pages in question. The shadow memory stores the status of each
|
|
||||||
byte in the allocation proper, e.g. whether it is initialized or
|
|
||||||
uninitialized.
|
|
||||||
|
|
||||||
2. Tell kmemcheck which parts of memory should be marked uninitialized.
|
|
||||||
There are actually a few more states, such as "not yet allocated" and
|
|
||||||
"recently freed".
|
|
||||||
|
|
||||||
If a slab cache is set up using the SLAB_NOTRACK flag, it will never return
|
|
||||||
memory that can take page faults because of kmemcheck.
|
|
||||||
|
|
||||||
If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still
|
|
||||||
request memory with the __GFP_NOTRACK or __GFP_NOTRACK_FALSE_POSITIVE flags.
|
|
||||||
This does not prevent the page faults from occurring, however, but marks the
|
|
||||||
object in question as being initialized so that no warnings will ever be
|
|
||||||
produced for this object.
|
|
||||||
|
|
||||||
Currently, the SLAB and SLUB allocators are supported by kmemcheck.
|
|
10
MAINTAINERS
10
MAINTAINERS
@ -3124,7 +3124,7 @@ L: cocci@systeme.lip6.fr (moderated for non-subscribers)
|
|||||||
T: git git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild.git misc
|
T: git git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild.git misc
|
||||||
W: http://coccinelle.lip6.fr/
|
W: http://coccinelle.lip6.fr/
|
||||||
S: Supported
|
S: Supported
|
||||||
F: Documentation/coccinelle.txt
|
F: Documentation/dev-tools/coccinelle.rst
|
||||||
F: scripts/coccinelle/
|
F: scripts/coccinelle/
|
||||||
F: scripts/coccicheck
|
F: scripts/coccicheck
|
||||||
|
|
||||||
@ -5118,7 +5118,7 @@ GCOV BASED KERNEL PROFILING
|
|||||||
M: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
|
M: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
|
||||||
S: Maintained
|
S: Maintained
|
||||||
F: kernel/gcov/
|
F: kernel/gcov/
|
||||||
F: Documentation/gcov.txt
|
F: Documentation/dev-tools/gcov.rst
|
||||||
|
|
||||||
GDT SCSI DISK ARRAY CONTROLLER DRIVER
|
GDT SCSI DISK ARRAY CONTROLLER DRIVER
|
||||||
M: Achim Leubner <achim_leubner@adaptec.com>
|
M: Achim Leubner <achim_leubner@adaptec.com>
|
||||||
@ -6587,7 +6587,7 @@ L: kasan-dev@googlegroups.com
|
|||||||
S: Maintained
|
S: Maintained
|
||||||
F: arch/*/include/asm/kasan.h
|
F: arch/*/include/asm/kasan.h
|
||||||
F: arch/*/mm/kasan_init*
|
F: arch/*/mm/kasan_init*
|
||||||
F: Documentation/kasan.txt
|
F: Documentation/dev-tools/kasan.rst
|
||||||
F: include/linux/kasan*.h
|
F: include/linux/kasan*.h
|
||||||
F: lib/test_kasan.c
|
F: lib/test_kasan.c
|
||||||
F: mm/kasan/
|
F: mm/kasan/
|
||||||
@ -6803,7 +6803,7 @@ KMEMCHECK
|
|||||||
M: Vegard Nossum <vegardno@ifi.uio.no>
|
M: Vegard Nossum <vegardno@ifi.uio.no>
|
||||||
M: Pekka Enberg <penberg@kernel.org>
|
M: Pekka Enberg <penberg@kernel.org>
|
||||||
S: Maintained
|
S: Maintained
|
||||||
F: Documentation/kmemcheck.txt
|
F: Documentation/dev-tools/kmemcheck.rst
|
||||||
F: arch/x86/include/asm/kmemcheck.h
|
F: arch/x86/include/asm/kmemcheck.h
|
||||||
F: arch/x86/mm/kmemcheck/
|
F: arch/x86/mm/kmemcheck/
|
||||||
F: include/linux/kmemcheck.h
|
F: include/linux/kmemcheck.h
|
||||||
@ -6812,7 +6812,7 @@ F: mm/kmemcheck.c
|
|||||||
KMEMLEAK
|
KMEMLEAK
|
||||||
M: Catalin Marinas <catalin.marinas@arm.com>
|
M: Catalin Marinas <catalin.marinas@arm.com>
|
||||||
S: Maintained
|
S: Maintained
|
||||||
F: Documentation/kmemleak.txt
|
F: Documentation/dev-tools/kmemleak.rst
|
||||||
F: include/linux/kmemleak.h
|
F: include/linux/kmemleak.h
|
||||||
F: mm/kmemleak.c
|
F: mm/kmemleak.c
|
||||||
F: mm/kmemleak-test.c
|
F: mm/kmemleak-test.c
|
||||||
|
Loading…
Reference in New Issue
Block a user