For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
It also frees up register x18, which is not available as a scratch
register on all platforms, which and so avoiding it improves
shareability of this code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Implement a NEON fallback for systems that do support NEON but have
no support for the optional 64x64->128 polynomial multiplication
instruction that is part of the ARMv8 Crypto Extensions. It is based
on the paper "Fast Software Polynomial Multiplication on ARM Processors
Using the NEON Engine" by Danilo Camara, Conrado Gouvea, Julio Lopez and
Ricardo Dahab (https://hal.inria.fr/hal-01506572), but has been reworked
extensively for the AArch64 ISA.
On a low-end core such as the Cortex-A53 found in the Raspberry Pi3, the
NEON based implementation is 4x faster than the table based one, and
is time invariant as well, making it less vulnerable to timing attacks.
When combined with the bit-sliced NEON implementation of AES-CTR, the
AES-GCM performance increases by 2x (from 58 to 29 cycles per byte).
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Implement a NEON fallback for systems that do support NEON but have
no support for the optional 64x64->128 polynomial multiplication
instruction that is part of the ARMv8 Crypto Extensions. It is based
on the paper "Fast Software Polynomial Multiplication on ARM Processors
Using the NEON Engine" by Danilo Camara, Conrado Gouvea, Julio Lopez and
Ricardo Dahab (https://hal.inria.fr/hal-01506572)
On a 32-bit guest executing under KVM on a Cortex-A57, the new code is
not only 4x faster than the generic table based GHASH driver, it is also
time invariant. (Note that the existing vmull.p64 code is 16x faster on
this core).
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently, the AES-GCM implementation for arm64 systems that support the
ARMv8 Crypto Extensions is based on the generic GCM module, which combines
the AES-CTR implementation using AES instructions with the PMULL based
GHASH driver. This is suboptimal, given the fact that the input data needs
to be loaded twice, once for the encryption and again for the MAC
calculation.
On Cortex-A57 (r1p2) and other recent cores that implement micro-op fusing
for the AES instructions, AES executes at less than 1 cycle per byte, which
means that any cycles wasted on loading the data twice hurt even more.
So implement a new GCM driver that combines the AES and PMULL instructions
at the block level. This improves performance on Cortex-A57 by ~37% (from
3.5 cpb to 2.6 cpb)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Of the various chaining modes implemented by the bit sliced AES driver,
only CTR is exposed as a synchronous cipher, and requires a fallback in
order to remain usable once we update the kernel mode NEON handling logic
to disallow nested use. So wire up the existing CTR fallback C code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
To accommodate systems that disallow the use of kernel mode NEON in
some circumstances, take the return value of may_use_simd into
account when deciding whether to invoke the C fallback routine.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
To accommodate systems that may disallow use of the NEON in kernel mode
in some circumstances, introduce a C fallback for synchronous AES in CTR
mode, and use it if may_use_simd() returns false.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The arm64 kernel will shortly disallow nested kernel mode NEON.
So honour this in the ARMv8 Crypto Extensions implementation of
CCM-AES, and fall back to a scalar implementation using the generic
crypto helpers for AES, XOR and incrementing the CTR counter.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar code that can be invoked in that case.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In order to be able to reuse the generic AES code as a fallback for
situations where the NEON may not be used, update the key handling
to match the byte order of the generic code: it stores round keys
as sequences of 32-bit quantities rather than streams of bytes, and
so our code needs to be updated to reflect that.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar code that can be invoked in that case.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There are quite a number of occurrences in the kernel of the pattern
if (dst != src)
memcpy(dst, src, walk.total % AES_BLOCK_SIZE);
crypto_xor(dst, final, walk.total % AES_BLOCK_SIZE);
or
crypto_xor(keystream, src, nbytes);
memcpy(dst, keystream, nbytes);
where crypto_xor() is preceded or followed by a memcpy() invocation
that is only there because crypto_xor() uses its output parameter as
one of the inputs. To avoid having to add new instances of this pattern
in the arm64 code, which will be refactored to implement non-SIMD
fallbacks, add an alternative implementation called crypto_xor_cpy(),
taking separate input and output arguments. This removes the need for
the separate memcpy().
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In preparation of introducing crypto_xor_cpy(), which will use separate
operands for input and output, modify the __crypto_xor() implementation,
which it will share with the existing crypto_xor(), which provides the
actual functionality when not using the inline version.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Sometime we would unable to dequeue the crypto request, in this case,
we should finish crypto and return the err code.
Signed-off-by: zain wang <wzz@rock-chips.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
It's illegal to call the completion function from hardirq context,
it will cause runtime tests to fail. Let's build a new task (done_task)
for moving update operation from hardirq context.
Signed-off-by: zain wang <wzz@rock-chips.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The driver is ported from Freescale's Linux git and can be
found in the
vendor/freescale/imx_2.6.35_maintain
branch.
The driver supports both RNG version C that's part of some Freescale
i.MX3 SoCs and version B that is available on i.MX2x chipsets.
Signed-off-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Reviewed-by: PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add binding documentation for the Freescale RNGC found on
some i.MX2/3 SoCs.
Signed-off-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Modify Kconfig help text to reflect the fact that random data from hwrng
is fed into kernel random number generator's entropy pool.
Signed-off-by: PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The scompress code allocates 2 x 128 KB of scratch buffers for each CPU,
so that clients of the async API can use synchronous implementations
even from atomic context. However, on systems such as Cavium Thunderx
(which has 96 cores), this adds up to a non-negligible 24 MB. Also,
32-bit systems may prefer to use their precious vmalloc space for other
things,especially since there don't appear to be any clients for the
async compression API yet.
So let's defer allocation of the scratch buffers until the first time
we allocate an acompress cipher based on an scompress implementation.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When allocating the per-CPU scratch buffers, we allocate the source
and destination buffers separately, but bail immediately if the second
allocation fails, without freeing the first one. Fix that.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Due to the use of per-CPU buffers, scomp_acomp_comp_decomp() executes
with preemption disabled, and so whether the CRYPTO_TFM_REQ_MAY_SLEEP
flag is set is irrelevant, since we cannot sleep anyway. So disregard
the flag, and use GFP_ATOMIC unconditionally.
Cc: <stable@vger.kernel.org> # v4.10+
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Enhance code to generically support cases where DMA rings
are greater than or equal to number of SPU engines.
New hardware has underlying DMA engine-FlexRM with 32 rings
which can be used to communicate to any of the available
10 SPU engines.
Signed-off-by: Raveendra Padasalagi <raveendra.padasalagi@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
static checker warning:
drivers/crypto/atmel-ecc.c:281 atmel_ecdh_done()
warn: assigning (-22) to unsigned variable 'status'
Similar warning can be raised in atmel_ecc_work_handler()
when atmel_ecc_send_receive() returns an error. Fix this too.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
ecdh_ctx contained static allocated data for the shared secret
and public key.
The shared secret and the public key were doomed to concurrency
issues because they could be shared by multiple crypto requests.
The concurrency is fixed by replacing per-tfm shared secret and
public key with per-request dynamically allocated shared secret
and public key.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
commit 720419f018 ("crypto: ccp - Introduce the AMD Secure Processor device")
moved the module registeration from ccp-dev.c to sp-dev.c but patch missed
removing the module version and author entry from ccp-dev.c.
It causes the below warning during boot when CONFIG_CRYPTO_DEV_SP_CCP=y
and CONFIG_CRYPTO_DEV_CCP_CRYPTO=y is set.
[ 0.187825] sysfs: cannot create duplicate filename '/module/ccp/version'
[ 0.187825] sysfs: cannot create duplicate filename '/module/ccp/version'
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Gary R Hook <gary.hook@amd.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove xts(aes) speed tests with 2 x 192-bit keys, since implementations
adhering strictly to IEEE 1619-2007 standard cannot cope with key sizes
other than 2 x 128, 2 x 256 bits - i.e. AES-XTS-{128,256}:
[...]
tcrypt: test 5 (384 bit key, 16 byte blocks):
caam_jr 8020000.jr: key size mismatch
tcrypt: setkey() failed flags=200000
[...]
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Functions atmel_ecc_i2c_client_alloc and atmel_ecc_i2c_client_free are
local to the source and no not need to be in the global scope. Make
them static.
Cleans up sparse warnings:
symbol 'atmel_ecc_i2c_client_alloc' was not declared. Should it be static?
symbol 'atmel_ecc_i2c_client_free' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove unnecessary static on local variable hdev. Such variable
is initialized before being used, on every execution path throughout
the function. The static has no benefit and, removing it reduces the
object file size.
This issue was detected using Coccinelle and the following semantic patch:
https://github.com/GustavoARSilva/coccinelle/blob/master/static/static_unused.cocci
In the following log you can see a significant difference in the object
file size. This log is the output of the size command, before and after
the code change:
before:
text data bss dec hex filename
14842 6464 128 21434 53ba drivers/crypto/img-hash.o
after:
text data bss dec hex filename
14789 6376 64 21229 52ed drivers/crypto/img-hash.o
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove unnecessary static on local variable tdes_dd. Such variable
is initialized before being used, on every execution path throughout
the function. The static has no benefit and, removing it reduces the
object file size.
This issue was detected using Coccinelle and the following semantic patch:
https://github.com/GustavoARSilva/coccinelle/blob/master/static/static_unused.cocci
In the following log you can see a significant difference in the object
file size. This log is the output of the size command, before and after
the code change:
before:
text data bss dec hex filename
17079 8704 128 25911 6537 drivers/crypto/atmel-tdes.o
after:
text data bss dec hex filename
17039 8616 64 25719 6477 drivers/crypto/atmel-tdes.o
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove unnecessary static on local variable sha_dd. Such variable
is initialized before being used, on every execution path throughout
the function. The static has no benefit and, removing it reduces the
object file size.
This issue was detected using Coccinelle and the following semantic patch:
https://github.com/GustavoARSilva/coccinelle/blob/master/static/static_unused.cocci
In the following log you can see a significant difference in the object
file size. This log is the output of the size command, before and after
the code change:
before:
text data bss dec hex filename
30005 10264 128 40397 9dcd drivers/crypto/atmel-sha.o
after:
text data bss dec hex filename
29934 10208 64 40206 9d0e drivers/crypto/atmel-sha.o
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove unnecessary static on local variable dd. Such variable
is initialized before being used, on every execution path throughout
the function. The static has no benefit and, removing it reduces the
object file size.
This issue was detected using Coccinelle and the following semantic patch:
https://github.com/GustavoARSilva/coccinelle/blob/master/static/static_unused.cocci
In the following log you can see a difference in the object file size.
This log is the output of the size command, before and after the code
change:
before:
text data bss dec hex filename
26135 11944 128 38207 953f drivers/crypto/omap-sham.o
after:
text data bss dec hex filename
26084 11856 64 38004 9474 drivers/crypto/omap-sham.o
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that we have a custom printf format specifier, convert users of
full_name to use %pOF instead. This is preparation to remove storing
of the full path string for each node.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for using the caam/jr backend on DPAA2-based SoCs.
These have some particularities we have to account for:
-HW S/G format is different
-Management Complex (MC) firmware initializes / manages (partially)
the CAAM block: MCFGR, QI enablement in QICTL, RNG
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
A version 5 CCP can handle an RSA modulus up to 16k bits.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Version 5 devices have requirements for buffer lengths, as well as
parameter format (e.g. bits vs. bytes). Fix the base CCP driver
code to meet requirements all supported versions.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Some updates this year have not had copyright dates changed in modified
files. Correct this for 2017.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Otherwise, we might be seeding the RNG using bad randomness, which is
dangerous. The one use of this function from within the kernel -- not
from userspace -- is being removed (keys/big_key), so that call site
isn't relevant in assessing this.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This module register a HASH module that support multiples
algorithms: MD5, SHA1, SHA224, SHA256.
It includes the support of HMAC hardware processing corresponding
to the supported algorithms. DMA or IRQ mode are used depending
on data length.
Signed-off-by: Lionel Debieve <lionel.debieve@st.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This adds documentation of device tree bindings for the STM32
HASH controller.
Signed-off-by: Lionel Debieve <lionel.debieve@st.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The complete stm32 module is rename as crypto
in order to use generic naming
Signed-off-by: Lionel Debieve <lionel.debieve@st.com>
Reviewed-by: Fabien Dessenne <fabien.dessenne@st.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use the correct unregister_shashes function to
to remove the registered algo
Signed-off-by: Lionel Debieve <lionel.debieve@st.com>
Reviewed-by: Fabien Dessenne <fabien.dessenne@st.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In case of arm soc support, readl and writel will
be optimized using relaxed functions
Signed-off-by: Lionel Debieve <lionel.debieve@st.com>
Reviewed-by: Fabien Dessenne <fabien.dessenne@st.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>