Add an ARM NEON-accelerated implementation of Speck-XTS. It operates on
128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
Speck64. Each 128-byte chunk goes through XTS preprocessing, then is
encrypted/decrypted (doing one cipher round for all the blocks, then the
next round, etc.), then goes through XTS postprocessing.
The performance depends on the processor but can be about 3 times faster
than the generic code. For example, on an ARMv7 processor we observe
the following performance with Speck128/256-XTS:
xts-speck128-neon: Encryption 107.9 MB/s, Decryption 108.1 MB/s
xts(speck128-generic): Encryption 32.1 MB/s, Decryption 36.6 MB/s
In comparison to AES-256-XTS without the Cryptography Extensions:
xts-aes-neonbs: Encryption 41.2 MB/s, Decryption 36.7 MB/s
xts(aes-asm): Encryption 31.7 MB/s, Decryption 30.8 MB/s
xts(aes-generic): Encryption 21.2 MB/s, Decryption 20.9 MB/s
Speck64/128-XTS is even faster:
xts-speck64-neon: Encryption 138.6 MB/s, Decryption 139.1 MB/s
Note that as with the generic code, only the Speck128 and Speck64
variants are supported. Also, for now only the XTS mode of operation is
supported, to target the disk and file encryption use cases. The NEON
code also only handles the portion of the data that is evenly divisible
into 128-byte chunks, with any remainder handled by a C fallback. Of
course, other modes of operation could be added later if needed, and/or
the NEON code could be updated to handle other buffer sizes.
The XTS specification is only defined for AES which has a 128-bit block
size, so for the GF(2^64) math needed for Speck64-XTS we use the
reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
paper. Of course, when possible users should use Speck128-XTS, but even
that may be too slow on some processors; Speck64-XTS can be faster.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Export the Speck constants and transform context and the ->setkey(),
->encrypt(), and ->decrypt() functions so that they can be reused by the
ARM NEON implementation of Speck-XTS. The generic key expansion code
will be reused because it is not performance-critical and is not
vectorizable, while the generic encryption and decryption functions are
needed as fallbacks and for the XTS tweak encryption.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a generic implementation of Speck, including the Speck128 and
Speck64 variants. Speck is a lightweight block cipher that can be much
faster than AES on processors that don't have AES instructions.
We are planning to offer Speck-XTS (probably Speck128/256-XTS) as an
option for dm-crypt and fscrypt on Android, for low-end mobile devices
with older CPUs such as ARMv7 which don't have the Cryptography
Extensions. Currently, such devices are unencrypted because AES is not
fast enough, even when the NEON bit-sliced implementation of AES is
used. Other AES alternatives such as Twofish, Threefish, Camellia,
CAST6, and Serpent aren't fast enough either; it seems that only a
modern ARX cipher can provide sufficient performance on these devices.
This is a replacement for our original proposal
(https://patchwork.kernel.org/patch/10101451/) which was to offer
ChaCha20 for these devices. However, the use of a stream cipher for
disk/file encryption with no space to store nonces would have been much
more insecure than we thought initially, given that it would be used on
top of flash storage as well as potentially on top of F2FS, neither of
which is guaranteed to overwrite data in-place.
Speck has been somewhat controversial due to its origin. Nevertheless,
it has a straightforward design (it's an ARX cipher), and it appears to
be the leading software-optimized lightweight block cipher currently,
with the most cryptanalysis. It's also easy to implement without side
channels, unlike AES. Moreover, we only intend Speck to be used when
the status quo is no encryption, due to AES not being fast enough.
We've also considered a novel length-preserving encryption mode based on
ChaCha20 and Poly1305. While theoretically attractive, such a mode
would be a brand new crypto construction and would be more complicated
and difficult to implement efficiently in comparison to Speck-XTS.
There is confusion about the byte and word orders of Speck, since the
original paper doesn't specify them. But we have implemented it using
the orders the authors recommended in a correspondence with them. The
test vectors are taken from the original paper but were mapped to byte
arrays using the recommended byte and word orders.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add gcmaes_crypt_by_sg routine, that will do scatter/gather
by sg. Either src or dst may contain multiple buffers, so
iterate over both at the same time if they are different.
If the input is the same as the output, iterate only over one.
Currently both the AAD and TAG must be linear, so copy them out
with scatterlist_map_and_copy. If first buffer contains the
entire AAD, we can optimize and not copy. Since the AAD
can be any size, if copied it must be on the heap. TAG can
be on the stack since it is always < 16 bytes.
Only the SSE routines are updated so far, so leave the previous
gcmaes_en/decrypt routines, and branch to the sg ones if the
keysize is inappropriate for avx, or we are SSE only.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The asm macros are all set up now, introduce entry points.
GCM_INIT and GCM_COMPLETE have arguments supplied, so that
the new scatter/gather entry points don't have to take all the
arguments, and only the ones they need.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
We can fast-path any < 16 byte read if the full message is > 16 bytes,
and shift over by the appropriate amount. Usually we are
reading > 16 bytes, so this should be faster than the READ_PARTIAL
macro introduced in b20209c91e for the average case.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Before this diff, multiple calls to GCM_ENC_DEC will
succeed, but only if all calls are a multiple of 16 bytes.
Handle partial blocks at the start of GCM_ENC_DEC, and update
aadhash as appropriate.
The data offset %r11 is also updated after the partial block.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
HashKey computation only needs to happen once per scatter/gather operation,
save it between calls in gcm_context struct instead of on the stack.
Since the asm no longer stores anything on the stack, we can use
%rsp directly, and clean up the frame save/restore macros a bit.
Hashkeys actually only need to be calculated once per key and could
be moved to when set_key is called, however, the current glue code
falls back to generic aes code if fpu is disabled.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Prepare to handle partial blocks between scatter/gather calls.
For the last partial block, we only want to calculate the aadhash
in GCM_COMPLETE, and a new partial block macro will handle both
aadhash update and encrypting partial blocks between calls.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Fill in aadhash, aadlen, pblocklen, curcount with appropriate values.
pblocklen, aadhash, and pblockenckey are also updated at the end
of each scatter/gather operation, to be carried over to the next
operation.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
AAD hash only needs to be calculated once for each scatter/gather operation.
Move it to its own macro, and call it from GCM_INIT instead of
INITIAL_BLOCKS.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Introduce a gcm_context_data struct that will be used to pass
context data between scatter/gather update calls. It is passed
as the second argument (after crypto keys), other args are
renumbered.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Make a macro for the main encode/decode routine. Only a small handful
of lines differ for enc and dec. This will also become the main
scatter/gather update routine.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Merge encode and decode tag calculations in GCM_COMPLETE macro.
Scatter/gather routines will call this once at the end of encryption
or decryption.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reduce code duplication by introducting GCM_INIT macro. This macro
will also be exposed as a function for implementing scatter/gather
support, since INIT only needs to be called once for the full
operation.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Macro-ify function save and restore. These will be used in new functions
added for scatter/gather update operations.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use macro operations to merge implemetations of INITIAL_BLOCKS,
since they differ by only a small handful of lines.
Use macro counter \@ to simplify implementation.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Omit an extra message for a memory allocation failure in this function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Omit an extra message for a memory allocation failure in this function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Two local variables will eventually be set to appropriate pointers
a bit later. Thus omit their explicit initialisation at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Replace the function name in this error message so that the same name
is mentioned according to what was called before.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The local variable "cryp_error" was used only for two condition checks.
* Check the return values from these function calls directly instead.
* Delete this variable which became unnecessary with this refactoring.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Omit an extra message for a memory allocation failure in this function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The RSA private key for the first form should have
version, prime1, prime2, exponent1, exponent2, coefficient
values 0.
With non-zero values for prime1,2, exponent 1,2 and coefficient
the Intel QAT driver will assume that values are provided for the
private key second form. This will result in signature verification
failures for modules where QAT device is present and the modules
are signed with rsa,sha256.
Cc: <stable@vger.kernel.org>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Conor McLoughlin <conor.mcloughlin@intel.com>
Reviewed-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds a label to unmap the result buffer in the hash send
function error path.
Fixes: 1b44c5a60c ("crypto: inside-secure - add SafeXcel EIP197 crypto engine driver")
Suggested-by: Ofer Heifetz <oferh@marvell.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch updates the Inside Secure SafeXcel driver to avoid being
out-of-sync between the number of requests sent and the one being
completed.
The number of requests acknowledged by the driver can be different than
the threshold that was configured if new requests were being pushed to
the h/w in the meantime. The driver wasn't taking those into account,
and the number of remaining requests to handled (to reconfigure the
interrupt threshold) could be out-of sync.
This patch fixes it by not taking in account the number of requests
left, but by taking in account the total number of requests being sent
to the hardware, so that new requests are being taken into account.
Fixes: dc7e28a328 ("crypto: inside-secure - dequeue all requests at once")
Suggested-by: Ofer Heifetz <oferh@marvell.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When exiting a transformation, the cra_exit() helper is called in each
driver providing one. The Inside Secure SafeXcel driver has one, which
is responsible of freeing some areas and of sending one invalidation
request to the crypto engine, to invalidate the context that was used
during the transformation.
We could see in some setups (when lots of transformations were being
used with a short lifetime, and hence lots of cra_exit() calls) NULL
pointer dereferences and other weird issues. All these issues were
coming from accessing the tfm context.
The issue is the invalidation request completion is checked using a
wait_for_completion_interruptible() call in both the cipher and hash
cra_exit() helpers. In some cases this was interrupted while the
invalidation request wasn't processed yet. And then cra_exit() returned,
and its caller was freeing the tfm instance. Only then the request was
being handled by the SafeXcel driver, which lead to the said issues.
This patch fixes this by using wait_for_completion() calls in these
specific cases.
Fixes: 1b44c5a60c ("crypto: inside-secure - add SafeXcel EIP197 crypto engine driver")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds a check in the SafeXcel dequeue function, to avoid
processing request further if no hardware command was issued. This can
happen in certain cases where the ->send() function caches all the data
that would have been send.
Fixes: 809778e02c ("crypto: inside-secure - fix hash when length is a multiple of a block")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch fixes the cache length computation as cache_len could end up
being a negative value. The check between the queued size and the
block size is updated to reflect the caching mechanism which can cache
up to a full block size (included!).
Fixes: 809778e02c ("crypto: inside-secure - fix hash when length is a multiple of a block")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch fixes the extra cache computation when the queued data is a
multiple of a block size. This fixes the hash support in some cases.
Fixes: 809778e02c ("crypto: inside-secure - fix hash when length is a multiple of a block")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch fixes the Inside Secure SafeXcel driver not to overwrite the
interrupt threshold value. In certain cases the value of this register,
which controls when to fire an interrupt, was overwritten. This lead to
packet not being processed or acked as the driver never was aware of
their completion.
This patch fixes this behaviour by not setting the threshold when
requests are being processed by the engine.
Fixes: dc7e28a328 ("crypto: inside-secure - dequeue all requests at once")
Suggested-by: Ofer Heifetz <oferh@marvell.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Free Electrons became Bootlin. Update my email accordingly.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In case the probe of the clock is deferred, we would assume it is
optional. This is wrong, so defer the probe of this driver until
the clock is available.
Fixes: 791af4f490 ("hwrng: bcm2835 - Manage an optional clock")
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Move the AES inverse S-box to the .rodata section
where it is safe from abuse by speculation.
Signed-off-by: Jinbum Park <jinb.park7@gmail.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The MODULE_ALIAS is required to enable the sun4i-ss driver to load
automatically when built at a module. Tested on a Cubietruck.
Fixes: 6298e94821 ("crypto: sunxi-ss - Add Allwinner Security System crypto accelerator")
Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
stm32mp1 differs from stm32f7 in the way it handles byte ordering and
padding for aes gcm & ccm algo.
Signed-off-by: Fabien Dessenne <fabien.dessenne@st.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Functions qat_rsa_set_n, qat_rsa_set_e and qat_rsa_set_n are local to
the source and do not need to be in global scope, so make them static.
Cleans up sparse warnings:
drivers/crypto/qat/qat_common/qat_asym_algs.c:972:5: warning: symbol
'qat_rsa_set_n' was not declared. Should it be static?
drivers/crypto/qat/qat_common/qat_asym_algs.c:1003:5: warning: symbol
'qat_rsa_set_e' was not declared. Should it be static?
drivers/crypto/qat/qat_common/qat_asym_algs.c:1027:5: warning: symbol
'qat_rsa_set_d' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Function ccp_get_dma_chan_attr is local to the source and does not
need to be in global scope, so make it static.
Cleans up sparse warning:
drivers/crypto/ccp/ccp-dmaengine.c:41:14: warning: symbol
'ccp_get_dma_chan_attr' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Function aead_ccm_validate_input is local to the source and does not
need to be in global scope, so make it static.
Cleans up sparse warning:
drivers/crypto/chelsio/chcr_algo.c:2627:5: warning: symbol
'aead_ccm_validate_input' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
multibuffer driver
Even though I created the original implementation of SHA1 multibuffer
driver, Megha extended it to SHA256 and SHA512 and she is now maintaining
the code for SHA1/SHA256/SHA512 multi-buffer driver.
Add the entry in the MAINTAINERS file so any update patch can find
its way properly to Megha.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Due to another patch, the dma fails when padding is
needed as the given length is not correct.
Signed-off-by: Lionel Debieve <lionel.debieve@st.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Fixing bugs link to stress tests. Bad results are
detected during testmgr selftests executing in a
faster environment. bufcnt value may be resetted and
false IT are sometimes detected.
Signed-off-by: Lionel Debieve <lionel.debieve@st.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
dma-maxburst is an optional value and must not return
error in case of dma not used (or max-burst not defined).
Signed-off-by: Lionel Debieve <lionel.debieve@st.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for AES working in Galois Counter Mode.
The following algorithms are added:
gcm(aes)
rfc4106(gcm(aes))
rfc4543(gcm(aes))
There is a limitation related to IV size, similar to the one present in
SW implementation (crypto/gcm.c):
The only IV size allowed is 12 bytes. It will be padded by HW to the right
with 0x0000_0001 (up to 16 bytes - AES block size), according to the
GCM specification.
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Update gcm(aes) descriptors (generic, rfc4106 and rfc4543) such that
they would also work when submitted via the QI interface.
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Crypto drivers are expected to return -EBADMSG in case of
ICV check (authentication) failure.
In this case it also makes sense to suppress the error message
in the QI dequeue callback.
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch convert the stm32-cryp driver to the new crypto engine API.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Tested-by: Fabien Dessenne <fabien.dessenne@st.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch convert the stm32-hash driver to the new crypto engine API.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Tested-by: Fabien Dessenne <fabien.dessenne@st.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>