Add more optimized XTS code for camellia-aesni-avx, for smaller stack usage
and small boost for speed.
tcrypt results, with Intel i5-2450M:
enc dec
16B 1.10x 1.01x
64B 0.82x 0.77x
256B 1.14x 1.10x
1024B 1.17x 1.16x
8192B 1.10x 1.11x
Since XTS is practically always used with data blocks of size 512 bytes or
more, I chose to not make use of camellia-2way for block sized smaller than
256 bytes. This causes slower result in tcrypt for 64 bytes.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>