linux/arch/loongarch/vdso
Xi Ruoyao 9805f39d42 LoongArch: vDSO: Tune chacha implementation
As Christophe pointed out, tuning the chacha implementation by
scheduling the instructions like what GCC does can improve the
performance.

The tuning does not introduce too much complexity (basically it's just
reordering some instructions). And the tuning does not hurt readibility
too much: actually the tuned code looks even more similar to a
textbook-style implementation based on 128-bit vectors.  So overall it's
a good deal to me.

Tested with vdso_test_getchacha and benched with vdso_test_getrandom.
On a LA664 the speedup is 5%, and I expect a larger speedup on LA[2-4]64
with a lower issue rate.

Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/all/77655d9e-fc05-4300-8f0d-7b2ad840d091@csgroup.eu/
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-24 14:21:05 +02:00
..
.gitignore
elf.S
gen_vdso_offsets.sh
Makefile LoongArch: vDSO: Wire up getrandom() vDSO implementation 2024-09-13 17:28:35 +02:00
sigreturn.S
vdso.lds.S LoongArch: vDSO: Wire up getrandom() vDSO implementation 2024-09-13 17:28:35 +02:00
vdso.S
vgetcpu.c LoongArch: Add support to clone a time namespace 2023-06-29 20:58:43 +08:00
vgetrandom-chacha.S LoongArch: vDSO: Tune chacha implementation 2024-09-24 14:21:05 +02:00
vgetrandom.c LoongArch: vDSO: Wire up getrandom() vDSO implementation 2024-09-13 17:28:35 +02:00
vgettimeofday.c arch: vdso: consolidate gettime prototypes 2023-11-23 11:32:32 +01:00