Versions of gcc prior to gcc 5 emitted a __multi3 function call when dealing with TI types, resulting in failures when trying to link to libgcc, and more generally, bad performance. However, since gcc 5, the compiler supports actually emitting fast instructions, which means we can at long last enable this option and receive the speedups. The gcc commit that added proper Aarch64 support is: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d This commit appears to be part of the gcc 5 release. There are still a few instructions, __ashlti3 and __ashrti3, which require libgcc, which is fine. Rather than linking to libgcc, we simply provide them ourselves, since they're not that complicated. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
22 lines
1.1 KiB
Makefile
22 lines
1.1 KiB
Makefile
lib-y := bitops.o clear_user.o delay.o copy_from_user.o \
|
|
copy_to_user.o copy_in_user.o copy_page.o \
|
|
clear_page.o memchr.o memcpy.o memmove.o memset.o \
|
|
memcmp.o strcmp.o strncmp.o strlen.o strnlen.o \
|
|
strchr.o strrchr.o tishift.o
|
|
|
|
# Tell the compiler to treat all general purpose registers (with the
|
|
# exception of the IP registers, which are already handled by the caller
|
|
# in case of a PLT) as callee-saved, which allows for efficient runtime
|
|
# patching of the bl instruction in the caller with an atomic instruction
|
|
# when supported by the CPU. Result and argument registers are handled
|
|
# correctly, based on the function prototype.
|
|
lib-$(CONFIG_ARM64_LSE_ATOMICS) += atomic_ll_sc.o
|
|
CFLAGS_atomic_ll_sc.o := -fcall-used-x0 -ffixed-x1 -ffixed-x2 \
|
|
-ffixed-x3 -ffixed-x4 -ffixed-x5 -ffixed-x6 \
|
|
-ffixed-x7 -fcall-saved-x8 -fcall-saved-x9 \
|
|
-fcall-saved-x10 -fcall-saved-x11 -fcall-saved-x12 \
|
|
-fcall-saved-x13 -fcall-saved-x14 -fcall-saved-x15 \
|
|
-fcall-saved-x18
|
|
|
|
lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o
|