linux/arch/powerpc
Christophe Leroy f83647d642 powerpc: Discard ffs()/__ffs() function and use builtin functions instead
With the ffs() function as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter, as shown
by the small exemple below.

int ffs_test(void)
{
	return 4 << ffs(31);
}

c0012334 <ffs_test>:
c0012334:       39 20 00 01     li      r9,1
c0012338:       38 60 00 04     li      r3,4
c001233c:       7d 29 00 34     cntlzw  r9,r9
c0012340:       21 29 00 20     subfic  r9,r9,32
c0012344:       7c 63 48 30     slw     r3,r3,r9
c0012348:       4e 80 00 20     blr

With this patch, the same function will compile as follows:

c0012334 <ffs_test>:
c0012334:       38 60 00 08     li      r3,8
c0012338:       4e 80 00 20     blr

The same happens with __ffs()

For non constant calls, the generated code is doing the same,
allthought it is slightly different on 64 bits for ffs():

unsigned long test__ffs(unsigned long x)
{
	return __ffs(x);
}

int testffs(int x)
{
	return ffs(x);
}

On PPC32, before the patch:
0000003c <test__ffs>:
  3c:	7d 23 00 d0 	neg     r9,r3
  40:	7d 23 18 38 	and     r3,r9,r3
  44:	7c 63 00 34 	cntlzw  r3,r3
  48:	20 63 00 1f 	subfic  r3,r3,31
  4c:	4e 80 00 20 	blr

00000050 <testffs>:
  50:	7d 23 00 d0 	neg     r9,r3
  54:	7d 23 18 38 	and     r3,r9,r3
  58:	7c 63 00 34 	cntlzw  r3,r3
  5c:	20 63 00 20 	subfic  r3,r3,32
  60:	4e 80 00 20 	blr

On PPC32, after the patch:
0000002c <test__ffs>:
  2c:	7d 23 00 d0 	neg     r9,r3
  30:	7d 23 18 38 	and     r3,r9,r3
  34:	7c 63 00 34 	cntlzw  r3,r3
  38:	20 63 00 1f 	subfic  r3,r3,31
  3c:	4e 80 00 20 	blr

00000040 <testffs>:
  40:	7d 23 00 d0 	neg     r9,r3
  44:	7d 23 18 38 	and     r3,r9,r3
  48:	7c 63 00 34 	cntlzw  r3,r3
  4c:	20 63 00 20 	subfic  r3,r3,32
  50:	4e 80 00 20 	blr

On PPC64, before the patch:
0000000000000060 <.test__ffs>:
  60:	7c 03 00 d0 	neg     r0,r3
  64:	7c 03 18 38 	and     r3,r0,r3
  68:	7c 63 00 74 	cntlzd  r3,r3
  6c:	20 63 00 3f 	subfic  r3,r3,63
  70:	7c 63 07 b4 	extsw   r3,r3
  74:	4e 80 00 20 	blr

0000000000000080 <.testffs>:
  80:	7c 03 00 d0 	neg     r0,r3
  84:	7c 03 18 38 	and     r3,r0,r3
  88:	7c 63 00 74 	cntlzd  r3,r3
  8c:	20 63 00 40 	subfic  r3,r3,64
  90:	7c 63 07 b4 	extsw   r3,r3
  94:	4e 80 00 20 	blr

On PPC64, after the patch:
0000000000000050 <.test__ffs>:
  50:	7c 03 00 d0 	neg     r0,r3
  54:	7c 03 18 38 	and     r3,r0,r3
  58:	7c 63 00 74 	cntlzd  r3,r3
  5c:	20 63 00 3f 	subfic  r3,r3,63
  60:	4e 80 00 20 	blr

0000000000000070 <.testffs>:
  70:	7c 03 00 d0 	neg     r0,r3
  74:	7c 03 18 38 	and     r3,r0,r3
  78:	7c 63 00 34 	cntlzw  r3,r3
  7c:	20 63 00 20 	subfic  r3,r3,32
  80:	7c 63 07 b4 	extsw   r3,r3
  84:	4e 80 00 20 	blr
(ffs() operates on an int so cntlzw is equivalent to cntlzd)

In addition, when reading the generated vmlinux, we can observe
that with the builtin functions, GCC sometimes efficiently spreads
the instructions within the generated functions while the inline
assembly force them to remain grouped together.

__builtin_ffs() is already used in arch/powerpc/include/asm/page_32.h

Those builtins have been in GCC since at least 3.4.6 (see
https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Other-Builtins.html )

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:23:54 +10:00
..
boot powerpc/64: Do not link crtsaveres.o in boot 2017-05-30 14:59:51 +10:00
configs powerpc/44x/fsp2: Add defconfig for FSP2 board 2017-05-30 14:59:51 +10:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2017-05-02 15:53:46 -07:00
include powerpc: Discard ffs()/__ffs() function and use builtin functions instead 2017-06-02 19:23:54 +10:00
kernel powerpc: Handle simultaneous interrupts at once 2017-06-02 19:20:44 +10:00
kvm KVM: PPC: Book3S PR: Don't include SPAPR TCE code on non-pseries platforms 2017-05-12 15:32:30 +10:00
lib powerpc/64: Linker on-demand sfpr functions for modules 2017-05-30 14:59:51 +10:00
math-emu Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
mm powerpc/mm: The 8xx doesn't call do_page_fault() for breakpoints 2017-06-02 19:20:12 +10:00
net powerpc updates for 4.11 part 1. 2017-02-22 10:30:38 -08:00
oprofile ktime: Cleanup ktime_set() usage 2016-12-25 17:21:22 +01:00
perf powerpc/perf: Add Power8 mem_access event to sysfs 2017-04-19 20:00:23 +10:00
platforms powerpc/44x/fsp2: Platform support for FSP2 (476fpe) board 2017-05-30 14:59:51 +10:00
purgatory kexec, x86/purgatory: Unbreak it and clean it up 2017-03-10 20:55:09 +01:00
sysdev powerpc/8xx: fix mpc8xx_get_irq() return on no irq 2017-06-02 19:20:44 +10:00
tools powerpc/64: Tool to check head sections location sanity 2017-05-30 14:59:51 +10:00
xmon powerpc/xmon: Fix compile error with PPC_8xx=y 2017-05-30 14:59:51 +10:00
Kconfig powerpc/64: Handle linker stubs in low .text code 2017-05-30 14:59:51 +10:00
Kconfig.debug powerpc/xmon: Enable disassembly files (compilation changes) 2017-02-15 20:02:42 +11:00
Makefile powerpc: Link warning for orphan sections 2017-05-30 14:59:51 +10:00
Makefile.postlink powerpc/64: Tool to check head sections location sanity 2017-05-30 14:59:51 +10:00