linux

Author	SHA1	Message	Date
Anton Blanchard	87a156fb18	powerpc: Align hot loops of some string functions Align the hot loops in our assembly implementation of strncpy(), strncmp() and memchr(). Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-14 13:58:25 +10:00
Anton Blanchard	3ece16632b	powerpc: Remove assembly versions of strcpy, strcat, strlen and strcmp A number of our assembly implementations of string functions do not align their hot loops. I was going to align them manually, but I realised that they are are almost instruction for instruction identical to what gcc produces, with the advantage that gcc does align them. In light of that, let's just remove the assembly versions. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-14 13:58:25 +10:00
Anton Blanchard	15c2d45d17	powerpc: Add 64bit optimised memcmp I noticed ksm spending quite a lot of time in memcmp on a large KVM box. The current memcmp loop is very unoptimised - byte at a time compares with no loop unrolling. We can do much much better. Optimise the loop in a few ways: - Unroll the byte at a time loop - For large (at least 32 byte) comparisons that are also 8 byte aligned, use an unrolled modulo scheduled loop using 8 byte loads. This is similar to our glibc memcmp. A simple microbenchmark testing 10000000 iterations of an 8192 byte memcmp was used to measure the performance: baseline: 29.93 s modified: 1.70 s Just over 17x faster. v2: Incorporated some suggestions from Segher: - Use andi. instead of rdlicl. - Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare and was a relic from a previous version. - Don't use cr5, we have plans to use that CR field for fast local atomics. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-01-23 14:02:55 +11:00
Anton Blanchard	17968fbbd1	powerpc: 64bit optimised __clear_user I noticed __clear_user high up in a profile of one of my RAID stress tests. The testcase was doing a dd from /dev/zero which ends up calling __clear_user. __clear_user is basically a loop with a single 4 byte store which is horribly slow. We can do much better by aligning the desination and doing 32 bytes of 8 byte stores in a loop. The following testcase was used to verify the patch: http://ozlabs.org/~anton/junkcode/stress_clear_user.c To show the improvement in performance I ran a dd from /dev/zero to /dev/null on a POWER7 box: Before: # dd if=/dev/zero of=/dev/null bs=1M count=10000 10485760000 bytes (10 GB) copied, 3.72379 s, 2.8 GB/s After: # time dd if=/dev/zero of=/dev/null bs=1M count=10000 10485760000 bytes (10 GB) copied, 0.728318 s, 14.4 GB/s Over 5x faster. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-03 14:14:41 +10:00
Paul Mackerras	1629372caa	powerpc: Use the new generic strncpy_from_user() and strnlen_user() This is much the same as for SPARC except that we can do the find_zero() function more efficiently using the count-leading-zeroes instructions. Tested on 32-bit and 64-bit PowerPC. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-27 21:00:07 -07:00
Andreas Schwab	ca5d0674c3	powerpc: Fix string library functions The powerpc strncmp implementation does not correctly handle a zero length, despite the claim in `0119536cd3` (Add hand-coded assembly strcmp). Additionally, all the length arguments are size_t, not int, so use PPC_LCMPI and eq instead of cmpwi and le throughout. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-05-21 17:31:08 +10:00
Jeff Mahoney	637a99022f	powerpc: Fix handling of strncmp with zero len Commit `0119536c`, which added the assembly version of strncmp to powerpc, mentions that it adds two instructions to the version from boot/string.S to allow it to handle len=0. Unfortunately, it doesn't always return 0 when that is the case. The length is passed in r5, but the return value is passed back in r3. In certain cases, this will happen to work. Otherwise it will pass back the address of the first string as the return value. This patch lifts the len <= 0 handling code from memcpy to handle that case. Reported by: Christian_Sellars@symantec.com Signed-off-by: Jeff Mahoney <jeffm@suse.com> CC: <stable@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-04-07 18:00:39 +10:00
Michael Ellerman	76bfdcf71c	powerpc: Use PPC_LONG and PPC_LONG_ALIGN in lib/string.S Replace ifdef clutter with the PPC_LONG and PPC_LONG_ALIGN macros for readability. No change to the generated code. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-07-22 10:39:35 +10:00
Steven Rostedt	0119536cd3	[POWERPC] Add hand-coded assembly strcmp We have an assembly version of strncmp for the bootwrapper, but not for the kernel, so we end up using the C version in the kernel. This takes the strncmp code from the bootup and copies it to the kernel proper, adding two instructions so it copes correctly with len==0. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-07 10:03:03 +10:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
Paul Mackerras	b3b8dc6c07	powerpc: Use reg.h instead of processor.h when we just want reg names Now that the register names and bit definitions are all in reg.h, use that instead of processor.h in assembly code in a few places. Signed-off-by: Paul Mackerras <paulus@samba.org>	2005-10-10 22:20:10 +10:00
Paul Mackerras	14cf11af6c	powerpc: Merge enough to start building in arch/powerpc. This creates the directory structure under arch/powerpc and a bunch of Kconfig files. It does a first-cut merge of arch/powerpc/mm, arch/powerpc/lib and arch/powerpc/platforms/powermac. This is enough to build a 32-bit powermac kernel with ARCH=powerpc. For now we are getting some unmerged files from arch/ppc/kernel and arch/ppc/syslib, or arch/ppc64/kernel. This makes some minor changes to files in those directories and files outside arch/powerpc. The boot directory is still not merged. That's going to be interesting. Signed-off-by: Paul Mackerras <paulus@samba.org>	2005-09-26 16:04:21 +10:00

12 Commits