linux/arch
Rik van Riel 8f898fbbe5 sched/x86: Optimize switch_mm() for multi-threaded workloads
Dick Fowles, Don Zickus and Joe Mario have been working on
improvements to perf, and noticed heavy cache line contention
on the mm_cpumask, running linpack on a 60 core / 120 thread
system.

The cause turned out to be unnecessary atomic accesses to the
mm_cpumask. When in lazy TLB mode, the CPU is only removed from
the mm_cpumask if there is a TLB flush event.

Most of the time, no such TLB flush happens, and the kernel
skips the TLB reload. It can also skip the atomic memory
set & test.

Here is a summary of Joe's test results:

 * The __schedule function dropped from 24% of all program cycles down
   to 5.5%.

 * The cacheline contention/hotness for accesses to that bitmask went
   from being the 1st/2nd hottest - down to the 84th hottest (0.3% of
   all shared misses which is now quite cold)

 * The average load latency for the bit-test-n-set instruction in
   __schedule dropped from 10k-15k cycles down to an average of 600 cycles.

 * The linpack program results improved from 133 GFlops to 144 GFlops.
   Peak GFlops rose from 133 to 153.

Reported-by: Don Zickus <dzickus@redhat.com>
Reported-by: Joe Mario <jmario@redhat.com>
Tested-by: Joe Mario <jmario@redhat.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Paul Turner <pjt@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20130731221421.616d3d20@annuminas.surriel.com
[ Made the comments consistent around the modified code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-01 09:10:26 +02:00
..
alpha alpha: delete __cpuinit usage from all users 2013-07-14 19:36:51 -04:00
arc Couple of Platform updates (Device Tree files primarily) given that the 2013-07-10 10:11:26 -07:00
arm Power management and ACPI fixes for 3.11-rc2 2013-07-19 09:59:06 -07:00
arm64 - Post -rc1 update to the common reboot infrastructure. 2013-07-19 15:08:53 -07:00
avr32 net: rename busy poll socket op and globals 2013-07-10 17:08:27 -07:00
blackfin Merge branch 'cpuinit_phase2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2013-07-18 10:50:26 -07:00
c6x Merge branch 'akpm' (updates from Andrew Morton) 2013-07-03 17:12:13 -07:00
cris cris: delete __cpuinit usage from all cris files 2013-07-14 19:36:54 -04:00
frv frv: delete __cpuinit usage from all frv files 2013-07-14 19:36:55 -04:00
h8300 net: rename busy poll socket op and globals 2013-07-10 17:08:27 -07:00
hexagon hexagon: delete __cpuinit usage from all hexagon files 2013-07-14 19:36:55 -04:00
ia64 net: rename busy poll socket op and globals 2013-07-10 17:08:27 -07:00
m32r m32r: delete __cpuinit usage from all m32r files 2013-07-14 19:36:55 -04:00
m68k Merge branch 'akpm' (updates from Andrew Morton) 2013-07-03 17:12:13 -07:00
metag metag: delete __cpuinit usage from all metag files 2013-07-14 19:36:54 -04:00
microblaze Merge branch 'next' of git://git.monstr.eu/linux-2.6-microblaze 2013-07-10 10:16:07 -07:00
mips Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2013-07-19 15:10:01 -07:00
mn10300 net: rename busy poll socket op and globals 2013-07-10 17:08:27 -07:00
openrisc openrisc: delete __cpuinit usage from all openrisc files 2013-07-14 19:36:55 -04:00
parisc parisc: delete __cpuinit usage from all users 2013-07-14 19:36:51 -04:00
powerpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-07-13 17:42:22 -07:00
s390 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2013-07-19 15:08:12 -07:00
score score: delete __cpuinit usage from all score files 2013-07-14 19:36:56 -04:00
sh sh: delete __cpuinit usage from all sh files 2013-07-14 19:36:53 -04:00
sparc sparc: delete __cpuinit/__CPUINIT usage from all users 2013-07-14 19:36:52 -04:00
tile tile: delete __cpuinit usage from all tile files 2013-07-14 19:36:54 -04:00
um um: siginfo cleanup 2013-07-19 11:31:36 +02:00
unicore32 reboot: move arch/x86 reboot= handling to generic kernel 2013-07-09 10:33:29 -07:00
x86 sched/x86: Optimize switch_mm() for multi-threaded workloads 2013-08-01 09:10:26 +02:00
xtensa xtensa: delete __cpuinit usage from all xtensa files 2013-07-14 19:36:56 -04:00
.gitignore
Kconfig mm: soft-dirty bits for user memory changes tracking 2013-07-03 16:07:26 -07:00