This patch introduces the SMP support for the OpenRISC architecture.
The SMP architecture requires cores which have multi-core features which
have been introduced a few years back including:
- New SPRS SPR_COREID SPR_NUMCORES
- Shadow SPRs
- Atomic Instructions
- Cache Coherency
- A wired in IPI controller
This patch adds all of the SMP specific changes to core infrastructure,
it looks big but it needs to go all together as its hard to split this
one up.
Boot loader spinning of second cpu is not supported yet, it's assumed
that Linux is booted straight after cpu reset.
The bulk of these changes are trivial changes to refactor to use per cpu
data structures throughout. The addition of the smp.c and changes in
time.c are the changes. Some specific notes:
MM changes
----------
The reason why this is created as an array, and not with DEFINE_PER_CPU
is that doing it this way, we'll save a load in the tlb-miss handler
(the load from __per_cpu_offset).
TLB Flush
---------
The SMP implementation of flush_tlb_* works by sending out a
function-call IPI to all the non-local cpus by using the generic
on_each_cpu() function.
Currently, all flush_tlb_* functions will result in a flush_tlb_all(),
which has always been the behaviour in the UP case.
CPU INFO
--------
This creates a per cpu cpuinfo struct and fills it out accordingly for
each activated cpu. show_cpuinfo is also updated to reflect new version
information in later versions of the spec.
SMP API
-------
This imitates the arm64 implementation by having a smp_cross_call
callback that can be set by set_smp_cross_call to initiate an IPI and a
handle_IPI function that is expected to be called from an IPI irqchip
driver.
Signed-off-by: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
[shorne@gmail.com: added cpu stop, checkpatch fixes, wrote commit message]
Signed-off-by: Stafford Horne <shorne@gmail.com>
Reduce dependencies on module.h since all we need here is export.h
for EXPORT_SYMBOL.
Fixes: f50169324d ("module.h: split out the EXPORT_SYMBOL into export.h")
Signed-off-by: Stafford Horne <shorne@gmail.com>
The Kconfig option for OR12000 is OR1K_1200.
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Stafford Horne <shorne@gmail.com>
The generic memcpy routine provided in kernel does only byte copies.
Using word copies we can lower boot time and cycles spend in memcpy
quite significantly.
Booting on my de0 nano I see boot times go from 7.2 to 5.6 seconds.
The avg cycles in memcpy during boot go from 6467 to 1887.
I tested several algorithms (see code in previous patch mails)
The implementations I tested and avg cycles:
- Word Copies + Loop Unrolls + Non Aligned 1882
- Word Copies + Loop Unrolls 1887
- Word Copies 2441
- Byte Copies + Loop Unrolls 6467
- Byte Copies 7600
In the end I ended up going with Word Copies + Loop Unrolls as it
provides best tradeoff between simplicity and boot speedups.
Signed-off-by: Stafford Horne <shorne@gmail.com>
This adds a hand-optimized assembler version of memset and sets
__HAVE_ARCH_MEMSET to use this version instead of the generic C
routine
Signed-off-by: Olof Kindgren <olof.kindgren@gmail.com>
Signed-off-by: Stafford Horne <shorne@gmail.com>
This fixes up all of the smaller arches that had __dev* markings for
their platform-specific drivers.
CONFIG_HOTPLUG is going away as an option. As a result, the __dev*
markings need to be removed.
This change removes the use of __devinit, __devexit_p, __devinitdata,
__devinitconst, and __devexit from these drivers.
Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Bob Liu <lliubbo@gmail.com>
Cc: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Myron Stowe <myron.stowe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Thierry Reding <thierry.reding@avionic-design.de>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Yong Zhang <yong.zhang0@gmail.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Jan Glauber <jang@linux.vnet.ibm.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the counter overflows during a __delay operation, we will exit the
loop prematurely. For example, calling __delay(0x100) with the counter
at 0xffffff00 gives us a target of 0x0. The unsigned comparison in the
while loop will likely be false on the first iteration (if the counter
is now anything other than 0) and we will return early.
This patch fixes the problem by comparing deltas in the loop rather than
absolute values.
Cc: Jon Masters <jcm@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jonas Bonn <jonas@southpole.se>
The openrisc implementation of __const_udelay casts the result of a
32-bit multiplication to 64 bits and passes the top 32 bits to __delay.
Since there are no casts on the arguments, this results in a __delay of
zero, regardless of the xloops parameter.
This patch fixes the problem by casting xloops to (unsigned long long),
ensuring that the multiplication is not truncated.
Cc: Jon Masters <jcm@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jonas Bonn <jonas@southpole.se>
The generic version is both easier to support and more correct.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As per commits 2922585b93 ("lib: Sparc's strncpy_from_user is generic
enough, move under lib/") and 92ae03f2ef ("x86: merge 32/64-bit
versions of 'strncpy_from_user()' and speed it up"), and corresponding
discussion on linux-arch.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>