Merge third batch of patches from Andrew Morton:
- Some MM stragglers
- core SMP library cleanups (on_each_cpu_mask)
- Some IPI optimisations
- kexec
- kdump
- IPMI
- the radix-tree iterator work
- various other misc bits.
"That'll do for -rc1. I still have ~10 patches for 3.4, will send
those along when they've baked a little more."
* emailed from Andrew Morton <akpm@linux-foundation.org>: (35 commits)
backlight: fix typo in tosa_lcd.c
crc32: add help text for the algorithm select option
mm: move hugepage test examples to tools/testing/selftests/vm
mm: move slabinfo.c to tools/vm
mm: move page-types.c from Documentation to tools/vm
selftests/Makefile: make `run_tests' depend on `all'
selftests: launch individual selftests from the main Makefile
radix-tree: use iterators in find_get_pages* functions
radix-tree: rewrite gang lookup using iterator
radix-tree: introduce bit-optimized iterator
fs/proc/namespaces.c: prevent crash when ns_entries[] is empty
nbd: rename the nbd_device variable from lo to nbd
pidns: add reboot_pid_ns() to handle the reboot syscall
sysctl: use bitmap library functions
ipmi: use locks on watchdog timeout set on reboot
ipmi: simplify locking
ipmi: fix message handling during panics
ipmi: use a tasklet for handling received messages
ipmi: increase KCS timeouts
ipmi: decrease the IPMI message transaction time in interrupt mode
...
This was marked as obsolete for quite a while now.. Now it is time to
remove it altogether. And while doing this, get rid of first_cpu() as
well. Also, remove the redundant setting of cpu_online_mask in
smp_prepare_cpus() because the generic code would have already set cpu 0
in cpu_online_mask.
Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIVAwUAT3NKzROxKuMESys7AQKElw/+JyDxJSlj+g+nymkx8IVVuU8CsEwNLgRk
8KEnRfLhGtkXFLSJYWO6jzGo16F8Uqli1PdMFte/wagSv0285/HZaKlkkBVHdJ/m
u40oSjgT013bBh6MQ0Oaf8pFezFUiQB5zPOA9QGaLVGDLXCmgqUgd7exaD5wRIwB
ZmyItjZeAVnDfk1R+ZiNYytHAi8A5wSB+eFDCIQYgyulA1Igd1UnRtx+dRKbvc/m
rWQ6KWbZHIdvP1ksd8wHHkrlUD2pEeJ8glJLsZUhMm/5oMf/8RmOCvmo8rvE/qwl
eDQ1h4cGYlfjobxXZMHqAN9m7Jg2bI946HZjdb7/7oCeO6VW3FwPZ/Ic75p+wp45
HXJTItufERYk6QxShiOKvA+QexnYwY0IT5oRP4DrhdVB/X9cl2MoaZHC+RbYLQy+
/5VNZKi38iK4F9AbFamS7kd0i5QszA/ZzEzKZ6VMuOp3W/fagpn4ZJT1LIA3m4A9
Q0cj24mqeyCfjysu0TMbPtaN+Yjeu1o1OFRvM8XffbZsp5bNzuTDEvviJ2NXw4vK
4qUHulhYSEWcu9YgAZXvEWDEM78FXCkg2v/CrZXH5tyc95kUkMPcgG+QZBB5wElR
FaOKpiC/BuNIGEf02IZQ4nfDxE90QwnDeoYeV+FvNj9UEOopJ5z5bMPoTHxm4cCD
NypQthI85pc=
=G9mT
-----END PGP SIGNATURE-----
Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system
Pull "Disintegrate and delete asm/system.h" from David Howells:
"Here are a bunch of patches to disintegrate asm/system.h into a set of
separate bits to relieve the problem of circular inclusion
dependencies.
I've built all the working defconfigs from all the arches that I can
and made sure that they don't break.
The reason for these patches is that I recently encountered a circular
dependency problem that came about when I produced some patches to
optimise get_order() by rewriting it to use ilog2().
This uses bitops - and on the SH arch asm/bitops.h drags in
asm-generic/get_order.h by a circuituous route involving asm/system.h.
The main difficulty seems to be asm/system.h. It holds a number of
low level bits with no/few dependencies that are commonly used (eg.
memory barriers) and a number of bits with more dependencies that
aren't used in many places (eg. switch_to()).
These patches break asm/system.h up into the following core pieces:
(1) asm/barrier.h
Move memory barriers here. This already done for MIPS and Alpha.
(2) asm/switch_to.h
Move switch_to() and related stuff here.
(3) asm/exec.h
Move arch_align_stack() here. Other process execution related bits
could perhaps go here from asm/processor.h.
(4) asm/cmpxchg.h
Move xchg() and cmpxchg() here as they're full word atomic ops and
frequently used by atomic_xchg() and atomic_cmpxchg().
(5) asm/bug.h
Move die() and related bits.
(6) asm/auxvec.h
Move AT_VECTOR_SIZE_ARCH here.
Other arch headers are created as needed on a per-arch basis."
Fixed up some conflicts from other header file cleanups and moving code
around that has happened in the meantime, so David's testing is somewhat
weakened by that. We'll find out anything that got broken and fix it..
* tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits)
Delete all instances of asm/system.h
Remove all #inclusions of asm/system.h
Add #includes needed to permit the removal of asm/system.h
Move all declarations of free_initmem() to linux/mm.h
Disintegrate asm/system.h for OpenRISC
Split arch_align_stack() out from asm-generic/system.h
Split the switch_to() wrapper out of asm-generic/system.h
Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h
Create asm-generic/barrier.h
Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h
Disintegrate asm/system.h for Xtensa
Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt]
Disintegrate asm/system.h for Tile
Disintegrate asm/system.h for Sparc
Disintegrate asm/system.h for SH
Disintegrate asm/system.h for Score
Disintegrate asm/system.h for S390
Disintegrate asm/system.h for PowerPC
Disintegrate asm/system.h for PA-RISC
Disintegrate asm/system.h for MN10300
...
Disintegrate asm/system.h for IA64.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Tony Luck <tony.luck@intel.com>
cc: linux-ia64@vger.kernel.org
SAL specification mandates that ia64_mca_log_sal_error_record() gets
called with interrupts enabled, and that's why ia64_mca_cmc_int_handler()
is enabling them. It however forgets to re-disable them when exiting,
which triggers WARN_ON() in generic IRQ handler.
Disable the interrupts again before exiting. This is analogous to
a396768574 ("[IA64] disable interrupts at end of ia64_mca_cpe_int_handler()").
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Tony Luck <tony.luck@intel.com>
ia64_mca_cpu_init has a void *data local variable that is assigned
the value from either __get_free_pages() or mca_bootmem(). The problem
is that __get_free_pages returns an unsigned long and mca_bootmem, via
alloc_bootmem(), returns a void *. format_mca_init_stack takes the void *,
and it's also used with __pa(), but that casts it to long anyway.
This results in the following build warning:
arch/ia64/kernel/mca.c:1898: warning: assignment makes pointer from
integer without a cast
Cast the return of __get_free_pages to a void * to avoid
the warning.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
SAL requires that interrupts be enabled when making some calls
to it to pick up error records, so we enable interrupts inside
this handler. We should disable them again at the end.
Found by a new WARN_ONCE that tglx added to handle_irq_event_percpu()
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is called before early_irq_init() which will clobber any
registrations made too early. Move the calls to ia64_mca_late_init().
Signed-off-by: Tony Luck <tomy.luck@intel.com>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
__per_cpu_idtrs is statically allocated ... on CONFIG_NR_CPUS=4096
systems it hogs 16MB of memory. This is way too much for a quite
probably unused facility (only KVM uses dynamic TR registers).
Change to an array of pointers, and allocate entries as needed on
a per cpu basis. Change the name too as the __per_cpu_ prefix is
confusing (this isn't a classic <linux/percpu.h> type object).
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is a patch related to this discussion.
http://www.spinics.net/lists/linux-ia64/msg07605.html
When INIT is sent, ip/psr/pfs register is stored to the I-resources
(iip/ipsr/ifs registers), and they are copied in the min-state save
area(pmsa_{iip,ipsr,ifs}).
Therefore, in creating pt_regs at ia64_mca_modify_original_stack(),
cr_{iip,ipsr,ifs} should be derived from pmsa_{iip,ipsr,ifs}. But
current code copies pmsa_{xip,xpsr,xfs} to cr_{iip,ipsr,ifs}
when PSR.ic is 0.
finish_pt_regs(struct pt_regs *regs, const pal_min_state_area_t *ms,
unsigned long *nat)
{
(snip)
if (ia64_psr(regs)->ic) {
regs->cr_iip = ms->pmsa_iip;
regs->cr_ipsr = ms->pmsa_ipsr;
regs->cr_ifs = ms->pmsa_ifs;
} else {
regs->cr_iip = ms->pmsa_xip;
regs->cr_ipsr = ms->pmsa_xpsr;
regs->cr_ifs = ms->pmsa_xfs;
}
It's ok when PSR.ic is not 0. But when PSR.ic is 0, this could be
a problem when we investigate kernel as the value of regs->cr_iip does
not point to where INIT really interrupted.
At first I tried to change finish_pt_regs() so that it uses always
pmsa_{iip,ipsr,ifs} for cr_{iip,ipsr,ifs}, but Keith Owens pointed out
it could cause another problem if I change it.
>The only problem I can think of is an MCA/INIT
>arriving while code like SAVE_MIN or SAVE_REST is executing. Back
>tracing at that point using pmsa_iip is going to be a problem, you have
>no idea what state the registers or stack are in.
I confirmed he was right, so I decided to keep it as-is and to
save pmsa_{iip,ipsr,ifs} to ia64_sal_os_state for debugging.
An attached patch is just adding new members into ia64_sal_os_state to
save pmsa_{iip,ipsr,ifs}.
Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Registers are not saved anywhere when INIT comes during fsys mode and
we cannot know what happened when we investigate vmcore captured by
kdump. This patch adds new function finish_pt_regs() so registers can
be saved in such a case.
Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Setting monarch_cpu = -1 to let slaves frozen might not work, because
there might be slaves being late, not entered the rendezvous yet.
Such slaves might be caught in while (monarch_cpu == -1) loop.
Use kdump_in_progress instead of monarch_cpus to break INIT rendezvous
and let all slaves enter DIE_INIT_SLAVE_LEAVE smoothly.
And monarch no longer need to manage rendezvous if once kdump_in_progress
is set, catch the monarch in DIE_INIT_MONARCH_ENTER then.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Haren Myneni <hbabu@us.ibm.com>
Cc: kexec@lists.infradead.org
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
It is generally agreed that it would be beneficial for u64 to be an
unsigned long long on all architectures. ia64 (in common with several
other 64-bit architectures) currently uses unsigned long. Migrating
piecemeal is too painful; this giant patch fixes all compilation warnings
and errors that come as a result of switching to use int-ll64.h.
Note that userspace will still see __u64 defined as unsigned long. This
is important as it affects C++ name mangling.
[Updated by Tony Luck to change efi.h:efi_freemem_callback_t to use
u64 for start/end rather than unsigned long]
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Callers of alloc_pages_node() can optionally specify -1 as a node to mean
"allocate from the current node". However, a number of the callers in
fast paths know for a fact their node is valid. To avoid a comparison and
branch, this patch adds alloc_pages_exact_node() that only checks the nid
with VM_BUG_ON(). Callers that know their node is valid are then
converted.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Paul Mundt <lethal@linux-sh.org> [for the SLOB NUMA bits]
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The defines and typedefs (hw_interrupt_type, no_irq_type, irq_desc_t) have
been kept around for migration reasons. After more than two years it's
time to remove them finally.
This patch cleans up one of the remaining users. When all such patches
hit mainline we can remove the defines and typedefs finally.
Impact: cleanup
Convert the last remaining users and remove the typedef.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Impact: cleanup, futureproof
In fact, all cpumask ops will only be valid (in general) for bit
numbers < nr_cpu_ids. So use that instead of NR_CPUS in various
places.
This is always safe: no cpu number can be >= nr_cpu_ids, and
nr_cpu_ids is initialized to NR_CPUS at boot.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Using printk from MCA/INIT context is unsafe since it can cause deadlock.
The ia64_mca_modify_original_stack is called from both of mca handler and
init handler, so it should use mprintk instead of printk.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
There are many notify_die() and almost all take same style with
ia64_mca_spin(). This patch defines macros and replace them all,
to reduce lines and to improve readability.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There are 3 hooks in MCA handler, but this DIE_MCA_MONARCH_PROCESS
event does not notified other than for the first monarch.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The functions time_before, time_before_eq, time_after, and time_after_eq are
more robust for comparing jiffies against other values.
So use the time_after() & time_before() macros, defined at linux/jiffies.h,
which deal with wrapping correctly
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: S.Caglar Onur <caglar@pardus.org.tr>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While it is convenient that we can invoke kdump by asserting INIT
via button on chassis etc., there are some situations that invoking
kdump on fatal MCA is not welcomed rather than rebooting fast without
dump.
This patch adds a new flag 'kdump_on_fatal_mca' that is independent
from 'kdump_on_init' currently available. Adding this flag enable
us to turning on/off of kdump depend on the event, INIT and/or fatal
MCA. Default for this flag is to take the dump.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Dynamic TR resource should be managed in the uniform way.
Add two interfaces for kernel:
ia64_itr_entry: Allocate a (pair of) TR for caller.
ia64_ptr_entry: Purge a (pair of ) TR by caller.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
__FUNCTION__ is gcc-specific, use __func__
Long lines have been kept where they exist, some small spacing changes
have been done.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The MCA code allocates bootmem memory for NR_CPUS, regardless
of how many cpus the system actually has. This change allocates
memory only for cpus that actually exist.
On my test system with NR_CPUS = 1024, reserved memory was reduced by 130944k.
Before: Memory: 27886976k/28111168k available (8282k code, 242304k reserved, 5928k data, 1792k init)
After: Memory: 28017920k/28111168k available (8282k code, 111360k reserved, 5928k data, 1792k init)
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Currently CMCI mask of hot-added CPU is always disabled after CPU hotplug.
We should adjust this mask depending on CMC polling state.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When the CPE handler encounters too many CPEs (such as a solid single
bit memory error), it sets up a polling timer and disables the CPE
interrupt (to avoid excessive overhead logging the stream of single
bit errors). disable_irq_nosync() calls chip->disable() to provide
a chipset specifiec interface for disabling the interrupt. This patch
adds the Altix specific support to disable and re-enable the CPE interrupt.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Additional testing uncovered a situation where the MCA recovery code could
hang due to a race condition.
According to the SAL spec, SAL sends a rendezvous interrupt to all but the first
CPU that goes into MCA. This includes other CPUs that go into MCA at the same
time. Those other CPUs will go into the linux MCA handler (rather than the
slave loop) with the rendezvous interrupt pending. When all the CPUs have
completed MCA processing and the last monarch completes, freeing all the CPUs,
the CPUs with the pended rendezvous interrupt then go into the
ia64_mca_rendez_int_handler(). In ia64_mca_rendez_int_handler() the CPUs
get marked as rendezvoused, but then leave the handler (due to no MCA).
That leaves the CPUs marked as rendezvoused _before_ the next MCA event.
When the next MCA hits, the monarch will mistakenly believe that all the CPUs
are rendezvoused when they are not, opening up a window where a CPU can get
stuck in the slave loop.
This patch avoids leaving CPUs marked as rendezvoused when they are not.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While testing the MCA recovery code, noticed that some machines would have a
five second delay rendezvousing cpus. What was happening is that
ia64_wait_for_slaves() would check to see if all the slave CPUs had
rendezvoused. If any had not, it would wait 1 millisecond then check again.
If any CPUs had still not rendezvoused, it would wait 5 seconds before
checking again.
On some configs the rendezvous takes more than 1 millisecond, causing the code
to wait the full 5 seconds, even though the last CPU rendezvoused after only
a few milliseconds.
The fix is to check every 1 millisecond to see if all the cpus have
rendezvoused. After 5 seconds the code concludes the CPUs will never
rendezvous (same as before).
The MCA code is, by definition, not performance critical, but a needless
delay of 5 seconds is senseless. The 5 seconds also adds up quickly
when running the error injection code in a loop.
This patch both simplifies the code and removes the needless delay.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Use local_vector_to_irq() instead of looping through all NR_IRQS.
This avoids registering the CPE handler on multiple irqs. Only
register if the irq is valid. If no valid irq is found, print an
error message and set up polling.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the following section mismatch warnings:
WARNING: vmlinux.o(.text+0x41902): Section mismatch: reference to .init.text:__alloc_bootmem (between 'ia64_mca_cpu_init' and 'ia64_do_tlb_purge')
WARNING: vmlinux.o(.text+0x49222): Section mismatch: reference to .init.text:__alloc_bootmem (between 'register_intr' and 'iosapic_register_intr')
WARNING: vmlinux.o(.text+0x62beb2): Section mismatch: reference to .init.text:__alloc_bootmem_node (between 'hubdev_init_node' and 'cnodeid_get_geoid')
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Linux does not gracefully deal with multiple processors going
through OS_MCA aa part of the same MCA event. The first cpu
into OS_MCA grabs the ia64_mca_serialize lock. Subsequent
cpus wait for that lock, preventing them from reporting in as
rendezvoused. The first cpu waits 5 seconds then complains
that all the cpus have not rendezvoused. The first cpu then
handles its MCA and frees up all the rendezvoused cpus and
releases the ia64_mca_serialize lock. One of the subsequent
cpus going thought OS_MCA then gets the ia64_mca_serialize
lock, waits another 5 seconds and then complains that none of
the other cpus have rendezvoused.
This patch allows multiple CPUs to gracefully go through OS_MCA.
The first CPU into ia64_mca_handler() grabs a mca_count lock.
Subsequent CPUs into ia64_mca_handler() are added to a list of cpus
that need to go through OS_MCA (a bit set in mca_cpu), and report
in as rendezvoused, and but spin waiting their turn.
The first CPU sees everyone rendezvous, handles his MCA, wakes up
one of the other CPUs waiting to process their MCA (by clearing
one mca_cpu bit), and then waits for the other cpus to complete
their MCA handling. The next CPU handles his MCA and the process
repeats until all the CPUs have handled their MCA. When the last
CPU has handled it's MCA, it sets monarch_cpu to -1, releasing all
the CPUs.
In testing this works more reliably and faster.
Thanks to Keith Owens for suggesting numerous improvements
to this code.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Building with GCC 4.2, I get the following error:
CC arch/ia64/kernel/mca.o
arch/ia64/kernel/mca.c:275: error: __ksymtab_ia64_mlogbuf_finish causes a
section type conflict
This is because ia64_mlogbuf_finish is both declared static and exported.
Fix by removing the export (which is unneeded now).
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The current implementation of kdump on INIT events would enter
kdump processing on DIE_INIT_MONARCH_ENTER and DIE_INIT_SLAVE_ENTER
events. Thus, the monarch cpu would go ahead and boot up the kdump
On SN shub2 systems, this out-of-sync situation causes some slave
cpus on different nodes to enter POD.
This patch moves kdump entry points to DIE_INIT_MONARCH_LEAVE and
DIE_INIT_SLAVE_LEAVE. It also sets kdump_in_progress variable in
the DIE_INIT_MONARCH_PROCESS event to not dump all active stack
traces to the console in the case of kdump.
I have tested this patch on an SN machine and a HP RX2600.
Signed-off-by: Jay Lan <jlan@sgi.com>
Acked-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Only shows up while building sim_defconfig because CONFIG_ACPI=n
there, and all of the uses of cpe_poll_timer are inside #ifdef CONFIG_ACPI.
Signed-off-by: Tony Luck <tony.luck@intel.com>
This finally renames the thread_info field in task structure to stack, so that
the assumptions about this field are gone and archs have more freedom about
placing the thread_info structure.
Nonbroken archs which have a proper thread pointer can do the access to both
current thread and task structure via a single pointer.
It'll allow for a few more cleanups of the fork code, from which e.g. ia64
could benefit.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
[akpm@linux-foundation.org: build fix]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Andi Kleen <ak@muc.de>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.
Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch moves the die notifier handling to common code. Previous
various architectures had exactly the same code for it. Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)
arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at. avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jack Steiner noticed that duplicate TLB DTC entries do not cause a
linux panic. See discussion:
http://www.gelato.unsw.edu.au/archives/linux-ia64/0307/6108.html
The current TLB recovery code is recovering from the duplicate itr.d
dropins, masking the underlying problem. This change modifies
the MCA recovery code to look for the TLB check signature of the
duplicate TLB entry and panic in that case.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Actually, on reflection I think that there is a good case for
keeping the options separate. I am thinking particularly of people
who want a very small crashdump kernel and thus don't want to compile
in kexec.
The patch below should fix things up so that all valid combinations of
KEXEC, CRASH_DUMP and VMCORE compile cleanly - VMCORE depends on
CRASH_DUMP which is why I said valid combinations. In a nutshell
it just untangles unrelated code and switches around a few defines.
Please note that it creats a new file, arch/ia64/kernel/crash_dump.c
This is in keeping with the i386 implementation.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Changes and updates.
1. Remove fake rendz path and related code according to discuss with Khalid Aziz.
2. fc.i offset fix in relocate_kernel.S.
3. iospic shutdown code eoi and mask race fix from Fujitsu.
4. Warm boot hook in machine_kexec to SN SAL code from Jack Steiner.
5. Send slave to SAL slave loop patch from Jay Lan.
6. Kdump on non-recoverable MCA event patch from Jay Lan
7. Use CTL_UNNUMBERED in kdump_on_init sysctl.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix up arch-specific work items where possible to use the new work_struct and
delayed_work structs.
Three places that enqueue bits of their stack and then return have been marked
with #error as this is not permitted.
Signed-Off-By: David Howells <dhowells@redhat.com>
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.
The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around. On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).
Where appropriate, an arch may override the generic storage facility and do
something different with the variable. On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.
Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions. Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller. A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.
I've build this code with allyesconfig for x86_64 and i386. I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.
This will affect all archs. Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
struct pt_regs *old_regs = set_irq_regs(regs);
And put the old one back at the end:
set_irq_regs(old_regs);
Don't pass regs through to generic_handle_irq() or __do_IRQ().
In timer_interrupt(), this sort of change will be necessary:
- update_process_times(user_mode(regs));
- profile_tick(CPU_PROFILING, regs);
+ update_process_times(user_mode(get_irq_regs()));
+ profile_tick(CPU_PROFILING);
I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().
Some notes on the interrupt handling in the drivers:
(*) input_dev() is now gone entirely. The regs pointer is no longer stored in
the input_dev struct.
(*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does
something different depending on whether it's been supplied with a regs
pointer or not.
(*) Various IRQ handler function pointers have been moved to type
irq_handler_t.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)