We want to allow different implementations of pci_raw_ops for standard
and extended config space on x86. Rather than clutter generic code with
knowledge of this, we make pci_raw_ops private to x86 and use it to
implement the new raw interface -- raw_pci_read() and raw_pci_write().
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To allow flexible configuration of IDE introduce HAVE_IDE.
All archs except arm, um and s390 unconditionally select it.
For arm the actual configuration determine if IDE is supported.
This is a step towards introducing drivers/Kconfig for arm.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Russell King - ARM Linux <linux@arm.linux.org.uk>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Fix large MCA bootmem allocation
[IA64] Simplify cpu_idle_wait
[IA64] Synchronize RBS on PTRACE_ATTACH
[IA64] Synchronize kernel RSE to user-space and back
[IA64] Rename TIF_PERFMON_WORK back to TIF_NOTIFY_RESUME
[IA64] Wire up timerfd_{create,settime,gettime} syscalls
The MCA code allocates bootmem memory for NR_CPUS, regardless
of how many cpus the system actually has. This change allocates
memory only for cpus that actually exist.
On my test system with NR_CPUS = 1024, reserved memory was reduced by 130944k.
Before: Memory: 27886976k/28111168k available (8282k code, 242304k reserved, 5928k data, 1792k init)
After: Memory: 28017920k/28111168k available (8282k code, 111360k reserved, 5928k data, 1792k init)
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is just Venki's patch[*] for x86 ported to ia64.
* http://marc.info/?l=linux-kernel&m=120249201318159&w=2
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When attaching to a stopped process, the RSE must be explicitly
synced to user-space, so the debugger can read the correct values.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
CC: Roland McGrath <roland@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is base kernel patch for ptrace RSE bug. It's basically a backport
from the utrace RSE patch I sent out several weeks ago. please review.
when a thread is stopped (ptraced), debugger might change thread's user
stack (change memory directly), and we must avoid the RSE stored in
kernel to override user stack (user space's RSE is newer than kernel's
in the case). To workaround the issue, we copy kernel RSE to user RSE
before the task is stopped, so user RSE has updated data. we then copy
user RSE to kernel after the task is resummed from traced stop and
kernel will use the newer RSE to return to user.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
CC: Roland McGrath <roland@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Since the RSE synchronization will need a TIF_ flag, but all
work-to-be-done bits are already used, so we have to multiplex
TIF_NOTIFY_RESUME again.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix typo in comments.
BTW: I have to fix coding style in arch/ia64/kernel/time.c also, otherwise
checkpatch.pl will be complaining.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes the configuration dependencies in the vmcoreinfo data.
i386's "node_data" is defined in arch/x86/mm/discontig_32.c,
and x86_64's one is defined in arch/x86/mm/numa_64.c.
They depend on CONFIG_NUMA:
arch/x86/mm/Makefile_32:7
obj-$(CONFIG_NUMA) += discontig_32.o
arch/x86/mm/Makefile_64:7
obj-$(CONFIG_NUMA) += numa_64.o
ia64's "pgdat_list" is defined in arch/ia64/mm/discontig.c,
and it depends on CONFIG_DISCONTIGMEM and CONFIG_SPARSEMEM:
arch/ia64/mm/Makefile:9-10
obj-$(CONFIG_DISCONTIGMEM) += discontig.o
obj-$(CONFIG_SPARSEMEM) += discontig.o
ia64's "node_memblk" is defined in arch/ia64/mm/numa.c,
and it depends on CONFIG_NUMA:
arch/ia64/mm/Makefile:8
obj-$(CONFIG_NUMA) += numa.o
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Acked-by: Simon Horman <horms@verge.net.au>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset is for the vmcoreinfo data.
The vmcoreinfo data has the minimum debugging information only for dump
filtering. makedumpfile (dump filtering command) gets it to distinguish
unnecessary pages, and makedumpfile creates a small dumpfile.
This patch:
VMCOREINFO_SIZE() should be renamed VMCOREINFO_STRUCT_SIZE() since it's always
returning the size of the struct with a given name. This change would allow
VMCOREINFO_TYPEDEF_SIZE() to simply become VMCOREINFO_SIZE() since it need not
be used exclusively for typedefs.
This discussion is the following:
http://www.ussg.iu.edu/hypermail/linux/kernel/0709.3/0582.html
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset adds a flags variable to reserve_bootmem() and uses the
BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
between crashkernel area and already used memory.
This patch:
Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
If that flag is set, the function returns with -EBUSY if the memory already
has been reserved in the past. This is to avoid conflicts.
Because that code runs before SMP initialisation, there's no race condition
inside reserve_bootmem_core().
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: <linux-arch@vger.kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
calibrate_delay() must be __cpuinit, not __{dev,}init.
I've verified that this is correct for all users.
While doing the latter, I also did the following cleanups:
- remove pointless additional prototypes in C files
- ensure all users #include <linux/delay.h>
This fixes the following section mismatches with CONFIG_HOTPLUG=n,
CONFIG_HOTPLUG_CPU=y:
WARNING: vmlinux.o(.text+0x1128d): Section mismatch: reference to .init.text.1:calibrate_delay (between 'check_cx686_slop' and 'set_cx86_reorder')
WARNING: vmlinux.o(.text+0x25102): Section mismatch: reference to .init.text.1:calibrate_delay (between 'smp_callin' and 'cpu_coregroup_map')
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christian Zankel <chris@zankel.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] make pfm_get_task work with virtual pids
[IA64] honor notify_die() returning NOTIFY_STOP
[IA64] remove dead code: __cpu_{down,die} from !HOTPLUG_CPU
[IA64] Appoint kvm/ia64 Maintainers
[IA64] ia64_set_psr should use srlz.i
[IA64] Export three symbols for module use
[IA64] mca style cleanup
[IA64] sn_hwperf semaphore to mutex
[IA64] generalize attribute of fsyscall_gtod_data
[IA64] efi.c Add /* never reached */ annotation
[IA64] efi.c Spelling/punctuation fixes
[IA64] Make efi.c mostly fit in 80 columns
[IA64] aliasing-test: fix gcc warnings on non-ia64
[IA64] Slim-down __clear_bit_unlock
[IA64] Fix the order of atomic operations in restore_previous_kprobes on ia64
[IA64] constify function pointer tables
[IA64] fix userspace compile error in gcc_intrin.h
This is the new timerfd API as it is implemented by the following patch:
int timerfd_create(int clockid, int flags);
int timerfd_settime(int ufd, int flags,
const struct itimerspec *utmr,
struct itimerspec *otmr);
int timerfd_gettime(int ufd, struct itimerspec *otmr);
The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.
The timerfd_settime() API give new settings by the timerfd fd, by optionally
retrieving the previous expiration time (in case the "otmr" parameter is not
NULL).
The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
is set in the "flags" parameter. Otherwise it's a relative time.
The timerfd_gettime() API returns the next expiration time of the timer, or
{0, 0} if the timerfd has not been set yet.
Like the previous timerfd API implementation, read(2) and poll(2) are
supported (with the same interface). Here's a simple test program I used to
exercise the new timerfd APIs:
http://www.xmailserver.org/timerfd-test2.c
[akpm@linux-foundation.org: coding-style cleanups]
[akpm@linux-foundation.org: fix ia64 build]
[akpm@linux-foundation.org: fix m68k build]
[akpm@linux-foundation.org: fix mips build]
[akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
[heiko.carstens@de.ibm.com: fix s390]
[akpm@linux-foundation.org: fix powerpc build]
[akpm@linux-foundation.org: fix sparc64 more]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This pid comes from user space, so treat it accordingly.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This requires making die() and die_if_kernel() return a value, and their
callers to honor this (and be prepared that it returns).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Neither __cpu_down() nor __cpu_die() are being referenced without
CONFIG_HOTPLUG_CPU.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The only in kernel use of ia64_set_psr() needs to follow
it with a srlz.i (since it is changing state for PSR.ic).
So it is pointless to issue srlz.d inside this function.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Since kvm/module needs to use some unexported functions in kernel,
so export them with this patch.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Really simple mutex style semaphore user. The new API is struct mutex which is
what I've converted it to with this change.
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
In an ordinary way,
> } __attribute__ ((aligned (L1_CACHE_BYTES)));
should be
> } ____cacheline_aligned;
to save some bytes on an uni-processor.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
As written, this loop could be for (;;) instead of do while (md). The tests
inside the loop always result in a return so the loop never terminates normally.
Signed-off-by: Aron Griffis <aron@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Incorporates the suggestions from Peter Chubb the last time I submitted
this. This called for using the same verb tense in the couple of preceding
comments as well.
Signed-off-by: Aron Griffis <aron@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch is purely whitespace changes to make the code fit in 80
columns, plus fix some inconsistent indentation. The efi_guidcmp()
tests remain wider than 80-columns since that seems to be the most
clear.
Signed-off-by: Aron Griffis <aron@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the order of atomic operations to prevent overwriting prev_kprobe[0].
To pop values from stack, we must decrement stack index right AFTER
reading values.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
After seeing the filename I'd have expected something about the
implementation of SMP in the Linux kernel - not some notes on kernel
configuration and building trivialities noone would search at this
place.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Alan Cox <alan@redhat.com>
Linus:
On the per-architecture side, I do think it would be better to *not* have
internal architecture knowledge in a generic file, and as such a line like
depends on X86_32 || IA64 || PPC || S390 || SPARC64 || X86_64 || AVR32
really shouldn't exist in a file like kernel/Kconfig.instrumentation.
It would be much better to do
depends on ARCH_SUPPORTS_KPROBES
in that generic file, and then architectures that do support it would just
have a
bool ARCH_SUPPORTS_KPROBES
default y
in *their* architecture files. That would seem to be much more logical,
and is readable both for arch maintainers *and* for people who have no
clue - and don't care - about which architecture is supposed to support
which interface...
Changelog:
Actually, I know I gave this as the magic incantation, but now that I see
it, I realize that I should have told you to just use
config KPROBES_SUPPORT
def_bool y
instead, which is a bit denser.
We seem to use both kinds of syntax for these things, but this is really
what "def_bool" is there for...
- Use HAVE_KPROBES
- Use a select
- Yet another update :
Moving to HAVE_* now.
- Update ARM for kprobes support.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Linus:
On the per-architecture side, I do think it would be better to *not* have
internal architecture knowledge in a generic file, and as such a line like
depends on X86_32 || IA64 || PPC || S390 || SPARC64 || X86_64 || AVR32
really shouldn't exist in a file like kernel/Kconfig.instrumentation.
It would be much better to do
depends on ARCH_SUPPORTS_KPROBES
in that generic file, and then architectures that do support it would just
have a
bool ARCH_SUPPORTS_KPROBES
default y
in *their* architecture files. That would seem to be much more logical,
and is readable both for arch maintainers *and* for people who have no
clue - and don't care - about which architecture is supposed to support
which interface...
Changelog:
Actually, I know I gave this as the magic incantation, but now that I see
it, I realize that I should have told you to just use
config ARCH_SUPPORTS_KPROBES
def_bool y
instead, which is a bit denser.
We seem to use both kinds of syntax for these things, but this is really
what "def_bool" is there for...
Changelog :
- Moving to HAVE_*.
- Add AVR32 oprofile.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
The ACPI_PDC_SMP_T_SWCOORD bit is set by and OS that is capable of
native ACPI throttling software coordination for mutli-processors
using the _TSD information.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits)
Remove commented-out code copied from NFS
NFS: Switch from intr mount option to TASK_KILLABLE
Add wait_for_completion_killable
Add wait_event_killable
Add schedule_timeout_killable
Use mutex_lock_killable in vfs_readdir
Add mutex_lock_killable
Use lock_page_killable
Add lock_page_killable
Add fatal_signal_pending
Add TASK_WAKEKILL
exit: Use task_is_*
signal: Use task_is_*
sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL
ptrace: Use task_is_*
power: Use task_is_*
wait: Use TASK_NORMAL
proc/base.c: Use task_is_*
proc/array.c: Use TASK_REPORT
perfmon: Use task_is_*
...
Fixed up conflicts in NFS/sunrpc manually..
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
alpha: fix x86.git merge build error
ia64: on UP percpu variables are not small memory model
x86: fix arch/x86/kernel/test_nx.c modular build bug
s390: use generic percpu linux-2.6.git
POWERPC: use generic per cpu
ia64: use generic percpu
SPARC64: use generic percpu
percpu: change Kconfig to HAVE_SETUP_PER_CPU_AREA
modules: fold percpu_modcopy into module.c
x86: export copy_from_user_ll_nocache[_nozero]
x86: fix duplicated TIF on 64-bit
percpu_modcopy() is defined multiple times in arch files. However, the only
user is module.c. Put a static definition into module.c and remove
the definitions from the arch files.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With the sg table code, every SCSI driver is now either chain capable
or broken (or has sg_tablesize set so chaining is never activated), so
there's no need to have a check in the host template.
Also tidy up the code by moving the scatterlist size defines into the
SCSI includes and permit the last entry of the scatterlist pools not
to be a power of two.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
- add support for PER_CPU_ATTRIBUTES
- fix generic smp percpu_modcopy to use per_cpu_offset() macro.
Add the ability to use generic/percpu even if the arch needs to override
several aspects of its operations. This will enable the use of generic
percpu.h for all arches.
An arch may define:
__per_cpu_offset Do not use the generic pointer array. Arch must
define per_cpu_offset(cpu) (used by x86_64, s390).
__my_cpu_offset Can be defined to provide an optimized way to determine
the offset for variables of the currently executing
processor. Used by ia64, x86_64, x86_32, sparc64, s/390.
SHIFT_PTR(ptr, offset) If an arch defines it then special handling
of pointer arithmentic may be implemented. Used
by s/390.
(Some of these special percpu arch implementations may be later consolidated
so that there are less cases to deal with.)
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The use of the __GENERIC_PERCPU is a bit problematic since arches
may want to run their own percpu setup while using the generic
percpu definitions. Replace it through a kconfig variable.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The break_lock data structure and code for spinlocks is quite nasty.
Not only does it double the size of a spinlock but it changes locking to
a potentially less optimal trylock.
Put all of that under CONFIG_GENERIC_LOCKBREAK, and introduce a
__raw_spin_is_contended that uses the lock data itself to determine whether
there are waiters on the lock, to be used if CONFIG_GENERIC_LOCKBREAK is
not set.
Rename need_lockbreak to spin_needbreak, make it use spin_is_contended to
decouple it from the spinlock implementation, and make it typesafe (rwlocks
do not have any need_lockbreak sites -- why do they even get bloated up
with that break_lock then?).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
#39: FILE: arch/ia64/ia32/binfmt_elf32.c:229:
+elf32_map (struct file *filep, unsigned long addr, struct elf_phdr *eppnt, int prot, int type, unsigned long unused)
WARNING: no space between function name and open parenthesis '('
#39: FILE: arch/ia64/ia32/binfmt_elf32.c:229:
+elf32_map (struct file *filep, unsigned long addr, struct elf_phdr *eppnt, int prot, int type, unsigned long unused)
WARNING: line over 80 characters
#67: FILE: arch/x86/kernel/sys_x86_64.c:80:
+ new_begin = randomize_range(*begin, *begin + 0x02000000, 0);
ERROR: use tabs not spaces
#110: FILE: arch/x86/kernel/sys_x86_64.c:185:
+ ^I mm->cached_hole_size = 0;$
ERROR: use tabs not spaces
#111: FILE: arch/x86/kernel/sys_x86_64.c:186:
+ ^I^Imm->free_area_cache = mm->mmap_base;$
ERROR: use tabs not spaces
#112: FILE: arch/x86/kernel/sys_x86_64.c:187:
+ ^I}$
ERROR: use tabs not spaces
#141: FILE: arch/x86/kernel/sys_x86_64.c:216:
+ ^I^I/* remember the largest hole we saw so far */$
ERROR: use tabs not spaces
#142: FILE: arch/x86/kernel/sys_x86_64.c:217:
+ ^I^Iif (addr + mm->cached_hole_size < vma->vm_start)$
ERROR: use tabs not spaces
#143: FILE: arch/x86/kernel/sys_x86_64.c:218:
+ ^I^I mm->cached_hole_size = vma->vm_start - addr;$
ERROR: use tabs not spaces
#157: FILE: arch/x86/kernel/sys_x86_64.c:232:
+ ^Imm->free_area_cache = TASK_UNMAPPED_BASE;$
ERROR: need a space before the open parenthesis '('
#291: FILE: arch/x86/mm/mmap_64.c:101:
+ } else if(mmap_is_legacy()) {
WARNING: braces {} are not necessary for single statement blocks
#302: FILE: arch/x86/mm/mmap_64.c:112:
+ if (current->flags & PF_RANDOMIZE) {
+ mm->mmap_base += ((long)rnd) << PAGE_SHIFT;
+ }
WARNING: line over 80 characters
#314: FILE: fs/binfmt_elf.c:48:
+static unsigned long elf_map (struct file *, unsigned long, struct elf_phdr *, int, int, unsigned long);
WARNING: no space between function name and open parenthesis '('
#314: FILE: fs/binfmt_elf.c:48:
+static unsigned long elf_map (struct file *, unsigned long, struct elf_phdr *, int, int, unsigned long);
WARNING: line over 80 characters
#429: FILE: fs/binfmt_elf.c:438:
+ eppnt, elf_prot, elf_type, total_size);
ERROR: need space after that ',' (ctx:VxV)
#480: FILE: fs/binfmt_elf.c:939:
+ elf_prot, elf_flags,0);
^
total: 9 errors, 7 warnings, 461 lines checked
Your patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Please run checkpatch prior to sending patches
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
main executable of (specially compiled/linked -pie/-fpie) ET_DYN binaries
onto a random address (in cases in which mmap() is allowed to perform a
randomization).
The code has been extraced from Ingo's exec-shield patch
http://people.redhat.com/mingo/exec-shield/
[akpm@linux-foundation.org: fix used-uninitialsied warning]
[kamezawa.hiroyu@jp.fujitsu.com: fixed ia32 ELF on x86_64 handling]
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25: (1470 commits)
[IPV6] ADDRLABEL: Fix double free on label deletion.
[PPP]: Sparse warning fixes.
[IPV4] fib_trie: remove unneeded NULL check
[IPV4] fib_trie: More whitespace cleanup.
[NET_SCHED]: Use nla_policy for attribute validation in ematches
[NET_SCHED]: Use nla_policy for attribute validation in actions
[NET_SCHED]: Use nla_policy for attribute validation in classifiers
[NET_SCHED]: Use nla_policy for attribute validation in packet schedulers
[NET_SCHED]: sch_api: introduce constant for rate table size
[NET_SCHED]: Use typeful attribute parsing helpers
[NET_SCHED]: Use typeful attribute construction helpers
[NET_SCHED]: Use NLA_PUT_STRING for string dumping
[NET_SCHED]: Use nla_nest_start/nla_nest_end
[NET_SCHED]: Propagate nla_parse return value
[NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get
[NET_SCHED]: act_api: use nlmsg_parse
[NET_SCHED]: act_api: fix netlink API conversion bug
[NET_SCHED]: sch_netem: use nla_parse_nested_compat
[NET_SCHED]: sch_atm: fix format string warning
[NETNS]: Add namespace for ICMP replying code.
...
* use irq_handler_t where appropriate
* no need to use 'irq' function arg, its already stored in a data struct
* rename irq handler 'irq' argument to 'dummy', where the function
has been analyzed and proven not to use its first argument.
* remove always-false "dev_id == NULL" test from irq handlers
* remove pointless casts from void*
* declance: irq argument is not const
* add KERN_xxx printk prefix
* fix minor whitespace weirdness
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This patch consolidate all definitions of .init.text, .init.data
and .exit.text, .exit.data section definitions in
the generic vmlinux.lds.h.
This is a preparational patch - alone it does not buy
us much good.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
There is no need for kobject_unregister() anymore, thanks to Kay's
kobject cleanup changes, so replace all instances of it with
kobject_put().
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Stop using kobject_register, as this way we can control the sending of
the uevent properly, after everything is properly initialized.
Cc: Tony Luck <tony.luck@intel.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The compiler team did the hard work for this distilling a problem in
large fortran application which showed up when applied to a 290MB input
data set down to this instruction:
ldfd f34=[r17],-8
Which they noticed incremented r17 by 0x10 rather than decrementing it
by 8 when the value in r17 caused an unaligned data fault. I tracked
it down to some bad instruction decoding in unaligned.c. The code
assumes that the 'x' bit can determine whether the instruction is
an "ldf" or "ldfp" ... which it is for opcode=6 (see table 4-29 on
page 3:302 of the SDM). But for opcode=7 the 'x' bit is irrelevent,
all variants are "ldf" instructions (see table 4-36 on page 3:306).
Note also that interpreting the instruction as "ldfp" means that the
"paired" floating point register (f35 in the example here) will also
be corrupted.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Montecito and Montvale behaves slightly differently than previous
Itanium processors, resulting in the MCA due to a failed PIO read
to sometimes surfacing outside the nofault code. This code is
based on discussions with Intel CPU architects and verified at
customer sites.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Currently CMCI mask of hot-added CPU is always disabled after CPU hotplug.
We should adjust this mask depending on CMC polling state.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This fixes an unused variable warning in mm/vmalloc.c.
Tony: also fix resulting fallout in uncached.c with a
typo in args to flush_tlb_kernel_range().
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Access to elfcorehdr_addr needs to be guarded by #if CONFIG_PROC_FS
as well as the existing #if guards.
Fixes the following build problem:
arch/ia64/hp/common/built-in.o: In function
`sba_init':arch/ia64/hp/common/sba_iommu.c:2043: undefined reference to `elfcorehdr_addr'
:arch/ia64/hp/common/sba_iommu.c:2043: undefined reference to `elfcorehdr_addr'
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The Altix shub2 BTE error detail bits are in a different location
than on shub1. The current code does not take this into account
resulting in all shub2 BTE failures mapping to "unknown".
This patch reads the error detail bits from the proper location,
so the correct BTE failure reason is returned for both shub1
and shub2.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes the following assembler warning messages.
AS arch/ia64/kernel/head.o
arch/ia64/kernel/head.S: Assembler messages:
arch/ia64/kernel/head.S:1179: Warning: Use of 'ld8' violates RAW dependency 'CR[PTA]' (data)
arch/ia64/kernel/head.S:1179: Warning: Only the first path encountering the conflict is reported
arch/ia64/kernel/head.S:1178: Warning: This is the location of the conflicting usage
arch/ia64/kernel/head.S:1180: Warning: Use of 'ld8' violates RAW dependency 'CR[PTA]' (data)
arch/ia64/kernel/head.S:1180: Warning: Only the first path encountering the conflict is reported
arch/ia64/kernel/head.S:1178: Warning: This is the location of the conflicting usage
:
arch/ia64/kernel/head.S:1213: Warning: Use of 'ldf.fill.nta' violates RAW dependency 'CR[PTA]' (data)
arch/ia64/kernel/head.S:1213: Warning: Only the first path encountering the conflict is reported
arch/ia64/kernel/head.S:1178: Warning: This is the location of the conflicting usage
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes the following compiler warning messages.
CC arch/ia64/kernel/irq_ia64.o
arch/ia64/kernel/irq_ia64.c: In function 'create_irq':
arch/ia64/kernel/irq_ia64.c:343: warning: 'domain.bits[0u]' may be used uninitialized in this function
arch/ia64/kernel/irq_ia64.c: In function 'assign_irq_vector':
arch/ia64/kernel/irq_ia64.c:203: warning: 'domain.bits[0u]' may be used uninitialized in this function
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
I tried to upgrade an IA32 chroot on my IA64 to a new glibc with TLS.
It kept dying because set_thread_area was returning -ESRCH
(bugs.debian.org/451939).
I instrumented arch/ia64/ia32/sys_ia32.c:get_free_idx() and ended up
seeing output like
[pid] idx desc->a desc->b
-----------------------------
[2710] 0 -> c6b0ffff 40dff31b
[2710] 1 -> 0 0
[2710] 2 -> 0 0
[2710] 0 -> c6b0ffff 40dff31b
[2710] 1 -> c6b0ffff 40dff31b
[2710] 2 -> 0 0
[2711] 0 -> c6b0ffff 40dff31b
[2711] 1 -> c6b0ffff 40dff31b
[2711] 2 -> 48c0ffff 40dff317
which suggested to me that TLS pointers were surviving exec() calls,
leading to GDT pointers filling up and the eventual failure of
get_free_idx().
I think the solution is flushing the tls array on exec.
Signed-Off-By: Ian Wienand <ianw@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The ia64 oops message doesn't include the kernel version, which
makes it hard to automatically categorize oops messages scraped
from mailing lists and bug databases.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Improve performance of memory allocations on ia64 by avoiding a global TLB
purge to purge a single page from the file cache. This happens whenever we
evict a page from the buffer cache to make room for some other allocation.
Test case: Run 'find /usr -type f | xargs cat > /dev/null' in the
background to fill the buffer cache, then run something that uses memory,
e.g. 'gmake -j50 install'. Instrumentation showed that the number of
global TLB purges went from a few millions down to about 170 over a 12
hours run of the above.
The performance impact is particularly noticeable under virtualization,
because a virtual TLB is generally both larger and slower to purge than
a physical one.
Signed-off-by: Christophe de Dinechin <ddd@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Convert ia64's ia32 support from nopage to fault.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes some redundant code in the function setup_sigcontext().
The registers ar.ccv,b7,r14,ar.csd,ar.ssd,r2-r3 and r16-r31 are not
restored in restore_sigcontext() when (flags & IA64_SC_FLAG_IN_SYSCALL) is
true. So we don't need to zero those variables in setup_sigcontext().
Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
ACPI tables follow a tree structure in memory.
The root of the tree is the RSDP (Root System Description Pointer).
To find the RSDP, the OS searches for the signature "RSD PTR "
in well known physical memory locations. Then the OS computes
a table checksum to verify that the signature is really part
of a valid table header.
Some systems have a proper signature but an invalid checksum;
followed elsewhere by a proper signature with valid checksum.
http://bugzilla.kernel.org/show_bug.cgi?id=9444
The Linux RSDP scanning code bailed out on those systems
and as a result they booted with ACPI disabled.
Fix this by deleting the Linux RSDP scanning code and
plugging in the ACPICA RSDP scanning code.
Signed-off-by: Len Brown <len.brown@intel.com>
.. as it it used only during early boot.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
arch/ia64/kernel/acpi.c | 2 +-
arch/x86/kernel/acpi/boot.c | 4 ++--
drivers/acpi/osl.c | 3 ++-
3 files changed, 5 insertions(+), 4 deletions(-)
Signed-off-by: Len Brown <len.brown@intel.com>
If "CPEI Processor Override" bit is not set in "Platform Interrupt
Source Flags" in "Platform Interrupt Sources Structure" in ACPI MADT,
the target processor of CPEI is restricted to a specific CPU. Because
of this, the delivery mode for CPEI should be IOSAPIC_FIXED.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Restore regs->ccr_iip before kreturn probe handler runs. In this way, if
probe handler does unwind, unwind can correctly get the stack trace.
Fixes: http://sourceware.org/bugzilla/show_bug.cgi?id=5051
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
'!' has a higher priority than '&', so as was
this won't test the first bit, but rather evaluates to false for any non-zero
lsapic->lapic_flags.
Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Macro efi_md_size is defined in efi.c, and here we apply it throughout
efi.c.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Rename _bss to __bss_start as on other architectures. That makes it
possible to use the <linux/sections.h> instead of own declarations. Also
add __bss_stop because that symbol exists on other architectures.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When initializing pci_controller->node to point to the closest node we need
to take into consideration that a PIC PCI Bridge ASIC can be connected to a
headless/memless node just like the TIOCP and TIOCE Bridge ASICs
Signed-off-by: Mike Habeck <habeck@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Make some IOSAPIC functions static and remove one that is unused.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Not all the return value of __copy_from_user and
__put_user is checked.This patch fixed it.
Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
With the unionfs patch applied I get
ERROR: "copy_page" [fs/unionfs/unionfs.ko] undefined!
the other architectures (some, at least) export copy_page() so I guess ia64
should also do so.
To do this we need to move the copy_page() functions out of lib.a and into
built-in.o and add the EXPORT_SYMBOL().
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Don't assume that this file has execute permissions. For example, patch(1)
loses that information.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
/opt/crosstool/gcc-3.4.5-glibc-2.3.6/ia64-unknown-linux-gnu/lib/gcc/ia64-unknown-linux-gnu/3.4.5/../../../../ia64-unknown-linux-gnu/bin/ld: section .data.patch [a000000000000500 -> a000000000000507] overlaps section .dynamic [a0000000000003c8 -> a000000000000507]
This only appears to be a problem with strangely configured
cross-compilation ... native compilers don't have this issue.
But in the interests of helping others at least compile for
ia64, this can go in. -Tony
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
i386 and x86-64 registers System RAM as IORESOURCE_MEM | IORESOURCE_BUSY.
But ia64 registers it as IORESOURCE_MEM only.
In addition, memory hotplug code registers new memory as IORESOURCE_MEM too.
This difference causes a failure of memory unplug of x86-64. This patch
fixes it.
This patch adds IORESOURCE_BUSY to avoid potential overlap mapping by PCI
device.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Luck, Tony" <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On Altix (sn2) machines the "Error parsing MADT" message is
misleading because the lack of IOSAPIC entries is expected.
Since I am sure someone will ask, I have been told that
the chance of this changing anytime soon is close to nil.
Signed-off-by: George Beshers <gbeshers@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Newer Itanium versions have added additional processor feature set
bits. This patch prints all the implemented feature set bits. Some
bit descriptions have not been made public. For those bits, a generic
"Feature set X bit Y" message is printed. Bits that are not implemented
will no longer be printed.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the problem that redirect hit bit in I/O SAPIC RTE is set even
when it must be disabled (e.g. nointroute boot option is set, CPU
hotplug is enabled or percpu vector is enabled).
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Currently, XPC's heartbeat timer function runs on whatever CPU modprobe/insmod
ran on when XPC was started. To avoid the heartbeat from being delayed for
long periods the timer function must run on CPU 0.
N.B. Altix doesn't currently allow cpu0 to be taken offline, so this is
safe for now. This code must be revised when offline of cpu0 is enabled.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Clean up /proc/interrupts output on the system that has 10 or more
CPUs.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When the CPE handler encounters too many CPEs (such as a solid single
bit memory error), it sets up a polling timer and disables the CPE
interrupt (to avoid excessive overhead logging the stream of single
bit errors). disable_irq_nosync() calls chip->disable() to provide
a chipset specifiec interface for disabling the interrupt. This patch
adds the Altix specific support to disable and re-enable the CPE interrupt.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
No need to print "McKinley Errata 9 workaround not needed; disabling it"
on every non-McKinley Itanium, which at this point is almost all of them.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
If you build the kernel `in-place' then do a git update, git
complains about arch/ia64/kernel/gate.lds being modified and
untracked.
Add that (generated) file to a .gitignore file.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There is a section mismatch when building CONFIG_FLATMEM=y kernels
that also have CONFIG_HOTPLUG_CPU=y
WARNING: vmlinux.o(.text+0x5a902): Section mismatch: reference to \
.init.text:__alloc_bootmem (between 'per_cpu_init' and 'count_pages')
The issue occurs because per_cpu_init() in mm/contig.c is
marked __cpuinit (which is #define'd to nothing on a hot
plug cpu configuration) call __alloc_bootmem() (which is
an __init function). The usage is actually safe because
the __alloc_bootmem() is inside an "if (first_time)" test
so that the call is only made while it is still legal to
do so.
But the warning is irritating. Move the allocation to
find_memory().
Signed-off-by: Tony Luck <tony.luck@intel.com>
Not sizeof(ptr) ... we meant to say sizeof(*ptr).
Also moved the memset to the error path (the normal path overwrites
every field in the structure anyway) -Tony
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The sizeof a pointer is constant, we want the sizeof what is pointed to.
Zero out 'sizeof(*efi_systab)' bytes of the efi_system_table_t pointer
'efi_systab' instead.
Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Signed-off-by: Tony Luck <tony.luck@intel.com>
New sanity checks in sysctl_check_table() complain about a couple
of mode 0755 that should be 0555 in the perfmon code:
sysctl table check failed: /kernel .1 Writable sysctl directory
sysctl table check failed: /kernel/perfmon Writable sysctl directory
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the problem that pci_enable_msi() fails on ia64 platform. The cause of
this problem is incorrect return value of ia64_setup_msi_irq(). It must
return 0 on success, instead of irq number.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Update sn2_defconfig to select 64KB page size, as well as include new
config options.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Clean up the process for presenting the "physical id" field in
/proc/cpuinfo.
- remove global smp_num_cpucores, as it is mostly useless
- remove check_for_logical_procs(), since we do the same
functionality in identify_siblings()
- reflow logic in identify_siblings(). If an older CPU
does not implement PAL_LOGICAL_TO_PHYSICAL, we may still
be able to get useful information from SAL_PHYSICAL_ID_INFO
- in identify_siblings(), threads/cores are a property of
the CPU, not the platform
- remove useless printk's about multi-core / thread
capability in identify_siblings(), as that information
is readily available in /proc/cpuinfo, and printing for
the BSP only adds little value
- smp_num_siblings is now meaningful if any CPU in the
system supports threads, not just the BSP
- expose "physical id" field, even on CPUs that are not
multi-core / multi-threaded (as long as we have a valid
value). Now we know what sockets Madisons live in too.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When gcc uses --build-id by default, the gate.lds.S linker script runs afoul
of the new note section and produces a bad DSO image. This fixes it.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Some versions of ld with --build-id support will crash when using the flag
with a linker script that discards notes. This bites ia64's check-segrel.lds.
The bug is easy to avoid.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
vmcore_find_descriptor_size() is only called by
reserve_elfcorehdr(), which is in __init, so it seems to me that
vmcore_find_descriptor_size() should be there too.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch fixes the following section mismatches:
<-- snip -->
...
WARNING: vmlinux.o(.text+0x5b5c2): Section mismatch: reference to .init.text:memmap_init_zone (between 'memmap_init' and 'virtual_memmap_init')
WARNING: vmlinux.o(.text+0x5b842): Section mismatch: reference to .init.text:memmap_init_zone (between 'virtual_memmap_init' and 'ia64_mmu_init')
...
<-- snip -->
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'sg' of git://git.kernel.dk/linux-2.6-block:
Add CONFIG_DEBUG_SG sg validation
Change table chaining layout
Update arch/ to use sg helpers
Update swiotlb to use sg helpers
Update net/ to use sg helpers
Update fs/ to use sg helpers
[SG] Update drivers to use sg helpers
[SG] Update crypto/ to sg helpers
[SG] Update block layer to use sg helpers
[SG] Add helpers for manipulating SG entries
Add the BSS to the resource tree just as kernel text and kernel data are in
the resource tree. The main reason behind this is to avoid crashkernel
reservation in that area.
While it's not strictly necessary to have the BSS in the resource tree (the
actual collision detection is done in the reserve_bootmem() function before),
the usage of the BSS resource should be presented to the user in /proc/iomem
just as Kernel data and Kernel code.
Note: The patch currently is only implemented for x86 and ia64 (because
efi_initialize_iomem_resources() has the same signature on i386 and ia64).
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Most of these fixes were already submitted for old kernel versions, and were
approved, but for some reason they never made it into the releases.
Because this is a consolidation of a couple old missed patches, it touches both
Kconfigs and documentation texts.
Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Quoting Randy:
"It seems sad that this patch sources Kconfig.marker, a 7-line file,
20-something times. Yes, you (we) don't want to put those 7 lines into
20-something different files, so sourcing is the right thing.
However, what you did for avr32 seems more on the right track to me: make
_one_ Instrumentation support menu that includes PROFILING, OPROFILE, KPROBES,
and MARKERS and then use (source) that in all of the arches."
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adapts IA64 to use the generic parse_crashkernel() function instead
of its own parsing for the crashkernel command line.
Because the total amount of System RAM must be known when calling this
function, efi_memmap_init() is modified to return its accumulated total_memory
variable.
Also, the crashkernel handling is moved in an own function in
arch/ia64/kernel/setup.c to make the code more readable.
[kamalesh@linux.vnet.ibm.com: build fix]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One of the easiest things to isolate is the pid printed in kernel log.
There was a patch, that made this for arch-independent code, this one makes
so for arch/xxx files.
It took some time to cross-compile it, but hopefully these are all the
printks in arch code.
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the largest patch in the set. Make all (I hope) the places where
the pid is shown to or get from user operate on the virtual pids.
The idea is:
- all in-kernel data structures must store either struct pid itself
or the pid's global nr, obtained with pid_nr() call;
- when seeking the task from kernel code with the stored id one
should use find_task_by_pid() call that works with global pids;
- when showing pid's numerical value to the user the virtual one
should be used, but however when one shows task's pid outside this
task's namespace the global one is to be used;
- when getting the pid from userspace one need to consider this as
the virtual one and use appropriate task/pid-searching functions.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: nuther build fix]
[akpm@linux-foundation.org: yet nuther build fix]
[akpm@linux-foundation.org: remove unneeded casts]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Paul Menage <menage@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
is_init() is an ambiguous name for the pid==1 check. Split it into
is_global_init() and is_container_init().
A cgroup init has it's tsk->pid == 1.
A global init also has it's tsk->pid == 1 and it's active pid namespace
is the init_pid_ns. But rather than check the active pid namespace,
compare the task structure with 'init_pid_ns.child_reaper', which is
initialized during boot to the /sbin/init process and never changes.
Changelog:
2.6.22-rc4-mm2-pidns1:
- Use 'init_pid_ns.child_reaper' to determine if a given task is the
global init (/sbin/init) process. This would improve performance
and remove dependence on the task_pid().
2.6.21-mm2-pidns2:
- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
ppc,avr32}/traps.c for the _exception() call to is_global_init().
This way, we kill only the cgroup if the cgroup's init has a
bug rather than force a kernel panic.
[akpm@linux-foundation.org: fix comment]
[sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
[bunk@stusta.de: kernel/pid.c: remove unused exports]
[sukadev@us.ibm.com: Fix capability.c to work with threaded init]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Acked-by: Pavel Emelianov <xemul@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Herbert Poetzel <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In pre-cgroup cpusets, a few config files enabled cpusets by default.
Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch uses vm_get_page_prot() to setup vma->vm_page_prot.
Though inside vm_get_page_prot() the protection flags is AND with
(VM_READ|VM_WRITE|VM_EXEC|VM_SHARED), it does not hurt correct code.
Signed-off-by: Coly Li <coyli@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Found these while looking at printk uses.
Add missing newlines to dev_<level> uses
Add missing KERN_<level> prefixes to multiline dev_<level>s
Fixed a wierd->weird spelling typo
Added a newline to a printk
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Mark M. Hoffman <mhoffman@lightlink.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: James Smart <James.Smart@Emulex.Com>
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On platforms that copy sys_tz into the vdso (currently only x86_64, soon to
include powerpc), it is possible for the vdso to get out of sync if a user
calls (admittedly unusual) settimeofday(NULL, ptr).
This patch adds a hook for architectures that set
CONFIG_GENERIC_TIME_VSYSCALL to ensure when sys_tz is updated they can also
updatee their copy in the vdso.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/ia64/kernel/machine_kexec.c: In function `arch_crash_save_vmcoreinfo':
arch/ia64/kernel/machine_kexec.c:131: error: `pgdat_list' undeclared (first use in this function)
arch/ia64/kernel/machine_kexec.c:131: error: (Each undeclared identifier is reported only once
arch/ia64/kernel/machine_kexec.c:131: error: for each function it appears in.)
arch/ia64/kernel/machine_kexec.c:134: error: `node_memblk' undeclared (first use in this function)
arch/ia64/kernel/machine_kexec.c:135: error: `NR_NODE_MEMBLKS' undeclared (first use in this function)
arch/ia64/kernel/machine_kexec.c:136: error: invalid application of `sizeof' to incomplete type `node_memblk_s'
arch/ia64/kernel/machine_kexec.c:137: error: dereferencing pointer to incomplete type
arch/ia64/kernel/machine_kexec.c:138: error: dereferencing pointer to incomplete type
make[1]: *** [arch/ia64/kernel/machine_kexec.o] Error 1
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add a prefix "VMCOREINFO_" to the vmcoreinfo macros. Old vmcoreinfo macros
were defined as generic names SYMBOL/SIZE/OFFSET /LENGTH/CONFIG, and it is
impossible to grep for them. So these names should be changed. This
discussion is the following:
http://www.ussg.iu.edu/hypermail/linux/kernel/0709.1/0415.html
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch set frees the restriction that makedumpfile users should install a
vmlinux file (including the debugging information) into each system.
makedumpfile command is the dump filtering feature for kdump. It creates a
small dumpfile by filtering unnecessary pages for the analysis. To
distinguish unnecessary pages, it needs a vmlinux file including the debugging
information. These days, the debugging package becomes a huge file, and it is
hard to install it into each system.
To solve the problem, kdump developers discussed it at lkml and kexec-ml. As
the result, we reached the conclusion that necessary information for dump
filtering (called "vmcoreinfo") should be embedded into the first kernel file
and it should be accessed through /proc/vmcore during the second kernel.
(http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.0/1806.html)
Dan Aloni created the patch set for the above implementation.
(http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.1/1053.html)
And I updated it for multi architectures and memory models.
(http://lists.infradead.org/pipermail/kexec/2007-August/000479.html)
Signed-off-by: Dan Aloni <da-x@monatomic.org>
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It makes more sense to make instrumentation support experimental on a
case-by-case basis.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
list_del() hardly can fail, so checking for return value is pointless
(and current code always return 0).
Nobody really cared that return value anyway.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace NT_PRXFPREG with ELF_CORE_XFPREG_TYPE in the coredump code which
allows for more flexibility in the note type for the state of 'extended
floating point' implementations in coredumps. New note types can now be
added with an appropriate #define.
This does #define ELF_CORE_XFPREG_TYPE to be NT_PRXFPREG in all
current users so there's are no change in behaviour.
This will let us use different note types on powerpc for the Altivec/VMX
state that some PowerPC cpus have (G4, PPC970, POWER6) and for the SPE
(signal processing extension) state that some embedded PowerPC cpus from
Freescale have.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sg list elements might not be continuous.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
d5a7430ddc missed a spot where we
use cpu_sibling_map and cpu_core_map. These don't exist on a
uni-processor build. Wrap #ifdef CONFIG_SMP ... #endif around it.
Signed-off-by: Tony Luck <tony.luck@intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (40 commits)
kbuild: introduce ccflags-y, asflags-y and ldflags-y
kbuild: enable 'make CPPFLAGS=...' to add additional options to CPP
kbuild: enable use of AFLAGS and CFLAGS on commandline
kbuild: enable 'make AFLAGS=...' to add additional options to AS
kbuild: fix AFLAGS use in h8300 and m68knommu
kbuild: check for wrong use of CFLAGS
kbuild: enable 'make CFLAGS=...' to add additional options to CC
kbuild: fix up CFLAGS usage
kbuild: make modpost detect unterminated device id lists
kbuild: call export_report from the Makefile
kbuild: move Kai Germaschewski to CREDITS
kconfig/menuconfig: distinguish between selected-by-another options and comments
kconfig: tristate choices with mixed tristate and boolean values
include/linux/Kbuild: remove duplicate entries
kbuild: kill backward compatibility checks
kbuild: kill EXTRA_ARFLAGS
kbuild: fix documentation in makefiles.txt
kbuild: call make once for all targets when O=.. is used
kbuild: pass -g to assembler under CONFIG_DEBUG_INFO
kbuild: update _shipped files for kconfig syntax cleanup
...
Fix up conflicts in arch/um/sys-{x86_64,i386}/Makefile manually.
This cleans up the formatting in the vDSO linker script, mostly just the
use of whitespace. It's intended to approximate the kernel standard
conventions for indenting C, treating elements of the linker script about
like initialized variable definitions.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce architecture dependent kretprobe blacklists to prohibit users
from inserting return probes on the function in which kprobes can be
inserted but kretprobes can not.
This patch also removes "__kprobes" mark from "__switch_to" on x86_64 and
registers "__switch_to" to the blacklist on x86-64, because that mark is to
prohibit user from inserting only kretprobe.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now, arch dependent code around CONFIG_MEMORY_HOTREMOVE is a mess.
This patch cleans up them. This is against 2.6.23-rc6-mm1.
- fix compile failure on ia64/ CONFIG_MEMORY_HOTPLUG && !CONFIG_MEMORY_HOTREMOVE case.
- For !CONFIG_MEMORY_HOTREMOVE, add generic no-op remove_memory(),
which returns -EINVAL.
- removed remove_pages() only used in powerpc.
- removed no-op remove_memory() in i386, sh, sparc64, x86_64.
- only powerpc returns -ENOSYS at memory hot remove(no-op). changes it
to return -EINVAL.
Note:
Currently, only ia64 supports CONFIG_MEMORY_HOTREMOVE. I welcome other
archs if there are requirements and testers.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Logic.
- set all pages in [start,end) as isolated migration-type.
by this, all free pages in the range will be not-for-use.
- Migrate all LRU pages in the range.
- Test all pages in the range's refcnt is zero or not.
Todo:
- allocate migration destination page from better area.
- confirm page_count(page)== 0 && PageReserved(page) page is safe to be freed..
(I don't like this kind of page but..
- Find out pages which cannot be migrated.
- more running tests.
- Use reclaim for unplugging other memory type area.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently mobility grouping works at the MAX_ORDER_NR_PAGES level. This makes
sense for the majority of users where this is also the huge page size.
However, on platforms like ia64 where the huge page size is runtime
configurable it is desirable to group at a lower order. On x86_64 and
occasionally on x86, the hugepage size may not always be MAX_ORDER_NR_PAGES.
This patch groups pages together based on the value of HUGETLB_PAGE_ORDER. It
uses a compile-time constant if possible and a variable where the huge page
size is runtime configurable.
It is assumed that grouping should be done at the lowest sensible order and
that the user would not want to override this. If this is not true,
page_block order could be forced to a variable initialised via a boot-time
kernel parameter.
One potential issue with this patch is that IA64 now parses hugepagesz with
early_param() instead of __setup(). __setup() is called after the memory
allocator has been initialised and the pageblock bitmaps already setup. In
tests on one IA64 there did not seem to be any problem with using
early_param() and in fact may be more correct as it guarantees the parameter
is handled before the parsing of hugepages=.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current ia64 kernel flushes icache by lazy_mmu_prot_update() *after*
set_pte(). This is too late. This patch removes lazy_mmu_prot_update and
add modfied set_pte() for flushing if necessary.
This patch flush icache of a page when
new pte has exec bit.
&& new pte has present bit
&& new pte is user's page.
&& (old *ptep is not present
|| new pte's pfn is not same to old *ptep's ptn)
&& new pte's page has no Pg_arch_1 bit.
Pg_arch_1 is set when a page is cache consistent.
I think this condition checks are much easier to understand than considering
"Where sync_icache_dcache() should be inserted ?".
pte_user() for ia64 was removed by http://lkml.org/lkml/2007/6/12/67 as
clean-up. So, I added it again.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The checks for node_online in the uncached allocator are made to make sure
that memory is available on these nodes. Thus switch all the checks to use
N_HIGH_MEMORY and to N_ONLINE.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Bob Picco <bob.picco@hp.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@skynet.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have had complaints where a threaded application is left in a bad state
after one of it's threads is killed when we hit a VM: out_of_memory
condition.
Killing just one of the process threads can leave the application in a bad
state, whereas killing the entire process group would allow for the
application to restart, or be otherwise handled, and makes it very obvious
that something has gone wrong.
This change allows the entire process group to be taken down, rather
than just the one thread.
Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Equip IA64 sparsemem with a virtual memmap. This is similar to the existing
CONFIG_VIRTUAL_MEM_MAP functionality for DISCONTIGMEM. It uses a PAGE_SIZE
mapping.
This is provided as a minimally intrusive solution. We split the 128TB
VMALLOC area into two 64TB areas and use one for the virtual memmap.
This should replace CONFIG_VIRTUAL_MEM_MAP long term.
[apw@shadowen.org: convert to new helper based initialisation]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert cpu_sibling_map from a static array sized by NR_CPUS to a per_cpu
variable. This saves sizeof(cpumask_t) * NR unused cpus. Access is mostly
from startup and CPU HOTPLUG functions.
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This option is true if a low-level driver can support sg
chaining. This will be removed eventually when all the drivers are
converted to support sg chaining. q->max_phys_segments is set to
SCSI_MAX_SG_SEGMENTS if false.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The variable CPPFLAGS is a wellknown variable and the usage by
kbuild may result in unexpected behaviour.
This patch replace use of CPPFLAGS with KBUILD_CPPFLAGS all over the
tree and enabling one to use:
make CPPFLAGS=...
to specify additional CPP commandline options.
Patch was tested on following architectures:
alpha, arm, i386, x86_64, mips, sparc, sparc64, ia64, m68k, s390
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Update defonfig file for sn2 to match recent changes in config options.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The variable CFLAGS is a wellknown variable and the usage by
kbuild may result in unexpected behaviour.
On top of that several people over time has asked for a way to
pass in additional flags to gcc.
This patch replace use of CFLAGS with KBUILD_CFLAGS all over the
tree and enabling one to use:
make CFLAGS=...
to specify additional gcc commandline options.
One usecase is when trying to find gcc bugs but other
use cases has been requested too.
Patch was tested on following architectures:
alpha, arm, i386, x86_64, mips, sparc, sparc64, ia64, m68k
Test was simple to do a defconfig build, apply the patch and check
that nothing got rebuild.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (75 commits)
PM: merge device power-management source files
sysfs: add copyrights
kobject: update the copyrights
kset: add some kerneldoc to help describe what these strange things are
Driver core: rename ktype_edd and ktype_efivar
Driver core: rename ktype_driver
Driver core: rename ktype_device
Driver core: rename ktype_class
driver core: remove subsystem_init()
sysfs: move sysfs file poll implementation to sysfs_open_dirent
sysfs: implement sysfs_open_dirent
sysfs: move sysfs_dirent->s_children into sysfs_dirent->s_dir
sysfs: make sysfs_root a regular directory dirent
sysfs: open code sysfs_attach_dentry()
sysfs: make s_elem an anonymous union
sysfs: make bin attr open get active reference of parent too
sysfs: kill unnecessary NULL pointer check in sysfs_release()
sysfs: kill unnecessary sysfs_get() in open paths
sysfs: reposition sysfs_dirent->s_mode.
sysfs: kill sysfs_update_file()
...
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] Don't take semaphore in cpufreq_quick_get()
[CPUFREQ] Support different families in fid/did to frequency conversion
[CPUFREQ] cpufreq_stats: misc cpuinit section annotations
[CPUFREQ] implement !CONFIG_CPU_FREQ stub for cpufreq_unregister_notifier()
[CPUFREQ] mark hotplug notifier callback as __cpuinit
[CPUFREQ] Only check for transition latency on problematic governors (kconfig fix)
[CPUFREQ] allow ondemand and conservative cpufreq governors to be used as default
[CPUFREQ] move policy's governor initialisation out of low-level drivers into cpufreq core
[CPUFREQ] Longhaul - Add support for PM133 northbridge
[CPUFREQ] x86: use num_online_nodes to get physical cpus numbers for
Fix the problem that kdump on INIT hung up if kdump kernel image is
not configured.
The kdump_init_notifier() on monarch CPU stops its operation at
DIE_INIT_MONARCH_LEAVE time if the kdump kernel image is not
configured. On the other hand, kdump_init_notifier() on non-monarch
CPUs get into spin because they don't know the fact the monarch stops
its operation. This is the cause of this problem. To fix this problem,
we need to check the kdump kernel image at the top of the
kdump_init_notifier() function.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the problem that kdump on INIT causes a kernel panic if kdump
kernel image is not configured. The cause of this problem is
machine_kexec_on_init() is using printk in INIT context. It should
use ia64_mca_printk() instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The use of vector in ia64_machine_kexec() seems spurious,
and removing it simplifies the code slightly.
As suggested by Alex Williamson <alex.williamson@hp.com>
Cc: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Additional testing uncovered a situation where the MCA recovery code could
hang due to a race condition.
According to the SAL spec, SAL sends a rendezvous interrupt to all but the first
CPU that goes into MCA. This includes other CPUs that go into MCA at the same
time. Those other CPUs will go into the linux MCA handler (rather than the
slave loop) with the rendezvous interrupt pending. When all the CPUs have
completed MCA processing and the last monarch completes, freeing all the CPUs,
the CPUs with the pended rendezvous interrupt then go into the
ia64_mca_rendez_int_handler(). In ia64_mca_rendez_int_handler() the CPUs
get marked as rendezvoused, but then leave the handler (due to no MCA).
That leaves the CPUs marked as rendezvoused _before_ the next MCA event.
When the next MCA hits, the monarch will mistakenly believe that all the CPUs
are rendezvoused when they are not, opening up a window where a CPU can get
stuck in the slave loop.
This patch avoids leaving CPUs marked as rendezvoused when they are not.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While testing the MCA recovery code, noticed that some machines would have a
five second delay rendezvousing cpus. What was happening is that
ia64_wait_for_slaves() would check to see if all the slave CPUs had
rendezvoused. If any had not, it would wait 1 millisecond then check again.
If any CPUs had still not rendezvoused, it would wait 5 seconds before
checking again.
On some configs the rendezvous takes more than 1 millisecond, causing the code
to wait the full 5 seconds, even though the last CPU rendezvoused after only
a few milliseconds.
The fix is to check every 1 millisecond to see if all the cpus have
rendezvoused. After 5 seconds the code concludes the CPUs will never
rendezvous (same as before).
The MCA code is, by definition, not performance critical, but a needless
delay of 5 seconds is senseless. The 5 seconds also adds up quickly
when running the error injection code in a loop.
This patch both simplifies the code and removes the needless delay.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This driver for HPQ5001 devices installs a global ACPI OpRegion handler.
AML methods can use this OpRegion to call native firmware entry points.
ACPI does not define a mechanism for AML methods to call native firmware
interfaces such as PAL or SAL. This OpRegion handler adds such a mechanism.
After the handler is installed, an AML method can call native firmware by
storing the arguments and firmware entry point to specific offsets in the
OpRegion. When AML reads the "return value" offset from the OpRegion, this
handler loads up the arguments, makes the firmware call, and returns the
result.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This changes the uevent buffer functions to use a struct instead of a
long list of parameters. It does no longer require the caller to do the
proper buffer termination and size accounting, which is currently wrong
in some places. It fixes a known bug where parts of the uevent
environment are overwritten because of wrong index calculations.
Many thanks to Mathieu Desnoyers for finding bugs and improving the
error handling.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Because it is dead code and not referenced by anybody else (that file cannot
be built modular).
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* palinfo.c:
palinfo_cpu_notifier is a CPU hotplug notifier_block, and can be
marked __cpuinitdata, and the callback function palinfo_cpu_callback()
itself can be marked __cpuinit. create_palinfo_proc_entries() is only
called from __cpuinit callback or general __init code, therefore a
candidate for __cpuinit itself. remove_palinfo_proc_entries() is only
called from __cpuinit callback or general __exit code, therefore a
candidate for __cpuexit.
* salinfo.c:
The CPU hotplug notifier_block can be __cpuinitdata. The callback
salinfo_cpu_callback() is incorrectly marked __devinit -- it must
be __cpuinit instead.
* topology.c:
cache_sysfs_init() is only called at device_initcall() time so marking
it as __cpuinit is wrong and wasteful. It should be unconditionally
__init. Also cleanup reference to hotplug notifier callback function
from this function and replace with cache_add_dev(), which could also
enable us to use other tricks to replace __cpuinit{data} annotations,
as recently discussed on this list.
cache_shared_cpu_map_setup() is only ever called from __cpuinit-marked
functions hence both its definitions (SMP or !SMP) are candidates for
__cpuinit itself. Also all_cpu_cache_info can be __cpuinitdata because
only referenced from __cpuinit code.
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Changing the global CPPFLAGS is not the recommended way
to add additional include dirs.
Changed to use EXTRA_CFLAGS.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Jes Sorensen <jes@sgi.com>
If scsi_add_host returned an error, the host would never be freed.
We need to call scsi_host_put() if an error happens.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (867 commits)
[SKY2]: status polling loop (post merge)
[NET]: Fix NAPI completion handling in some drivers.
[TCP]: Limit processing lost_retrans loop to work-to-do cases
[TCP]: Fix lost_retrans loop vs fastpath problems
[TCP]: No need to re-count fackets_out/sacked_out at RTO
[TCP]: Extract tcp_match_queue_to_sack from sacktag code
[TCP]: Kill almost unused variable pcount from sacktag
[TCP]: Fix mark_head_lost to ignore R-bit when trying to mark L
[TCP]: Add bytes_acked (ABC) clearing to FRTO too
[IPv6]: Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493, try2
[NETFILTER]: x_tables: add missing ip6t_modulename aliases
[NETFILTER]: nf_conntrack_tcp: fix connection reopening
[QETH]: fix qeth_main.c
[NETLINK]: fib_frontend build fixes
[IPv6]: Export userland ND options through netlink (RDNSS support)
[9P]: build fix with !CONFIG_SYSCTL
[NET]: Fix dev_put() and dev_hold() comments
[NET]: make netlink user -> kernel interface synchronious
[NET]: unify netlink kernel socket recognition
[NET]: cleanup 3rd argument in netlink_sendskb
...
Fix up conflicts manually in Documentation/feature-removal-schedule.txt
and my new least favourite crap, the "mod_devicetable" support in the
files include/linux/mod_devicetable.h and scripts/mod/file2alias.c.
(The latter files seem to be explicitly _designed_ to get conflicts when
different subsystems work with them - that have an absolutely horrid
lack of subsystem separation!)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the headers to include/asm-x86 and fixup the
header install make rules
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Every user of the network device notifiers is either a protocol
stack or a pseudo device. If a protocol stack that does not have
support for multiple network namespaces receives an event for a
device that is not in the initial network namespace it quite possibly
can get confused and do the wrong thing.
To avoid problems until all of the protocol stacks are converted
this patch modifies all netdev event handlers to ignore events on
devices that are not in the initial network namespace.
As the rest of the code is made network namespace aware these
checks can be removed.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
When PTRACE_SYSCALL was used and then PTRACE_DETACH is used, the
TIF_SYSCALL_TRACE flag is left set on the formerly-traced task. This
means that when a new tracer comes along and does PTRACE_ATTACH, it's
possible he gets a syscall tracing stop even though he's never used
PTRACE_SYSCALL. This happens if the task was in the middle of a system
call when the second PTRACE_ATTACH was done. The symptom is an
unexpected SIGTRAP when the tracer thinks that only SIGSTOP should have
been provoked by his ptrace calls so far.
A few machines already fixed this in ptrace_disable (i386, ia64, m68k).
But all other machines do not, and still have this bug. On x86_64, this
constitutes a regression in IA32 compatibility support.
Since all machines now use TIF_SYSCALL_TRACE for this, I put the
clearing of TIF_SYSCALL_TRACE in the generic ptrace_detach code rather
than adding it to every other machine's ptrace_disable.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After my last patch we have a new header file for HP simulator use.
Here's code to use it for stuff that used to have `extern' statements
inline in the code. Functionality should not change with this patch.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch cleans up the `enable early console for SKI' patch
(471e7a4484), and
1. potentially allows the gensparse_defconfig to work again.
(there are other problems running a generic kernel on Ski)
2. fixes the `console registered twice' problem.
3. Cleans up the code by moving the `extern hpsim_cons' declaration to
a new asm/hpsim.h file.
Thanks to Jes for comments.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When dumping memory via sysrq-m it is possible to take a bogus NMI watchdog
or softlockup watchdog because the dump can take a long time on big memory
systems.
Occasionally tickle the watchdog when doing the dump.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add additional support for CPU disable on SN platforms.
Correctly setup the smp_affinity mask for I/O error IRQs.
Restrict the use of the feature to Altix 4000 and 450 systems
running with a CPU disable capable PROM, and do not allow disabling
of CPU 0.
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
vmalloc() returns a void pointer - no need to cast it.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
For hugepage mappings, the file offset, like the address and size, needs to
be aligned to the size of a hugepage.
In commit 68589bc353, the check for this was
moved into prepare_hugepage_range() along with the address and size checks.
But since BenH's rework of the get_unmapped_area() paths leading up to
commit 4b1d89290b, prepare_hugepage_range()
is only called for MAP_FIXED mappings, not for other mappings. This means
we're no longer ever checking for an aligned offset - I've confirmed that
mmap() will (apparently) succeed with a misaligned offset on both powerpc
and i386 at least.
This patch restores the check, removing it from prepare_hugepage_range()
and putting it back into hugetlbfs_file_mmap(). I'm putting it there,
rather than in the get_unmapped_area() path so it only needs to go in one
place, than separately in the half-dozen or so arch-specific
implementations of hugetlb_get_unmapped_area().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The pending interrupts can be remaining at boot up time on some
platform. This will cause spurious interrupts when interrupt is
enabled for the first time. This patch clears IVR at the CPU
initialization to eliminate such spurious interrupts.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix handling for spurious interrupts not being mapped to any IRQs.
Currently, spurious interrupts that are not mapped to any IRQs are
handled as IRQ 15 (== IA64_SPURIOUS_VECTOR). But it is not proper
because vector != irq. We need special handlings for such spurious
interrupts not being mapped to any IRQs.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When using Ski to debug early startup, it's a bit of a pain not to
have printk.
This patch enables the simulated console very early.
It may be worth conditionalising on the command line... but this is
enough for now.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The "ri" field in the processor status register only has defined
values of 0, 1, 2. Do not let ptrace set this to 3. As with
other reserved fields in registers we silently discard the value.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There is a bug in the ia64_do_page_fault code that can cause a failure
to grow the register backing store, or any mapping that is marked as
VM_GROWSUP if the mapping is the highest mapped area of memory.
When the address accessed is below the first mapping the previous mapping
is returned as NULL, and this case is handled. However, when the address
accessed is above the highest mapping the vma returned is NULL, this
case is not handled correctly, and it fails to spot that this access
might require an existing mapping to grow upwards.
Signed-off-by: Andrew Burgess <andrew@transitive.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The core cpufreq code doesn't appear to understand returning -EAGAIN
for the get() function of the cpufreq_driver. If PAL_GET_PSTATE returns
-1, such as when running on Xen, scaling_cur_freq is happy to return
4294967285 kHz (ie. (unsigned)-11). The other drivers appear to return
0 for a failure, and doing so gives me the max frequency from
scaling_cur_frequency and "<unknown>" from cpuinfo_cur_frequency. I
believe that's the desired behavior.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
If the interrupt has been disabled, don't call the force_interrupt provider.
Doing so can result in an infinite runaway interrupt loop.
Signed-off-by: Mike Habeck <habeck@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The slab allocator was changed in 2.6.23 to default to SLUB. However,
the config files in arch/ia64/configs still use SLAB. Switch them to SLUB.
Added same change to arch/ia64/defconfig ... Tony
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Explicitly put the unwind section into its own program-header. This
used to be unnecessary (probably because binutils did it for us), but
with current binutils (e.g., v2.17.50.20070804) we won't get
the PT_IA_64_UNWIND header without this patch which will break
unwinding in a debugger and simulators such as Ski.
Signed-off-by: David Mosberger-Tang <dmosberger@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add NOTES to linker script such that the kernel can be built with
recent versions of binutils. Without this patch, final link fails
with this error:
ld: .tmp_vmlinux1: section `.text' can't be allocated in segment 0
ld: final link failed: Bad value
This error is due to the fact that the --build-id option is used
with newer linkers to include a .notes section on the kernel, but
without the NOTES macro, that section won't be included in the kernel
which then leads to the above error message.
Signed-off-by: David Mosberger-Tang <dmosberger@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add a dummy nop at the end of _start() to maintain the invariant that
the return-pointer (rp) always point to the calling function. This
makes unwinding stop at the last frame, as it should.
Signed-off-by: David Mosberger-Tang <dmosberger@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Use local_vector_to_irq() instead of looping through all NR_IRQS.
This avoids registering the CPE handler on multiple irqs. Only
register if the irq is valid. If no valid irq is found, print an
error message and set up polling.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
arch/ia64/Kconfig failed to include kernel/Kconfig.preempt that meant it
did not support PREEMPT_VOLUNTARY and PREEMPT_BKL (inadvertently).
This was recently noticed when the newly-added PREEMPT_NOTIFIERS in
Kconfig.preempt that was "select"ed from drivers/kvm/Kconfig (therefore)
started giving bogus warnings ('select' used by config symbol 'KVM' refers
to undefined symbol 'PREEMPT_NOTIFIERS') on ia64 builds.
So let's remove the open-coded definition of CONFIG_PREEMPT in
arch/ia64/Kconfig and replace it with just including Kconfig.preempt
instead, like the other archs do.
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add base support for implementing platform_irq_to_vector(), and
then use it on SN2.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Acked-by: John Keller <jpk@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While sending interrupts to a cpu to repeatedly wake a thread, on occasion
that thread will take a full timer tick cycle (4002 usec in my case)
to wakeup.
The problem concerns a race condition in the code around the safe_halt()
call in the default_idle() routine. Setting 'nohalt' on the kernel
command line causes the long wakeups to disappear.
void
default_idle (void)
{
local_irq_enable();
while (!need_resched()) {
--> if (can_do_pal_halt)
--> safe_halt();
else
A timer tick could arrive between the check for !need_resched and the
actual call to safe_halt() (which does a pal call to PAL_HALT_LIGHT).
By the time the timer tick completes, a thread that might now need to run
could get held up for as long as a timer tick waiting for the halted cpu.
I'm proposing that we disable irq's and check need_resched again before
calling safe_halt(). Does anyone see any problem with this approach?
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] ITC: Reduce rating for ITC clock if ITCs are drifty
[IA64] SN2: Fix up sn2_rtc clock
[IA64] Fix wrong access to irq_desc[] in iosapic_register_intr().
[IA64] Fix possible race in destroy_and_reserve_irq()
[IA64] Fix registered interrupt check
[IA64] Remove a few duplicate includes
[IA64] Allow smp_call_function_single() to current cpu
[IA64] fix a few section mismatch warnings
Make sure to reduce the rating of the ITC clock if ITCs are drifty. If they
are drifting then we have not synchronized the ITC values, nor are we doing
the jitter compensation (useless since drift may increase the differentials
arbitrarily).
Without this patch it is possible that the ITC clock becomes selected as
the system clock on systems with drifty ITCs which will result in
nanosleep hanging.
One can still select the itc clock manually on such systems via
clocksource=itc
(Produces nice hangs on SGI Altix.)
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
If the sn2_rtc clock is present then it is a must have since sn2_rtc
provides a synchronized time source on Altix systems. So elevate
the priority to 450. Otherwise the ITC would take precendence. Altix
systems currently do not boot because the ITC clocksource is broken. It
seems to assume that ITCs are synchronized and as a result nanosleep
hangs (may be fixed in a different patch).
While we are at it: Remove the sn2_mc definition. The sn2_rtc has a fixed
address. No point in reading the address from memory. Removing it avoids
touching one cacheline.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
In error path we must unlock irq_desc[irq].lock before we change
'irq'.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove unused TIF_NOTIFY_RESUME flag for all processor architectures. The
flag was not used excecpt on IA-64 where the patch replaces it with
TIF_PERFMON_WORK.
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, destroy_and_reserve_irq() sets irq_status[irq] UNUSED using
clear_irq_vector() and sets irq_status[irq] RSVD using reserve_irq().
But there is a race window because vector_lock is once released between
them. This patch fixes this race window.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the problem that interrupts are not initialized correctly at PCI
hotplug or driver reloading time.
By vector domain change, the iosapic_rte_info structure was changed to
be on the iosapic_intr_info[irq].rtes list even after the interrupts
are unregistered. So iosapic_intr_info[irq].rtes list must not be
checked to see if there are registered interrupts (RTEs) on the
irq. We must check iosapic_intr_info[irq].count counter instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes a few duplicate includes from arch/ia64/
Acked-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This removes the requirement for callers to get_cpu() to check in simple
cases. i386 and x86_64 already received a similar treatment.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the following section mismatch warnings:
WARNING: vmlinux.o(.text+0x41902): Section mismatch: reference to .init.text:__alloc_bootmem (between 'ia64_mca_cpu_init' and 'ia64_do_tlb_purge')
WARNING: vmlinux.o(.text+0x49222): Section mismatch: reference to .init.text:__alloc_bootmem (between 'register_intr' and 'iosapic_register_intr')
WARNING: vmlinux.o(.text+0x62beb2): Section mismatch: reference to .init.text:__alloc_bootmem_node (between 'hubdev_init_node' and 'cnodeid_get_geoid')
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: (28 commits)
[SCSI] mpt fusion: Changes in mptctl.c for logging support
[SCSI] mpt fusion: Changes in mptfc.c mptlan.c mptsas.c and mptspi.c for logging support
[SCSI] mpt fusion: Changes in mptscsih.c for logging support
[SCSI] mpt fusion: Changes in mptbase.c for logging support
[SCSI] mpt fusion: logging support in Kconfig, Makefile, mptbase.h and addition of mptdebug.h
[SCSI] libsas: Fix potential NULL dereference in sas_smp_get_phy_events()
[SCSI] bsg: Fix build for CONFIG_BLOCK=n
[SCSI] aacraid: fix Sunrise Lake reset handling
[SCSI] aacraid: add SCSI SYNCHONIZE_CACHE range checking
[SCSI] add easyRAID to the no report luns blacklist
[SCSI] advansys: lindent and other large, uninteresting changes
[SCSI] aic79xx, aic7xxx: Fix incorrect width setting
[SCSI] qla2xxx: fix to honor ignored parameters in sysfs attributes
[SCSI] aacraid: draw line in sand, sundry cleanup and version update
[SCSI] iscsi_tcp: Turn off bounce buffers
[SCSI] libiscsi: fix cmd seqeunce number checking
[SCSI] iscsi_tcp, ib_iser Enable module refcounting for iscsi host template
[SCSI] libiscsi: make sure session is not blocked when removing host
[SCSI] libsas: Remove PCI dependencies
[SCSI] simscsi: convert to use the data buffer accessors
...
When comparing a pointer, it's clearer to compare it to NULL than to 0.
Signed-off-by: Yoann Padioleau <padator@wanadoo.fr>
Signed-off-by: Tony Luck <tony.luck@intel.com>
b716395e2b added code to handle
a compatability issue with 32bit quota tools, but the new compat
routines are only needed when CONFIG_COMPAT=y (and with this set
to 'n' there are compilation problems since some new typedefs are
not visible).
Reported by Doug Chapman. Fix tuned by a cast of thousands (Andi,
Andreas, Arthur, HPA, Willy)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Forgot to adjust this one with the acpi autoloading patches
in commit 8c8eb78f67
Acked-by: Myron Stowe <myron.stowe@hp.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix wrong return value in parse_vector_domain().
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The ia64's acpi_gsi_to_irq() function assumes irq == vector. But in
fact irq can be different from vector. This patch fix this wrong
assumption.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add some sanity checks into __bind_irq_vector().
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
u32* volatile cyclone_timer means volatile auto pointer to u32,
which is clearly not what had been intended (we never even take
the address of that variable, let alone pass it to something that
could change it behind our back). u32 volatile * is what the
authors apparently wanted to say, but in reality we don't need that
qualifier there at all - it's (properly) only passed to iomem helpers
which takes care of that stuff just fine.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Nail two more simple section mismatch errors
[IA64] fix section mismatch warnings
[IA64] rename partial_page
[IA64] Ensure that machvec is set up takes place before serial console
[IA64] vector-domain - fix vector_table
[IA64] vector-domain - handle assign_irq_vector(AUTO_ASSIGN)
- remove the unnecessary map_single path.
- convert to use the new accessors for the sg lists and the
parameters.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
pcibios_setup (between 'pci_setup' and 'quirk_mellanox_tavor')
setup_profiling_timer (between 'write_profile' and 'delayed_put_task_struct')
Signed-off-by: Tony Luck <tony.luck@intel.com>
In 741f98fe29 Sam added full
checking across the entire vmlinux image. This flushed out
a dozen new section mismatch warnings. Start the whack-a-mole
game again to stomp them out.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Jens has added a partial_page thing in splice whcih conflicts with the ia64
one. Rename ia64 out of the way. (ia64 chose poorly).
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Parse the machvec command line option outside of the early_param()
so that ia64_mv is set before any console intialisation that
may result from early_param parsing.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This change fixes a panic when assign_irq_vector(irq) is called with
irq = AUTO_ASSIGN.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
As it was a synonym for (CONFIG_ACPI && CONFIG_X86),
the ifdefs for it were more clutter than they were worth.
For ia64, just add a few stubs in anticipation of future
S3 or S4 support.
Signed-off-by: Len Brown <len.brown@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Prevent people from directly including <asm/rwsem.h>.
[IA64] remove time interpolator
[IA64] Convert to generic timekeeping/clocksource
[IA64] refresh some config files for 64K pagesize
[IA64] Delete iosapic_free_rte()
[IA64] fallocate system call
[IA64] Enable percpu vector domain for IA64_DIG
[IA64] Enable percpu vector domain for IA64_GENERIC
[IA64] Support irq migration across domain
[IA64] Add support for vector domain
[IA64] Add mapping table between irq and vector
[IA64] Check if irq is sharable
[IA64] Fix invalid irq vector assumption for iosapic
[IA64] Use dynamic irq for iosapic interrupts
[IA64] Use per iosapic lock for indirect iosapic register access
[IA64] Cleanup lock order in iosapic_register_intr
[IA64] Remove duplicated members in iosapic_rte_info
[IA64] Remove block structure for locking in iosapic.c
This is a merge of Peter Keilty's initial patch (which was
revived by Bob Picco) for this with Hidetoshi Seto's fixes
and scaling improvements.
Acked-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Update arch/ia64/defconfig: select 64K pagesize
Same for arch/ia64/configs/tiger_defconfig + CONFIG_COMPAT=n
Signed-off-by: Tony Luck <tony.luck@intel.com>
> arch/ia64/kernel/iosapic.c:597: warning: 'iosapic_free_rte' defined but not used
>
> This isn't spurious, the only call to iosapic_free_rte() has been removed, but there
> is still a call to iosapic_alloc_rte() ... which means we must have a memory leak.
I did it on purpose (and gave the warning a miss...) and I consider
iosapic_free_rte() is no longer needed.
I decided to remain iosapic_rte_info to keep gsi-to-irq binding
after device disable. Indeed it needs some extra memory, but it
is only "sizeof(iosapic_rte_info) * <the number of removed devices>"
bytes and has no memory leak becasue re-enabled devices use the
iosapic_rte_info which they used before disabling.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
sys_fallocate for ia64. This uses an empty slot #1303 erroneously
marked as reserved for move_pages (which had already been allocated
as syscall #1276)
Signed-Off-By: Dave Chinner <dgc@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Since Ingo's recent scheduler rewrite which was merged as commit
0437e109e1 sched_cacheflush is unused.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from
the old mm into the new mm.
We create the new mm before the binfmt code runs, and place the new stack at
the very top of the address space. Once the binfmt code runs and figures out
where the stack should be, we move it downwards.
It is a bit peculiar in that we have one task with two mm's, one of which is
inactive.
[a.p.zijlstra@chello.nl: limit stack size]
Signed-off-by: Ollie Wild <aaw@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <linux-arch@vger.kernel.org>
Cc: Hugh Dickins <hugh@veritas.com>
[bunk@stusta.de: unexport bprm_mm_init]
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently most of the per cpu data, which is accessed by different cpus,
has a ____cacheline_aligned_in_smp attribute. Move all this data to the
new per cpu shared data section: .data.percpu.shared_aligned.
This will seperate the percpu data which is referenced frequently by other
cpus from the local only percpu data.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
per cpu data section contains two types of data. One set which is
exclusively accessed by the local cpu and the other set which is per cpu,
but also shared by remote cpus. In the current kernel, these two sets are
not clearely separated out. This can potentially cause the same data
cacheline shared between the two sets of data, which will result in
unnecessary bouncing of the cacheline between cpus.
One way to fix the problem is to cacheline align the remotely accessed per
cpu data, both at the beginning and at the end. Because of the padding at
both ends, this will likely cause some memory wastage and also the
interface to achieve this is not clean.
This patch:
Moves the remotely accessed per cpu data (which is currently marked
as ____cacheline_aligned_in_smp) into a different section, where all the data
elements are cacheline aligned. And as such, this differentiates the local
only data and remotely accessed data cleanly.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I realise jprobes are a razor-blades-included type of interface, but that
doesn't mean we can't try and make them safer to use. This guy I know once
wrote code like this:
struct jprobe jp = { .kp.symbol_name = "foo", .entry = "jprobe_foo" };
And then his kernel exploded. Oops.
This patch adds an arch hook, arch_deref_entry_point() (I don't like it
either) which takes the void * in a struct jprobe, and gives back the text
address that it represents.
We can then use that in register_jprobe() to check that the entry point we're
passed is actually in the kernel text, rather than just some random value.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch completes Linus's wish that the fault return codes be made into
bit flags, which I agree makes everything nicer. This requires requires
all handle_mm_fault callers to be modified (possibly the modifications
should go further and do things like fault accounting in handle_mm_fault --
however that would be for another patch).
[akpm@linux-foundation.org: fix alpha build]
[akpm@linux-foundation.org: fix s390 build]
[akpm@linux-foundation.org: fix sparc build]
[akpm@linux-foundation.org: fix sparc64 build]
[akpm@linux-foundation.org: fix ia64 build]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Still apparently needs some ARM and PPC loving - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Clean away some code inside some non-existent CONFIG ifdefs
[IA64] ar.itc access must really be after xtime_lock.sequence has been read
[IA64] correctly count CPU objects in the ia64/sn hwperf interface
[IA64] arbitary speed tty ioctl support
[IA64] use machvec=dig on hpzx1 platforms
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Based on usage and testing over the past couple of years, kprobes on
i386, ia64, powerpc and x86_64 is no longer EXPERIMENTAL.
This is a follow-up to Robert P.J. Day's patch making "Instrumentation
support" non-EXPERIMENTAL:
http://marc.info/?l=linux-kernel&m=118396955423812&w=2
Arch maintainers for sparc64, avr32 and s390 need to take a similar call.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the kernel OOPSed or BUGed then it probably should be considered as
tainted. Thus, all subsequent OOPSes and SysRq dumps will report the
tainted kernel. This saves a lot of time explaining oddities in the
calltraces.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Added parisc patch from Matthew Wilson -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds the kernelcore= parameter for x86.
Once all patches are applied, a new command-line parameter exist and a new
sysctl. This patch adds the necessary documentation.
From: Yasunori Goto <y-goto@jp.fujitsu.com>
When "kernelcore" boot option is specified, kernel can't boot up on ia64
because of an infinite loop. In addition, the parsing code can be handled
in an architecture-independent manner.
This patch uses common code to handle the kernelcore= parameter. It is
only available to architectures that support arch-independent zone-sizing
(i.e. define CONFIG_ARCH_POPULATES_NODE_MAP). Other architectures will
ignore the boot parameter.
[bunk@stusta.de: make cmdline_parse_kernelcore() static]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add per-CPU vector domain support for IA64_DIG. It is enabled by
adding the "vector=percpu" boot option.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add per-CPU vector domain support for IA64_GENERIC. It is enabled by
adding the "vector=percpu" boot option.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add support for IRQ migration across vector domain.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add fundamental support for multiple vector domain. There still exists
only one vector domain even with this patch. IRQ migration across
domain is not supported yet by this patch.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add mapping tables between irqs and vectors, and its management code.
This is necessary for supporting multiple vector domain because 1:1
mapping between irq and vector will be changed to n:1.
The irq == vector relationship between irqs and vectors is explicitly
remained for percpu interrupts, platform interrupts, isa IRQs and
vectors assigned using assign_irq_vector() because some programs might
depend on it.
And I should consider the following problem.
When pci drivers enabled/disabled devices dynamically, its irq number
is changed to the different one. Therefore, suspend/resume code may
happen problem.
To fix this problem, I bound gsi to irq.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Need to check if irq is sharable amoung handlers when searching
sharable IOSAPIC irq.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Many of IOSAPIC codes depends on the flollowing assumptions, but these
would become invalid when multiple vector domain will be supported in
the future.
- 1:1 mapping between IRQ and vector
- IRQ == vector
To fix those invalid assumptions, this patch changes iosapic_intr_info[]
to be indexed by irq number instead of vector.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cleanup order of irq_desc.lock and iosapic_lock in
iosapic_register_intr() and iosapic_unregister_intr().
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove duplicated members in iosapic_rte_info in iosapic.c. This patch
has no functional changes.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove unnecessary indent between spin_lock() and spin_unlock() in
iosapic.c. This has no functional changes.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
OpenVZ Linux kernel team has discovered the problem with 32bit quota tools
working on 64bit architectures. In 2.6.10 kernel sys32_quotactl() function
was replaced by sys_quotactl() with the comment "sys_quotactl seems to be
32/64bit clean, enable it for 32bit" However this isn't right. Look at
if_dqblk structure:
struct if_dqblk {
__u64 dqb_bhardlimit;
__u64 dqb_bsoftlimit;
__u64 dqb_curspace;
__u64 dqb_ihardlimit;
__u64 dqb_isoftlimit;
__u64 dqb_curinodes;
__u64 dqb_btime;
__u64 dqb_itime;
__u32 dqb_valid;
};
For 32 bit quota tools sizeof(if_dqblk) == 0x44.
But for 64 bit kernel its size is 0x48, 'cause of alignment!
Thus we got a problem. Attached patch reintroduce sys32_quotactl() function,
that handles this and related situations.
[michal.k.k.piotrowski@gmail.com: build fix]
[akpm@linux-foundation.org: Make it link with CONFIG_QUOTA=n]
Signed-off-by: Vasily Tarasov <vtaras@openvz.org>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is using mmap()'s randomization functionality in such a way that
it maps the main executable of (specially compiled/linked -pie/-fpie)
ET_DYN binaries onto a random address (in cases in which mmap() is allowed
to perform a randomization).
Origin of this patch is in exec-shield
(http://people.redhat.com/mingo/exec-shield/)
[jkosina@suse.cz: pie randomization: fix BAD_ADDR macro]
Signed-off-by: Jan Kratochvil <honza@jikos.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Beacuse SERIAL_PORT_DFNS is removed from include/asm-i386/serial.h and
include/asm-x86_64/serial.h. the serial8250_ports need to be probed late in
serial initializing stage. the console_init=>serial8250_console_init=>
register_console=>serial8250_console_setup will return -ENDEV, and console
ttyS0 can not be enabled at that time. need to wait till uart_add_one_port in
drivers/serial/serial_core.c to call register_console to get console ttyS0.
that is too late.
Make early_uart to use early_param, so uart console can be used earlier. Make
it to be bootconsole with CON_BOOT flag, so can use console handover feature.
and it will switch to corresponding normal serial console automatically.
new command line will be:
console=uart8250,io,0x3f8,9600n8
console=uart8250,mmio,0xff5e0000,115200n8
or
earlycon=uart8250,io,0x3f8,9600n8
earlycon=uart8250,mmio,0xff5e0000,115200n8
it will print in very early stage:
Early serial console at I/O port 0x3f8 (options '9600n8')
console [uart0] enabled
later for console it will print:
console handover: boot [uart0] -> real [ttyS0]
Signed-off-by: <yinghai.lu@sun.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P.J. Day has a script that finds places in the code that
use non-existent CONFIG variables. It complained of two uses in
ia64 specific code: CONFIG_IA64_SDV and CONFIG_KDB (both used in
the hp/sim code).
Signed-off-by: Tony Luck <tony.luck@intel.com>
The ".acq" semantics of the load only apply w.r.t. other data access.
Reading the clock (ar.itc) isn't a data access so strange things can
happen here. Specifically the read of ar.itc can be launched as soon
as the read of xtime_lock.sequence is ISSUED. Since this may cache
miss, and that might cause a thread switch, and there may be cache
contention for the line containing xtime_lock, it may be a long time
before the actual value is returned, so the ar.itc value may be very
stale.
Move the consumption of r28 up before the read of ar.itc to make sure
that we really have got the current value of xtime_lock.sequence
before look at ar.itc.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Correctly count CPU objects for SGI ia64/sn hwperf interface
Signed-off-by: Mark Goodwin <markgw@sgi.com>
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
On HP zx1 machines, the 'machvec=dig' parameter is needed for the
kdump kernel to avoid problems with the HP sba iommu. The problem
is that during the boot of the kdump kernel, the iommu is re-initialized,
so in-flight DMA from improperly shutdown drivers causes an IOTLB
miss which leads to an MCA. With kdump, the idea is to get into the
kdump kernel with as little code as we can, so shutting down drivers
properly is not an option.
The workaround is to add 'machvec=dig' to the kdump kernel boot
parameters. This makes the kdump kernel avoid using the sba iommu
altogether, leaving the IOTLB intact. Any ongoing DMA falls
harmlessly outside the kdump kernel. After the kdump kernel reboots,
all devices will have been shutdown properly and DMA stopped.
This patch pushes that functionality into the sba iommu
initialization code, so that users won't have to find the obscure
documentation telling them about 'machvec=dig'.
This patch only affects HP platforms. It still includes one
extern declaration in the file, because no applicable header file
exists.
Signed-off-by: Terry Loftin <terry.loftin@hp.com>
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Support multiple CPUs going through OS_MCA
[IA64] silence GCC ia64 unused variable warnings
[IA64] prevent MCA when performing MMIO mmap to PCI config space
[IA64] add sn_register_pmi_handler oemcall
[IA64] Stop bit for brl instruction
[IA64] SN: Correct ROM resource length for BIOS copy
[IA64] Don't set psr.ic and psr.i simultaneously
The PCI syscalls are built on every architecture except X86, but only
a few have ever hooked them up. Use a new Kconfig symbol to save a
couple of kB on the architectures that have never used the syscalls.
Tested on x86 and ia64 only.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Linux does not gracefully deal with multiple processors going
through OS_MCA aa part of the same MCA event. The first cpu
into OS_MCA grabs the ia64_mca_serialize lock. Subsequent
cpus wait for that lock, preventing them from reporting in as
rendezvoused. The first cpu waits 5 seconds then complains
that all the cpus have not rendezvoused. The first cpu then
handles its MCA and frees up all the rendezvoused cpus and
releases the ia64_mca_serialize lock. One of the subsequent
cpus going thought OS_MCA then gets the ia64_mca_serialize
lock, waits another 5 seconds and then complains that none of
the other cpus have rendezvoused.
This patch allows multiple CPUs to gracefully go through OS_MCA.
The first CPU into ia64_mca_handler() grabs a mca_count lock.
Subsequent CPUs into ia64_mca_handler() are added to a list of cpus
that need to go through OS_MCA (a bit set in mca_cpu), and report
in as rendezvoused, and but spin waiting their turn.
The first CPU sees everyone rendezvous, handles his MCA, wakes up
one of the other CPUs waiting to process their MCA (by clearing
one mca_cpu bit), and then waits for the other cpus to complete
their MCA handling. The next CPU handles his MCA and the process
repeats until all the CPUs have handled their MCA. When the last
CPU has handled it's MCA, it sets monarch_cpu to -1, releasing all
the CPUs.
In testing this works more reliably and faster.
Thanks to Keith Owens for suggesting numerous improvements
to this code.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tell GCC to stop spewing out unnecessary warnings for unused variables
passed to functions as pointers for ia64 files.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Example memory map (HP rx7640 with 'default' acpiconfig setting, VGA disabled):
0x00000000 - 0x3FFFBFFF supports only WB (cacheable) access
If a user attempts to perform an MMIO mmap (using the PCIIOC_MMAP_IS_MEM ioctl)
to PCI config space (like mmap'ing and accessing memory at 0xA0000),
we will MCA because the kernel will attempt to use a mapping with the UC
attribute.
So check the memory attribute in kern_mmap and the EFI memmap. If WC is
requested, and WC or UC access is supported for the region, allow it.
Otherwise, use the same attribute the kernel uses.
Updates documentation and test cases as well.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
SDM says that brl instruction must be followed by a stop bit.
Fix instance in BRL_COND_FSYS_BUBBLE_DOWN where it isn't.
Signed-off-by: Christian Kandeler <christian.kandeler@hob.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
On SN systems, when setting the IORESOURCE_ROM_BIOS_COPY resource flag,
the resource length should be set to the actual size of the ROM image
so that a call to pci_map_rom() returns the correct size.
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
It's not a good idea to use "ssm psr.ic | psr.i" to simultaneously
enable interrupts and interrupt state collection, the two bits can
take effect asynchronously, so it is possible for an interrupt to
be serviced while psr.ic is still zero.
Signed-off-by: Tony Luck <tony.luck@intel.com>
the SMP load-balancer uses the boot-time migration-cost estimation
code to attempt to improve the quality of balancing. The reason for
this code is that the discrete priority queues do not preserve
the order of scheduling accurately, so the load-balancer skips
tasks that were running on a CPU 'recently'.
this code is fundamental fragile: the boot-time migration cost detector
doesnt really work on systems that had large L3 caches, it caused boot
delays on large systems and the whole cache-hot concept made the
balancing code pretty undeterministic as well.
(and hey, i wrote most of it, so i can say it out loud that it sucks ;-)
under CFS the same purpose of cache affinity can be achieved without
any special cache-hot special-case: tasks are sorted in the 'timeline'
tree and the SMP balancer picks tasks from the left side of the
tree, thus the most cache-cold task is balanced automatically.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This one changes the SN2 specific PCI drivers to use ioremap() for
obtaining the real address to access for the PCI registers instead of
manually calculating them with __IA64_UNCACHED_OFFSET.
The patch should have no real change when running on a normal Linux
kernel, but when running as a paravirtualized it is needed.
Signed-off-by: Jes Sorenson <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Montecito behaves slightly differently than previous processors,
resulting in the MCA due to a failed PIO read to sometimes surfacing
outside the nofault code. Adding an additional or and stop bits
ensures the MCA surfaces in the nofault code.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove duplicate header include line from arch/ia64/kernel/time.c.
Signed-off-by: MUNEDA Takahiro <muneda.takahiro@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Both rp_loc and pfs_loc can be in the register stack area _or_ they can
be in the memory stack area, the latter occurs when a struct pt_regs is
pushed. Correct the validation check on these fields to check for both
stack areas. Not allowing for memory stack locations means no
backtrace past ia64_leave_kernel, or any other code that uses
PT_REGS_UNWIND_INFO.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Replacing (n & (n-1)) in the context of power of 2 checks
with is_power_of_2
Signed-off-by: vignesh babu <vignesh.babu@wipro.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Section mismatch: reference to .init.text:acpi_find_rsdp
(between 'acpi_get_sysname' and 'acpi_request_vector')
acpi_get_sysname() needs to call the __init function acpi_find_rsdp, but it
doesn't have the __init attribute itself, hence the warning. Luckily it is
only called from machvec_init() which has __init attribute, so the fix
is to define acpi_get_sysname() as __init too.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Silly bug in _PDC data setup. Haven't seen any real side-effects of this one
yet. But, needs fixing regardless.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] fix kmalloc(0) in arch/ia64/pci/pci.c
[IA64] Only unwind non-running tasks.
[IA64] Improve unwind checking.
[IA64] Yet another section mismatch warning
[IA64] Fix bogus messages about system calls not implemented.
Hiroyuki Kamezawa reported the problem that pci_acpi_scan_root() of
ia64 might call kmalloc_node() with zero size.
Currently ia64's pci_acpi_scan_root() assumes that _CRS method of root
bridge has at least one resource window. But, the root bridges that
has no resource window must be taken into account.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Unwinding a running task has proven problematic.
In one instance, the running task was attempting to unwind itself and
received an interrupt between when get_wchan allocated local variables on
the stack and when unw_init_from_blocked_task was called which resulted
in unw_init_frame_info to place this tasks task_struct pointer over the
switch stack's ar_bspstore entry.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch adds some sanity checks to keep register and memory stack
pointers in the unw_frame_info structure within the tasks stack address
range.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
reference to .init.data: from .text between 'sn_cpu_init' (at offset 0x1411) and 'nasid_slice_to_cpuid'
reference to .init.data: from .text between 'sn_cpu_init' (at offset 0x1420) and 'nasid_slice_to_cpuid'
The offending .init.data object is shub_1_1_found which should be declared
in __cpuinitdata, not in __initdata
Signed-off-by: Tony Luck <tony.luck@intel.com>
Get rid of the notifier list and call the kprobes code directly
if compiled in. This mirrors the changes that recently went
into powerpc, s390 and sparc64.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Building with GCC 4.2, I get the following error:
CC arch/ia64/kernel/mca.o
arch/ia64/kernel/mca.c:275: error: __ksymtab_ia64_mlogbuf_finish causes a
section type conflict
This is because ia64_mlogbuf_finish is both declared static and exported.
Fix by removing the export (which is unneeded now).
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Previous spelling patch from Simon Arlott broke one spot that
didn't need fixing (reported by Simon within 35 minutes of the
patch ... but not until after I'd applied to GIT and pushed :-(
Signed-off-by: Tony Luck <tony.luck@intel.com>
The current implementation of kdump on INIT events would enter
kdump processing on DIE_INIT_MONARCH_ENTER and DIE_INIT_SLAVE_ENTER
events. Thus, the monarch cpu would go ahead and boot up the kdump
On SN shub2 systems, this out-of-sync situation causes some slave
cpus on different nodes to enter POD.
This patch moves kdump entry points to DIE_INIT_MONARCH_LEAVE and
DIE_INIT_SLAVE_LEAVE. It also sets kdump_in_progress variable in
the DIE_INIT_MONARCH_PROCESS event to not dump all active stack
traces to the console in the case of kdump.
I have tested this patch on an SN machine and a HP RX2600.
Signed-off-by: Jay Lan <jlan@sgi.com>
Acked-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Quicklist support for IA64
[IA64] fix Kprobes reentrancy
[IA64] SN: validate smp_affinity mask on intr redirect
[IA64] drivers/char/snsc_event.c:206: warning: unused variable `p'
[IA64] mca.c:121: warning: 'cpe_poll_timer' defined but not used
[IA64] Fix - Section mismatch: reference to .init.data:mvec_name
[IA64] more warning cleanups
[IA64] Wire up epoll_pwait and utimensat
[IA64] Fix warnings resulting from type-checking in dev_dbg()
[IA64] typo s/kenrel/kernel/
IA64 is the origin of the quicklist implementation. So cut out the pieces
that are now in core code and modify the functions called.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
In case of reentrance i.e when a probe handler calls a functions which
inturn has a probe, we save a previous kprobe information and just single
step the reentrant probe without calling the actual probe handler. During
this reentracy period, if an interrupt occurs and if probe happens to
trigger in the inturrupt path, then we were corrupting the previous kprobe(
as we were overriding the previous kprobe info) info their by crashing the
system. This patch fixes this issues by having a an array of previous
kprobe info struct(with the array size of 2).
This similar technique is not needed on i386 and x86_64 because by default
interrupts are turn off in the break/int3 exception handler.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
On SN, only allow one bit to be set in the smp_affinty mask when
redirecting an interrupt. Currently setting multiple bits is allowed, but
only the first bit is used in determining the CPU to redirect to. This has
caused confusion among some customers.
[akpm@linux-foundation.org: fixes]
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When auditing syscalls that send signals, log the pid and security
context for each target process. Optimize the data collection by
adding a counter for signal-related rules, and avoiding allocating an
aux struct unless we have more than one target process. For process
groups, collect pid/context data in blocks of 16. Move the
audit_signal_info() hook up in check_kill_permission() so we audit
attempts where permission is denied.
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Only shows up while building sim_defconfig because CONFIG_ACPI=n
there, and all of the uses of cpe_poll_timer are inside #ifdef CONFIG_ACPI.
Signed-off-by: Tony Luck <tony.luck@intel.com>
arch/ia64/sn/kernel/xpc_partition.c:578: warning: long unsigned int format, different type arg (arg 3)
arch/ia64/sn/kernel/xpnet.c:349: warning: int format, different type arg (arg 7)
arch/ia64/sn/kernel/xpnet.c:349: warning: int format, different type arg (arg 8)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Lots of places where we passed a "struct pci_device *" rather than
a "struct device *". One place where we used a "%s" in the format,
but forgot to provide an argument.
Acked-by: John Keller <jpk@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Seems more than just deprecated, we can't build using SA_INTERUPT.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] wire up pselect, ppoll
[IA64] Add TIF_RESTORE_SIGMASK
[IA64] unwind did not work for processes born with CLONE_STOPPED
[IA64] Optional method to purge the TLB on SN systems
[IA64] SPIN_LOCK_UNLOCKED macro cleanup in arch/ia64
[IA64-SN2][KJ] mmtimer.c-kzalloc
[IA64] fix stack alignment for ia32 signal handlers
[IA64] - Altix: hotplug after intr redirect can crash system
[IA64] save and restore cpus_allowed in cpu_idle_wait
[IA64] Removal of percpu TR cleanup in kexec code
[IA64] Fix some section mismatch errors
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
sound: convert "sound" subdirectory to UTF-8
MAINTAINERS: Add cxacru website/mailing list
include files: convert "include" subdirectory to UTF-8
general: convert "kernel" subdirectory to UTF-8
documentation: convert the Documentation directory to UTF-8
Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
remove broken URLs from net drivers' output
Magic number prefix consistency change to Documentation/magic-number.txt
trivial: s/i_sem /i_mutex/
fix file specification in comments
drivers/base/platform.c: fix small typo in doc
misc doc and kconfig typos
Remove obsolete fat_cvf help text
Fix occurrences of "the the "
Fix minor typoes in kernel/module.c
Kconfig: Remove reference to external mqueue library
Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
Correct comments in genrtc.c to refer to correct /proc file.
Fix more "deprecated" spellos.
Fix "deprecated" typoes.
...
Fix trivial comment conflict in kernel/relay.c.
This finally renames the thread_info field in task structure to stack, so that
the assumptions about this field are gone and archs have more freedom about
placing the thread_info structure.
Nonbroken archs which have a proper thread pointer can do the access to both
current thread and task structure via a single pointer.
It'll allow for a few more cleanups of the fork code, from which e.g. ia64
could benefit.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
[akpm@linux-foundation.org: build fix]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Andi Kleen <ak@muc.de>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress. This
patch introduces such notifications and causes them to be used during
suspend and resume transitions. It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).
[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minor problem for mainstream. Big problem for checkpoint/restore,
because all the stopped/traced processes are born in this state,
hence they cannot be checkpointed again due to failing unwind.
The problem was identified as assumption in kernel unwind library
that top level frame is different of syscall frame. It is the case
unless process was born with CLONE_STOPPED.
Author: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-Off-By: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-Off-By: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch adds an optional method for purging the TLB on SN IA64 systems.
The change should not affect any non-SN system.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This fixes the setup of the alignment of the signal frame, so that all
signal handlers are run with a properly aligned stack frame.
The current code "over-aligns" the stack pointer so that the stack frame
is effectively always mis-aligned by 4 bytes. But what we really want
is that on function entry ((sp + 4) & 15) == 0, which matches what would
happen if the stack were aligned before a "call" instruction.
i386 and x86_64 are already fixed by d347f37227
Signed-off-by: Markus F.X.J. Oberhumer <markus@oberhumer.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch provides a debugfs knob to turn kprobes on/off
o A new file /debug/kprobes/enabled indicates if kprobes is enabled or
not (default enabled)
o Echoing 0 to this file will disarm all installed probes
o Any new probe registration when disabled will register the probe but
not arm it. A message will be printed out in such a case.
o When a value 1 is echoed to the file, all probes (including ones
registered in the intervening period) will be enabled
o Unregistration will happen irrespective of whether probes are globally
enabled or not.
o Update Documentation/kprobes.txt to reflect these changes. While there
also update the doc to make it current.
We are also looking at providing sysrq key support to tie to the disabling
feature provided by this patch.
[akpm@linux-foundation.org: Use bool like a bool!]
[akpm@linux-foundation.org: add printk facility levels]
[cornelia.huck@de.ibm.com: Add the missing arch_trampoline_kprobe() for s390]
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- consolidate duplicate code in all arch_prepare_kretprobe instances
into common code
- replace various odd helpers that use hlist_for_each_entry to get
the first elemenet of a list with either a hlist_for_each_entry_save
or an opencoded access to the first element in the caller
- inline add_rp_inst into it's only remaining caller
- use kretprobe_inst_table_head instead of opencoding it
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We used to warn unless the EFI system table major revision was exactly 1.
But EFI 2.00 firmware is starting to appear, and the 2.00 changes don't
affect anything in Linux.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In certain cases like when the real return address can't be found or when
the number of tracked calls to a kretprobed function is less than the
number of returns, we may not be able to find the correct return address
after processing a kretprobe. Currently we just do a BUG_ON, but no
information is provided about the actual failing kretprobe.
Print out details of the kretprobe before calling BUG().
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the size of the per-cpu region reserved to save crash notes is
set by the per-architecture value MAX_NOTE_BYTES. Which in turn is
currently set to 1024 on all supported architectures.
While testing ia64 I recently discovered that this value is in fact too
small. The particular setup I was using actually needs 1172 bytes. This
lead to very tedious failure mode where the tail of one elf note would
overwrite the head of another if they ended up being alocated sequentially
by kmalloc, which was often the case.
It seems to me that a far better approach is to caclculate the size that
the area needs to be. This patch does just that.
If a simpler stop-gap patch for ia64 to be squeezed into 2.6.21(.X) is
needed then this should be as easy as making MAX_NOTE_BYTES larger in
arch/asm-ia64/kexec.h. Perhaps 2048 would be a good choice. However, I
think that the approach in this patch is a much more robust idea.
Acked-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.
Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch moves the die notifier handling to common code. Previous
various architectures had exactly the same code for it. Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)
arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at. avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use SLAB_PANIC and delete duplicated panic().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Ian Molton <spyro@f2s.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is to fix many section mismatches of code related to memory hotplug.
I checked compile with memory hotplug on/off on ia64 and x86-64 box.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When redirecting a device interrupt on SN, not all links
between platform specific structures are being updated.
This can result in a system crash if an interrupt
redirection is followed by an unplug of that device.
The complete fix also requires a prom update. Though,
this patch is backward compatable and not dependent on
the prom patch.
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Save and Restore the task's cpus_allowed mask, across the set_cpus_allowed()
in cpu_idle_wait(). Without this, we will endup corrupting task's cpu affinity.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The kexec code wasn't in the kernel when Ken Chen created
this patch (00b65985fb) to
change the mapping of percpu area from a TR to a TC.
Zou Nanhai spotted this extra piece, and Simon Horman
concurred.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Section mismatch: reference to ...
.init.text:prefill_possible_map from .text between 'setup_per_cpu_areas' and 'cpu_init'
.init.text:iosapic_override_isa_irq from .text between 'iosapic_init' and 'iosapic_remove'
Signed-off-by: Tony Luck <tony.luck@intel.com>
Handle MAP_FIXED in ia64 arch_get_unmapped_area and
hugetlb_get_unmapped_area(), just call prepare_hugepage_range in the later and
is_hugepage_only_range() in the former.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: William Irwin <bill.irwin@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we add a new flag so that we can distinguish between the first page and the
tail pages then we can avoid to use page->private in the first page.
page->private == page for the first page, so there is no real information in
there.
Freeing up page->private makes the use of compound pages more transparent.
They become more usable like real pages. Right now we have to be careful f.e.
if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
can then no longer use the private field. This is one of the issues that
cause us not to support debugging for page size slabs in SLAB.
Having page->private available for SLUB would allow more meta information in
the page struct. I can probably avoid the 16 bit ints that I have in there
right now.
Also if page->private is available then a compound page may be equipped with
buffer heads. This may free up the way for filesystems to support larger
blocks than page size.
We add PageTail as an alias of PageReclaim. Compound pages cannot currently
be reclaimed. Because of the alias one needs to check PageCompound first.
The RFC for the this approach was discussed at
http://marc.info/?t=117574302800001&r=1&w=2
[nacc@us.ibm.com: fix hugetlbfs]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
set_irq_msi() currently connects an irq_desc to an msi_desc. The archs call
it at some point in their setup routine, and then the generic code sets up the
reverse mapping from the msi_desc back to the irq.
set_irq_msi() should do both connections, making it the one and only call
required to connect an irq with it's MSI desc and vice versa.
The arch code MUST call set_irq_msi(), and it must do so only once it's sure
it's not going to fail the irq allocation.
Given that there's no need for the arch to return the irq anymore, the return
value from the arch setup routine just becomes 0 for success and anything else
for failure.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Allows architectures to advertise that they support MSI rather than listing
each architecture as a PCI_MSI dependency.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I noticed that many source files include <linux/pci.h> while they do
not appear to need it. Here is an attempt to clean it all up.
In order to find all possibly affected files, I searched for all
files including <linux/pci.h> but without any other occurence of "pci"
or "PCI". I removed the include statement from all of these, then I
compiled an allmodconfig kernel on both i386 and x86_64 and fixed the
false positives manually.
My tests covered 66% of the affected files, so there could be false
positives remaining. Untested files are:
arch/alpha/kernel/err_common.c
arch/alpha/kernel/err_ev6.c
arch/alpha/kernel/err_ev7.c
arch/ia64/sn/kernel/huberror.c
arch/ia64/sn/kernel/xpnet.c
arch/m68knommu/kernel/dma.c
arch/mips/lib/iomap.c
arch/powerpc/platforms/pseries/ras.c
arch/ppc/8260_io/enet.c
arch/ppc/8260_io/fcc_enet.c
arch/ppc/8xx_io/enet.c
arch/ppc/syslib/ppc4xx_sgdma.c
arch/sh64/mach-cayman/iomap.c
arch/xtensa/kernel/xtensa_ksyms.c
arch/xtensa/platform-iss/setup.c
drivers/i2c/busses/i2c-at91.c
drivers/i2c/busses/i2c-mpc.c
drivers/media/video/saa711x.c
drivers/misc/hdpuftrs/hdpu_cpustate.c
drivers/misc/hdpuftrs/hdpu_nexus.c
drivers/net/au1000_eth.c
drivers/net/fec_8xx/fec_main.c
drivers/net/fec_8xx/fec_mii.c
drivers/net/fs_enet/fs_enet-main.c
drivers/net/fs_enet/mac-fcc.c
drivers/net/fs_enet/mac-fec.c
drivers/net/fs_enet/mac-scc.c
drivers/net/fs_enet/mii-bitbang.c
drivers/net/fs_enet/mii-fec.c
drivers/net/ibm_emac/ibm_emac_core.c
drivers/net/lasi_82596.c
drivers/parisc/hppb.c
drivers/sbus/sbus.c
drivers/video/g364fb.c
drivers/video/platinumfb.c
drivers/video/stifb.c
drivers/video/valkyriefb.c
include/asm-arm/arch-ixp4xx/dma.h
sound/oss/au1550_ac97.c
I would welcome test reports for these files. I am fine with removing
the untested files from the patch if the general opinion is that these
changes aren't safe. The tested part would still be nice to have.
Note that this patch depends on another header fixup patch I submitted
to LKML yesterday:
[PATCH] scatterlist.h needs types.h
http://lkml.org/lkml/2007/3/01/141
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fixes for various arch compilation problems:
(*) Missing module exports.
(*) Variable name collision when rxkad and af_rxrpc both built in
(rxrpc_debug).
(*) Large constant representation problem (AFS_UUID_TO_UNIX_TIME).
(*) Configuration dependencies.
(*) printk() format warnings.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To clearly state the intent of copying to linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
To clearly state the intent of copying from linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now to convert the last one, skb->data, that will allow many simplifications
and removal of some of the offset helpers.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)
Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One less thing for drivers writers to worry about.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On a SGI Altix TIOCP based PCI bus we need to include the ATE_PIO attribute
bit if we're mapping a 32bit MSI address.
Signed-off-by: Mike Habeck <habeck@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
My patch: git commit=95235ca2c20ac0b31a8eb39e2d599bcc3e9c9a10 introduced a bug
in IA64 cpuinfo output.
Patch changed the proc_freq from 1HZ resolution to 1KHz resolution, but left
format string unchanged at " %lu.%06lu". Below is the fix.
Thanks to Bjorn for catching this.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Example memory map (from HP sx1000 with VGA enabled):
0x00000 - 0x9FFFF supports only WB (cacheable) access
0xA0000 - 0xBFFFF supports only UC (uncacheable) access
0xC0000 - 0xFFFFF supports only WB (cacheable) access
Some versions of X map the entire 0x00000-0xFFFFF area at once. With the
example above, this mmap must fail because there's no memory attribute that's
safe for the entire area.
Prior to this patch, we performed the mmap with a UC mapping. When X
accessed the WB memory at 0xC0000, it caused an MCA. The crash can happen
when mapping 0xC0000 from either /dev/mem or a /sys/.../legacy_mem file.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Allow cacheable mmaps of legacy_mem if WB access is supported for the region.
The "legacy_mem" file often contains a shadow option ROM, and some versions of
X depend on this.
Tim Yamin <plasm@roo.me.uk> reported that this change fixes X on a Dell
PowerEdge 3250.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Example memory map (from HP sx1000 with VGA enabled):
0x00000 - 0x9FFFF supports only WB (cacheable) access
0xA0000 - 0xBFFFF supports only UC (uncacheable) access
0xC0000 - 0xFFFFF supports only WB (cacheable) access
pci_read_rom() indirectly uses ioremap(0xC0000) to read the shadow VGA option
ROM. ioremap() used to default to a 16MB or 64MB UC kernel identity mapping,
which would cause an MCA when reading 0xC0000 since only WB is supported there.
X uses reads the option ROM to initialize devices. A smaller test case is:
# echo 1 > /sys/bus/pci/devices/0000:aa:03.0/rom
# cp /sys/bus/pci/devices/0000:aa:03.0/rom x
To avoid this, we can use the same ioremap_page_range() strategy that most
architectures use for all ioremaps. These page table mappings come out of the
vmalloc area. On ia64, these are in region 5 (0xA... addresses) and typically
use 16KB or 64KB mappings instead of 16MB or 64MB mappings. The smaller
mappings give more flexibility to use the correct attributes.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
No functional change, just use the same names as i386.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Skip clock calibration if cpu being brought online is exactly the same
speed, stepping, etc., as the previous cpu. This significantly reduces
the time to boot very large systems.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
ia64 expects following vm layout:
== low memory
[register-stack grows up]
[memory-stack grows down]
== high memory
But the code assigns the base of the register stack at the
maximum stack size offset from the fixed address where the
stack *might* start. Stack randomization will result in the
memory stack starting at a lower address than this, and if the
user has set a low stack limit with "ulimit -s", then you can
end up with the register stack above the memory stack (or if
you were very unlucky right on top of it!).
Fix: Calculate the base address for the register stack starting
from the actual address of the memory stack.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The following 'if' statement in ia64_setup_msi_irq() always fails even
if create_irq() returns <0 value, because variable 'irq' is defined as
unsigned int. It would cause invalid memory access.
irq = create_irq();
if (irq < 0)
return irq;
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
So I think the right solution is to simply make pci_enable_device just
flip enable bits and move the rest of the work someplace else.
However a thorough cleanup is a little extreme for this point in the
release cycle, so I think a quick hack that makes the code not stomp the
irq when msi irq's are enabled should be the first fix. Then we can
later make the code not change the irqs at all.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI: IA64: fix %ll build warnings
ACPI: IA64: fix allnoconfig build
ACPI: Only use IPI on known broken machines (AMD, Dothan/BaniasPentium M)
ACPI: ibm-acpi: allow module to load when acpi notifiers can't be set (v2)
ACPI: parse 2nd MADT by default
ACPICA: revert "acpi_serialize" changes
sony-laptop: MAINTAINERS fix entry, add L: and W:
ACPI: resolve HP nx6125 S3 immediate wakeup regression
ACPI: Add support to parse 2nd MADT
In sn_io_slot_fixup(), the parent is re-set from the bus to
io(port|mem)_resource because the address is changed in a way that it's not
child of the bus any more.
However, only the root is set but not the parent/child/sibling relationship in
the resource tree which causes 'cat /proc/iomem' to stop after this memory
area. Depding on the poition in the tree the iomem may be nearly completely
empty.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Acked-by: John Keller <jpk@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When booting an SN system without specifing a console
(i.e., no "console=" on boot line), the system will hang during
boot at the point where /sbin/init is run.
The problem is that vga_console_iobase is not converted to a
virtual address before storing in io_space[0].mmio_base.
The conversion was happening in sn_scan_pcdp(), but not in
setup_vga_console().
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Clearly should be checking for "val == DIE_INIT_SLAVE_ENTER".
Signed-off-by: Jay Lan <jlan@sgi.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
If a system consists of mixed processor types, kmalloc()
can be called before the per-cpu data page is initialized.
If the slab contains sufficient memory, then kmalloc() works
ok. However, if the slabs are empty, slab calls the memory
allocator. This requires per-cpu data (NODE_DATA()) & the
cpu dies.
Also noted by Russ Anderson who had a very similar patch.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
We have seen bad_pte_print when testing crashdump on an SN machine in
recent 2.6.20 kernel. There are tons of bad pte print (pfn < max_low_pfn)
reports when the crash kernel boots up, all those reported bad pages
are inside initmem range; That is because if the crash kernel code and
data happens to be at the beginning of the 1st node. build_node_maps in
discontig.c will bypass reserved regions with filter_rsvd_memory. Since
min_low_pfn is calculated in build_node_map, so in this case, min_low_pfn
will be greater than kernel code and data.
Because pages inside initmem are freed and reused later, we saw
pfn_valid check fail on those pages.
I think this theoretically happen on a normal kernel. When I check
min_low_pfn and max_low_pfn calculation in contig.c and discontig.c.
I found more issues than this.
1. min_low_pfn and max_low_pfn calculation is inconsistent between
contig.c and discontig.c,
min_low_pfn is calculated as the first page number of boot memmap in
contig.c (Why? Though this may work at the most of the time, I don't
think it is the right logic). It is calculated as the lowest physical
memory page number bypass reserved regions in discontig.c.
max_low_pfn is calculated include reserved regions in contig.c. It is
calculated exclude reserved regions in discontig.c.
2. If kernel code and data region is happen to be at the begin or the
end of physical memory, when min_low_pfn and max_low_pfn calculation is
bypassed kernel code and data, pages in initmem will report bad.
3. initrd is also in reserved regions, if it is at the begin or at the
end of physical memory, kernel will refuse to reuse the memory. Because
the virt_addr_valid check in free_initrd_mem.
So it is better to fix and clean up those issues.
Calculate min_low_pfn and max_low_pfn in a consistent way.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Acked-by: Jay Lan <jlan@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The evils of Kconfig's select bite us once again...
ia64/Kconfig selects ACPI, which depends on PM.
But select ignores dependencies, allnoconfig
chooses CONFIG_PM=n, and thus the menu of sub-options
under ACPI vanish, which breaks the build.
Manually select PM along with ACPI for now.
Some day, we should delete them both, or fix select.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
In sn_io_slot_fixup(), the parent is re-set from the bus to
io(port|mem)_resource because the address is changed in a way that it's not
child of the bus any more.
However, only the root is set but not the parent/child/sibling relationship
in the resource tree which causes 'cat /proc/iomem' to stop after this
memory area. Depding on the poition in the tree the iomem may be nearly
completely empty.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: John Keller <jpk@sgi.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bring defconfig, tiger_defconfig and zx1_defconfig up to date. Also
sprinkle KEXEC and KDUMP combinations around liberally so that my
usual regression test builds will see all combinations:
tiger_defconfig gets KEXEC=y, CRASH_DUMP=n
zx1_defconfig gets KEXEC=n, CRASH_DUMP=y
defconfig gets KEXEC=y, CRASH_DUMP=y
others remain at KEXEC=n, CRASH_DUMP=n
Signed-off-by: Tony Luck <tomy.luck@intel.com>
kdump_find_rsvd_region() is only called by
reserve_memory() which is in __init, so it seems that
kdump_find_rsvd_region() should also be in there.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The ptrace misses clearing the syscall trace flag.
The increased syscall overhead is retained after the trace is finished.
This case happens when strace is terminated by force.
Signed-off-by: Akiyama, Nobuyuki <akiyama.nobuyuk@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Grammatical fixes (s/freezed/frozen/)
Make some variables static
Change a C++ "//" comment to "/* ... */"
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Similar to memory error recovery, when a cache error is consumed
by a user process terminate the user instead of crashing the system.
Signed-off-by: Russ Anderson (rja@sgi.com)
Acked-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Jack Steiner noticed that duplicate TLB DTC entries do not cause a
linux panic. See discussion:
http://www.gelato.unsw.edu.au/archives/linux-ia64/0307/6108.html
The current TLB recovery code is recovering from the duplicate itr.d
dropins, masking the underlying problem. This change modifies
the MCA recovery code to look for the TLB check signature of the
duplicate TLB entry and panic in that case.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
On 1.6GHz Montectio Tiger4, the following performance data is measured with
kernel built with defconfig which has NUMA configured:
Fastest sys_getcpu: 502 itc counts.
Fastest fsys_getcpu: 28 itc counts.
fsys_getcpu performance is largly impacted by whether data (node_to_cpu_map
etc) is in cache. It can take fsys_getcpu up to ~150 itc counts in cold
cache case.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
efi_initialize_iomem_resources() is declared in both include/linux/efi.h
and arch/ia64/kernel/setup.c. This patch removes the latter.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Berhhard Walle noted that on his HP rx8640 he ended up with saved_max_pfn
smaller than the highest address of system ram in /proc/iomem and proposed
a patch to base the address on the unrounded and unfiltered EFI memory
map address. Simon Horman and Magnus Damm suggested that the whole test
be moved earlier in the function. This is the combination of both of
these patches.
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch fixes boot failure because irq_desc->mask() is NULL.
- Added mask/unmask functions to ia64's irq desc function table.
- rename hw_interrupt_type to irq_chip. hw_interrupt_type is old name.
- Tony: Added same change to arch/ia64/sn/kernel/irq.c as pointed out
by Eric Biederman ... mask/unmask functions there can be no-op.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The address where the ELF core header is stored is passed to the secondary
kernel as a kernel command line option. The memory area for this header is
also marked as a separate EFI memory descriptor on ia64.
The separate EFI memory descriptor is at the moment of the type
EFI_UNUSABLE_MEMORY. With such a type the secondary kernel skips over the
entire memory granule (config option, 16M or 64M) when detecting memory.
If we are lucky we will just lose some memory, but if we happen to have
data in the same granule (such as an initramfs image), then this data will
never get mapped and the kernel bombs out when trying to access it.
So this is an attempt to fix this by changing the EFI memory descriptor
type into EFI_LOADER_DATA. This type is the same type used for the kernel
data and for initramfs. In the secondary kernel we then handle the ELF
core header data the same way as we handle the initramfs image.
This patch contains the kernel changes to make this happen. Pretty
straightforward, we reserve the area in reserve_memory(). The address for
the area comes from the kernel command line and the size comes from the
specialized EFI parsing function vmcore_find_descriptor_size().
The kexec-tools-testing code for this can be found here:
http://lists.osdl.org/pipermail/fastboot/2007-February/005983.html
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Cc: Simon Horman <horms@verge.net.au>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Perfmon associates vmalloc()ed memory with a file descriptor, and installs
a vma mapping that memory. Unfortunately, the vm_file field is not filled
in, so processes with mappings to that memory do not prevent the file from
being closed and the memory freed. This results in use-after-free bugs and
multiple freeing of pages, etc.
I saw this bug on an Altix on SLES9. Haven't reproduced upstream but it
looks like the same issue is there.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add VERIFY_WRITE check in the beginning like compat_sys_getdents() (EINVAL vs
EFAULT).
Signed-off-by: Alexandr Andreev <aandreev@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Always build ia64 xor.o because multiple config options now depend on it.
Necessary to build .20-mm* on ia64 when, e.g., CONFIG_ASYNC_TX_DMA is
defined. Don't know if '_ASYNC_TX_DMA makes sense on ia64. If not, maybe
Kconfig should preclude it.
Could have defined a Kconfig option that defaults to true if MD_RAID456 ||
ASYNC_TX_DMA to control building of xor.o, but xor.o is only 848 bytes and
this IS ia64...
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Bob Picco <bob.picco@hp.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Make saved_max_pfn point to max_pfn of entire system.
Without this patch is so that vmcore is zero length on ia64. This is
because saved_max_pfn was wrongly being set to the max_pfn of the crash
kernel's address space, rather than the max_pfg on the physical memory of
the machine - the whole purpose of vmcore is to access physical memory that
is not part of the crash kernel's addresss space.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Sort-Of-Acked-By: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
To provide compatibilty with SN kernels that do and do not
have ACPI IO support, the SN PROM must build different
versions of some ACPI tables based on which kernel is booting.
As such, the tables may have to change at kernel boot time.
By default, prior to kernel boot, the PROM builds an empty
DSDT (header only) and no SSDTs. If an ACPI capable kernel
boots, the kernel will notify the PROM, at platform setup time,
and the PROM will build full DSDT and SSDT tables.
With the latest changes to acpi_table_init(), the table lengths
are saved, and when our PROM changes them, the changes are not seen,
and the kernel will crash on boot. Because of issues with kexec support,
we are not able to create the tables prior to acpi_table_init().
As a result, we are making a second call to acpi_table_init() to
process the rebuilt DSDT and SSDTs.
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This patch replaces all instances of "set_native_irq_info(irq, mask)"
with "irq_desc[irq].affinity = mask". The latter form is clearer
uses fewer abstractions, and makes access to this field uniform
accross different architectures.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SN code to initialize the Hub/TIO infrastructure needs to
execute before bus scanning. This was previously done with
an early call to acpi_bus_register_driver(). But now that
ACPI is using the Linux driver model, a driver cannot be registered
that early. Make changes to have the init routines invoked via
calls to acpi_get_devices().
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
Documentation/kernel-docs.txt update.
arch/cris: typo in KERN_INFO
Storage class should be before const qualifier
kernel/printk.c: comment fix
update I/O sched Kconfig help texts - CFQ is now default, not AS.
Remove duplicate listing of Cris arch from README
kbuild: more doc. cleanups
doc: make doc. for maxcpus= more visible
drivers/net/eexpress.c: remove duplicate comment
add a help text for BLK_DEV_GENERIC
correct a dead URL in the IP_MULTICAST help text
fix the BAYCOM_SER_HDX help text
fix SCSI_SCAN_ASYNC help text
trivial documentation patch for platform.txt
Fix typos concerning hierarchy
Fix comment typo "spin_lock_irqrestore".
Fix misspellings of "agressive".
drivers/scsi/a100u2w.c: trivial typo patch
Correct trivial typo in log2.h.
Remove useless FIND_FIRST_BIT() macro from cardbus.c.
...
Fix "spin_lock_irqrestore" to "spin_unlock_irqrestore."
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
acpi_boot_init() is making a bad check on the return
status from acpi_table_parse(). acpi_table_parse() now
returns zero on success, one on failure.
Signed-off-by: Aaron Young <ayoung@sgi.com>
Signed-off-by: Len Brown <len.brown@intel.com>
If an ATA drive uses legacy mode, ata driver will choose 14 and 15
as the fixed irq number. On ia64 platform, such numbers are GSI and
should be converted to irq vector.
Below patch against kernel 2.6.20 fixes it.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name. Which is
pain for caching and the proc interface never implemented.
I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.
So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Corey Minyard <minyard@acm.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This convters the sysctl ctl_tables to use C99 initializers. While I was
looking at it I discovered it was using a portion of the sysctl binary
addresses space under CTL_KERN KERN_OSTYPE which was completely inappropriate.
So I completely removed all of the sysctl binary names, to remove and avoid
the ABI conflict.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
By not using the enumeration in sysctl.h (or even understanding it) the SN
platform placed their arch specific xpc directory on top of CTL_KERN and only
because they didn't have 4 entries in their xpc directory got lucky and didn't
break glibc.
This is totally irresponsible. So this patch entirely removes sys_sysctl
support from their sysctl code. Hopefully they don't have ascii name
conflicts as well.
And now that they have no ABI numbers add them to the end instead of the
sysctl list instead of the head so nothing else will be overridden.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
[akpm@osdl.org: sparc64 fix]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove in-source externs, linux/init.h is included in all cases.
This is a fixups for "Dynamic kernel command-line" patch.
It also includes some uml __init fixups so that we can __initdata also its
command_line.
Signed-off-by: Alon Bar-Lev <alon.barlev@gmail.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1. Rename saved_command_line into boot_command_line.
2. Set command_line as __initdata.
[akpm@osdl.org: move some declarations to the right place]
Signed-off-by: Alon Bar-Lev <alon.barlev@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Part of long forgotten patch
http://groups.google.com/group/fa.linux.kernel/msg/e98e941ce1cf29f6?dmode=source
Since then, m32r grabbed two copies.
Leave s390 copy because of important absence of CONFIG_VT, but remove
references to non-existent timerlist_lock. ia64 also loses timerlist_lock.
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fix-rmmod-read-write-races-in-proc-entries.patch doesn't want dynamically
allocated ->proc_fops, because it will set it to NULL at module unload time.
Regardless of module status, switch to statically allocated ->proc_fops which
leads to simpler code without wrappers.
AFAICS, also fix the following bug: "sn_force_interrupt" proc entry set
->write for itself, but was created with 0444 permissions. Change to 0644.
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I noticed that almost all architectures implemented exactly the same
sys32_sysinfo... except parisc, where a bug was to be found in handling of
the uptime. So let's remove a whole whack of code for fun and profit.
Cribbed compat_sys_sysinfo from x86_64's implementation, since I figured it
would be the best tested.
This patch incorporates Arnd's suggestion of not using set_fs/get_fs, but
instead extracting out the common code from sys_sysinfo.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Roland McGrath <roland@redhat.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update all arch/*/kernel/vmlinux.lds.S to not include space for initramfs
when CONFIG_BLK_DEV_INITRAMFS is not selected. This saves another 4 kbytes
on most platfoms (some reserve PAGE_SIZE for initramfs).
Signed-off-by: Jean-Paul Saman <jean-paul.saman@nxp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ZONE_DMA less operation for IA64 SGI platform
Disable ZONE_DMA for SGI SN2. All memory is addressable by all devices and we
do not need any special memory pool.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch simply defines CONFIG_ZONE_DMA for all arches. We later do special
things with CONFIG_ZONE_DMA after the VM and an arch are prepared to work
without ZONE_DMA.
CONFIG_ZONE_DMA can be defined in two ways depending on how an architecture
handles ISA DMA.
First if CONFIG_GENERIC_ISA_DMA is set by the arch then we know that the arch
needs ZONE_DMA because ISA DMA devices are supported. We can catch this in
mm/Kconfig and do not need to modify arch code.
Second, arches may use ZONE_DMA in an unknown way. We set CONFIG_ZONE_DMA for
all arches that do not set CONFIG_GENERIC_ISA_DMA in order to insure backwards
compatibility. The arches may later undefine ZONE_DMA if their arch code has
been verified to not depend on ZONE_DMA.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <willy@debian.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Function is unnecessary now. We can use the summing features of the ZVCs to
get the values we need.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The arch hooks arch_setup_msi_irq and arch_teardown_msi_irq are now
responsible for allocating and freeing the linux irq in addition to
setting up the the linux irq to work with the interrupt.
arch_setup_msi_irq now takes a pci_device and a msi_desc and returns
an irq.
With this change in place this code should be useable by all platforms
except those that won't let the OS touch the hardware like ppc RTAS.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We need to be able to get from an irq number to a struct msi_desc.
The msi_desc array in msi.c had several short comings the big one was
that it could not be used outside of msi.c. Using irq_data in struct
irq_desc almost worked except on some architectures irq_data needs to
be used for something else.
So this patch adds a msi_desc pointer to irq_desc, adds the appropriate
wrappers and changes all of the msi code to use them.
The dynamic_irq_init/cleanup code was tweaked to ensure the new
field is left in a well defined state.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (140 commits)
ACPICA: reduce table header messages to fit within 80 columns
asus-laptop: merge with ACPICA table update
ACPI: bay: Convert ACPI Bay driver to be compatible with sysfs update.
ACPI: bay: new driver is EXPERIMENTAL
ACPI: bay: make drive_bays static
ACPI: bay: make bay a platform driver
ACPI: bay: remove prototype procfs code
ACPI: bay: delete unused variable
ACPI: bay: new driver adding removable drive bay support
ACPI: dock: check if parent is on dock
ACPICA: fix gcc build warnings
Altix: Add ACPI SSDT PCI device support (hotplug)
Altix: ACPI SSDT PCI device support
ACPICA: reduce conflicts with Altix patch series
ACPI_NUMA: fix HP IA64 simulator issue with extended memory domain
ACPI: fix HP RX2600 IA64 boot
ACPI: build fix for IBM x440 - CONFIG_X86_SUMMIT
ACPICA: Update version to 20070126
ACPICA: Fix for incorrect parameter passed to AcpiTbDeleteTable during table load.
ACPICA: Update copyright to 2007.
...
Instead of pinning per-cpu TLB into a DTR, use DTC. This will free up
one TLB entry for application, or even kernel if access pattern to
per-cpu data area has high temporal locality.
Since per-cpu is mapped at the top of region 7 address, we just need to
add special case in alt_dtlb_miss. The physical address of per-cpu data
is already conveniently stored in IA64_KR(PER_CPU_DATA). Latency for
alt_dtlb_miss is not affected as we can hide all the latency. It was
measured that alt_dtlb_miss handler has 23 cycles latency before and
after the patch.
The performance effect is massive for applications that put lots of tlb
pressure on CPU. Workload environment like database online transaction
processing or application uses tera-byte of memory would benefit the most.
Measurement with industry standard database benchmark shown an upward
of 1.6% gain. While smaller workloads like cpu, java also showing small
improvement.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
It's not efficient to use a per-cpu variable just to store
how many physical stack register a cpu has. Ever since the
incarnation of ia64 up till upcoming Montecito processor, that
variable has "glued" to 96. Having a variable in memory means
that the kernel is burning an extra cacheline access on every
syscall and kernel exit path. Such "static" value is better
served with the instruction patching utility exists today.
Convert ia64_phys_stacked_size_p8 into dynamic insn patching.
This also has a pleasant side effect of eliminating access to
per-cpu area while psr.ic=0 in the kernel exit path. (fixable
for per-cpu DTC work, but why bother?)
There are some concerns with the default value that the instruc-
tion encoded in the kernel image. It shouldn't be concerned.
The reasons are:
(1) cpu_init() is called at CPU initialization. In there, we
find out physical stack register size from PAL and patch
two instructions in kernel exit code. The code in question
can not be executed before the patching is done.
(2) current implementation stores zero in ia64_phys_stacked_size_p8,
and that's what the current kernel exit path loads the value with.
With the new code, it is equivalent that we store reg size 96
in ia64_phys_stacked_size_p8, thus creating a better safety net.
Given (1) above can never fail, having (2) is just a bonus.
All in all, this patch allow one less memory reference in the kernel
exit path, thus reducing syscall and interrupt return latency; and
avoid polluting potential useful data in the CPU cache.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch fixes
- marking I-cache clean of pages DMAed to now only done for IA64
- broken multiple inclusion in include/asm-x86_64/swiotlb.h
- missing call to mark_clean in swiotlb_sync_sg()
- a (perhaps only theoretical) issue in swiotlb_dma_supported() when
io_tlb_end is exactly at the end of memory
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
getcpu system call returns cpu# and node# on which this system call and
its caller are running. This patch hooks up its implementation on IA64.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Eliminate arch specific memory_present call ia64 NUMA by utilizing
sparse_memory_present_with_active_regions.
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
On the ia64 architecture only this patch upgrades show_mem() for sparse
memory to be the same as it was for discontig memory. It has been shown to
work on NUMA and flatmem architectures.
Signed-off-by: George Beshers <gbeshers@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add missing exports to allow several drivers to be built as module with
CONFIG_IA64_HP_ZX1_SWIOTLB.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Occasionally the FSYS_RETURN patch list can have an odd length, causing other
data structures to get out of alignment. In OpenVZ it is odd and we get
misaligned kernel image, which does not boot.
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While pursuing and unrelated issue with 64Mb granules I noticed a problem
related to inconsistent use of add_active_range. There doesn't appear any
reason to me why FLATMEM versus DISCONTIG_MEM should register memory to
add_active_range with different code. So I've changed the code into a
common implementation.
The other subtle issue fixed by this patch was calling add_active_range in
count_node_pages before granule aligning is performed. We were lucky with
16MB granules but not so with 64MB granules. count_node_pages has reserved
regions filtered out and as a consequence linked kernel text and data
aren't covered by calls to count_node_pages. So linked kernel regions
wasn't reported to add_active_regions. This resulted in free_initmem
causing numerous bad_page reports. This won't occur with this patch
because now all known memory regions are reported by
register_active_ranges.
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Don't force CONFIG_SWIOTLB on when not actually needed (i.e. HP_ZX1 and
SGI_SN2).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When we offline a CPU, migrate_irqs() tries to determine whether the
affinity bits of the IRQ descriptor match any of the remaining online
CPUs. If not, it fixes up the interrupt to point somewhere else.
Unfortunately, if an IRQ is unregistered the IRQ descriptor may still
have affinity to the CPU being offlined, but the no_irq_chip handler
doesn't provide a set_affinity function. This causes us to hit the
WARN_ON in migrate_irqs().
The easiest solution seems to be setting all the bits in the affinity
mask when the last interrupt is removed from the vector. I hit this on
an older kernel with Xen/ia64 using driver domains (so it probably needs
more testing on upstream). Xen essentially uses the bind/unbind
interface in sysfs to unregister a device from a driver and thus
unregister the interrupt.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
All IA64 systems except IA64_HP_SIM include ACPI and PCI.
So prevent IA64 Kconfigs that try to do irritating things like building
PCI without building ACPI.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch fixes a NULL-pointer dereference in ia64_machine_kexec().
The variable ia64_kimage is set in machine_kexec_prepare() which is
called from sys_kexec_load(). If kdump wasn't configured before,
ia64_kimage is NULL. machine_kdump_on_init() passes ia64_kimage() to
machine_kexec() which assumes a valid value.
The patch also adds a few sanity checks for the image to simplify
debugging of similar problems in future.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
I encountered one problem when running ptrace test case the situation
is this: traced process's syscall parameter needs to be accessed, but
for sys_clone system call with clone_flag (CLONE_VFORK | CLONE_VM |
SIGCHLD) parameter. This syscall's parameter accessing result is wrong.
The reason is that vforked child process mm point is the same, but
tgid is different. Without this patch find_thread_for_addr will return
vforked process if vforked process is also stopped, but not the thread
which calls vfork syscall.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Some patches have turned up on xen-devel recently to convert strcpy()
to safer alternatives and so forth. While reviewing those patches
I noticed that the features string building could be cleaned up.
This patch uses snprintf() instead of strcpy() and direct character
pointer manipulation. It makes the features string building safe and
gets rid of the special case for features output in show_cpuinfo()
Additionally I removed the (int) cast of ARRAY_SIZE, which seems to
serve no purpose.
Signed-off-by: Aron Griffis <aron@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
As is pointed out in
http://www.gelato.org/community/view_linear.php?id=1_1036&from=authors&value=Ian%20Wienand#1_1039,
if single step on break instruction, the break fault has higher
priority than the single-step trap. When the break fault handler
is entered, it advances the IP by 1 instruction so break instruction
single-stepping is skipped, actually it is next instruction which
is single stepped.
This patch modifies this, it adds TIF_SINGLESTEP bit for thread
flags, and generate a fake sigtrap when single stepping break
instruction. Test case in attachment can verify this. Any comments
is welcome.
Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This moves the ia64 implementation of machine_shutdown() from
machine_kexec.c to process.c, which is in keeping with the implelmentation
on other architectures, and seems like a much more appropriate home for it.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove the Remove inline declaration of efi_get_pal_addr() as it is
declared in linux/efi.h.
Signed-Off-By: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
linux/uaccess.h was being included, but it seems that
really the following includes are needed.
asm/page.h: for __va() and PAGE_SHIFT
asm/uaccess.h: for copy_to_user()
I guess that linux/uaccess.h pulls in both asm/page.h and asm/uaccess.h.
I notices this while backporting the code to xen's linux-2.6.16.33,
which does not have linux/uaccess.h. I'm posting it as I think it is a
correct, though somewhat cosmetic fix.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix a typo in the saved_max_pfn description in contig.c
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Set saved_max_pfn when discontig memory is in use.
This sets up saved_max_pfn when disctontig memory is in use.
This mirrors the code for contig memory.
This patch does not entirely solve the problem of making vmcore work,
however it does appear to be neccessary. Please consider applying.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Kexec support for 2.6.20 on ia64 does not build properly using a config
made up by CONFIG_SMP=n and CONFIG_HOTPLUG_CPU=n:
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Jay Lan <jlan@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The SN Altix platform does not conform to the IOSAPIC IRQ routing model.
Add code in acpi_unregister_gsi() to check if (acpi_irq_model ==
ACPI_IRQ_MODEL_PLATFORM) and return.
Due to an oversight, this code was not added previously when
similar code was added to acpi_register_gsi().
http://marc.theaimsgroup.com/?l=linux-acpi&m=116680983430121&w=2
Signed-off-by: John Keller <jpk@sgi.com>
Acked-by: Len Brown <lenb@kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes up ia64 kexec support for HP rx2620 hardware. It does
this by skipping migration of already disabled irqs. This is most likely a
problem on other ia64 platforms as well, but I've only been able to
reproduce it on one machine so far.
The full story is that handle_bad_irq() gets invoked before starting the
new kernel without this patch. This seems to happen when fixup_irqs()
calls generic_handle_irq() on already migrated (and disabled) irqs. So by
avoiding migration of disabled irqs we stay away of handle_bad_irq().
The code has been tested on three different ia64 machines, all with good
results. It is possible to trigger the same bug by offlining a processor
using echo 0 > /sys/devices/system/cpu/cpuX/online.
More detailed information is available in the following mail thread:
http://lists.osdl.org/pipermail/fastboot/2007-January/thread.html#5774
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Zou, Nanhai <nanhai.zou@intel.com>
Acked-by: Jay Lan <jlan@sgi.com>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add SN platform support for running with an ACPI
capable PROM that defines PCI devices in SSDT
tables. There is a SSDT table for every occupied
slot on a root bus, containing info for every
PPB and/or device on the bus. The SSDTs will be
dynamically loaded/unloaded at hotplug enable/disable.
Platform specific information that is currently
passed via a SAL call, will now be passed via the
Vendor resource in the ACPI Device object(s) defined
in each SSDT.
Signed-off-by: John Keller <jpk@sgi.com>
Cc: Greg KH <greg@kroah.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
ACPI 3.0 incorporated the SRAT spec, upping the table version to 2,
and extending the size of the proximity domain from 1-byte to 4-bytes.
This extension was into a reserved field that firmware should
set to 0, but the HP simulator had non-zero values there
resulting in unexpected huge numbers.
So mask the domain down to 8-bits for now.
A more general fix will be to check the table version
supplied by firmware and get paranoid about reserved fields.
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This kernel driver patch provides sysfs interface for user application to
call pal_mc_error_inject() procedure.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch has kenrel configuration changes for the MC Error Injection
Tool.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
Revert "ACPI: ibm-acpi: make non-generic bay support optional"
ACPI: update MAINTAINERS
ACPI: schedule obsolete features for deletion
ACPI: delete two spurious ACPI messages
ACPI: rename cstate_entry_s to cstate_entry
ACPI: ec: enable printk on cmdline use
ACPI: Altix: ACPI _PRT support
Fix an oops experienced on the Cell architecture when init-time functions,
early_*(), are called at runtime. It alters the call paths to make sure
that the callers explicitly say whether the call is being made on behalf of
a hotplug even, or happening at boot-time.
It has been compile tested on ppc64, ia64, s390, i386 and x86_64.
Acked-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide ACPI _PRT support for SN Altix systems.
The SN Altix platform does not conform to the
IOSAPIC IRQ routing model, so a new acpi_irq_model
(ACPI_IRQ_MODEL_PLATFORM) has been defined. The SN
platform specific code sets acpi_irq_model to
this new value, and keys off of it in acpi_register_gsi()
to avoid the iosapic code path.
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (68 commits)
ACPI: replace kmalloc+memset with kzalloc
ACPI: Add support for acpi_load_table/acpi_unload_table_id
fbdev: update after backlight argument change
ACPI: video: Add dev argument for backlight_device_register
ACPI: Implement acpi_video_get_next_level()
ACPI: Kconfig - depend on PM rather than selecting it
ACPI: fix NULL check in drivers/acpi/osl.c
ACPI: make drivers/acpi/ec.c:ec_ecdt static
ACPI: prevent processor module from loading on failures
ACPI: fix single linked list manipulation
ACPI: ibm_acpi: allow clean removal
ACPI: fix git automerge failure
ACPI: ibm_acpi: respond to workqueue update
ACPI: dock: add uevent to indicate change in device status
ACPI: ec: Lindent once again
ACPI: ec: Change #define to enums there possible.
ACPI: ec: Style changes.
ACPI: ec: Acquire Global Lock under EC mutex.
ACPI: ec: Drop udelay() from poll mode. Loop by reading status field instead.
ACPI: ec: Rename gpe_bit to gpe
...
Fernando Lopez-Lezcano reported frequent scheduling latencies and audio
xruns starting at the 2.6.18-rt kernel, and those problems persisted all
until current -rt kernels. The latencies were serious and unjustified by
system load, often in the milliseconds range.
After a patient and heroic multi-month effort of Fernando, where he
tested dozens of kernels, tried various configs, boot options,
test-patches of mine and provided latency traces of those incidents, the
following 'smoking gun' trace was captured by him:
_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
IRQ_19-1479 1D..1 0us : __trace_start_sched_wakeup (try_to_wake_up)
IRQ_19-1479 1D..1 0us : __trace_start_sched_wakeup <<...>-5856> (37 0)
IRQ_19-1479 1D..1 0us : __trace_start_sched_wakeup (c01262ba 0 0)
IRQ_19-1479 1D..1 0us : resched_task (try_to_wake_up)
IRQ_19-1479 1D..1 0us : __spin_unlock_irqrestore (try_to_wake_up)
...
<idle>-0 1...1 11us!: default_idle (cpu_idle)
...
<idle>-0 0Dn.1 602us : smp_apic_timer_interrupt (c0103baf 1 0)
...
<...>-5856 0D..2 618us : __switch_to (__schedule)
<...>-5856 0D..2 618us : __schedule <<idle>-0> (20 162)
<...>-5856 0D..2 619us : __spin_unlock_irq (__schedule)
<...>-5856 0...1 619us : trace_stop_sched_switched (__schedule)
<...>-5856 0D..1 619us : trace_stop_sched_switched <<...>-5856> (37 0)
what is visible in this trace is that CPU#1 ran try_to_wake_up() for
PID:5856, it placed PID:5856 on CPU#0's runqueue and ran resched_task()
for CPU#0. But it decided to not send an IPI that no CPU - due to
TS_POLLING. But CPU#0 never woke up after its NEED_RESCHED bit was set,
and only rescheduled to PID:5856 upon the next lapic timer IRQ. The
result was a 600+ usecs latency and a missed wakeup!
the bug turned out to be an idle-wakeup bug introduced into the mainline
kernel this summer via an optimization in the x86_64 tree:
commit 495ab9c045
Author: Andi Kleen <ak@suse.de>
Date: Mon Jun 26 13:59:11 2006 +0200
[PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status
During some profiling I noticed that default_idle causes a lot of
memory traffic. I think that is caused by the atomic operations
to clear/set the polling flag in thread_info. There is actually
no reason to make this atomic - only the idle thread does it
to itself, other CPUs only read it. So I moved it into ti->status.
the problem is this type of change:
if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
- clear_thread_flag(TIF_POLLING_NRFLAG);
+ current_thread_info()->status &= ~TS_POLLING;
smp_mb__after_clear_bit();
while (!need_resched()) {
local_irq_disable();
this changes clear_thread_flag() to an explicit clearing of TS_POLLING.
clear_thread_flag() is defined as:
clear_bit(flag, &ti->flags);
and clear_bit() is a LOCK-ed atomic instruction on all x86 platforms:
static inline void clear_bit(int nr, volatile unsigned long * addr)
{
__asm__ __volatile__( LOCK_PREFIX
"btrl %1,%0"
hence smp_mb__after_clear_bit() is defined as a simple compile barrier:
#define smp_mb__after_clear_bit() barrier()
but the explicit TS_POLLING clearing introduced by the patch:
+ current_thread_info()->status &= ~TS_POLLING;
is not an atomic op! So the clearing of the TS_POLLING bit is freely
reorderable with the reading of the NEED_RESCHED bit - and both now
reside in different memory addresses.
CPU idle wakeup very much depends on ordered memory ops, the clearing of
the TS_POLLING flag must always be done before we test need_resched()
and hit the idle instruction(s). [Symmetrically, the wakeup code needs
to set NEED_RESCHED before it tests the TS_POLLING flag, so memory
ordering is paramount.]
Fernando's dual-core Athlon64 system has a sufficiently advanced memory
ordering model so that it triggered this scenario very often.
( And it also turned out that the reason why these latencies never
triggered on my testsystems is that i routinely use idle=poll, which
was the only idle variant not affected by this bug. )
The fix is to change the smp_mb__after_clear_bit() to an smp_mb(), to
act as an absolute barrier between the TS_POLLING write and the
NEED_RESCHED read. This affects almost all idling methods (default,
ACPI, APM), on all 3 x86 architectures: i386, x86_64, ia64.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Run this:
#!/bin/sh
for f in $(grep -Erl "\([^\)]*\) *k[cmz]alloc" *) ; do
echo "De-casting $f..."
perl -pi -e "s/ ?= ?\([^\)]*\) *(k[cmz]alloc) *\(/ = \1\(/" $f
done
And then go through and reinstate those cases where code is casting pointers
to non-pointers.
And then drop a few hunks which conflicted with outstanding work.
Cc: Russell King <rmk@arm.linux.org.uk>, Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Paul Fulghum <paulkf@microgate.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Karsten Keil <kkeil@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Steven French <sfrench@us.ibm.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On IA64 there exists some special instructions which
always need to be executed regradless of qp bits, such
as com.crel.unc, tbit.trel.unc etc.
This patch clears qp bits when inserting kprobe trap code
and disables probepoint on slot 1 for these special
instructions.
Signed-off-by: bibo,mao <bibo.mao@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Because slot 1 of one instr bundle crosses border of two consecutive
8-bytes, kprobe on slot 1 is disabled. This patch enables kprobe on
slot1, it only replaces higher 8-bytes of the instruction bundle and
changes the exception code to ignore the low 12 bits of the break
number (which is across the border in the lower 8-bytes of the bundle).
For those instructions which must execute regardless qp bits,
kprobe on slot 1 is still disabled.
Signed-off-by: bibo,mao <bibo.mao@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Stephane thought he saw a problem here (but was just confused
by the return value from ia64_pal_get_brand_info()). But we
should be more defensive here in case an prototype PAL for
a future processor doesn't implement this PAL call.
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch eliminates a potential deadlock that is possible when XPC
disconnects a channel to a partition that has gone down. This deadlock will
occur if at least one of the kthreads created by XPC for the purpose of making
callouts to the channel's registerer is detained in the registerer and will
not be returning back to XPC until some registerer request occurs on the now
downed partition. The potential for a deadlock is removed by ensuring that
there always is a kthread available to make the channel disconnecting callout
to the registerer.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Improve the scalability of the fpswa code that rate-limits
logging of messages.
There are 2 distinctly different problems in this code.
1) If prctl is used to disable logging, last_time is never
updated. The result is that fpu_swa_count is zeroed out on
EVERY fp fault. This causes a very very hot cache line.
The fix reduces the wallclock time of a 1024p FP exception test
from 28734 sec to 19 sec!!!
2) On VERY large systems, excessive messages are logged because
multiple cpus can each reset or increment fpu_swa_count at
about the same time. The result is that hundreds of messages
are logged each second. The fixes reduces the logging rate
to ~1 per second.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This warning only shows up with CONFIG_VIRTUAL_MEM_MAP=y and
CONFIG_FLATMEM=y.
There is only one caller left for register_active_ranges() from the
contig.c code ... so it doesn't need to pick up the node number, the
node number is always zero.
Signed-off-by: Tony Luck <tony.luck@intel.com>
* Make NORET_TYPE and ATTRIB_NORET in line with the
declaration for other architectures
* Add parameter names
Signed-Off-By: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There seems to be a value in both allowing the kernel to determine
the base offset of the crashkernel automatically and allowing
users's to sepcify it.
The old behaviour on ia64, which is still the current behaviour on
most architectures is for the user to always specify the address.
Recently ia64 was changed so that it is always automatically determined.
With this patch the kernel automatically determines the offset if
the supplied value is 0, otherwise it uses the value provided.
This should probably be backed by a documentation change.
Signed-Off-By: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Actually, on reflection I think that there is a good case for
keeping the options separate. I am thinking particularly of people
who want a very small crashdump kernel and thus don't want to compile
in kexec.
The patch below should fix things up so that all valid combinations of
KEXEC, CRASH_DUMP and VMCORE compile cleanly - VMCORE depends on
CRASH_DUMP which is why I said valid combinations. In a nutshell
it just untangles unrelated code and switches around a few defines.
Please note that it creats a new file, arch/ia64/kernel/crash_dump.c
This is in keeping with the i386 implementation.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is an SN specific patch.
Architectually, cpu_init is always called twice on cpu 0
and thus resulted in two SN_SAL_SET_CPU_NUMBER calls.
This was harmless in production kernel; however, it can
cause problem on booting up a crashdump kernel at Altix.
Here is the patch that detects the second sn_cpu_init
call and skips the second call to SN_SAL_SET_CPU_NUMBER.
Signed-Off-By: Jay Lan <jlan@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This facility provides three entry points:
ilog2() Log base 2 of unsigned long
ilog2_u32() Log base 2 of u32
ilog2_u64() Log base 2 of u64
These facilities can either be used inside functions on dynamic data:
int do_something(long q)
{
...;
y = ilog2(x)
...;
}
Or can be used to statically initialise global variables with constant values:
unsigned n = ilog2(27);
When performing static initialisation, the compiler will report "error:
initializer element is not constant" if asked to take a log of zero or of
something not reducible to a constant. They treat negative numbers as
unsigned.
When not dealing with a constant, they fall back to using fls() which permits
them to use arch-specific log calculation instructions - such as BSR on
x86/x86_64 or SCAN on FRV - if available.
[akpm@osdl.org: MMC fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Howells <dhowells@redhat.com>
Cc: Wojtek Kaniewski <wojtekka@toxygen.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'release' of master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] replace kmalloc+memset with kzalloc
[IA64] resolve name clash by renaming is_available_memory()
[IA64] Need export for csum_ipv6_magic
[IA64] Fix DISCONTIGMEM without VIRTUAL_MEM_MAP
[PATCH] Add support for type argument in PAL_GET_PSTATE
[IA64] tidy up return value of ip_fast_csum
[IA64] implement csum_ipv6_magic for ia64.
[IA64] More Itanium PAL spec updates
[IA64] Update processor_info features
[IA64] Add se bit to Processor State Parameter structure
[IA64] Add dp bit to cache and bus check structs
[IA64] SN: Correctly update smp_affinty mask
[IA64] sparse cleanups
[IA64] IA64 Kexec/kdump
Replace kmalloc+memset with kzalloc
Signed-off-by: Yan Burman <burman.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There is a name clash with ia64 arch code in Andrew's tree. Rename
is_avialable_memory() to is_memory_available() to avoid the clash.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Now we have our own highly optimized assembly code version of
this routine (Thanks Ken!) we should export it so that it can
be used.
Signed-off-by: Tony Luck <tony.luck@intel.com>
PAL_GET_PSTATE accepts a type argument to return different kinds of
frequency information.
Refer: Intel ItaniumArchitecture Software Developer's Manual -
Volume 2: System Architecture, Revision 2.2
(http://developer.intel.com/design/itanium/manuals/245318.htm)
Add the support for type argument and use Instantaneous frequency
in the acpi driver.
Also fix a bug, where in return value of PAL_GET_PSTATE was getting compared
with 'control' bits instead of 'status' bits.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While working on implementing csum_ipv6_magic, I noticed that current
version of ip_fast_csum will potentially return bits above "unsigned
short" as 1. While no harm is done right now because all call sites
will chop off the upper bits when it uses the return value. However,
this is still dangerous and buggy. Here is a patch to enforce that the
function really returns unsigned short in the native register format.
The fix is free as there are plenty open slot to add one more asm instruction.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The asm version is 4.4 times faster than the generic C version and
10X smaller in code size.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
On Altix systems, the /proc/irq/nn/smp_affinity mask is not being setup
at device iniitalization, or updated after an interrupt redirection.
This patch resolves those issues.
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Changes and updates.
1. Remove fake rendz path and related code according to discuss with Khalid Aziz.
2. fc.i offset fix in relocate_kernel.S.
3. iospic shutdown code eoi and mask race fix from Fujitsu.
4. Warm boot hook in machine_kexec to SN SAL code from Jack Steiner.
5. Send slave to SAL slave loop patch from Jay Lan.
6. Kdump on non-recoverable MCA event patch from Jay Lan
7. Use CTL_UNNUMBERED in kdump_on_init sysctl.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
generating compiler warnings of unused symbols, hence forcing people to add
#ifdefs.
the compiler can skip truly unused functions just fine:
text data bss dec hex filename
1624412 728710 3674856 6027978 5bfaca vmlinux.before
1624412 728710 3674856 6027978 5bfaca vmlinux.after
[akpm@osdl.org: topology.c fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When we are unregistering a kprobe-booster, we can't release its
instruction buffer immediately on the preemptive kernel, because some
processes might be preempted on the buffer. The freeze_processes() and
thaw_processes() functions can clean most of processes up from the buffer.
There are still some non-frozen threads who have the PF_NOFREEZE flag. If
those threads are sleeping (not preempted) at the known place outside the
buffer, we can ensure safety of freeing.
However, the processing of this check routine takes a long time. So, this
patch introduces the garbage collection mechanism of insn_slot. It also
introduces the "dirty" flag to free_insn_slot because of efficiency.
The "clean" instruction slots (dirty flag is cleared) are released
immediately. But the "dirty" slots which are used by boosted kprobes, are
marked as garbages. collect_garbage_slots() will be invoked to release
"dirty" slots if there are more than INSNS_PER_PAGE garbage slots or if
there are no unused slots.
Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "bibo,mao" <bibo.mao@intel.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Yumiko Sugita <yumiko.sugita.yf@hitachi.com>
Cc: Satoshi Oshima <soshima@redhat.com>
Cc: Hideo Aoki <haoki@redhat.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Define elf_addr_t in linux/elf.h. The size of the type is determined using
ELF_CLASS. This allows us to remove the defines that today are spread all
over .c and .h files.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Cc: Daniel Jacobowitz <drow@false.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace all uses of kmem_cache_t with struct kmem_cache.
The patch was generated using the following script:
#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#
set -e
for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done
The script was run like this
sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_KERNEL is an alias of GFP_KERNEL.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Following up with the work on shared page table done by Dave McCracken. This
set of patch target shared page table for hugetlb memory only.
The shared page table is particular useful in the situation of large number of
independent processes sharing large shared memory segments. In the normal
page case, the amount of memory saved from process' page table is quite
significant. For hugetlb, the saving on page table memory is not the primary
objective (as hugetlb itself already cuts down page table overhead
significantly), instead, the purpose of using shared page table on hugetlb is
to allow faster TLB refill and smaller cache pollution upon TLB miss.
With PT sharing, pte entries are shared among hundreds of processes, the cache
consumption used by all the page table is smaller and in return, application
gets much higher cache hit ratio. One other effect is that cache hit ratio
with hardware page walker hitting on pte in cache will be higher and this
helps to reduce tlb miss latency. These two effects contribute to higher
application performance.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Cc: Dave McCracken <dmccr@us.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the 'no_control' field in the cpu struct to a more positive
and better term 'hotpluggable'. And change(/cleanup) the logic accordingly.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
The recent change to convert the is_enabled flag in the PCI device to an
atomic count broke the IA64 compilation.
As pcibios_disable_device is only ever called if the reference count
is zero, convert the if to a BUG_ON.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix up arch-specific work items where possible to use the new work_struct and
delayed_work structs.
Three places that enqueue bits of their stack and then return have been marked
with #error as this is not permitted.
Signed-Off-By: David Howells <dhowells@redhat.com>
* sanitize prototypes, annotate
* ntohs -> shift in checksum calculations
* kill access_ok() in csum_partial_copy_from_user
* collapse do_csum_partial_copy_from_user
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support a shadowed ROM when running with an ACPI capable PROM.
Define a new dev.resource flag IORESOURCE_ROM_BIOS_COPY to
describe the case of a BIOS shadowed ROM, which can then
be used to avoid pci_map_rom() making an unneeded call to
pci_enable_rom().
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
First phase in introducing ACPI support to SN.
In this phase, when running with an ACPI capable PROM,
the DSDT will define the root busses and all SN nodes
(SGIHUB, SGITIO). An ACPI bus driver will be registered
for the node devices, with the acpi_pci_root_driver being
used for the root busses. An ACPI vendor descriptor is
now used to pass platform specific information for both
nodes and busses, eliminating the need for the current
SAL calls. Also, with ACPI support, SN fixup code is no longer
needed to initiate the PCI bus scans, as the acpi_pci_root_driver
does that.
However, to maintain backward compatibility with non-ACPI capable
PROMs, none of the current 'fixup' code can been deleted, though
much restructuring has been done. For example, the bulk of the code
in io_common.c is relocated code that is now common regardless
of what PROM is running, while io_acpi_init.c and io_init.c contain
routines specific to an ACPI or non ACPI capable PROM respectively.
A new pci bus fixup platform vector has been created to provide
a hook for invoking platform specific bus fixup from pcibios_fixup_bus().
The size of io_space[] has been increased to support systems with
large IO configurations.
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The pci_generic_prep_mwi() code does everything that pcibios_prep_mwi()
does on ia64. All we need to do is be sure that pci_cache_line_size
is set appropriately, and we can delete pcibios_prep_mwi().
Using SMP_CACHE_BYTES as the default was wrong on uniprocessor machines
as it is only 8 bytes. The default in the generic code of L1_CACHE_BYTES
is at least as good.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix various .c/.h typos in comments (no code changes).
Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
The HP_SIMSCSI driver can't be built as a module (unhealthy dependencies on
things that shouldn't really be exported).
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Use generic_handle_irq() to handle mixed-type irq handling.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
`typename' is going away and is usually uninitialised anwyay.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When called to do a transfer that has a start offset within the cache
line which is uneven between source and destination and a length which
terminates the source of the copy exactly on a cache line, one extra
line gets copied into a temporary buffer. This is normally not an issue
since the buffer is a kernel buffer and only the requested information
gets copied into the user buffer.
The problem arises when the source ends at the very last physical page
of memory. That last cache line does not exist and results in the SHUB
chip raising an MCA.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
(David:)
If hugetlbfs_file_mmap() returns a failure to do_mmap_pgoff() - for example,
because the given file offset is not hugepage aligned - then do_mmap_pgoff
will go to the unmap_and_free_vma backout path.
But at this stage the vma hasn't been marked as hugepage, and the backout path
will call unmap_region() on it. That will eventually call down to the
non-hugepage version of unmap_page_range(). On ppc64, at least, that will
cause serious problems if there are any existing hugepage pagetable entries in
the vicinity - for example if there are any other hugepage mappings under the
same PUD. unmap_page_range() will trigger a bad_pud() on the hugepage pud
entries. I suspect this will also cause bad problems on ia64, though I don't
have a machine to test it on.
(Hugh:)
prepare_hugepage_range() should check file offset alignment when it checks
virtual address and length, to stop MAP_FIXED with a bad huge offset from
unmapping before it fails further down. PowerPC should apply the same
prepare_hugepage_range alignment checks as ia64 and all the others do.
Then none of the alignment checks in hugetlbfs_file_mmap are required (nor
is the check for too small a mapping); but even so, move up setting of
VM_HUGETLB and add a comment to warn of what David Gibson discovered - if
hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region
when unwinding from error will go the non-huge way, which may cause bad
behaviour on architectures (powerpc and ia64) which segregate their huge
mappings into a separate region of the address space.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Adam Litke <agl@us.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix MSPEC driver to build for non SN2 enabled configs as the driver should
work in cached and uncached modes (no fetchop) on these systems. In
addition make MSPEC select IA64_UNCACHED_ALLOCATOR, which is required for
it and move it to arch/ia64/Kconfig to avoid warnings on non ia64
architectures running allmodconfig. Once the Kconfig code is fixed, we can
move it back.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Cc: Fernando Luis Vzquez Cao <fernando@oss.ntt.co.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When ACPI && NUMA, pxm_to_node is used and it exists in drivers/acpi/numa.c
Tony said:
The patch makes sense ... if you pick both of "ACPI" and "NUMA", then you
need (and should automatically be given) ACPI_NUMA too.
The only open question is whether there is a better way of getting there.
Perhaps with less configuration options in the first place? We are heading
towards a future where so many systems will be NUMA that there would seem to
be little benefit in keeping ACPI_NUMA separate from ACPI ... but perhaps
we aren't quite there yet.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>
Cc: Len Brown <lenb@kernel.org>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'release' of master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Correct definition of handle_IPI
[IA64] move SAL_CACHE_FLUSH check later in boot
[IA64] MCA recovery: Montecito support
[IA64] cpu-hotplug: Fixing confliction between CPU hot-add and IPI
[IA64] don't double >> PAGE_SHIFT pointer for /dev/kmem access
The declaration of handle_IPI in arch/ia64/kernel/smp.c was changed but
not the definition of this function. Remove struct pt_regs from
handle_IPI().
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The check to see if the firmware drops interrupts during a
SAL_CACHE_FLUSH is done to early in the boot. SAL_CACHE_FLUSH expects
to be able to make PAL calls in virtual mode, on some cell based
machines a fault occurs causing a MCA. This patch moves the check
after mmu_context_init so the TLB and VHPT are properly setup.
Signed-off-by Troy Heber <troy.heber@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The information in MCA records is filled in slightly differently on
Montecito than on Madison/McKinley. Usually, the cache check and bus
check target identifiers have the same address. On Montecito the
cache check and bus check target identifiers can be different if
a corrected error (ie SBE or unconsumed poison data) was encountered and
then an uncorrected error (ie DBE) was consumed. In that case, the
cache check target identifier is the physical address of the DBE (that
caused the MCA to surface) while the bus check target identifier is the
physical address of the SBE. This patch correctly finds the target
identifier that triggered the MCA.
If there are multiple valid cache target identifiers in the same
error record then use the one with the lowest cache level.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6:
PCI: Remove quirk_via_abnormal_poweroff
PCI: reset pci device state to unknown state for resume
PCI: x86-64: mmconfig missing printk levels
PCI: fix pci_fixup_video as it blows up on sparc64
acpiphp: fix latch status
Add a vmlinux.lds.h helper macro for defining the eight-level initcall table,
teach all the architectures to use it.
This is a prerequisite for a patch which performs initcall synchronisation for
multithreaded-probing.
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
[ Added AVR32 as well ]
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reverts much of the original pci_fixup_video change and makes it
work for all arches that need it.
fixed, and tested on x86, x86_64 and IA64 dig.
Signed-off-by: Eiichiro Oiwa <eiichiro.oiwa.nm@hitachi.com>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Count the number of "resched" interrupts that each cpu receives.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The current sn2_defconfig is obsolete, in particular the recent ATA
changes are a pain, so here's a patch to get things in sync ... at
least for a day or two :)
Update sn_defconfig to match current community kernel tree.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reformat to fit in 80 columns. Fix a couple typos. Remove
a couple unused labels.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Linux maps PAL instructions with an ITR, but uses a DTC for PAL data.
Section 11.10.2.1.3, "Making PAL Procedures Calls in Physical or Virtual
Mode," of the SDM (rev 2.2), says we must therefore make all PAL calls
with PSR.ic = 1 so that Linux can handle any TLB faults.
PAL_CALL_IC_OFF is currently unused, and as long as we use the ITR + DTC
strategy, we can't use it. So remove it. I also removed the code in
ia64_pal_call_static() that conditionally cleared PSR.ic.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Allow pending IPIs to interrupt a timer interrupt that is looping
in the do_timer() "while" loop in timer_interrupt(). (Interrupts are
allowed at only 1 spot in the code).
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Missed one piece of ia64 fallout from the global IRQ patch
7d12e780e0
Perfmon interrupt handler needs to use get_irq_regs() too.
Acked-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Arch-independent zone-sizing is using indices instead of symbolic names to
offset within an array related to zones (max_zone_pfns). The unintended
impact is that ZONE_DMA and ZONE_NORMAL is initialised on powerpc instead
of ZONE_DMA and ZONE_HIGHMEM when CONFIG_HIGHMEM is set. As a result, the
the machine fails to boot but will boot with CONFIG_HIGHMEM turned off.
The following patch properly initialises the max_zone_pfns[] array and uses
symbolic names instead of indices in each architecture using
arch-independent zone-sizing. Two users have successfully booted their
powerpcs with it (one an ibook G4). It has also been boot tested on x86,
x86_64, ppc64 and ia64. Please merge for 2.6.19-rc2.
Credit to Benjamin Herrenschmidt for identifying the bug and rolling the
first fix. Additional credit to Johannes Berg and Andreas Schwab for
reporting the problem and testing on powerpc.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cast to (void *) in request_irq() argument is stupid and
only hides problems...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use the new typedef for interrupt handler function pointers rather than
actually spelling out the full thing each time. This was scripted with the
following small shell script:
#!/bin/sh
egrep -nHrl -e 'irqreturn_t[ ]*[(][*]' $* |
while read i
do
echo $i
perl -pi -e 's/irqreturn_t\s*[(]\s*[*]\s*([_a-zA-Z0-9]*)\s*[)]\s*[(]\s*int\s*,\s*void\s*[*]\s*[)]/irq_handler_t \1/g' $i || exit $?
done
Signed-Off-By: David Howells <dhowells@redhat.com>
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.
The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around. On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).
Where appropriate, an arch may override the generic storage facility and do
something different with the variable. On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.
Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions. Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller. A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.
I've build this code with allyesconfig for x86_64 and i386. I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.
This will affect all archs. Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
struct pt_regs *old_regs = set_irq_regs(regs);
And put the old one back at the end:
set_irq_regs(old_regs);
Don't pass regs through to generic_handle_irq() or __do_IRQ().
In timer_interrupt(), this sort of change will be necessary:
- update_process_times(user_mode(regs));
- profile_tick(CPU_PROFILING, regs);
+ update_process_times(user_mode(get_irq_regs()));
+ profile_tick(CPU_PROFILING);
I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().
Some notes on the interrupt handling in the drivers:
(*) input_dev() is now gone entirely. The regs pointer is no longer stored in
the input_dev struct.
(*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does
something different depending on whether it's been supplied with a regs
pointer or not.
(*) Various IRQ handler function pointers have been moved to type
irq_handler_t.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
This is just a few makefile tweaks and some file renames.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <greg@kroah.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently we attempt to predict how many irqs we will be able to allocate with
msi using pci_vector_resources and some complicated accounting, and then we
only allow each device as many irqs as we think are available on average.
Only the s2io driver even takes advantage of this feature all other drivers
have a fixed number of irqs they need and bail if they can't get them.
pci_vector_resources is inaccurate if anyone ever frees an irq. The whole
implmentation is racy. The current irq limit policy does not appear to make
sense with current drivers. So I have simplified things. We can revisit this
we we need a more sophisticated policy.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Many files include the filename at the beginning, serveral used a wrong one.
Signed-off-by: Uwe Zeisberger <Uwe_Zeisberger@digi.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
The patch below corrects multiple occurances of "the the"
typos across several files, both in source comments and KConfig files.
There is no actual code changed, only text. Note this only affects the /arch
directory, and I believe I could find many more elsewhere. :)
Signed-off-by: Adrian Bunk <bunk@stusta.de>
These patches make the kernel pass 64-bit inode numbers internally when
communicating to userspace, even on a 32-bit system. They are required
because some filesystems have intrinsic 64-bit inode numbers: NFS3+ and XFS
for example. The 64-bit inode numbers are then propagated to userspace
automatically where the arch supports it.
Problems have been seen with userspace (eg: ld.so) using the 64-bit inode
number returned by stat64() or getdents64() to differentiate files, and
failing because the 64-bit inode number space was compressed to 32-bits, and
so overlaps occur.
This patch:
Make filldir_t take a 64-bit inode number and struct kstat carry a 64-bit
inode number so that 64-bit inode numbers can be passed back to userspace.
The stat functions then returns the full 64-bit inode number where
available and where possible. If it is not possible to represent the inode
number supplied by the filesystem in the field provided by userspace, then
error EOVERFLOW will be issued.
Similarly, the getdents/readdir functions now pass the full 64-bit inode
number to userspace where possible, returning EOVERFLOW instead when a
directory entry is encountered that can't be properly represented.
Note that this means that some inodes will not be stat'able on a 32-bit
system with old libraries where they were before - but it does mean that
there will be no ambiguity over what a 32-bit inode number refers to.
Note similarly that directory scans may be cut short with an error on a
32-bit system with old libraries where the scan would work before for the
same reasons.
It is judged unlikely that this situation will occur because modern glibc
uses 64-bit capable versions of stat and getdents class functions
exclusively, and that older systems are unlikely to encounter
unrepresentable inode numbers anyway.
[akpm: alpha build fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The last in-kernel user of errno is gone, so we should remove the definition
and everything referring to it. This also removes the now-unused lib/execve.c
file that was introduced earlier.
Also remove every trace of __KERNEL_SYSCALLS__ that still remained in the
kernel.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some architectures provide an execve function that does not set errno, but
instead returns the result code directly. Rename these to kernel_execve to
get the right semantics there. Moreover, there is no reasone for any of these
architectures to still provide __KERNEL_SYSCALLS__ or _syscallN macros, so
remove these right away.
[akpm@osdl.org: build fix]
[bunk@stusta.de: build fix]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace references to system_utsname to the per-process uts namespace
where appropriate. This includes things like uname.
Changes: Per Eric Biederman's comments, use the per-process uts namespace
for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c
[jdike@addtoit.com: UML fix]
[clg@fr.ibm.com: cleanup]
[akpm@osdl.org: build fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move the init_nsproxy definition out of arch/ into kernel/nsproxy.c. This
avoids all arches having to be updated. Compiles and boots on s390.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a nsproxy structure to the task struct. Later patches will
move the fs namespace pointer into this structure, and introduce a new utsname
namespace into the nsproxy.
The vserver and openvz functionality, then, would be implemented in large part
by virtualizing/isolating more and more resources into namespaces, each
contained in the nsproxy.
[akpm@osdl.org: build fix]
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
cpumask: ensure that node_to_cpumask() is available to modules for all
supported combinations of architecture and CONFIG_NUMA.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
kprobe_flush_task() possibly calls kfree function during holding
kretprobe_lock spinlock, if kfree function is probed by kretprobe that will
incur spinlock deadlock. This patch moves kfree function out scope of
kretprobe_lock.
Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Whitespace is used to indent, this patch cleans up these sentences by
kernel coding style.
Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In an effort to make kprobe modules more portable, here is a patch that:
o Introduces the "symbol_name" field to struct kprobe.
The symbol->address resolution now happens in the kernel in an
architecture agnostic manner. 64-bit powerpc users no longer have
to specify the ".symbols"
o Introduces the "offset" field to struct kprobe to allow a user to
specify an offset into a symbol.
o The legacy mechanism of specifying the kprobe.addr is still supported.
However, if both the kprobe.addr and kprobe.symbol_name are specified,
probe registration fails with an -EINVAL.
o The symbol resolution code uses kallsyms_lookup_name(). So
CONFIG_KPROBES now depends on CONFIG_KALLSYMS
o Apparantly kprobe modules were the only legitimate out-of-tree user of
the kallsyms_lookup_name() EXPORT. Now that the symbol resolution
happens in-kernel, remove the EXPORT as suggested by Christoph Hellwig
o Modify tcp_probe.c that uses the kprobe interface so as to make it
work on multiple platforms (in its earlier form, the code wouldn't
work, say, on powerpc)
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As part of an SMP cleanliness pass over UML, I consted a bunch of
structures in order to not have to document their locking. One of these
structures was a struct tty_operations. In order to const it in UML
without introducing compiler complaints, the declaration of
tty_set_operations needs to be changed, and then all of its callers need to
be fixed.
This patch declares all struct tty_operations in the tree as const. In all
cases, they are static and used only as input to tty_set_operations. As an
extra check, I ran an i386 allyesconfig build which produced no extra
warnings.
53 drivers are affected. I checked the history of a bunch of them, and in
most cases, there have been only a handful of maintenance changes in the
last six months. serial_core.c was the busiest one that I looked at.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
asm/serial.h is supposed to contain the definitions for the architecture
specific 8250 ports for the 8250 driver. It may also define BASE_BAUD,
but this is the base baud for the architecture specific ports _only_.
Therefore, nothing other than the 8250 driver should be including this
header file. In order to move towards this goal, here is a patch which
removes some of the more obvious incorrect includes of the file.
Acked-by: Paul Fulghum <paulkf@microgate.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
With 2.6.18-rc4-mm2, now wall_jiffies will always be the same as jiffies.
So we can kill wall_jiffies completely.
This is just a cleanup and logically should not change any real behavior
except for one thing: RTC updating code in (old) ppc and xtensa use a
condition "jiffies - wall_jiffies == 1". This condition is never met so I
suppose it is just a bug. I just remove that condition only instead of
kill the whole "if" block.
[heiko.carstens@de.ibm.com: s390 build fix and cleanup]
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In cases where the acpi memory-add event does not containe the pxm (node)
infomation allow the driver to look up node info based on the address. The
acpi_get_node call returns -1 if it can't decode the pxm info, this causes
add_memory to panic. acpi_get_node would have to decode the resource from the
handle (a lenghty proposition). This seems to be the cleanist point to
interject the hook.
[kamezawa.hiroyu@jp.fujitsu.com: build fixes]
[y-goto@jp.fujitsu.com: build fixes]
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass ticks to do_timer() and update_times(), and adjust x86_64 and s390
timer interrupt handler with this change.
Currently update_times() calculates ticks by "jiffies - wall_jiffies", but
callers of do_timer() should know how many ticks to update. Passing ticks
get rid of this redundant calculation. Also there are another redundancy
pointed out by Martin Schwidefsky.
This cleanup make a barrier added by
5aee405c66 needless. So this patch removes
it.
As a bonus, this cleanup make wall_jiffies can be removed easily, since now
wall_jiffies is always synced with jiffies. (This patch does not really
remove wall_jiffies. It would be another cleanup patch)
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is an updated version of Eric Biederman's is_init() patch.
(http://lkml.org/lkml/2006/2/6/280). It applies cleanly to 2.6.18-rc3 and
replaces a few more instances of ->pid == 1 with is_init().
Further, is_init() checks pid and thus removes dependency on Eric's other
patches for now.
Eric's original description:
There are a lot of places in the kernel where we test for init
because we give it special properties. Most significantly init
must not die. This results in code all over the kernel test
->pid == 1.
Introduce is_init to capture this case.
With multiple pid spaces for all of the cases affected we are
looking for only the first process on the system, not some other
process that has pid == 1.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: <lxc-devel@lists.sourceforge.net>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make PROT_WRITE imply PROT_READ for a number of architectures which don't
support write only in hardware.
While looking at this, I noticed that some architectures which do not
support write only mappings already take the exact same approach. For
example, in arch/alpha/mm/fault.c:
"
if (cause < 0) {
if (!(vma->vm_flags & VM_EXEC))
goto bad_area;
} else if (!cause) {
/* Allow reads even for write-only mappings */
if (!(vma->vm_flags & (VM_READ | VM_WRITE)))
goto bad_area;
} else {
if (!(vma->vm_flags & VM_WRITE))
goto bad_area;
}
"
Thus, this patch brings other architectures which do not support write only
mappings in-line and consistent with the rest. I've verified the patch on
ia64, x86_64 and x86.
Additional discussion:
Several architectures, including x86, can not support write-only mappings.
The pte for x86 reserves a single bit for protection and its two states are
read only or read/write. Thus, write only is not supported in h/w.
Currently, if i 'mmap' a page write-only, the first read attempt on that page
creates a page fault and will SEGV. That check is enforced in
arch/blah/mm/fault.c. However, if i first write that page it will fault in
and the pte will be set to read/write. Thus, any subsequent reads to the page
will succeed. It is this inconsistency in behavior that this patch is
attempting to address. Furthermore, if the page is swapped out, and then
brought back the first read will also cause a SEGV. Thus, any arbitrary read
on a page can potentially result in a SEGV.
According to the SuSv3 spec, "if the application requests only PROT_WRITE, the
implementation may also allow read access." Also as mentioned, some
archtectures, such as alpha, shown above already take the approach that i am
suggesting.
The counter-argument to this raised by Arjan, is that the kernel is enforcing
the write only mapping the best it can given the h/w limitations. This is
true, however Alan Cox, and myself would argue that the inconsitency in
behavior, that is applications can sometimes work/sometimes fails is highly
undesireable. If you read through the thread, i think people, came to an
agreement on the last patch i posted, as nobody has objected to it...
Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Andi Kleen <ak@muc.de>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Ian Molton <spyro@f2s.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ifa_local, ifa_address, ifa_mask, ifa_broadcast and ifa_anycast are
net-endian. Annotated them and variables that are inferred to be
net-endian.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] minor reformatting to vmlinux.lds.S
[IA64] CMC/CPE: Reverse the order of fetching log and checking poll threshold
[IA64] PAL calls need physical mode, stacked
[IA64] ar.fpsr not set on MCA/INIT kernel entry
[IA64] printing support for MCA/INIT
[IA64] trim output of show_mem()
[IA64] show_mem() printk levels
[IA64] Make gp value point to Region 5 in mca handler
Revert "[IA64] Unwire set/get_robust_list"
[IA64] Implement futex primitives
[IA64-SGI] Do not request DMA memory for BTE
[IA64] Move perfmon tables from thread_struct to pfm_context
[IA64] Add interface so modules can discover whether multithreading is on.
[IA64] kprobes: fixup the pagefault exception caused by probehandlers
[IA64] kprobe opcode 16 bytes alignment on IA64
[IA64] esi-support
[IA64] Add "model name" to /proc/cpuinfo
Since sys_sysctl is deprecated start allow it to be compiled out. This
should catch any remaining user space code that cares, and paves the way
for further sysctl cleanups.
[akpm@osdl.org: If sys_sysctl() is not compiled-in, emit a warning]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix build error introduced by 3212fe1594
Non-NUMA case should be handled.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch prevents pcibios_disable_device() from disabling interrupts
of devices which is not enabled.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: MUNEDA Takahiro <muneda.takahiro@jp.fujitsu.com>
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Minor reformatting to vmlinux.lds.S to make it 80-column usable,
in accordance with Linux coding style.
Signed-off-by: Al Stone <ahs3@fc.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch reverses the order of fetching log from SAL and
checking poll threshold. This will fix following trivial issues:
- If SAL_GET_SATE_INFO is unbelievably slow (due to huge system
or just its silly implementation) and if it takes more than
1/5 sec, CMCI/CPEI will never switch to CMCP/CPEP.
- Assuming terrible flood of interrupt (continuous corrected
errors let all CPUs enter to handler at once and bind them
in it), CPUs will be serialized by IA64_LOG_LOCK(*).
Now we check the poll threshold after the lock and log fetch,
so we need to call SAL_GET_STATE_INFO (num_online_cpus() + 4)
times in the worst case.
if we can check the threshold before the lock, we can shut up
interrupts quickly without waiting preceding log fetches, and
the number of times will be reduced to (num_online_cpus()) in
the same situation.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When entering the kernel due to an MCA or INIT, ar.fpsr (ar40)
was not getting set to the kernel default value (remaining
at the user value). The effect depends on the user setting
of ar.fpsr. In the test case, the effect was addresses
printing with strange hex values.
Setting ar.fpsr in ia64_set_kernel_registers sets it for both
the MCA and INIT paths. The user value of ar.fpsr is correctly
saved (in ia64_state_save) and restored (in ia64_state_restore).
Below is an example of output with very strange hex values.
Anyone know the value of hex 'g'? :-)
Processes interrupted by INIT - 0 (cpu 14 task 0xdfffg55g7a4c6gA)
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Printing message to console from MCA/INIT handler is useful,
however doing oops_in_progress = 1 in them exactly makes
something in kernel wrong. Especially it sounds ugly if
system goes wrong after returning from recoverable MCA.
This patch adds ia64_mca_printk() function that collects
messages into temporary-not-so-large message buffer during
in MCA/INIT environment and print them out later, after
returning to normal context or when handlers determine to
down the system.
Also this print function is exported for use in extensional
MCA handler. It would be useful to describe detail about
recovery.
NOTE:
I don't think it is sane thing if temporary message buffer
is enlarged enough to hold whole stack dumps from INIT, so
buffering is disabled during stack dump from INIT-monarch
(= default_monarch_init_process). please fix it in future.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cut the number of lines of memory info output per node from five
to one line.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Use the default sysrq printk level for printing show_mem() output both
for disconfig and contig versions. This is consistent with the printk
level used on other architectures (well ia32 at least).
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
MCA dispatch code take physical address of GP passed from SAL, then call
DATA_PA_TO_VA twice on GP before call into C code. The first time is
in ia64_set_kernel_register, the second time is in VIRTUAL_MODE_ENTER.
The gp is changed to a virtual address in region 7 because DATA_PA_TO_VA
is implemented by dep instruction.
However when notify blocks were called from MCA handler code, because
notify blocks are supported by callback function pointers, gp value
value was switched to region 5 again.
The patch set gp register to kernel gp of region 5 at entry of MCA
dispatch.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This reverts commit 2636255488.
Jakub Jelinek provided the missing futex_atomic_cmpxchg_inatomic()
function, so now it should be safe to re-enable these syscalls.
Signed-off-by: Tony Luck <tony.luck@intel.com>
The GFP_DMA option usually does nothing on SN2 since all of memory is in thei
DMA zone and the BTE has always been capable of addressing all of memory.
So there is no need to get memory from a restricted range of memory (which
is what GFP_DMA is for).
Remove useless __GFP_DMA option.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch renders thread_struct->pmcs[] and thread_struct->pmds[]
OBSOLETE. The actual table is moved to pfm_context structure which
saves space in thread_struct (in turn saving space in task_struct
which frees up more space for kernel stacks).
Signed-off-by: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add is_multithreading_enabled() to check whether multi-threading
is enabled independently of which cpu is currently online
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
If the user-specified kprobe handler causes the page fault when accessing
user space address, fixup this fault since do_page_fault() should not
continue as the kprobe handler are run with preemption disabled.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
On IA64 instruction opcode must be 16 bytes alignment, in kprobe structure
there is one element to save original instruction, currently saved opcode
is not statically allocated in kprobe structure, that can not assure
16 bytes alignment. This patch dynamically allocated kprobe instruction
opcode to assure 16 bytes alignment.
Signed-off-by: bibo mao <bibo.mao@intel.com>
Acked-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
If we're going to implement smp_call_function_single() on three architecture
with the same prototype then it should have a declaration in a
non-arch-specific header file.
Move it into <linux/smp.h>.
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In many places we will need to use the same combination of flags. Specify
a single GFP_THISNODE definition for ease of use in gfp.h.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The uncached allocator manages per node pools. Specify __GFP_THISNODE in
order to force allocation on the indicated node or fail. The uncached
allocator has already logic to deal with failing allocations.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make ZONE_DMA32 optional
- Add #ifdefs around ZONE_DMA32 specific code and definitions.
- Add CONFIG_ZONE_DMA32 config option and use that for x86_64
that alone needs this zone.
- Remove the use of CONFIG_DMA_IS_DMA32 and CONFIG_DMA_IS_NORMAL
for ia64 and fix up the way per node ZVCs are calculated.
- Fall back to prior GFP_ZONEMASK of 0x03 if there is no
DMA32 zone.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Assume that a cpu is *physically* offlined at boot time...
Because smpboot.c::smp_boot_cpu_map() canoot find cpu's sapicid,
numa.c::build_cpu_to_node_map() cannot build cpu<->node map for
offlined cpu.
For such cpus, cpu_to_node map should be fixed at cpu-hot-add.
This mapping should be done before cpu onlining.
This patch also handles cpu hotremove case.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Problem description:
We have additional_cpus= option for allocating possible_cpus. But nid
for possible cpus are not fixed at boot time. cpus which is offlined at
boot or cpus which is not on SRAT is not tied to its node. This will
cause panic at cpu onlining.
Usually, pxm_to_nid() mapping is fixed at boot time by SRAT.
But, unfortunately, some system (my system!) do not include
full SRAT table for possible cpus. (Then, I use
additiona_cpus= option.)
For such possible cpus, pxm<->nid should be fixed at
hot-add. We now have acpi_map_pxm_to_node() which is also
used at boot. It's suitable here.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Take default arch/*/kernel/audit.c to lib/, have those with special
needs (== biarch) define AUDIT_ARCH in their Kconfig.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The sn_cpu_init() is required for cpu initialization on SN platforms.
Change __init to __cpuinit so that the function is not freed with init code/data.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The SN PROM uses the register stack in the slave loop. The contents
must be preserved for the OS to return to the slave loop via offlining
a cpu or for kexec. A 'flushrs" is needed to force the stack to be written
to memory prior to changing bspstore.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The syscalls set/get_robust_list must not be wired up until
futex_atomic_cmpxchg_inatomic is implemented. Otherwise the kernel will
hang in handle_futex_death.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix a bug in sys_perfmonctl() whereby it was not correctly
decrementing the file descriptor reference count.
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This prevents cross-region mappings on IA64 and SPARC which could lead
to system crash. They were correctly trapped for normal mmap() calls,
but not for the kernel internal calls generated by executable loading.
This code just moves the architecture-specific cross-region checks into
an arch-specific "arch_mmap_check()" macro, and defines that for the
architectures that needed it (ia64, sparc and sparc64).
Architectures that don't have any special requirements can just ignore
the new cross-region check, since the mmap() code will just notice on
its own when the macro isn't defined.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ Cleaned up to not affect architectures that don't need it ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Increase default nodes shift to 10, nr_cpus to 1024
[IA64] remove redundant local_irq_save() calls from sn_sal.h
[IA64] panic if topology_init kzalloc fails
[IA64-SGI] Silent data corruption caused by XPC V2.
Change both the NODES_SHIFT and the NR_CPUS so that even big machines
can boot all nodes and processors with a generic kernel.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There really is no sense trying to continue if the kzalloc of sysfs_cpus[]
fails in ia64 topology_init. The code calling into here doesn't check
errors very well, and one ends up with a nonobvious boot failure that
wastes peoples time debugging.
See for example the lkml thread at:
http://lkml.org/lkml/2006/3/2/215
Since the system is totally dead when this kzalloc fails, not having yet
even booted, might as well announce one's death boldly and plainly.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
ACPI 3.0 appended a variable length UID string to the LAPIC structure
as part of support for > 256 processors. So the BAD_MADT_ENTRY() sanity
check can no longer compare for equality with a fixed structure length.
Signed-off-by: Alexey Y Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Previously the message was "Fatal exception: panic_on_oops", as introduced
in a recent patch whith removed a somewhat dangerous call to ssleep() in
the panic_on_oops path. However, Paul Mackerras suggested that this was
somewhat confusing, leadind people to believe that it was panic_on_oops
that was the root cause of the fatal exception. On his suggestion, this
patch changes the message to simply "Fatal exception". A suitable oops
message should already have been displayed.
Signed-off-by: Simon Horman <horms@verge.net.au>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jack Steiner identified a problem where XPC can cause a silent
data corruption. On module load, the placement may cause the
xpc_remote_copy_buffer to span two physical pages. DMA transfers are
done to the start virtual address translated to physical.
This patch changes the buffer from a statically allocated buffer to a
kmalloc'd buffer. Dean Nelson reviewed this before posting. I have
tested it in the configuration that was showing the memory corruption
and verified it works. I also added a BUG_ON statement to help catch
this if a similar situation is encountered.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Conflicts:
arch/ia64/hp/sim/simscsi.c
Stylistic differences in two separate fixes for buffer->request_buffer
problem.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The uncached allocator has a function, uncached_get_new_chunk(), that needs
to be serialized on a per node basis. It also has a global variable,
allocated_granules, which should be defined on a per node basis and protected
by that serialization. Additionally, all error returns from functions called
(like ia64_pal_mc_drain()) should be handled appropriately.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Acked-by: Jes Sorenson <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
contig.c (FLATMEM) requires the same optimization as in discontig.c for show_mem
when VIRTUAL_MEM_MAP is in use. Otherwise FLATMEM has softlockup timeouts.
This was boot tested for memory configuration: SPARSEMEM,
DISCONTIG+VIRTUAL_MEM_MAP, FLATMEM, FLATMEM+VIRTUAL_MEM_MAP and
FLATMEM+VIRTUAL_MEM_MAP with largest memory gap less than LARGE_GAP by
using boot parameter "mem=".
This was boot tested and "echo m >/proc/sysrq-trigger" output evaluated for
: FLATMEM, FLATMEM+VIRTUAL_MEM_MAP, DISCONTIGMEM+VIRTUAL_MEM_MAP and
SPARSEMEM.
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Assure that vmem_map's high endpoint is MAX_ORDER aligned. Not doing so violates
the buddy allocator algorithm. Also anyone using mem=XXX on boot line and
not aligned to MAX_ORDER requires this patch in order to satisfy buddy
allocator. vmem_map always starts at pfn 0. The potentially large MAX_ORDER
on ia64 (due to hugetlbfs) requires that the end of vmem_map be aligned
to MAX_ORDER_NR_PAGES.
This was boot tested for: FLATMEM, FLATMEM+VIRTUAL_MEM_MAP,
DISCONTIGMEM+VIRTUAL_MEM_MAP and SPARSEMEM.
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
CONFIG_MD_RAID5 became CONFIG_MD_RAID456 in drivers/md/Kconfig. Make
the same change in arch/ia64
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Aron Griffis <aron@hp.com>
Acked-by: Jes Sorenson <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
I think ia64_switch_mode_phys and ia64_switch_mode_virt
does not need to alloc an empty frame.
An empty frame is required by loadrs but flushrs
does not need that.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
We found an issue in pal.S.
According to the software runtime SPEC,
The caller's output registers do not need to be preserved for
caller. The callee may reuse input registers for any other
purpose within the procedure.
in ia64_pal_call_phys_stacked,
input registers are copied to output registers before call
into ia64_switch_mode_phys, then used to call into PAL. This
assumes output registers are preserved in ia64_switch_mode_phys,
which may not be true.
In this particular case, ia64_switch_mode_phys alloc a null frame
, and mask off psr.i.
If an interrupt comes at this small window,
or an MCA comes inside the procedure, output registers
maybe changed,
then the pal call may got some staled input registers.
This patch moves the copies from input to output
after ia64_switch_mode_phys to follow the software
runtime convention.
It also removed some unused labels in
ia64_pal_call_phys_stacked.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix some sparse warnings on ia64. Large constants that should be long
instead of int. Use NULL instead of 0. Add some missing __iomem
casts. Replace a non-C99 structure assignment.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The latest toolchains can produce a new ELF section in DSOs and
dynamically-linked executables. The new section ".gnu.hash" replaces
".hash", and allows for more efficient runtime symbol lookups by the
dynamic linker. The new ld option --hash-style={sysv|gnu|both} controls
whether to produce the old ".hash", the new ".gnu.hash", or both. In some
new systems such as Fedora Core 6, gcc by default passes --hash-style=gnu
to the linker, so that a standard invocation of "gcc -shared" results in
producing a DSO with only ".gnu.hash". The new ".gnu.hash" sections need
to be dealt with the same way as ".hash" sections in all respects; only the
dynamic linker cares about their contents. To work with older dynamic
linkers (i.e. preexisting releases of glibc), a binary must have the old
".hash" section. The --hash-style=both option produces binaries that a new
dynamic linker can use more efficiently, but an old dynamic linker can
still handle.
The new section runs afoul of the custom linker scripts used to build vDSO
images for the kernel. On ia64, the failure mode for this is a boot-time
panic because the vDSO's PT_IA_64_UNWIND segment winds up ill-formed.
This patch addresses the problem in two ways.
First, it mentions ".gnu.hash" in all the linker scripts alongside ".hash".
This produces correct vDSO images with --hash-style=sysv (or old tools),
with --hash-style=gnu, or with --hash-style=both.
Second, it passes the --hash-style=sysv option when building the vDSO
images, so that ".gnu.hash" is not actually produced. This is the most
conservative choice for compatibility with any old userland. There is some
concern that some ancient glibc builds (though not any known old production
system) might choke on --hash-style=both binaries. The optimizations
provided by the new style of hash section do not really matter for a DSO
with a tiny number of symbols, as the vDSO has. If someone wants to use
=gnu or =both for their vDSO builds and worry less about that
compatibility, just change the option and the linker script changes will
make any choice work fine.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The SGI IOC4 IDE device always shares an interrupt with other devices which
are part of IOC4. As such, IDEPCI_SHARE_IRQ should always be enabled when
BLK_DEV_SGIIOC4 is enabled.
Signed-off-by: Brent Casavant <bcasavan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use hotplug version of register_cpu_notifier in late init functions.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is part of an effort to unify the panic_on_oops behaviour across
all architectures that implement it.
It was pointed out to me by Andi Kleen that if an oops has occured in
interrupt context, then calling sleep() in the oops path will only cause a
panic, and that it would be really better for it not to be in the path at
all.
This patch removes the ssleep() call and reworks the console message
accordinly. I have a slght concern that the resulting console message is
too long, feedback welcome.
For powerpc it also unifies the 32bit and 64bit behaviour.
Fror x86_64, this patch only updates the console message, as ssleep() is
already not present.
Signed-off-by: Horms <horms@verge.net.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Kprobe inserts breakpoint instruction in probepoint and then jumps to
instruction slot when breakpoint is hit, the instruction slot icache must
be consistent with dcache. Here is the patch which invalidates instruction
slot icache area.
Without this patch, in some machines there will be fault when executing
instruction slot where icache content is inconsistent with dcache.
Signed-off-by: bibo,mao <bibo.mao@intel.com>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Acked-by: Keshavamurthy Anil S <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
arch/ia64/hp/sim/simscsi.c: In function `simscsi_sg_readwrite':
arch/ia64/hp/sim/simscsi.c:154: error: structure has no member named `buffer'
arch/ia64/hp/sim/simscsi.c: In function `simscsi_fillresult':
arch/ia64/hp/sim/simscsi.c:247: error: structure has no member named `buffer'
hch said:
>Just change it to access the request_buffer member instead. buffer
>and request_buffer have been synonymous 99% of the time, and a driver
>never even wants to access buffer.
Signed-off-by: Tony Luck <tony.luck@intel.com>
/proc/pal/*/version_info is a bit confusing. HP firmware, at least,
reports 07.31 instead of 0.7.31. Also, the comment is out of place;
it's an internal detail about the implementation of ia64_pal_version.
Since the 2.2 revision of the SDM still states that PAL_VERSION can
be called in virtual mode, correct the comment to be more accurate.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Tony Luck <tony.luck@intel.com>
On Thu, Jul 27, 2006 at 01:03:24AM -0700, Andrew Morton wrote:
> arch/ia64/hp/sim/simscsi.c: In function `simscsi_sg_readwrite':
> arch/ia64/hp/sim/simscsi.c:154: error: structure has no member named `buffer'
> arch/ia64/hp/sim/simscsi.c: In function `simscsi_fillresult':
> arch/ia64/hp/sim/simscsi.c:247: error: structure has no member named `buffer'
> arch/ia64/hp/sim/simscsi.c: At top level:
> arch/ia64/hp/sim/simscsi.c:87: warning: 'simscsi_setup' defined but not used
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Newer ARMs have a 40 bit physical address space, but mapping physical
memory above 4G needs a special page table format which we (currently?) do
not use for userspace mappings, so what happens instead is that mapping an
address >= 4G will happily discard the upper bits and wrap.
There is a valid_mmap_phys_addr_range() arch hook where we could check for
>= 4G addresses and deny the mapping, but this hook takes an unsigned long
address:
static inline int valid_mmap_phys_addr_range(unsigned long addr, size_t size);
And drivers/char/mem.c:mmap_mem() calls it like this:
static int mmap_mem(struct file * file, struct vm_area_struct * vma)
{
size_t size = vma->vm_end - vma->vm_start;
if (!valid_mmap_phys_addr_range(vma->vm_pgoff << PAGE_SHIFT, size))
So that's not much help either.
This patch makes the hook take a pfn instead of a phys address.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
screen_info.h doesn't have anything to do with the tty layer and shouldn't be
included by tty.h. This patches removes the include and modifies all users to
directly include screen_info.h. struct screen_info is mainly used to
communicate with the console drivers in drivers/video/console. Note that this
patch touches every arch and I have no way of testing it. If there is a
mistake the worst thing that will happen is a compile error.
[akpm@osdl.org: fix arm build]
[akpm@osdl.org: fix alpha build]
Signed-off-by: Jon Smirl <jonsmir@gmail.com>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I found a bug in memory hot-add code for ia64.
IA64's code has copies of pgdat's array on each node to reduce memory
access over crossing node. This array is used by NODE_DATA() macro. When
new node is hot-added, this pgdat's array should be updated and copied on
new node too.
However, I used for_each_online_node() in scatter_node_data() to copy
it. This meant its array is not copied on new node.
Because initialization of structures for new node was halfway,
so online_node_map couldn't be set at this time.
To copy arrays on new node, I changed it to check value of pgdat_list[]
which is source array of copies. I tested this patch with my Memory Hotadd
emulation on Tiger4. This patch is for 2.6.17-git20.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
cleanup: remove task_t and convert all the uses to struct task_struct. I
introduced it for the scheduler anno and it was a mistake.
Conversion was mostly scripted, the result was reviewed and all
secondary whitespace and style impact (if any) was fixed up by hand.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use the new IRQF_ constants and remove the SA_INTERRUPT define
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow to tie upper bits of syscall bitmap in audit rules to kernel-defined
sets of syscalls. Infrastructure, a couple of classes (with 32bit counterparts
for biarch targets) and actual tie-in on i386, amd64 and ia64.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Per zone counter infrastructure
The counters that we currently have for the VM are split per processor. The
processor however has not much to do with the zone these pages belong to. We
cannot tell f.e. how many ZONE_DMA pages are dirty.
So we are blind to potentially inbalances in the usage of memory in various
zones. F.e. in a NUMA system we cannot tell how many pages are dirty on a
particular node. If we knew then we could put measures into the VM to balance
the use of memory between different zones and different nodes in a NUMA
system. For example it would be possible to limit the dirty pages per node so
that fast local memory is kept available even if a process is dirtying huge
amounts of pages.
Another example is zone reclaim. We do not know how many unmapped pages exist
per zone. So we just have to try to reclaim. If it is not working then we
pause and try again later. It would be better if we knew when it makes sense
to reclaim unmapped pages from a zone. This patchset allows the determination
of the number of unmapped pages per zone. We can remove the zone reclaim
interval with the counters introduced here.
Futhermore the ability to have various usage statistics available will allow
the development of new NUMA balancing algorithms that may be able to improve
the decision making in the scheduler of when to move a process to another node
and hopefully will also enable automatic page migration through a user space
program that can analyse the memory load distribution and then rebalance
memory use in order to increase performance.
The counter framework here implements differential counters for each processor
in struct zone. The differential counters are consolidated when a threshold
is exceeded (like done in the current implementation for nr_pageache), when
slab reaping occurs or when a consolidation function is called.
Consolidation uses atomic operations and accumulates counters per zone in the
zone structure and also globally in the vm_stat array. VM functions can
access the counts by simply indexing a global or zone specific array.
The arrangement of counters in an array also simplifies processing when output
has to be generated for /proc/*.
Counters can be updated by calling inc/dec_zone_page_state or
_inc/dec_zone_page_state analogous to *_page_state. The second group of
functions can be called if it is known that interrupts are disabled.
Special optimized increment and decrement functions are provided. These can
avoid certain checks and use increment or decrement instructions that an
architecture may provide.
We also add a new CONFIG_DMA_IS_NORMAL that signifies that an architecture can
do DMA to all memory and therefore ZONE_NORMAL will not be populated. This is
only currently set for IA64 SGI SN2 and currently only affects
node_page_state(). In the best case node_page_state can be reduced to
retrieving a single counter for the one zone on the node.
[akpm@osdl.org: cleanups]
[akpm@osdl.org: export vm_stat[] for filesystems]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6:
[PATCH] i386: export memory more than 4G through /proc/iomem
[PATCH] 64bit Resource: finally enable 64bit resource sizes
[PATCH] 64bit Resource: convert a few remaining drivers to use resource_size_t where needed
[PATCH] 64bit resource: change pnp core to use resource_size_t
[PATCH] 64bit resource: change pci core and arch code to use resource_size_t
[PATCH] 64bit resource: change resource core to use resource_size_t
[PATCH] 64bit resource: introduce resource_size_t for the start and end of struct resource
[PATCH] 64bit resource: fix up printks for resources in misc drivers
[PATCH] 64bit resource: fix up printks for resources in arch and core code
[PATCH] 64bit resource: fix up printks for resources in pcmcia drivers
[PATCH] 64bit resource: fix up printks for resources in video drivers
[PATCH] 64bit resource: fix up printks for resources in ide drivers
[PATCH] 64bit resource: fix up printks for resources in mtd drivers
[PATCH] 64bit resource: fix up printks for resources in pci core and hotplug drivers
[PATCH] 64bit resource: fix up printks for resources in networks drivers
[PATCH] 64bit resource: fix up printks for resources in sound drivers
[PATCH] 64bit resource: C99 changes for struct resource declarations
Fixed up trivial conflict in drivers/ide/pci/cmd64x.c (the printk that
was changed by the 64-bit resources had been deleted in the meantime ;)
Add ->retrigger() irq op to consolidate hw_irq_resend() implementations.
(Most architectures had it defined to NOP anyway.)
NOTE: ia64 needs testing. i386 and x86_64 tested.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cleanup: remove irq_descp() - explicit use of irq_desc[] is shorter and more
readable.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Consolidation: remove the irq_affinity[NR_IRQS] array and move it into the
irq_desc[NR_IRQS].affinity field.
[akpm@osdl.org: sparc64 build fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch-queue improves the generic IRQ layer to be truly generic, by adding
various abstractions and features to it, without impacting existing
functionality.
While the queue can be best described as "fix and improve everything in the
generic IRQ layer that we could think of", and thus it consists of many
smaller features and lots of cleanups, the one feature that stands out most is
the new 'irq chip' abstraction.
The irq-chip abstraction is about describing and coding and IRQ controller
driver by mapping its raw hardware capabilities [and quirks, if needed] in a
straightforward way, without having to think about "IRQ flow"
(level/edge/etc.) type of details.
This stands in contrast with the current 'irq-type' model of genirq
architectures, which 'mixes' raw hardware capabilities with 'flow' details.
The patchset supports both types of irq controller designs at once, and
converts i386 and x86_64 to the new irq-chip design.
As a bonus side-effect of the irq-chip approach, chained interrupt controllers
(master/slave PIC constructs, etc.) are now supported by design as well.
The end result of this patchset intends to be simpler architecture-level code
and more consolidation between architectures.
We reused many bits of code and many concepts from Russell King's ARM IRQ
layer, the merging of which was one of the motivations for this patchset.
This patch:
rename desc->handler to desc->chip.
Originally i did not want to do this, because it's a big patch. But having
both "desc->handler", "desc->handle_irq" and "action->handler" caused a
large degree of confusion and made the code appear alot less clean than it
truly is.
I have also attempted a dual approach as well by introducing a
desc->chip alias - but that just wasnt robust enough and broke
frequently.
So lets get over with this quickly. The conversion was done automatically
via scripts and converts all the code in the kernel.
This renaming patch is the first one amongst the patches, so that the
remaining patches can stay flexible and can be merged and split up
without having some big monolithic patch act as a merge barrier.
[akpm@osdl.org: build fix]
[akpm@osdl.org: another build fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Memory hotplug code of i386 adds memory to only highmem. So, if
CONFIG_HIGHMEM is not set, CONFIG_MEMORY_HOTPLUG shouldn't be set.
Otherwise, it causes compile error.
In addition, many architecture can't use memory hotplug feature yet. So, I
introduce CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patch fixes two spots in the SN kernel
that check a fixed prom revision number to determine prom
feature support. These checks are only valid on shub1 systems.
They are invalid on shub2 systems which have a different prom
with different revision numbers.
Signed-off-by: Aaron Young <ayoung@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Montecito is coming with dual core and threading, so this
four socket box can now have sixteen logical cpus.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Pass the OS logical cpu number to the PROM. This allows PROM
to log the OS logical cpu number in error records viewed thru POD.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Chandra Seetharaman missed one place in commit:
65edc68c34
[but it only shows up when building the ski simulator configuration
of ia64, so thats understandable]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Make use the of newly defined hotplug version of cpu_notifier functionality
wherever appropriate.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make notifier_blocks associated with cpu_notifier as __cpuinitdata.
__cpuinitdata makes sure that the data is init time only unless
CONFIG_HOTPLUG_CPU is defined.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In 2.6.17, there was a problem with cpu_notifiers and XFS. I provided a
band-aid solution to solve that problem. In the process, i undid all the
changes you both were making to ensure that these notifiers were available
only at init time (unless CONFIG_HOTPLUG_CPU is defined).
We deferred the real fix to 2.6.18. Here is a set of patches that fixes the
XFS problem cleanly and makes the cpu notifiers available only at init time
(unless CONFIG_HOTPLUG_CPU is defined).
If CONFIG_HOTPLUG_CPU is defined then cpu notifiers are available at run
time.
This patch reverts the notifier_call changes made in 2.6.17
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
locking init cleanups:
- convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK()
- convert rwlocks in a similar manner
this patch was generated automatically.
Motivation:
- cleanliness
- lockdep needs control of lock initialization, which the open-coded
variants do not give
- it's also useful for -rt and for lock debugging in general
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With Goto-san's patch, we can add new pgdat/node at runtime. I'm now
considering node-hot-add with cpu + memory on ACPI.
I found acpi container, which describes node, could evaluate cpu before
memory. This means cpu-hot-add occurs before memory hot add.
In most part, cpu-hot-add doesn't depend on node hot add. But register_cpu(),
which creates symbolic link from node to cpu, requires that node should be
onlined before register_cpu(). When a node is onlined, its pgdat should be
there.
This patch-set holds off creating symbolic link from node to cpu
until node is onlined.
This removes node arguments from register_cpu().
Now, register_cpu() requires 'struct node' as its argument. But the array of
struct node is now unified in driver/base/node.c now (By Goto's node hotplug
patch). We can get struct node in generic way. So, this argument is not
necessary now.
This patch also guarantees add cpu under node only when node is onlined. It
is necessary for node-hot-add vs. cpu-hot-add patch following this.
Moreover, register_cpu calculates cpu->node_id by cpu_to_node() without regard
to its 'struct node *root' argument. This patch removes it.
Also modify callers of register_cpu()/unregister_cpu, whose args are changed
by register-cpu-remove-node-struct patch.
[Brice.Goglin@ens-lyon.org: fix it]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a patch to allocate pgdat and per node data area for ia64. The size
for them can be calculated by compute_pernodesize().
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is to refresh node_data[] array for ia64. As I mentioned previous
patches, ia64 has copies of information of pgdat address array on each node as
per node data.
At v2 of node_add, this function used stop_machine_run() to update them. (I
wished that they were copied safety as much as possible.) But, in this patch,
this arrays are just copied simply, and set node_online_map bit after
completion of pgdat initialization.
So, kernel must touch NODE_DATA() macro after checking node_online_map().
(Current code has already done it.) This is more simple way for just
hot-add.....
Note : It will be problem when hot-remove will occur,
because, even if online_map bit is set, kernel may
touch NODE_DATA() due to race condition. :-(
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a preparatory patch to make common code for updating of NODE_DATA() of
ia64 between boottime and hotplug.
Current code remembers pgdat address in mem_data which is used at just boot
time. But its information can be used at hotplug time by moving to global
value. The next patch uses this array.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When new node becomes enable by hot-add, new sysfs file must be created for
new node. So, if new node is enabled by add_memory(), register_one_node() is
called to create it. In addition, I386's arch_register_node() and a part of
register_nodes() of powerpc are consolidated to register_one_node() as a
generic_code().
This is tested by Tiger4(IPF) with node hot-plug emulation.
Signed-off-by: Keiichiro Tokunaga <tokuanga.keiich@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Refresh NODE_DATA() for generic archs. In this case, NODE_DATA(nid) ==
node_data[nid]. node_data[] is array of address of pgdat. So, refresh is
quite simple.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the name of old add_memory() to arch_add_memory. And use node id to
get pgdat for the node at NODE_DATA().
Note: Powerpc's old add_memory() is defined as __devinit. However,
add_memory() is usually called only after bootup.
I suppose it may be redundant. But, I'm not well known about powerpc.
So, I keep it. (But, __meminit is better at least.)
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Based on a patch series originally from Vivek Goyal <vgoyal@in.ibm.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
During some profiling I noticed that default_idle causes a lot of
memory traffic. I think that is caused by the atomic operations
to clear/set the polling flag in thread_info. There is actually
no reason to make this atomic - only the idle thread does it
to itself, other CPUs only read it. So I moved it into ti->status.
Converted i386/x86-64/ia64 for now because that was the easiest
way to fix ACPI which also manipulates these flags in its idle
function.
Cc: Nick Piggin <npiggin@novell.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Overloading of page fault notification with the notify_die() has performance
issues(since the only interested components for page fault is kprobes and/or
kdb) and hence this patch introduces the new notifier call chain exclusively
for page fault notifications their by avoiding notifying unnecessary
components in the do_page_fault() code path.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove VM_LOCKED before remap_pfn range from device drivers and get rid of
VM_SHM.
remap_pfn_range() already sets VM_IO. There is no need to set VM_SHM since
it does nothing. VM_LOCKED is of no use since the remap_pfn_range does not
place pages on the LRU. The pages are therefore never subject to swap
anyways. Remove all the vm_flags settings before calling remap_pfn_range.
After removing all the vm_flag settings no use of VM_SHM is left. Drop it.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert a few stragglers over to for_each_possible_cpu(), remove
for_each_cpu().
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (65 commits)
ACPI: suppress power button event on S3 resume
ACPI: resolve merge conflict between sem2mutex and processor_perflib.c
ACPI: use for_each_possible_cpu() instead of for_each_cpu()
ACPI: delete newly added debugging macros in processor_perflib.c
ACPI: UP build fix for bugzilla-5737
Enable P-state software coordination via _PDC
P-state software coordination for speedstep-centrino
P-state software coordination for acpi-cpufreq
P-state software coordination for ACPI core
ACPI: create acpi_thermal_resume()
ACPI: create acpi_fan_suspend()/acpi_fan_resume()
ACPI: pass pm_message_t from acpi_device_suspend() to root_suspend()
ACPI: create acpi_device_suspend()/acpi_device_resume()
ACPI: replace spin_lock_irq with mutex for ec poll mode
ACPI: Allow a WAN module enable/disable on a Thinkpad X60.
sem2mutex: acpi, acpi_link_lock
ACPI: delete unused acpi_bus_drivers_lock
sem2mutex: drivers/acpi/processor_perflib.c
ACPI add ia64 exports to build acpi_memhotplug as a module
ACPI: asus_acpi_init(): propagate correct return value
...
Manual resolve of conflicts in:
arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c
arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
include/acpi/processor.h
Default values for boolean and tristate options can only be 'y', 'm' or 'n'.
This patch removes wrong default for SCHED_SMT.
Signed-off-by: Jean-Luc Leger <jean-luc.leger@dspnet.fr.eu.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass the POSIX lock owner ID to the flush operation.
This is useful for filesystems which don't want to store any locking state
in inode->i_flock but want to handle locking/unlocking POSIX locks
internally. FUSE is one such filesystem but I think it possible that some
network filesystems would need this also.
Also add a flag to indicate that a POSIX locking request was generated by
close(), so filesystems using the above feature won't send an extra locking
request in this case.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
move_pages() is used to move individual pages of a process. The function can
be used to determine the location of pages and to move them onto the desired
node. move_pages() returns status information for each page.
long move_pages(pid, number_of_pages_to_move,
addresses_of_pages[],
nodes[] or NULL,
status[],
flags);
The addresses of pages is an array of void * pointing to the
pages to be moved.
The nodes array contains the node numbers that the pages should be moved
to. If a NULL is passed instead of an array then no pages are moved but
the status array is updated. The status request may be used to determine
the page state before issuing another move_pages() to move pages.
The status array will contain the state of all individual page migration
attempts when the function terminates. The status array is only valid if
move_pages() completed successfullly.
Possible page states in status[]:
0..MAX_NUMNODES The page is now on the indicated node.
-ENOENT Page is not present
-EACCES Page is mapped by multiple processes and can only
be moved if MPOL_MF_MOVE_ALL is specified.
-EPERM The page has been mlocked by a process/driver and
cannot be moved.
-EBUSY Page is busy and cannot be moved. Try again later.
-EFAULT Invalid address (no VMA or zero page).
-ENOMEM Unable to allocate memory on target node.
-EIO Unable to write back page. The page must be written
back in order to move it since the page is dirty and the
filesystem does not provide a migration function that
would allow the moving of dirty pages.
-EINVAL A dirty page cannot be moved. The filesystem does not provide
a migration function and has no ability to write back pages.
The flags parameter indicates what types of pages to move:
MPOL_MF_MOVE Move pages that are only mapped by the process.
MPOL_MF_MOVE_ALL Also move pages that are mapped by multiple processes.
Requires sufficient capabilities.
Possible return codes from move_pages()
-ENOENT No pages found that would require moving. All pages
are either already on the target node, not present, had an
invalid address or could not be moved because they were
mapped by multiple processes.
-EINVAL Flags other than MPOL_MF_MOVE(_ALL) specified or an attempt
to migrate pages in a kernel thread.
-EPERM MPOL_MF_MOVE_ALL specified without sufficient priviledges.
or an attempt to move a process belonging to another user.
-EACCES One of the target nodes is not allowed by the current cpuset.
-ENODEV One of the target nodes is not online.
-ESRCH Process does not exist.
-E2BIG Too many pages to move.
-ENOMEM Not enough memory to allocate control array.
-EFAULT Parameters could not be accessed.
A test program for move_pages() may be found with the patches
on ftp.kernel.org:/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc4-mm3
From: Christoph Lameter <clameter@sgi.com>
Detailed results for sys_move_pages()
Pass a pointer to an integer to get_new_page() that may be used to
indicate where the completion status of a migration operation should be
placed. This allows sys_move_pags() to report back exactly what happened to
each page.
Wish there would be a better way to do this. Looks a bit hacky.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Jes Sorensen <jes@trained-monkey.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Modify the gen_pool allocator (lib/genalloc.c) to utilize a bitmap scheme
instead of the buddy scheme. The purpose of this change is to eliminate
the touching of the actual memory being allocated.
Since the change modifies the interface, a change to the uncached allocator
(arch/ia64/kernel/uncached.c) is also required.
Both Andrey Volkov and Jes Sorenson have expressed a desire that the
gen_pool allocator not write to the memory being managed. See the
following:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113518602713125&w=2http://marc.theaimsgroup.com/?l=linux-kernel&m=113533568827916&w=2
Signed-off-by: Dean Nelson <dcn@sgi.com>
Cc: Andrey Volkov <avolkov@varma-el.com>
Acked-by: Jes Sorensen <jes@trained-monkey.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Consolidate the various arch-specific implementations of pxm_to_node() and
node_to_pxm() into a single generic version.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.
The filesystem is then required to manually set the superblock and root dentry
pointers. For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).
The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.
This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing. In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.
The patch also makes the following changes:
(*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
pointer argument and return an integer, so most filesystems have to change
very little.
(*) If one of the convenience function is not used, then get_sb() should
normally call simple_set_mnt() to instantiate the vfsmount. This will
always return 0, and so can be tail-called from get_sb().
(*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
dcache upon superblock destruction rather than shrink_dcache_anon().
This is required because the superblock may now have multiple trees that
aren't actually bound to s_root, but that still need to be cleaned up. The
currently called functions assume that the whole tree is rooted at s_root,
and that anonymous dentries are not the roots of trees which results in
dentries being left unculled.
However, with the way NFS superblock sharing are currently set to be
implemented, these assumptions are violated: the root of the filesystem is
simply a dummy dentry and inode (the real inode for '/' may well be
inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
with child trees.
[*] Anonymous until discovered from another tree.
(*) The documentation has been adjusted, including the additional bit of
changing ext2_* into foo_* in the documentation.
[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sorry I didn't notice earlier, but that BUG_ON triggers for me on the
simulator. AFAICS the mask for itv is set in cpu_init(), which comes
after sal_init(). Consequently on the simulator the itv still has its
start value of zero. I've probably missed something, but I wonder why
at this stage of the boot you even need to save and restore the itv?
Signed-Off-By: Ian Wienand <ianw@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The following patch fixes a bug in the SGI Altix tioce_bus_fixup()
code. ce_dre_comp_err_addr needs to be zero'd out not ~0ULL. As
a result completion errors weren't being captured.
Signed-off-by: Mike Habeck <habeck@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
struct ia64_sal_os_state has three semi-independent sections. The code
in mca_asm.S assumes that these three sections are contiguous, which
makes it very awkward to add new data to this structure. Remove the
assumption that the sections are contiguous. Define a macro to shorten
references to offsets in ia64_sal_os_state.
This patch does not change the way that the code behaves. It just
makes it easier to update the code in future and to add fields to
ia64_sal_os_state when debugging the MCA/INIT handlers.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When I tried to use PCI Express Hotplug driver on my ia64 box, I
noticed that "PCI Express support" is not even selectable on ia64.
This patch makes PCI Express support selectable.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There is an SN bug in sn_hwperf.c that affects systems with 1024n or 1024p.
The bug manifests itself 2 ways: IO interrupts are not always
targeted to the nearest node, and 2) the "cat /proc/sgi_sn/sn_topology"
commands fails with "cannot allocate memory".
The code is using the wrong macros for validating node numbers.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
One more trivial, stand-alone patch from the Xen/ia64 review. Sanity
check usage of the reserved region numbers.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is a trivial stand-alone patch out of the Xen/ia64 patches. Add
a vmlinuz build target to be more compatible with x86-ish targets.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
MSI callouts for altix. Involves a fair amount of code reorg in sn irq.c
code as well as adding some extensions to the altix PCI provider abstaction.
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Abstract IA64_FIRST_DEVICE_VECTOR/IA64_LAST_DEVICE_VECTOR since SN platforms
use a subset of the IA64 range. Implement this by making the above macros
global variables which the platform can override in it setup code.
Also add a reserve_irq_vector() routine used by SN to mark a vector's as
in-use when that weren't allocated through assign_irq_vector().
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add support for making ESI calls [1]. ESI stands for "Extensible SAL
specification" and is basically a way for invoking firmware
subroutines which are identified by a GUID. I don't know whether ESI
is used by vendors other than HP (if you do, please let me know) but
as firmware "backdoors" go, this seems one of the cleaner methods, so
it seems reasonable to support it, even though I'm not aware of any
publicly documented ESI calls. I'd have liked to make the ESI module
completely stand-alone, but unfortunately that is not easily (or not
at all) possible because in order to make ESI calls in physical mode,
a small stub similar to the EFI stub is needed in the kernel proper.
I did try to create a stub that would work in user-level, but it
quickly got ugly beyond recognition (e.g., the stub had to make
assumptions about how the module-loader generated call-stubs work) and
I didn't even get it to work (that's probably fixable, but I didn't
bother because I concluded it was too ugly anyhow). While it's not
terribly elegant to have kernel code which isn't actively used in the
kernel proper, I think it might be worth making an exception here for
two reasons: the code is trivially small (all that's really needed is
esi_stub.S) and by including it in the normal kernel distro, it might
encourage other OEMs to also use ESI, which I think would be far
better than each inventing their own firmware "backdoor".
The code was originally written by Alex. I just massaged and packaged
it a bit (and perhaps messed up some things along the way...).
Changes since first version of patch that was posted to mailing list:
* Export ia64_esi_call and ia64_esi_call_phys() as GPL symbols.
* Disallow building esi.c as a module for now. Building as a module
would currently lead to an unresolved reference to "sal_lock" on SMP kernels
because that symbol doesn't get exported.
* Export esi_call_phys() only if ESI is enabled.
* Remove internal stuff from esi.h and add a "proc_type" argument to
ia64_esi_call() such that serialization-requirements can be expressed (ESI
follows SAL here, where procedure calls may have to be serialized, are
MP-safe, or MP-safe andr reentrant).
[1] h21007.www2.hp.com/dspp/tech/tech_TechDocumentDetailPage_IDX/1,1701,919,00.html
Signed-off-by: David Mosberger <David.Mosberger@acm.org>
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Linux ia64 port tried to decode the processor family number
to something human-readable, but Intel brandnames don't change
synchronously with updates to the family number. Adopt a more
i386-like approach and just print the family number in decimal.
Add a new field "model name" that uses PAL_BRAND_INFO to find
the official name for the cpu, or on older systems, falls back
to using the well-known codenames (Merced, McKinley, Madison).
Signed-off-by: Tony Luck <tony.luck@intel.com>
Calls to set_irq_info in set_irq_affinity_info() is redundant because
irq_affinity mask was set just one line immediately above it. Remove
that duplicate call.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When CONFIG_PCI_MSI is set, move_irq() is an empty function, causing
grief when sys admin tries to bind interrupt to CPU.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
When building sim_defconfig, which does not define CONFIG_ACPI
arch/ia64/kernel/acpi.c:71: warning: 'acpi_madt_rev' defined but not used
really acpi.c should not be built when CONFIG_ACPI=n...
Signed-off-by: Len Brown <len.brown@intel.com>
This closes a couple holes in our attribute aliasing avoidance scheme:
- The current kernel fails mmaps of some /dev/mem MMIO regions because
they don't appear in the EFI memory map. This keeps X from working
on the Intel Tiger box.
- The current kernel allows UC mmap of the 0-1MB region of
/sys/.../legacy_mem even when the chipset doesn't support UC
access. This causes an MCA when starting X on HP rx7620 and rx8620
boxes in the default configuration.
There's more detail in the Documentation/ia64/aliasing.txt file this
adds, but the general idea is that if a region might be covered by
a granule-sized kernel identity mapping, any access via /dev/mem or
mmap must use the same attribute as the identity mapping.
Otherwise, we fall back to using an attribute that is supported
according to the EFI memory map, or to using UC if the EFI memory
map doesn't mention the region.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Bob Picco noted that 6edfba1b33
dropped the -ffreestanding compiler flag from the top level
Makefile, which allows the compiler to substitute memcpy() in
places where strcpy() is used with a known size source string.
But the ia64 memcpy() returns 0 for success, and "bytes copied"
for failure.
Fix to return the address of the destination string (like
stdlibc version, and other architectures). There are no
places where ia64 specific code makes use of the non-standard
return value.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] update sn2 defconfig
[IA64] Add mca recovery failure messages
[IA64-SGI] fix SGI Altix tioce_reserve_m32() bug
[IA64] enable dumps to capture second page of kernel stack
[IA64-SGI] - Reduce overhead of reading sn_topology
[IA64-SGI] - Fix discover of nearest cpu node to IO node
[IA64] IOC4 config option ordering
[IA64] Setup an IA64 specific reclaim distance
[IA64] eliminate compile time warnings
[IA64] eliminate compile time warnings
[IA64-SGI] SN SAL call to inject memory errors
[IA64] - Fix MAX_PXM_DOMAINS for systems with > 256 nodes
[IA64] Remove unused variable in sn_sal.h
[IA64] Remove redundant NULL checks before kfree
[IA64] wire up compat_sys_adjtimex()
Update SN2 defconfig to latest kernel and add QLA FC drivers commonly
found in SN2 boxes.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When the mca recovery code encounters a condition that makes
the MCA non-recoverable, print the reason it could not recover.
This will make it easier to identify why the recovery code did
not recover.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The following patch fixes a bug in the SGI Altix tioce_reserve_m32()
code. The bug was that we could walking past the end of the CE ASIC
32/40bit PMU ATE Buffer, resulting in a PIO Reply Error.
Signed-off-by: Mike Habeck <habeck@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
MPI programs using certain debug options have a long
startup time. This was traced to a "vmalloc/vfree" in
the code that reads /proc/sgi_sn/sn_topology. On large
systems, vfree requires an IPI to all cpus to do TLB
purging.
Replace the vmalloc/vfree with kmalloc/kfree. Although
the size of the structure being allocated is unknown, it
will not not exceed 96 bytes.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix a bug that causes discovery of the nearest node/cpu to
a TIO (IO node) to fail.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Few of the notifier_chain_register() callers use __init in the definition
of notifier_call. It is incorrect as the function definition should be
available after the initializations (they do not unregister them during
initializations).
This patch fixes all such usages to _not_ have the notifier_call __init
section.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sys_splice() moves data to/from pipes with a file input/output. sys_vmsplice()
moves data to a pipe, with the input being a user address range instead.
This uses an approach suggested by Linus, where we can hold partial ranges
inside the pages[] map. Hopefully this will be useful for network
receive support as well.
Signed-off-by: Jens Axboe <axboe@suse.de>
SERIAL_SGI_IOC4 and BLK_DEV_SGIIOC4 depend upon SGI_IOC4, and
SERIAL_SGI_IOC3 depends upon SGI_IOC3. Currently the definitions
are out of order in the config sequence.
Fix by including drivers/sn/Kconfig immediately after SGI_SN,
upon which SGI_IOC4 and SGI_IOC3 depend.
Signed-off-by: Brent Casavant <bcasavan@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes following compile time warnings:
drivers/pci/pci-sysfs.c: In function `pci_read_legacy_io':
drivers/pci/pci-sysfs.c:257: warning: implicit declaration of function `ia64_pci_legacy_read'
drivers/pci/pci-sysfs.c: In function `pci_write_legacy_io':
drivers/pci/pci-sysfs.c:280: warning: implicit declaration of function `ia64_pci_legacy_write'
It also fixes wrong definition of ia64_pci_legacy_write (type of `bus' is not
`pci_dev', but `pci_bus').
Signed-Off-By: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is a trivial patch to remove following compile time warning:
arch/ia64/ia32/../../../fs/binfmt_elf.c:508: warning: 'randomize_stack_top' defined but not used
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Andrew Morton pointed out that compiler might not inline the functions
marked for inline in kprobes. There-by allowing the insertion of probes
on these kprobes routines, which might cause recursion.
This patch removes all such inline and adds them to kprobes section
there by disallowing probes on all such routines. Some of the routines
can even still be inlined, since these routines gets executed after the
kprobes had done necessay setup for reentrancy.
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
dmi_scan.c is arch-independent and is used by i386, x86_64, and ia64.
Currently all three arches compile it from arch/i386, which means that ia64
and x86_64 depend on things in arch/i386 that they wouldn't otherwise care
about.
This is simply "mv arch/i386/kernel/dmi_scan.c drivers/firmware/" (removing
trailing whitespace) and the associated Makefile changes. All three
architectures already set CONFIG_DMI in their top-level Kconfig files.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andrey Panin <pazke@orbita1.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'tee' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] splice: add support for sys_tee()
[PATCH] splice: pass offset around for ->splice_read() and ->splice_write()
This patch modifies ia64's show_mem() to walk the vmem_map page tables and
rapidly skip forward across regions where the page tables are missing.
This prevents the pfn_valid() check from causing numerous unnecessary
page faults.
Without this patch on a 512 node 512 cpu system where every node has four
memory holes, the show_mem() call takes 1 hour 18 minutes. With this
patch, it takes less than 3 seconds.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
ia64_wait_for_slaves() was changed in 2.6.17-rc1 to report the slave
state. It incorrectly assumes that all slaves are for MCA, but
ia64_wait_for_slaves() is also called from the INIT monarch handler.
The existing message is very misleading, so correct it.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Basically an in-kernel implementation of tee, which uses splice and the
pipe buffers as an intelligent way to pass data around by reference.
Where the user space tee consumes the input and produces a stdout and
file output, this syscall merely duplicates the data inside a pipe to
another pipe. No data is copied, the output just grabs a reference to the
input pipe data.
Signed-off-by: Jens Axboe <axboe@suse.de>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Prefetch mmap_sem in ia64_do_page_fault()
[IA64] Failure to resume after INIT in user space
[IA64] Pass more data to the MCA/INIT notify_die hooks
[IA64] always map VGA framebuffer UC, even if it supports WB
[IA64] fix bug in ia64 __mutex_fastpath_trylock
[IA64] for_each_possible_cpu: ia64
[IA64] update HP CSR space discovery via ACPI
[IA64] Wire up new syscalls {set,get}_robust_list
[IA64] 'msg' may be used uninitialized in xpc_initiate_allocate()
[IA64] Wire up new syscall sync_file_range()
Current implementations define NODES_SHIFT in include/asm-xxx/numnodes.h for
each arch. Its definition is sometimes configurable. Indeed, ia64 defines 5
NODES_SHIFT values in the current git tree. But it looks a bit messy.
SGI-SN2(ia64) system requires 1024 nodes, and the number of nodes already has
been changeable by config. Suitable node's number may be changed in the
future even if it is other architecture. So, I wrote configurable node's
number.
This patch set defines just default value for each arch which needs multi
nodes except ia64. But, it is easy to change to configurable if necessary.
On ia64 the number of nodes can be already configured in generic ia64 and SN2
config. But, NODES_SHIFT is defined for DIG64 and HP'S machine too. So, I
changed it so that all platforms can be configured via CONFIG_NODES_SHIFT. It
would be simpler.
See also: http://marc.theaimsgroup.com/?l=linux-kernel&m=114358010523896&w=2
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Take a hint from an x86_64 optimization by Arjan van de Ven and use it
for ia64. See a9ba9a3b38
Prefetch the mmap_sem, which is critical for the performance of the page fault
handler.
Note: mm may be NULL but I guess that is safe.
See 458f935527
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The OS INIT handler is loading incorrect values into cr.ifa on exit.
This shows up as a hang when resuming after an INIT that is delivered
while a cpu is in user space. Correct the value loaded into cr.ifa.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The MCA/INIT handlers maintain important state in the SAL to OS (sos)
area and in the monarch_cpu flag. Kernel debuggers (such as KDB) need
this data, and may need to adjust the monarch_cpu field so make the
data available to the notify_die hooks. Define two more events for
calling the functions on the notify_die chain.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
for_each_cpu() actually iterates across all possible CPUs. We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs. This is inefficient and
possibly buggy.
We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.
This patch replaces for_each_cpu with for_each_possible_cpu under
arch/ia64/kernel/.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fjitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Get rid of the manual search of _CRS, in favor of
acpi_get_vendor_resource() which is now provided by the ACPI CA. And fall
back to searching for a consumer-only address space descriptor if no
vendor-defined resource is found.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Otherwise, illegal configurations like X86_VOYAGER=y, PCI=y are
possible.
This patch also fixes the options select'ing ACPI to also select PCI.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
gcc3 thinks that a 32-bit field of a u64 type is itself a u64, so
should be printed with "%ld". gcc4 thinks it needs just "%d".
Make both versions happy by avoiding this construct.
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] ioremap() should prefer WB over UC
[IA64] Add __mca_table to the DISCARD list in gate.lds
[IA64] Move __mca_table out of the __init section
[IA64] simplify some condition checks in iosapic_check_gsi_range
[IA64] correct some messages and fixes some minor things
[IA64-SGI] fix for-loop in sn_hwperf_geoid_to_cnode()
[IA64-SGI] sn_hwperf use of num_online_cpus()
[IA64] optimize flush_tlb_range on large numa box
[IA64] lazy_mmu_prot_update needs to be aware of huge pages
This adds support for the sys_splice system call. Using a pipe as a
transport, it can connect to files or sockets (latter as output only).
From the splice.c comments:
"splice": joining two ropes together by interweaving their strands.
This is the "extended pipe" functionality, where a pipe is used as
an arbitrary in-memory buffer. Think of a pipe as a small kernel
buffer that you can use to transfer data from one end to the other.
The traditional unix read/write is extended with a "splice()" operation
that transfers data buffers to or from a pipe buffer.
Named by Larry McVoy, original implementation from Linus, extended by
Jens to support splicing to files and fixing the initial implementation
bugs.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
efi_memmap_init() collects full granules of WB memory, without
regard for whether they also support UC. So in order for ioremap()
to work for main memory, it must prefer WB mappings when possible.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add __mca_table to the DISCARD list for the gate.lds linker script to
avoid broken linker references when linking the final vmlinux file.
Also add comment to include/asm-ia64/asmmacros.h to avoid anyone else
hitting this problem in the future.
Credits to James Bottomley <James.Bottomley@SteelEye.com> for spotting
the DISCARD list in gate.lds.S
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Some condition checks on iosapic_check_gsi_range() can be omitted
because always `base <= end' is assured. This patch simplifies those
checks.
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch corrects some wrong comments and a printk message.
It also fixes some minor things, and makes all lines fit in
80 columns.
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tweak the proc setup code so things work OK with const
proc_dir_entry.proc_fops.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a for-loop in sn_hwperf_geoid_to_cnode(). It needs to loop over
num_cnodes to ensure it can still process TIO nodes in addition to
compute nodes on systems with many nodes. Interim fix until better
support for many (>265) nodes is complete.
Signed-off-by: Dean Roe <roe@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Eliminate an unnecessary -- and flawed -- use of the expensive
num_online_cpus().
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
It was reported from a field customer that global spin lock ptcg_lock
is giving a lot of grief on munmap performance running on a large numa
machine. What appears to be a problem coming from flush_tlb_range(),
which currently unconditionally calls platform_global_tlb_purge().
For some of the numa machines in existence today, this function is
mapped into ia64_global_tlb_purge(), which holds ptcg_lock spin lock
while executing ptc.ga instruction.
Here is a patch that attempt to avoid global tlb purge whenever
possible. It will use local tlb purge as much as possible. Though the
conditions to use local tlb purge is pretty restrictive. One of the
side effect of having flush tlb range instruction on ia64 is that
kernel don't get a chance to clear out cpu_vm_mask. On ia64, this mask
is sticky and it will accumulate if process bounces around. Thus
diminishing the possible use of ptc.l. Thoughts?
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Jack Steiner <steiner@sgi.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Function lazy_mmu_prot_update is also used on huge pages when it is called
by set_huge_ptep_writable, but it isn't aware of huge pages.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Acked-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
We noticed that notifier chains in the kernel fall into two basic usage
classes:
"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;
"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.
We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.
With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)
There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)
Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.
Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.
ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain
BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain
It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)
The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.
[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Because pgdat_list was linked to pgdat_list in *reverse* order, (By default)
some of arch has to sort it by themselves.
for_each_pgdat has gone..for_each_online_pgdat() uses node_online_map, which
doesn't need to be sorted.
This patch removes codes for sorting pgdat.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The find_*_bit() routines are defined to work on a pointer to unsigned long.
But partial_page.bitmap is unsigned int and it is passed to find_*_bit() in
arch/ia64/ia32/sys_ia32.c. So the compiler will print warnings.
This patch changes to unsigned long instead.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide proper kprobes fault handling, if a user-specified pre/post handlers
tries to access user address space, through copy_from_user(), get_user() etc.
The user-specified fault handler gets called only if the fault occurs while
executing user-specified handlers. In such a case user-specified handler is
allowed to fix it first, later if the user-specifed fault handler does not fix
it, we try to fix it by calling fix_exception().
The user-specified handler will not be called if the fault happens when single
stepping the original instruction, instead we reset the current probe and
allow the system page fault handler to fix it up.
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Anil S Keshavamurthy<anil.s.keshavamurthy@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently kprobe handler traps only happen in kernel space, so function
kprobe_exceptions_notify should skip traps which happen in user space.
This patch modifies this, and it is based on 2.6.16-rc4.
Signed-off-by: bibo mao <bibo.mao@intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Cc: <hiramatu@sdl.hitachi.co.jp>
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When kretprobe probes the schedule() function, if the probed process exits
then schedule() will never return, so some kretprobe instances will never
be recycled.
In this patch the parent process will recycle retprobe instances of the
probed function and there will be no memory leak of kretprobe instances.
Signed-off-by: bibo mao <bibo.mao@intel.com>
Cc: Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Create compat_sys_adjtimex and use it an all appropriate places.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We had a copy of the compatibility version of struct timex in each 64 bit
architecture. This patch just creates a global one and replaces all the
usages of the old ones.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Almost all users of the table addresses from the EFI system table want
physical addresses. So rather than doing the pa->va->pa conversion, just keep
physical addresses in struct efi.
This fixes a DMI bug: the efi structure contained the physical SMBIOS address
on x86 but the virtual address on ia64, so dmi_scan_machine() used ioremap()
on a virtual address on ia64.
This is essentially the same as an earlier patch by Matt Tolentino:
http://marc.theaimsgroup.com/?l=linux-kernel&m=112130292316281&w=2
except that this changes all table addresses, not just ACPI addresses.
Matt's original patch was backed out because it caused MCAs on HP sx1000
systems. That problem is resolved by the ioremap() attribute checking added
for ia64.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Check the EFI memory map so we can use the correct memory attributes for
ioremap(). Previously, we always used uncacheable access, which blows up on
some machines for regular system memory.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass the size, not a pointer to the size, to efi_mem_attribute_range().
This function validates memory regions for the /dev/mem read/write/mmap paths.
The pointer allows arches to reduce the size of the range, but I think that's
unnecessary complexity. Simplifying it will let me use
efi_mem_attribute_range() to improve the ia64 ioremap() implementation.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Enable DMI table parsing on ia64.
Andi Kleen has a patch in his x86_64 tree which enables the use of i386
dmi_scan.c on x86_64. dmi_scan.c functions are being used by the
drivers/char/ipmi/ipmi_si_intf.c driver for autodetecting the ports or
memory spaces where the IPMI controllers may be found.
This patch adds equivalent changes for ia64 as to what is in the x86_64
tree. In addition, I reworked the DMI detection, such that on EFI-capable
systems, it uses the efi.smbios pointer to find the table, rather than
brute-force searching from 0xF0000. On non-EFI systems, it continues the
brute-force search.
My test system, an Intel S870BN4 'Tiger4', aka Dell PowerEdge 7250, with
latest BIOS, does not list the IPMI controller in the ACPI namespace, nor
does it have an ACPI SPMI table. Also note, currently shipping Dell x8xx
EM64T servers don't have these either, so DMI is the only method for
obtaining the address of the IPMI controller.
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (22 commits)
[PATCH] fix audit_init failure path
[PATCH] EXPORT_SYMBOL patch for audit_log, audit_log_start, audit_log_end and audit_format
[PATCH] sem2mutex: audit_netlink_sem
[PATCH] simplify audit_free() locking
[PATCH] Fix audit operators
[PATCH] promiscuous mode
[PATCH] Add tty to syscall audit records
[PATCH] add/remove rule update
[PATCH] audit string fields interface + consumer
[PATCH] SE Linux audit events
[PATCH] Minor cosmetic cleanups to the code moved into auditfilter.c
[PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALL
[PATCH] Fix IA64 success/failure indication in syscall auditing.
[PATCH] Miscellaneous bug and warning fixes
[PATCH] Capture selinux subject/object context information.
[PATCH] Exclude messages by message type
[PATCH] Collect more inode information during syscall processing.
[PATCH] Pass dentry, not just name, in fsnotify creation hooks.
[PATCH] Define new range of userspace messages.
[PATCH] Filter rule comparators
...
Fixed trivial conflict in security/selinux/hooks.c
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] New IA64 core/thread detection patch
[IA64] Increase max node count on SN platforms
[IA64] Increase max node count on SN platforms
[IA64] Increase max node count on SN platforms
[IA64] Increase max node count on SN platforms
[IA64] Tollhouse HP: IA64 arch changes
[IA64] cleanup dig_irq_init
[IA64] MCA recovery: kernel context recovery table
IA64: Use early_parm to handle mvec_name and nomca
[IA64] move patchlist and machvec into init section
[IA64] add init declaration - nolwsys
[IA64] add init declaration - gate page functions
[IA64] add init declaration to memory initialization functions
[IA64] add init declaration to cpu initialization functions
[IA64] add __init declaration to mca functions
[IA64] Ignore disabled Local SAPIC Affinity Structure in SRAT
[IA64] sn_check_intr: use ia64_get_irr()
[IA64] fix ia64 is_hugepage_only_range
* master.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild: (46 commits)
kbuild: remove obsoleted scripts/reference_* files
kbuild: fix make help & make *pkg
kconfig: fix time ordering of writes to .kconfig.d and include/linux/autoconf.h
Kconfig: remove the CONFIG_CC_ALIGN_* options
kbuild: add -fverbose-asm to i386 Makefile
kbuild: clean-up genksyms
kbuild: Lindent genksyms.c
kbuild: fix genksyms build error
kbuild: in makefile.txt note that Makefile is preferred name for kbuild files
kbuild: replace PHONY with FORCE
kbuild: Fix bug in crc symbol generating of kernel and modules
kbuild: change kbuild to not rely on incorrect GNU make behavior
kbuild: when warning symbols exported twice now tell user this is the problem
kbuild: fix make dir/file.xx when asm symlink is missing
kbuild: in the section mismatch check try harder to find symbols
kbuild: fix section mismatch check for unwind on IA64
kbuild: kill false positives from section mismatch warnings for powerpc
kbuild: kill trailing whitespace in modpost & friends
kbuild: small update of allnoconfig description
kbuild: make namespace.pl CROSS_COMPILE happy
...
Trivial conflict in arch/ppc/boot/Makefile manually fixed up
alarm() calls the kernel with an unsigend int timeout in seconds. The
value is stored in the tv_sec field of a struct timeval to setup the
itimer. The tv_sec field of struct timeval is of type long, which causes
the tv_sec value to be negative on 32 bit machines if seconds > INT_MAX.
Before the hrtimer merge (pre 2.6.16) such a negative value was converted
to the maximum jiffies timeout by the timeval_to_jiffies conversion. It's
not clear whether this was intended or just happened to be done by the
timeval_to_jiffies code.
hrtimers expect a timeval in canonical form and treat a negative timeout as
already expired. This breaks the legitimate usage of alarm() with a
timeout value > INT_MAX seconds.
For 32 bit machines it is therefor necessary to limit the internal seconds
value to avoid API breakage. Instead of doing this in all implementations
of sys_alarm the duplicated sys_alarm code is moved into a common function
in itimer.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
IPF SDM 2.2 changes definition of PAL_LOGICAL_TO_PHYSICAL to add
proc_number=-1 to get core/thread mapping info on the running processer.
Based on this change, we had better to update existing core/thread
detection in IA64 kernel correspondingly. The attached patch implements
this change. It simplifies detection code and eliminates potential race
condition. It also runs a bit faster and has better scalability especially
when cores and threads number grows up in one package.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Node number are kept in the cpu_to_node_map which is
currently defined as u8. Change to u16 to accomodate
larger node numbers.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add support in IA64 acpi for platforms that support more than
256 nodes. Currently, ACPI is limited to 256 nodes because the
proximity domain number is 8-bits.
Long term, we expect to use ACPI3.0 to support >256 nodes.
This patch is an interim solution that works with platforms
that pass the high order bits of the proximity domain in
"reserved" fields of the ACPI tables. This code is enabled
ONLY on SN platforms.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add a configuration option to allow the maximum
number of nodes to be configurable for GENERIC or SN
kernels.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
arch/ia64/sn and include/asm-ia64/sn changes required to support Tollhouse
system PCI hotplug, fixes the ia64_sn_sysctl_ioboard_get call, and introduces
the PRF_HOTPLUG_SUPPORT feature bit.
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
dig_irq_init is equivalent to machvec_noop, no need to define
another empty function.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Memory errors encountered by user applications may surface
when the CPU is running in kernel context. The current code
will not attempt recovery if the MCA surfaces in kernel
context (privilage mode 0). This patch adds a check for cases
where the user initiated the load that surfaces in kernel
interrupt code.
An example is a user process lauching a load from memory
and the data in memory had bad ECC. Before the bad data
gets to the CPU register, and interrupt comes in. The
code jumps to the IVT interrupt entry point and begins
execution in kernel context. The process of saving the
user registers (SAVE_REST) causes the bad data to be loaded
into a CPU register, triggering the MCA. The MCA surfaces in
kernel context, even though the load was initiated from
user context.
As suggested by David and Tony, this patch uses an exception
table like approach, puting the tagged recovery addresses in
a searchable table. One difference from the exception table
is that MCAs do not surface in precise places (such as with
a TLB miss), so instead of tagging specific instructions,
address ranges are registers. A single macro is used to do
the tagging, with the input parameter being the label
of the starting address and the macro being the ending
address. This limits clutter in the code.
This patch only tags one spot, the interrupt ivt entry.
Testing showed that spot to be a "heavy hitter" with
MCAs surfacing while saving user registers. Other spots
can be added as needed by adding a single macro.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
include/linux/platform.h contained nothing that was actually used except
the default_idle() prototype, and is therefore removed by this patch.
This patch does the following with the platform specific default_idle()
functions on different architectures:
- remove the unused function:
- parisc
- sparc64
- make the needlessly global function static:
- arm
- h8300
- m68k
- m68knommu
- s390
- v850
- x86_64
- add a prototype in asm/system.h:
- cris
- i386
- ia64
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Patrick Mochel <mochel@digitalimplant.org>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I'm not sure of the worthiness of this idea, so please consider it an RFC.
Its key merits are:
* Reuse existing infrastructure
* Greatly tightens up the parsing of nomca
* Greatly simplifies the parsing of machvec
Addition cleanup (moving setup_mvec() to machvec.c) by Ken Chen.
Signed-Off-By: Horms <horms@verge.net.au>
Signed-Off-By: Tony Luck <tony.luck@intel.com>
This patch removes all occurances of _INLINE_ in the kernel.
With the exception of tty_flip.h, I've simply removed the inline's since
gcc should know best which functions to be inlined.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ia64_mv is initialized based on platform detected or specified.
However, there is one instantiation of each platform type. We
don't expect to switch platform vector during run time. Move
those platform specific type into init section since a copy is
made into global ia64_mv at initialization.
Also move instruction patch list into init section as well.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add init declaration to bunch of patch functions and gate
page setup function.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add init declaration to variables/functions used for memory
initialization. I don't think they would clash with memory
hotplug. If they do, please yell.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add init declaration to cpu initialization functions.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Mark init related variable and functions with appropriate
__init* declaration to mca functions.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
According to the ACPI spec, the OSPM must ignore the contents of the
Processor Local APIC/SAPIC Affinity Structure in System Resource
Affinity Table (SRAT), if its enable flag is cleared. However, ia64
linux refers all of the Processor Local APIC/SAPIC Affinity Structures
in SRAT regardless of the enable flag. This is obviously against the
ACPI spec. This patch fixes this bug.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Use the recently-added ia64_get_irr() rather than duplicating the code.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
fix is_hugepage_only_range() definition to be "overlaps"
instead of "within architectural restricted hugetlb address
range". Simplify the ia64 specific code that used to use
is_hugepage_only_range() to just check which region the
address is in.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Quite a long time back, prepare_hugepage_range() replaced
is_aligned_hugepage_range() as the callback from mm/mmap.c to arch code to
verify if an address range is suitable for a hugepage mapping.
is_aligned_hugepage_range() stuck around, but only to implement
prepare_hugepage_range() on archs which didn't implement their own.
Most archs (everything except ia64 and powerpc) used the same
implementation of is_aligned_hugepage_range(). On powerpc, which
implements its own prepare_hugepage_range(), the custom version was never
used.
In addition, "is_aligned_hugepage_range()" was a bad name, because it
suggests it returns true iff the given range is a good hugepage range,
whereas in fact it returns 0-or-error (so the sense is reversed).
This patch cleans up by abolishing is_aligned_hugepage_range(). Instead
prepare_hugepage_range() is defined directly. Most archs use the default
version, which simply checks the given region is aligned to the size of a
hugepage. ia64 and powerpc define custom versions. The ia64 one simply
checks that the range is in the correct address space region in addition to
being suitably aligned. The powerpc version (just as previously) checks
for suitable addresses, and if necessary performs low-level MMU frobbing to
set up new areas for use by hugepages.
No libhugetlbfs testsuite regressions on ppc64 (POWER5 LPAR).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
set_page_count usage outside mm/ is limited to setting the refcount to 1.
Remove set_page_count from outside mm/, and replace those users with
init_page_count() and set_page_refcounted().
This allows more debug checking, and tighter control on how code is allowed
to play around with page->_count.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Original 2.6.9 patch and explanation from somewhere within HP via
bugzilla...
ia64 stores a success/failure code in r10, and the return value (normal
return, or *positive* errno) in r8. The patch also sets the exit code to
negative errno if it's a failure result for consistency with other
architectures.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
A pte may be zapped by the swapper, exiting process, unmapping or page
migration while the accessed or dirty bit handers are about to run. In that
case the accessed bit or dirty is set on an zeroed pte which leads the VM to
conclude that this is a swap pte. This may lead to
- Messages from the vm like
swap_free: Bad swap file entry 4000000000000000
- Processes being aborted
swap_dup: Bad swap file entry 4000000000000000
VM: killing process ....
Page migration is particular suitable for the creation of this race since
it needs to remove and restore page table entries.
The fix here is to check for the present bit and simply not update
the pte if the page is not present anymore. If the page is not present
then the fault handler should run next which will take care of the problem
by bringing the page back and then mark the page dirty or move it onto the
active list.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When there is no bus check, the return code should be failure, not success.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
This stuff is all in the generic ia64 kernel, and the new initcall error
reporting complains about them.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Turn on CONFIG_PNPACPI. I recently removed 8250_acpi.c. All devices
previously claimed by 8250_acpi.c should now be claimed by 8250_pnp.c.
This depends on having CONFIG_PNPACPI so ACPI devices show up as PNP
devices.
All other ia64 defconfigs either have CONFIG_PNPACPI already, or
don't have 8250 support turned on at all.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The MCA recovery messages are currently KERN_DEBUG,
so they don't show up in /var/log/messages (by default).
Increase the severity to KERN_ERR, for the initial
message (and also add the physical address to this
message). Leave the successful isolation message as
KERN_DEBUG, but increase the severity when isolation
fails to KERN_CRIT.
[Russ' patch made these all KERN_CRIT]
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
The kbuild system takes advantage of an incorrect behavior in GNU make.
Once this behavior is fixed, all files in the kernel rebuild every time,
even if nothing has changed. This patch ensures kbuild works with both
the incorrect and correct behaviors of GNU make.
For more details on the incorrect behavior, see:
http://lists.gnu.org/archive/html/bug-make/2006-03/msg00003.html
Changes in this patch:
- Keep all targets that are to be marked .PHONY in a variable, PHONY.
- Add .PHONY: $(PHONY) to mark them properly.
- Remove any $(PHONY) files from the $? list when determining whether
targets are up-to-date or not.
Signed-off-by: Paul Smith <psmith@gnu.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Allow sysadmin to disable all warnings about userland apps
making unaligned accesses by using:
# echo 1 > /proc/sys/kernel/ignore-unaligned-usertrap
Rather than having to use prctl on a process by process basis.
Default behaivour leaves the warnings enabled.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
beautify coding style for zeroing end of fsyscall_table entries.
Remove misleading __NR_syscall_last and add more comments.
Drop (now unneeded) "guard against failure to increase NR_syscalls"
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
arch/ia64/kernel/unaligned.c erroneously marked die_if_kernel()
with a "noreturn" attribute ... which is silly (it returns whenever
the argument regs say that the fault happened in user mode, as one
might expect given the "if_kernel" part of its name!). Thanks to
Alan and Gareth for pointing this out.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Christoph Hellwig <hch@infradead.org> pointed that there are no
in-tree uses of this. So revert 9c65cb9be6
Signed-off-by: Tony Luck <tony.luck@intel.com>
pcibios_setup() should return NULL if it handled a parameter. Since ia64
handles no parameters, it should return the string that was passed in,
not NULL. This brings ia64 into line with all other architectures that
handle no parameters.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Take advantage of kzalloc() as well as reduce the size of code generated
for the error returns in xpc_setup_infrastructure().
Signed-off-by: Jes Sorensen <jes@sgi.com>
Acked-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
unaligned_access does fetch cr.ipsr, then calls
dispatch_unaligned_handler, but dispatch_unaligned_handler fetches
cr.ipsr again, so delete the first one.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Commit 9ec4b1f356 made kprobes not compile
without module support, so just make that clear in the Kconfig file.
Also, since it's marked EXPERIMENTAL, make that dependency explicit too.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Minor updates to earlier patch.
- Added to documentation to add ia64 as well.
- Minor clarification on how to use disabled cpus
- used plain max instead of max_t per Andew Morton.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
It appears that if auditing is enabled, the kernel fails to
check for pending signals before returning to user mode.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Acked-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Trivial port of this feature from i386
As it stands, panic_on_oops but does nothing on ia64
Signed-Off-By: Horms <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The original ia64 udelay() was simple, but flawed for platforms without
synchronized ITCs: a preemption and migration to another CPU during the
while-loop likely resulted in too-early termination or very, very
lengthy looping.
The first fix (now in 2.6.15) broke the delay loop into smaller,
non-preemptible chunks, reenabling preemption between the chunks. This
fix is flawed in that the total udelay is computed to be the sum of just
the non-premptible while-loop pieces, i.e., not counting the time spent
in the interim preemptible periods. If an interrupt or a migration
occurs during one of these interim periods, then that time is invisible
and only serves to lengthen the effective udelay().
This new fix backs out the current flawed fix and returns to a simple
udelay(), fully preemptible and interruptible. It implements two simple
alternative udelay() routines: one a default generic version that uses
ia64_get_itc(), and the other an sn-specific version that uses that
platform's RTC.
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix XPC so that it does not deliver any messages until the connected
callout has returned, as well as, prevent the disconnected callout to
occur before the disconnecting callout has returned.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The __sn_cnodeid_to_nasid array was incorrectly sized at MAX_NUMNODES.
On a large system, this array could overflow. The following patch
corrects this by defining it to MAX_COMPACT_NODES.
Signed-off-by: Dean Roe <roe@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This one falls into the "present for Andrew Morton" category to address
his wishlist for a compiler warning free build ;-)
Signed-off-by: Tony Luck <tony.luck@intel.com>
General SN2 code cleanup:
- Do not initialize global variables to zero
- Use kzalloc instead of kmalloc+memset
- Check kmalloc return values
- Do not obfuscate spin lock calls
- Remove some unused code
- Various formatting cleanups
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove symbol exports from ia64_ksyms.c that are already exported in
lib/string.c.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Have a facility to account for potentially hot-pluggable CPUs. ACPI doesnt
give a determinstic method to find hot-pluggable CPUs. Hence we use 2 methods
to assist.
- BIOS can mark potentially hot-pluggable CPUs as disabled in the MADT tables.
- User can specify the number of hot-pluggable CPUs via parameter
additional_cpus=X
The option is enabled only if ACPI_CONFIG_HOTPLUG_CPU=y which enables the
physical hotplug option. Without which user can still use logical onlining
and offlining of CPUs by enabling CONFIG_HOTPLUG_CPU=y
Adds more bits to cpu_possible_map for potentially hot-pluggable cpus.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Do not set cpu_possible_map for NR_CPUS when ACPI_CONFIG_HOTPLUG_CPU is set.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
MCA driver can cause panic if kernel gets a state info with no minstate.
This patch adds minstate validation before handling it.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove an erroneous kfree, and unlink the pcidev_info struct from the
pcidev_info list prior to free'ing the pcidev_info struct.
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Registers system call for the ia64 architecture.
Reserves space for ppoll and pselect, and adds unshare at system
call number 1296.
Signed-off-by: Janak Desai <janak@us.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
No platform in the community tree uses PLATFORM_MCA_HANDLERS, remove
the references.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Update the comm field on the MCA handler for user tasks as well as for
verified kernel tasks. This helps to identify the task that was
running when the MCA occurred.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Print a message identifying the monarch MCA handler. Print a summary
of the status of the slave MCA cpus.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Prevent SN2 specific code to be executed on non SN2 platforms when
running a generic kernel.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There were two problems with enabling the PRINTK_TIME config
option:
1) The first calls to printk() occur before per-cpu data virtual
address is pinned into the TLB, so sched_clock() can fault.
2) sched_clock() is based on ar.itc, which may not be synchronized
across cpus.
Ken Chen started this patch, Tony Luck tinkered with it, and Jes
Sorensen perfected it.
Signed-off-by: Tony Luck <tony.luck@intel.com>
The check of (end != cp) after memparse in efi.c looks wrong to me.
The result is that we can't use mem= and max_addr= kernel parameter at
the same time.
The following patch removed the check just like other arches do.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Rewrite the SN pio_phys_xxx macros in assembly language. This
avoids issues with the Intel icc compiler. Function call
overhead is not an issue - the functions reference PIOs
and take 100's nsec to complete.
In addition, the functions should likely be in assembly
language anyway - they reference memory using physical
addressing mode. One function executes with psr.ic disabled.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
After converting the cpu physical address to shub2 physical
addressing, the address was run through TO_PHYS() which
clobbered a high node offset bit causing the BTE to fail
on shub2 nodes with large memory. This fix corrects
that problem.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
The patch implements cpu topology exportation by sysfs.
Items (attributes) are similar to /proc/cpuinfo.
1) /sys/devices/system/cpu/cpuX/topology/physical_package_id:
represent the physical package id of cpu X;
2) /sys/devices/system/cpu/cpuX/topology/core_id:
represent the cpu core id to cpu X;
3) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
represent the thread siblings to cpu X in the same core;
4) /sys/devices/system/cpu/cpuX/topology/core_siblings:
represent the thread siblings to cpu X in the same physical package;
To implement it in an architecture-neutral way, a new source file,
driver/base/topology.c, is to export the 5 attributes.
If one architecture wants to support this feature, it just needs to
implement 4 defines, typically in file include/asm-XXX/topology.h.
The 4 defines are:
#define topology_physical_package_id(cpu)
#define topology_core_id(cpu)
#define topology_thread_siblings(cpu)
#define topology_core_siblings(cpu)
The type of **_id is int.
The type of siblings is cpumask_t.
To be consistent on all architectures, the 4 attributes should have
deafult values if their values are unavailable. Below is the rule.
1) physical_package_id: If cpu has no physical package id, -1 is the
default value.
2) core_id: If cpu doesn't support multi-core, its core id is 0.
3) thread_siblings: Just include itself, if the cpu doesn't support
HT/multi-thread.
4) core_siblings: Just include itself, if the cpu doesn't support
multi-core and HT/Multi-thread.
So be careful when declaring the 4 defines in include/asm-XXX/topology.h.
If an attribute isn't defined on an architecture, it won't be exported.
Thank Nathan, Greg, Andi, Paul and Venki.
The patch provides defines for i386/x86_64/ia64.
Signed-off-by: Zhang, Yanmin <yanmin.zhang@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
During some testing, we got a warning about trying to allocate
memory while holding a lock. This fixes that problem.
Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Maintenance patch:
- Add missing __init calls
- Do not zero initialize global variables
- No need to typecast function call returns to void
- Some formatting
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
If SAL_CACHE_FLUSH drops interrupts, complain about it and fall back to
using PAL_CACHE_FLUSH instead.
This is to work around a defect in HP rx5670 firmware: when an interrupt
occurs during SAL_CACHE_FLUSH, SAL drops the interrupt but leaves it marked
"in-service", which leaves the interrupt (and others of equal or lower
priority) masked.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Somehow I doubt this comment is meant to be here anymore... It's
been floating after the L1_CACHE_SHIFT entry since before Linux
moved to bitkeeper.
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Temporary patch to make pci_enable_msi() fail gracefully on altix. Will be
removed after 2.6.16 releases and the msi abstraction patches start flowing.
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Redirecting interrupts using smp_affinity on altix does not work on kernels
built with CONFIG_PCI_MSI. The problem is that move_irq() turns into a noop
if MSI is built in. This patch calls move_native_irq() instead of move_irq()
to get around that.
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
On SN2, MMIO writes which are issued from separate processors are not
guaranteed to arrive in any particular order at the IO hardware. When
performing such writes from the kernel this is not a problem, as a
kernel thread will not migrate to another CPU during execution, and
mmiowb() calls can guarantee write ordering when control of the IO
resource is allowed to move between threads.
However, when MMIO writes can be performed from user space (e.g. DRM)
there are no such guarantees and mechanisms, as the process may
context-switch at any time, and may migrate to a different CPU as part
of the switch. For such programs/hardware to operate correctly, it is
required that the MMIO writes from the old CPU be accepted by the IO
hardware before subsequent writes from the new CPU can be issued.
The following patch implements this behavior on SN2 by waiting for a
Shub register to indicate that these writes have been accepted. This
is placed in the context switch-in path, and only performs the wait
when the newly scheduled task changes CPUs.
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Brent Casavant <bcasavan@sgi.com>
This patch finishes support for SHUB2 (the new chipset). Most of the
changes are performance related. A few changes are workarounds for
"interesting" chipset features.
Some temporary debugging code has also been deleted.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Various bugfixes and hardware bug workarounds necessary for the rev 1.0 version
of the altix TIO CE asic.
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The only user of the MCA/INIT sigdelayed code (SGI's I/O probing) has
moved from the kernel into SAL. Delete the MCA/INIT sigdelayed code.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Actually I think this is more appropriate so we don't end up with 17
cases that add drivers/sn to the build lib.
Include drivers/sn when CONFIG_IA64_SGI_SN2 or CONFIG_IA64_GENERIC
is enabled.
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
arch/ia64/sn/Makefile sets CPPFLAGS, expecting that setting to
propogate to all the subdirectories. For a normal build with its
recursive descent it does work, but doing a selective build like
'make arch/ia64/sn/kernel/io_init.i' does not do a recursive descent,
it goes directly to arch/ia64/sn/kernel/Makefile so the flags do not
get set.
To support selective builds, set the flags in all the subordinate Makefiles.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove the GFP_DMA flag from XPC kmalloc() calls.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Acked-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Eliminate a hot shared cacheline that occurs if multiple cpus are
taking unaligned exceptions.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Takashi helped us track down a bad page state bug we thought was coming
from alsa. It turns out we weren't paying attention to the gfp flags
that were passed in to sn_dma_alloc_coherent().
From: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Edwards <edwardsg@sgi.com>
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
sos->os_status is set to a default value of IA64_MCA_COLD_BOOT for an
MCA, but then is incorrectly overwritten with IA64_MCA_SAME_CONTEXT (0).
This makes SAL think that all MCAs have been recovered.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix an unnecessary softlockup watchdog warning in the ia64
uncached_build_memmap() that occurs occasionally at 256p and always at
512p. The problem occurs at boot time.
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Migrate perfmon from using an old semaphore to a completion handler.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Migrate arch/ia64/ia32/sys_ia32 to using a mutex for mmap protection.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Adds the ability to disability packet split at compile time and use the legacy receive path on PCI express hardware. Made this a CONFIG option and modified the Kconfig, to reflect the new option.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: John Ronciak <john.ronciak@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Migrate sn2 code to use mutex and completion events rather than
semaphores.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Acked-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Bugfix... the altix SN_SAL_IOIF_SLOT_ENABLE & SN_SAL_IOIF_SLOT_DISABLE
SAL calls need to pass the segment# down
Signed-off-by: Mike Habeck <habeck@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Work-around to temporarily support older PROMs with new flush device code.
This code allows systems running older PROMs to continue to run on the new
kernel base until a new official PROM is released.
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Acked-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch fixes the bug that pci_claim_resource() is called multiple
times for the same P2P bridge's resource structures if P2P bridges
require their own PCI I/O resources.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
First step to memory hotplug for ia64 (add only,
all new memory is added to node 0, does not use
ZONE_EASY_RECLAIM yet).
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add driver support for a 2 port PCI IOC3-based serial card on Altix boxes:
This is a re-submission. On the original submission I was asked to
organize the code so that the MIPS ioc3 ethernet and serial parts could be
used with this driver. Stanislaw Skowronek was kind enough to provide the
shim layer for this - thanks Stanislaw. This patch includes the shim layer
and the Altix PCI ioc3 serial driver. The MIPS merged ioc3 ethernet and
serial support is forthcoming.
Signed-off-by: Patrick Gefre <pfg@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
tmp_buf_sem sems to be a common name for something completely unused...
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de> ("usb portion")
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
TTY layer buffering revamp broke ia64 in commit
33f0f88f1c
CC arch/ia64/hp/sim/simserial.o
arch/ia64/hp/sim/simserial.c: In function `receive_chars':
arch/ia64/hp/sim/simserial.c:170: error: structure has no member named `flip'
... and so on ...
make[1]: *** [arch/ia64/hp/sim/simserial.o] Error 1
Patch from Andreas Schwab.
Signed-off-by: Tony Luck <tony.luck@intel.com>
When jprobe is hit, the function parameters of the original function
should be saved before jprobe handler is executed, and restored it after
jprobe handler is executed, because jprobe handler might change the
register values due to tail call optimization by the gcc.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add hotplug cpu support to salinfo.c.
The cpu_event field is a cpumask so use the cpu_* macros consistently,
replacing the existing mixture of cpu_* and *_bit macros.
Instead of counting the number of outstanding events in a semaphore and
trying to track that count over user space context, interrupt context,
non-maskable interrupt context and cpu hotplug, replace the semaphore
with a test for "any bits set" combined with a mutex.
Modify the locking to make the test for "work to do" an atomic
operation.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
We need to handle debug traps in fsys mode non-fatally. They can
happen now that we have fsyscalls which contain probe instructions.
Signed-off-by: Jason Uhlenkott <jasonuhl@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch separates the sn_flush_device_list struct into kernel and
common (both kernel and PROM accessible) structures. As it was, if the
size of a spinlock_t changed (due to additional CONFIG options, etc.) the
sal call which populated the sn_flush_device_list structs would erroneously
write data (and cause memory corruption and/or a panic).
This patch does the following:
1. Removes sn_flush_device_list and adds sn_flush_device_common and
sn_flush_device_kernel.
2. Adds a new SAL call to populate a sn_flush_device_common struct per
device, not per widget as previously done.
3. Correctly initializes each device's sn_flush_device_kernel spinlock_t
struct (before it was only doing each widget's first device).
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
I originally thought this was an bug only in the SN code, but I think I
also see a hole in the generic IA64 tlb code. (Separate patch was sent
for the SN problem).
It looks like there is a bug in the TLB flushing code. During context switch,
kernel threads (kswapd, for example) inherit the mm of the task that was
previously running on the cpu. Normally, this is ok because the previous context
is still loaded into the RR registers. However, if the owner of the mm
migrates to another cpu, changes it's context number, and references a
page before kswapd issues a tlb_purge for that same page, the purge will be
done with a stale context number (& RR registers).
Signed-off-by: Tony Luck <tony.luck@intel.com>
Altix (shub2) pushes the BTE clean-up into SAL.
This patch correctly interfaces with the now implemented SAL call.
It also fixes a bug when delaying clean-up to allow busy BTEs to
complete (or error out).
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
On return from INIT handler we must convert the address of the
minstate area from a kernel virtual uncached address (0xC...)
to physical uncached (0x8...). A typo (or thinko?) in the code
converted to physical cached.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cleanup a few items after moving xpc.h from arch/ia64/sn/kernel to
include/asm-ia64/sn.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Move xpc.h from arch/ia64/sn/kernel to include/asm-ia64/sn without change.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Move xpc_system_reboot() to be closer to the file it calls for readability
reasons (which are indeed subjective).
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Allow for the loss of heartbeat while in kdebug to be ignored by remote
partitions.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cleanup the XPC disengage related messages that are printed to the log.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch fixes a problem in XPC disengage processing whereby it was not
seeing the request to disengage from a remote partition, so the disengage
wasn't happening. The disengagement is suppose to transpire during the time
a XPC channel is disconnecting, and should be completed before the channel
is declared to be disconnected.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When this new syscall was added to ia64 in commit
39743889aa
fsys.S was forgotten. Add a ".data8 0" there to keep
it in step. [Reported by Stephane Eranian]
Signed-off-by: Tony Luck <tony.luck@intel.com>
on ia64 thread_info is at the constant offset from task_struct and stack
is embedded into the same beast. Set __HAVE_THREAD_FUNCTIONS, made
task_thread_info() just add a constant.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
)
From: Ingo Molnar <mingo@elte.hu>
This is the latest version of the scheduler cache-hot-auto-tune patch.
The first problem was that detection time scaled with O(N^2), which is
unacceptable on larger SMP and NUMA systems. To solve this:
- I've added a 'domain distance' function, which is used to cache
measurement results. Each distance is only measured once. This means
that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT
distances 0 and 1, and on SMP distance 0 is measured. The code walks
the domain tree to determine the distance, so it automatically follows
whatever hierarchy an architecture sets up. This cuts down on the boot
time significantly and removes the O(N^2) limit. The only assumption
is that migration costs can be expressed as a function of domain
distance - this covers the overwhelming majority of existing systems,
and is a good guess even for more assymetric systems.
[ People hacking systems that have assymetries that break this
assumption (e.g. different CPU speeds) should experiment a bit with
the cpu_distance() function. Adding a ->migration_distance factor to
the domain structure would be one possible solution - but lets first
see the problem systems, if they exist at all. Lets not overdesign. ]
Another problem was that only a single cache-size was used for measuring
the cost of migration, and most architectures didnt set that variable
up. Furthermore, a single cache-size does not fit NUMA hierarchies with
L3 caches and does not fit HT setups, where different CPUs will often
have different 'effective cache sizes'. To solve this problem:
- Instead of relying on a single cache-size provided by the platform and
sticking to it, the code now auto-detects the 'effective migration
cost' between two measured CPUs, via iterating through a wide range of
cachesizes. The code searches for the maximum migration cost, which
occurs when the working set of the test-workload falls just below the
'effective cache size'. I.e. real-life optimized search is done for
the maximum migration cost, between two real CPUs.
This, amongst other things, has the positive effect hat if e.g. two
CPUs share a L2/L3 cache, a different (and accurate) migration cost
will be found than between two CPUs on the same system that dont share
any caches.
(The reliable measurement of migration costs is tricky - see the source
for details.)
Furthermore i've added various boot-time options to override/tune
migration behavior.
Firstly, there's a blanket override for autodetection:
migration_cost=1000,2000,3000
will override the depth 0/1/2 values with 1msec/2msec/3msec values.
Secondly, there's a global factor that can be used to increase (or
decrease) the autodetected values:
migration_factor=120
will increase the autodetected values by 20%. This option is useful to
tune things in a workload-dependent way - e.g. if a workload is
cache-insensitive then CPU utilization can be maximized by specifying
migration_factor=0.
I've tested the autodetection code quite extensively on x86, on 3
P3/Xeon/2MB, and the autodetected values look pretty good:
Dual Celeron (128K L2 cache):
---------------------
migration cost matrix (max_cache_size: 131072, cpu: 467 MHz):
---------------------
[00] [01]
[00]: - 1.7(1)
[01]: 1.7(1) -
---------------------
cacheflush times [2]: 0.0 (0) 1.7 (1784008)
---------------------
Here the slow memory subsystem dominates system performance, and even
though caches are small, the migration cost is 1.7 msecs.
Dual HT P4 (512K L2 cache):
---------------------
migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz):
---------------------
[00] [01] [02] [03]
[00]: - 0.4(1) 0.0(0) 0.4(1)
[01]: 0.4(1) - 0.4(1) 0.0(0)
[02]: 0.0(0) 0.4(1) - 0.4(1)
[03]: 0.4(1) 0.0(0) 0.4(1) -
---------------------
cacheflush times [2]: 0.0 (33900) 0.4 (448514)
---------------------
Here it can be seen that there is no migration cost between two HT
siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory
system makes inter-physical-CPU migration pretty cheap: 0.4 msecs.
8-way P3/Xeon [2MB L2 cache]:
---------------------
migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz):
---------------------
[00] [01] [02] [03] [04] [05] [06] [07]
[00]: - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[01]: 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[02]: 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[03]: 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[04]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1)
[05]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1)
[06]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1)
[07]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) -
---------------------
cacheflush times [2]: 0.0 (0) 19.2 (19281756)
---------------------
This one has huge caches and a relatively slow memory subsystem - so the
migration cost is 19 msecs.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: <wilder@us.ibm.com>
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add per-arch sched_cacheflush() which is a write-back cacheflush used by
the migration-cost calibration code at bootup time.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
arch: Use <linux/capability.h> where capable() is used.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is a window where a probe gets removed right after the probe is hit
on some different cpu. In this case probe handlers can't find a matching
probe instance related to break address. In this case we need to read the
original instruction at break address to see if that is not a break/int3
instruction and recover safely.
Previous code had a bug where we were not checking for the above race in
case of reentrant probes and the below patch fixes this race.
Tested on IA64, Powerpc, x86_64.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently arch_remove_kprobes() is only implemented/required for x86_64 and
powerpc. All other architecture like IA64, i386 and sparc64 implementes a
dummy function which is being called from arch independent kprobes.c file.
This patch removes the dummy functions and replaces it with
#define arch_remove_kprobe(p, s) do { } while(0)
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that all these entries in the arch ioctl32.c files are gone [1], we can
build fs/compat_ioctl.c as a normal object and kill tons of cruft. We need a
special do_ioctl32_pointer handler for s390 so the compat_ptr call is done.
This is not needed but harmless on all other architectures. Also remove some
superflous includes in fs/compat_ioctl.c
Tested on ppc64.
[1] parisc still had it's PPP handler left, which is not fully correct
for ppp and besides that ppp uses the generic SIOCPRIV ioctl so it'd
kick in for all netdevice users. We can introduce a proper handler
in one of the next patch series by adding a compat_ioctl method to
struct net_device but for now let's just kill it - parisc doesn't
compile in mainline anyway and I don't want this to block this
patchset.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The comment in compat.c is wrong, every architecture provides a
get_compat_sigevent() for the IPC compat code already.
This basically moves the x86_64 version to common code and removes all the
others.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a hook so architectures can validate /dev/mem mmap requests.
This is analogous to validation we already perform in the read/write
paths.
The identity mapping scheme used on ia64 requires that each 16MB or
64MB granule be accessed with exactly one attribute (write-back or
uncacheable). This avoids "attribute aliasing", which can cause a
machine check.
Sample problem scenario:
- Machine supports VGA, so it has uncacheable (UC) MMIO at 640K-768K
- efi_memmap_init() discards any write-back (WB) memory in the first granule
- Application (e.g., "hwinfo") mmaps /dev/mem, offset 0
- hwinfo receives UC mapping (the default, since memmap says "no WB here")
- Machine check abort (on chipsets that don't support UC access to WB
memory, e.g., sx1000)
In the scenario above, the only choices are
- Use WB for hwinfo mmap. Can't do this because it causes attribute
aliasing with the UC mapping for the VGA MMIO space.
- Use UC for hwinfo mmap. Can't do this because the chipset may not
support UC for that region.
- Disallow the hwinfo mmap with -EINVAL. That's what this patch does.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove various things which were checking for gcc-1.x and gcc-2.x compilers.
From: Adrian Bunk <bunk@stusta.de>
Some documentation updates and removes some code paths for gcc < 3.2.
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The ptrace_get_task_struct() helper that I added as part of the ptrace
consolidation is useful in variety of places that currently opencode it.
Switch them to the common helpers.
Add a ptrace_traceme() helper that needs to be explicitly called, and simplify
the ptrace_get_task_struct() interface. We don't need the request argument
now, and we return the task_struct directly, using ERR_PTR() for error
returns. It's a bit more code in the callers, but we have two sane routines
that do one thing well now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sys_migrate_pages implementation using swap based page migration
This is the original API proposed by Ray Bryant in his posts during the first
half of 2005 on linux-mm@kvack.org and linux-kernel@vger.kernel.org.
The intent of sys_migrate is to migrate memory of a process. A process may
have migrated to another node. Memory was allocated optimally for the prior
context. sys_migrate_pages allows to shift the memory to the new node.
sys_migrate_pages is also useful if the processes available memory nodes have
changed through cpuset operations to manually move the processes memory. Paul
Jackson is working on an automated mechanism that will allow an automatic
migration if the cpuset of a process is changed. However, a user may decide
to manually control the migration.
This implementation is put into the policy layer since it uses concepts and
functions that are also needed for mbind and friends. The patch also provides
a do_migrate_pages function that may be useful for cpusets to automatically
move memory. sys_migrate_pages does not modify policies in contrast to Ray's
implementation.
The current code here is based on the swap based page migration capability and
thus is not able to preserve the physical layout relative to it containing
nodeset (which may be a cpuset). When direct page migration becomes available
then the implementation needs to be changed to do a isomorphic move of pages
between different nodesets. The current implementation simply evicts all
pages in source nodeset that are not in the target nodeset.
Patch supports ia64, i386 and x86_64.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This was causing some ordering problems. Remove the up-front evaluation
and just revaluate the compiler version each time we need it.
(The up-front evaluation was problematic because some architectures modify
the value of $(CC)).
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
arch/ia64/kernel/setup.c: In function `show_cpuinfo':
arch/ia64/kernel/setup.c:576: warning: long unsigned int format, different type arg (arg 12)
arch/ia64/kernel/setup.c:576: warning: long unsigned int format, different type arg (arg 13)
Introduced by 95235ca2c2
Signed-off-by: Tony Luck <tony.luck@intel.com>
here is the BSP removal support for IA64. Its pretty much the same thing that
was released a while back, but has your feedback incorporated.
- Removed CONFIG_BSP_REMOVE_WORKAROUND and associated cmdline param
- Fixed compile issue with sn2/zx1 due to a undefined fix_b0_for_bsp
- some formatting nits (whitespace etc)
This has been tested on tiger and long back by alex on hp systems as well.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The first of these changes s/hotplug/uevent/ was needed to
compile sn2_defconfig (ia64/sn). The other three files
changed are blind changes of all remaining bus_type.hotplug
references I could find to bus_type.uevent.
This patch attempts to finish similar changes made in the
gregkh-driver-kill-hotplug-word-from-driver-core Nov 22 patch.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The function ia64_pci_legacy_write() returns 0 for everything
except errors. This return value gets sent back to the user from
pci_write_legacy_io(), making it look like every write fails. The trivial
patch below copies the behavior of the SGI sn machvec and does what
would be expected from something implementing a write() function.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Use the #define for ACPI_LEVEL_SENSITIVE instead of assuming
non-zero, because ACPICA 20051021 changes its value to zero.
Also, use uniform variable names:
edge_level -> triggering
active_high_low -> polarity
Signed-off-by: Len Brown <len.brown@intel.com>
sparc64, i386 and x86_64 have support for a special data section dedicated
to rarely updated data that is frequently read. The section was created to
avoid false sharing of those rarely read data with frequently written kernel
data.
This patch creates such a data section for ia64 and will group rarely written
data into this section.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Change the NR_CPUS default for ia64/sn up to 1024.
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: John Hesterberg <jh@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
I see why the problem exists only on SN. SN uses a different hardware
mechanism to purge TLB entries across nodes.
It looks like there is a bug in the SN TLB flushing code. During context switch,
kernel threads inherit the mm of the task that was previously running on the
cpu. This confuses the code in sn2_global_tlb_purge().
The result is a missed TLB purge for the task that owns the "borrowed" mm.
(I hit the problem running heavy stress where kswapd was purging code pages of
a user task that woke kswapd. The user task took a SIGILL fault trying to
execute code in the page that had been ripped out from underneath it).
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Use raw_smp_processor_id() instead of get_cpu() as we don't need the
extra features of get_cpu().
Signed-off-by: Jes Sorensen <jes@trained-monkey.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The udelay() inline for ia64 uses the ITC. If CONFIG_PREEMPT is enabled
and the platform has unsynchronized ITCs and the calling task migrates
to another CPU while doing the udelay loop, then the effective delay may
be too short or very, very long.
This patch disables preemption around 100 usec chunks of the overall
desired udelay time. This minimizes preemption-holdoffs.
udelay() is now too big to be inline, move it out of line and export it.
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
I realized ZONE_DMA32 has a trivial bug at Kconfig for ia64. In
include/linux/gfp.h on 2.6.15-rc5-mm1, CONFIG is define like followings.
#ifdef CONFIG_DMA_IS_DMA32
#define __GFP_DMA32 ((__force gfp_t)0x01) /* ZONE_DMA is ZONE_DMA32
*/
:
:
So, CONFIG_"ZONE"_DMA_IS_DMA32 is clearly wrong.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When multiple probes are registered at the same address and if due to some
recursion (probe getting triggered within a probe handler), we skip calling
pre_handlers and just increment nmissed field.
The below patch make sure it walks the list for multiple probes case.
Without the below patch we get incorrect results of nmissed count for
multiple probe case.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implemented support for the EM64T and other x86_64
processors. This essentially entails recognizing
that these processors support non-aligned memory
transfers. Previously, all 64-bit processors were assumed
to lack hardware support for non-aligned transfers.
Completed conversion of the Resource Manager to nearly
full table-driven operation. Specifically, the resource
conversion code (convert AML to internal format and the
reverse) and the debug code to dump internal resource
descriptors are fully table-driven, reducing code and data
size and improving maintainability.
The OSL interfaces for Acquire and Release Lock now use a
64-bit flag word on 64-bit processors instead of a fixed
32-bit word. (Alexey Starikovskiy)
Implemented support within the resource conversion code
for the Type-Specific byte within the various ACPI 3.0
*WordSpace macros.
Fixed some issues within the resource conversion code for
the type-specific flags for both Memory and I/O address
resource descriptors. For Memory, implemented support
for the MTP and TTP flags. For I/O, split the TRS and TTP
flags into two separate fields.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Completed a major overhaul of the Resource Manager code -
specifically, optimizations in the area of the AML/internal
resource conversion code. The code has been optimized to
simplify and eliminate duplicated code, CPU stack use has
been decreased by optimizing function parameters and local
variables, and naming conventions across the manager have
been standardized for clarity and ease of maintenance (this
includes function, parameter, variable, and struct/typedef
names.)
All Resource Manager dispatch and information tables have
been moved to a single location for clarity and ease of
maintenance. One new file was created, named "rsinfo.c".
The ACPI return macros (return_ACPI_STATUS, etc.) have
been modified to guarantee that the argument is
not evaluated twice, making them less prone to macro
side-effects. However, since there exists the possibility
of additional stack use if a particular compiler cannot
optimize them (such as in the debug generation case),
the original macros are optionally available. Note that
some invocations of the return_VALUE macro may now cause
size mismatch warnings; the return_UINT8 and return_UINT32
macros are provided to eliminate these. (From Randy Dunlap)
Implemented a new mechanism to enable debug tracing for
individual control methods. A new external interface,
acpi_debug_trace(), is provided to enable this mechanism. The
intent is to allow the host OS to easily enable and disable
tracing for problematic control methods. This interface
can be easily exposed to a user or debugger interface if
desired. See the file psxface.c for details.
acpi_ut_callocate() will now return a valid pointer if a
length of zero is specified - a length of one is used
and a warning is issued. This matches the behavior of
acpi_ut_allocate().
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
What is the value shown in "cpu MHz" of /proc/cpuinfo when CPUs are capable of
changing frequency?
Today the answer is: It depends.
On i386:
SMP kernel - It is always the boot frequency
UP kernel - Scales with the frequency change and shows that was last set.
On x86_64:
There is one single variable cpu_khz that gets written by all the CPUs. So,
the frequency set by last CPU will be seen on /proc/cpuinfo of all the
CPUs in the system. What you see also depends on whether you have constant_tsc
capable CPU or not.
On ia64:
It is always boot time frequency of a particular CPU that gets displayed.
The patch below changes this to:
Show the last known frequency of the particular CPU, when cpufreq is present. If
cpu doesnot support changing of frequency through cpufreq, then boot frequency
will be shown. The patch affects i386, x86_64 and ia64 architectures.
Signed-off-by: Venkatesh Pallipadi<venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
The patch that added support for a new platform chipset (shub2) broke
PTC deadlock recovery on older versions of the chipset. (PTCs are the
SN platform-specific method for doing a global TLB purge). This
patch fixes deadlock recovery so that it works on both the old & new
chipsets.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
We have a customer application which trips a bug. The problem arises
when a driver attempts to call do_munmap on an area which is mapped, but
because current->thread.task_size has been set to 0xC0000000, the call
to do_munmap fails thinking it is an unmap beyond the user's address
space.
The comment in fs/binfmt_elf.c in load_elf_library() before the call
to SET_PERSONALITY() indicates that task_size must not be changed for
the running application until flush_thread, but is for ia64 executing
ia32 binaries.
This patch moves the setting of task_size from SET_PERSONALITY() to
flush_thread() as indicated. The customer application no longer is able
to trip the bug.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The per-node data structures are allocated with strided offsets that are a
function of the node number. This prevents excessive cache-aliasing from
occurring.
On systems with a large number of nodes, the strided offset becomes
too large. This patch restricts the maximum offset to 32MB. This is far larger
than the size of any current L3 cache.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Altix only patch to add fixup code that sets up
pci_controller->window. This code is a temporary
fix until ACPI support on Altix is added.
Also, corrects the usage of pci_dev->sysdata,
which had previously been used to reference
platform specific device info, to now point to
a pci_controller struct.
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
http://bugzilla.kernel.org/show_bug.cgi?id=5483
ZX1 config doesn't include cpufreq, so move move acpi-processor.c
up out of ia64/cpufreq directory.
no functional changes
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Return -EINTR instead of -ERESTARTSYS when signals are delivered during
a blocked read of /proc/sal/*/event. This allows salinfo_decode to
detect signals when it is blocked on a read of those files.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This updates the sn2_defconfig file for the Altix 330 hardware, enables
the AGP graphics for the SGI Prism, and removes prompts for the remainder
of the new features. Greg Edwards reviewed the changes.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Greg Edwards <edwardsg@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Linux invokes the AML _PDC method (Processor Driver Capabilities)
to tell the BIOS what features it can handle. While the ACPI
spec says nothing about the OS invoking _PDC multiple times,
doing so with changing bits seems to hopelessly confuse the BIOS
on multiple platforms up to and including crashing the system.
Factor out the _PDC invocation so Linux invokes it only once.
http://bugzilla.kernel.org/show_bug.cgi?id=5483
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
arch/ia64/kernel/acpi-ext.c: In function `acpi_vendor_resource_match':
arch/ia64/kernel/acpi-ext.c:38: error: structure has no member named `id'
Signed-off-by: MAEDA Naoaki <maeda.naoaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
break.b always sets cr.iim to 0 and the current code tries to
get the break_num by decoding instruction. However, their
seems to be a race condition while reading the regs->cr_iip,
as on other cpu the break.b at regs->cr_iip might have been
replaced with the original instruction as a result of
unregister_kprobe() and hence decoding instruction to
obtain break_num will result in wrong value in this case.
Also includes changes to kprobes.c which now has to handle
break number zero.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
A single SGI Altix system can be divided into multiple partitions,
each running their own instance of the Linux kernel. pfn_valid()
is currently not optimal for any but the first partition, since it
does not compare the pfn with min_low_pfn before calling the more
costly ia64_pfn_valid().
Signed-off-by: Dean Roe <roe@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix a bug in kprobes that can cause an Oops or even a crash when a return
probe is installed on one of the following functions: sys_execve,
do_execve, load_*_binary, flush_old_exec, or flush_thread. The fix is to
remove the call to kprobe_flush_task() in flush_thread(). This fix has
been tested on all architectures for which the return-probes feature has
been implemented (i386, x86_64, ppc64, ia64). Please apply.
BACKGROUND
Up to now, we have called kprobe_flush_task() under two situations: when a
task exits, and when it execs. Flushing kretprobe_instances on exit is
correct because (a) do_exit() doesn't return, and (b) one or more
return-probed functions may be active when a task calls do_exit(). Neither
is the case for sys_execve() and its callees.
Initially, the mistaken call to kprobe_flush_task() on exec was harmless
because we put the "real" return address of each active probed function
back in the stack, just to be safe, when we recycled its
kretprobe_instance. When support for ppc64 and ia64 was added, this safety
measure couldn't be employed, and was eventually dropped even for i386 and
x86_64. sys_execve() and its callees were informally blacklisted for
return probes until this fix was developed.
Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The nasid_index was not being incremented if the
pointer was null, causing an infinite loop.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
The kernel's use of the for_each_*cpu(i) macros has allowed for sparse CPU
numbering. When I hacked the kernel to test sparse cpu_present_map[] and
cpu_possible_map[] cpumasks, I discovered one remaining spot, in
sn_hwperf_ioctl() during sn initialization, that needs to be fixed.
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Dean Roe <roe@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Patch to prevent sn2_ptc_init code from attempting to load on non-sn2 systems
when sn2_smp.c is built-in to generic kernel.
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Polish the comments specifically in vhpt_miss and nested_dtlb_miss
handlers. I think it's better to explicitly name each page table
level with its name instead of numerically name them. i.e., use
pgd, pud, pmd, and pte instead of referring as L1, L2, L3 etc.
Along the line, remove some magic number in the comments like:
"PTA + (((IFA(61,63) << 7) | IFA(33,39))*8)". No code change at
all, pure comment update. Feel free to shoot anything you have,
darts or tomahawk cruise missile. I will duck behind a bunker ;-)
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
From source code inspection, I think there is a bug with 4 level
page table with vhpt_miss handler. In the code path of rechecking
page table entry against previously read value after tlb insertion,
*pte value in register r18 was overwritten with value newly read
from pud pointer, render the check of new *pte against previous
*pte completely wrong. Though the bug is none fatal and the penalty
is to purge the entry and retry. For functional correctness, it
should be fixed. The fix is to use a different register so new
*pud don't trash *pte. (btw, the comments in the cmp statement is
wrong as well, which I will address in the next patch).
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Our performance validation on 2.6.15-rc1 caught a disastrous performance
regression on ia64 with netperf (-98%) and volanomark (-58%) compares to
previous kernel version 2.6.14-git7. See the following chart (result
group 1 & 2).
http://kernel-perf.sourceforge.net/results.machine_id=26.html
We have root caused it to commit 64c7c8f885
This changeset broke the ia64 task resched notification. In
sched.c:resched_task(), a reschedule IPI is conditioned upon
TIF_POLLING_NRFLAG. However, the above changeset unconditionally set
the polling thread flag for idle tasks regardless whether pal_halt_light
is in use or not. As a result, resched IPI is not sent from
resched_task(). And since the default behavior on ia64 is to use
pal_halt_light, we end up delaying the rescheduling task until next
timer tick, and thus cause the performance regression.
This fixes the performance bug. I'm glad our performance suite is
turning up bad performance bug like this in time.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
IA64 traditionally had a 4GB DMA32 zone. Set the compatibility flag
to keep old drivers working.
For new drivers it would be better to use ZONE_DMA32 now.
Cc: tony.luck@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix default VGA console on SN platforms. Since SN firmware does not pass
enough ACPI information to identify VGA cards and the associated legacy IO/MEM
addresses, we rely on the EFI PCDP table. Since the linux pcdp driver is
optional (and overridden if console= directives are used) SN duplicates a
portion of the pcdp scan code to identify if there is a usable console VGA
adapter. Additionally, dup necessary pcdp related structs to avoid dragging
drivers/pcdp.h into a more public location.
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch introduces 4-level page tables to ia64. I have run
some benchmarks and found nothing interesting. Performance has
consistently fallen within the noise range.
It also introduces a config option (setting the default to 3
levels). The config option prevents having 4 level page
tables with 64k base page size.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
XPC (as in arch/ia64/sn/kernel/xp*) has a need to notify other partitions
(SGI Altix) whenever a partition is going down in order to get them to
disengage from accessing the halting partition's memory. If this is not
done before the reset of the hardware, the other partitions can find
themselves encountering MCAs that bring them down.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce
confusion, and make their semantics rigid. Improves efficiency of
resched_task and some cpu_idle routines.
* In resched_task:
- TIF_NEED_RESCHED is only cleared with the task's runqueue lock held,
and as we hold it during resched_task, then there is no need for an
atomic test and set there. The only other time this should be set is
when the task's quantum expires, in the timer interrupt - this is
protected against because the rq lock is irq-safe.
- If TIF_NEED_RESCHED is set, then we don't need to do anything. It
won't get unset until the task get's schedule()d off.
- If we are running on the same CPU as the task we resched, then set
TIF_NEED_RESCHED and no further action is required.
- If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set
after TIF_NEED_RESCHED has been set, then we need to send an IPI.
Using these rules, we are able to remove the test and set operation in
resched_task, and make clear the previously vague semantics of
POLLING_NRFLAG.
* In idle routines:
- Enter cpu_idle with preempt disabled. When the need_resched() condition
becomes true, explicitly call schedule(). This makes things a bit clearer
(IMO), but haven't updated all architectures yet.
- Many do a test and clear of TIF_NEED_RESCHED for some reason. According
to the resched_task rules, this isn't needed (and actually breaks the
assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock
held). So remove that. Generally one less locked memory op when switching
to the idle thread.
- Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner
most polling idle loops. The above resched_task semantics allow it to be
set until before the last time need_resched() is checked before going into
a halt requiring interrupt wakeup.
Many idle routines simply never enter such a halt, and so POLLING_NRFLAG
can be always left set, completely eliminating resched IPIs when rescheduling
the idle task.
POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Con Kolivas <kernel@kolivas.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Run idle threads with preempt disabled.
Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()).
How did it ever work before?
Might fix the CPU hotplugging hang which Nigel Cunningham noted.
We think the bug hits if the idle thread is preempted after checking
need_resched() and before going to sleep, then the CPU offlined.
After calling stop_machine_run, the CPU eventually returns from preemption and
into the idle thread and goes to sleep. The CPU will continue executing
previous idle and have no chance to call play_dead.
By disabling preemption until we are ready to explicitly schedule, this bug is
fixed and the idle threads generally become more robust.
From: alexs <ashepard@u.washington.edu>
PPC build fix
From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
MIPS build fix
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some architectures define and use this type in their compat_ioctl code, but
all of them can easily use the identical ioctl_trans_handler_t type that is
defined in common code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ia64 translates normal loads and stores to special MMIO regions into I/O port
accesses. Reserve these special MMIO regions in /proc/iomem.
Sample /proc/iomem:
f8100000000-f81003fffff : PCI Bus 0000:80 I/O Ports 00000000-00000fff
f8100400000-f81007fffff : PCI Bus 0000:8e I/O Ports 00001000-00001fff
f8100800000-f8100ffffff : PCI Bus 0000:9c I/O Ports 00002000-00003fff
f8101000000-f81017fffff : PCI Bus 0000:aa I/O Ports 00004000-00005fff
and corresponding /proc/ioports:
00000000-00000fff : PCI Bus 0000:80
00001000-00001fff : PCI Bus 0000:8e
00002000-00003fff : PCI Bus 0000:9c
00004000-00005fff : PCI Bus 0000:aa
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When a page has a memory uncorrectable ECC error, the recovery
code wants to prevent the page from being reused. This change
bumps the reference count to prevent the page from getting back
on the free list.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
paddr needs to be shifted by PAGE_SHIFT to be valid
input for pfn_valid().
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The determination of whether an MCA is recoverable or not must
be based on the bits set in the PSP (Processor State Parameter).
The specific bits are shown in the Intel IA-64 Architecture Software
Developer's Manual, Vol 2, Table 11-6 Software Recovery Bits in
Processor State Parameter. Those bits should be consistent
across the entire IA-64 family of processors.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
At the moment, attempting to invoke a signal-handler on the normal
stack is guaranteed to fail if the stack-pointer happens not to be
16-byte aligned. This is because the signal-trampoline will attempt
to store fp-regs with stf.spill instructions, which will trap for
misaligned addresses. This isn't terribly useful behavior. It's
better to just always align the signal frame to the next lower 16-byte
boundary.
Signed-off-by: David Mosberger-Tang <David.Mosberger@acm.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The original memory less node allocation attempted to use NODEDATA_ALIGN for
alignment. The bootmem allocator only allows a power of two alignments. This
causes a BUG_ON for some nodes. For cpu only nodes just allocate with a
PERCPU_PAGE_SIZE alignment.
Some older firmware reports SLIT distances of 0xff and results in bestnode
not being computed. This is now treated correctly.
The failed allocation check was removed because it's redundant. The
bootmem allocator already makes this check.
This fix has been boot tested on 4 node machine which has 4 cpu only nodes
and 1 memory node. Thanks to Pete Keilty for reporting this and helping me
test it.
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
notify_die() added for MCA_{MONARCH,SLAVE,RENDEZVOUS}_{ENTER,PROCESS,LEAVE} and
INIT_{MONARCH,SLAVE}_{ENTER,PROCESS,LEAVE}. We need multiple
notification points for these events because they can take many seconds
to run which has nasty effects on the behaviour of the rest of the
system.
DIE_SS replaced by a generic DIE_FAULT which checks the vector number,
to allow interception of faults other than SS.
DIE_MACHINE_{HALT,RESTART} added to allow last minute close down
processing, especially when the halt/restart routines are called from
error handlers.
DIE_OOPS added.
The check for kprobe's break numbers has been moved from traps.c to
kprobes.c, allowing DIE_BREAK to be used for any additional break
numbers, i.e. it is no longer kprobes specific.
Hooks for kernel debuggers and kernel dumpers added, ENTER and LEAVE.
Both of these disable the system for long periods which impact on
watchdogs and heartbeat systems in general. More patches to come that
use these events to reset watchdogs and heartbeats.
unregister_die_notifier() added and both routines exported. Requested
by Dean Nelson.
Lock removed from {un,}register_die_notifier. notifier_chain_register()
already takes a lock. Also the generic notifier chain locking is being
reworked to distinguish between callbacks that can block and those that
cannot, the lock in {un,}register_die_notifier would interfere with
that change. http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
Leading white space removed from arch/ia64/kernel/kprobes.c.
Typo in mca.c in original version of this patch found & fixed by Dean
Nelson.
Signed-off-by: Keith Owens <kaos@sgi.com>
Acked-by: Dean Nelson <dcn@sgi.com>
Acked-by: Anil Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is the arch/ part of the big kfree cleanup patch.
Remove pointless checks for NULL prior to calling kfree() in arch/.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reorganize the preempt_disable/enable calls to eliminate the extra preempt
depth. Changes based on Paul McKenney's review suggestions for the kprobes
RCU changeset.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changes to the arch kprobes infrastructure to take advantage of the locking
changes introduced by usage of RCU for synchronization. All handlers are now
run without any locks held, so they have to be re-entrant or provide their own
synchronization.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
IA64 changes to track kprobe execution on a per-cpu basis. We now track the
kprobe state machine independently on each cpu using an arch specific kprobe
control block.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following set of patches are aimed at improving kprobes scalability. We
currently serialize kprobe registration, unregistration and handler execution
using a single spinlock - kprobe_lock.
With these changes, kprobe handlers can run without any locks held. It also
allows for simultaneous kprobe handler executions on different processors as
we now track kprobe execution on a per processor basis. It is now necessary
that the handlers be re-entrant since handlers can run concurrently on
multiple processors.
All changes have been tested on i386, ia64, ppc64 and x86_64, while sparc64
has been compile tested only.
The patches can be viewed as 3 logical chunks:
patch 1: Reorder preempt_(dis/en)able calls
patches 2-7: Introduce per_cpu data areas to track kprobe execution
patches 8-9: Use RCU to synchronize kprobe (un)registration and handler
execution.
Thanks to Maneesh Soni, James Keniston and Anil Keshavamurthy for their
review and suggestions. Thanks again to Anil, Hien Nguyen and Kevin Stafford
for testing the patches.
This patch:
Reorder preempt_disable/enable() calls in arch kprobes files in preparation to
introduce locking changes. No functional changes introduced by this patch.
Signed-off-by: Ananth N Mavinakayahanalli <ananth@in.ibm.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton suggested to move kprobes from kernel hacking menu, since
kernel hacking menu is in-appropriate for the Kprobes. This patch moves
Kprobes and Oprofile under instrumentation menu.
(akpm: it's not a natural fit, but things like djprobes and the s390 guys'
statistics library need a home)
Signed-of-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: John Levon <levon@movementarian.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current ia64 implementation of dma_get_cache_alignment does not work
for modules because it relies on a symbol which is not exported. Direct
access to a global is a little ugly anyway, so this patch re-implements
dma_get_cache_alignment in a manner similar to what is currently used for
x86_64.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Restrict CONFIG_SGI_SN_XP to IA64_GENERIC or IA64_SGI_SN2 kernels.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
wrap_mmu_context(), delayed_tlb_flush(), get_mmu_context() all
have an extra { } block which cause one extra indentation.
get_mmu_context() is particularly bad with 5 indentations to
the most inner "if". It finally gets on my nerve that I can't
keep the code within 80 columns. Remove the extra { } block
and while I'm at it, reformat all the comments to 80-column
friendly. No functional change at all with this patch.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Corrects the very inefficent method of finding free context_ids in
get_mmu_context(). Instead of walking the task_list of all processes,
2 bitmaps are used to efficently store and lookup state, inuse and
needs flushing. The entire rid address space is now used before calling
wrap_mmu_context and global tlb flushing.
Special thanks to Ken and Rohit for their review and modifications in
using a bit flushmap.
Signed-off-by: Peter Keilty <peter.keilty@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
I recently picked up my older work to remove unnecessary #includes of
sched.h, starting from a patch by Dave Jones to not include sched.h
from module.h. This reduces the number of indirect includes of sched.h
by ~300. Another ~400 pointless direct includes can be removed after
this disentangling (patch to follow later).
However, quite a few indirect includes need to be fixed up for this.
In order to feed the patches through -mm with as little disturbance as
possible, I've split out the fixes I accumulated up to now (complete for
i386 and x86_64, more archs to follow later) and post them before the real
patch. This way this large part of the patch is kept simple with only
adding #includes, and all hunks are independent of each other. So if any
hunk rejects or gets in the way of other patches, just drop it. My scripts
will pick it up again in the next round.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Define jiffies_64 in kernel/timer.c rather than having 24 duplicated
defines in each architecture.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make sure we always return, as all syscalls should. Also move the common
prototype to <linux/syscalls.h>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
pgdat->node_size_lock is basically only neeeded in one place in the normal
code: show_mem(), which is the arch-specific sysrq-m printing function.
Strictly speaking, the architectures not doing memory hotplug do no need this
locking in show_mem(). However, they are all included for completeness. This
should also make any future consolidation of all of the implementations a
little more straightforward.
This lock is also held in the sparsemem code during a memory removal, as
sections are invalidated. This is the place there pfn_valid() is made false
for a memory area that's being removed. The lock is only required when doing
pfn_valid() operations on memory which the user does not already have a
reference on the page, such as in show_mem().
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There was one small but very significant change in the previous patch:
mprotect's flush_tlb_range fell outside the page_table_lock: as it is in 2.4,
but that doesn't prove it safe in 2.6.
On some architectures flush_tlb_range comes to the same as flush_tlb_mm, which
has always been called from outside page_table_lock in dup_mmap, and is so
proved safe. Others required a deeper audit: I could find no reliance on
page_table_lock in any; but in ia64 and parisc found some code which looks a
bit as if it might want preemption disabled. That won't do any actual harm,
so pending a decision from the maintainers, disable preemption there.
Remove comments on page_table_lock from flush_tlb_mm, flush_tlb_range and
flush_tlb_page entries in cachetlb.txt: they were rather misleading (what
generic code does is different from what usually happens), the rules are now
changing, and it's not yet clear where we'll end up (will the generic
tlb_flush_mmu happen always under lock? never under lock? or sometimes under
and sometimes not?).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
First step in pushing down the page_table_lock. init_mm.page_table_lock has
been used throughout the architectures (usually for ioremap): not to serialize
kernel address space allocation (that's usually vmlist_lock), but because
pud_alloc,pmd_alloc,pte_alloc_kernel expect caller holds it.
Reverse that: don't lock or unlock init_mm.page_table_lock in any of the
architectures; instead rely on pud_alloc,pmd_alloc,pte_alloc_kernel to take
and drop it when allocating a new one, to check lest a racing task already
did. Similarly no page_table_lock in vmalloc's map_vm_area.
Some temporary ugliness in __pud_alloc and __pmd_alloc: since they also handle
user mms, which are converted only by a later patch, for now they have to lock
differently according to whether or not it's init_mm.
If sources get muddled, there's a danger that an arch source taking
init_mm.page_table_lock will be mixed with common source also taking it (or
neither take it). So break the rules and make another change, which should
break the build for such a mismatch: remove the redundant mm arg from
pte_alloc_kernel (ppc64 scrapped its distinct ioremap_mm in 2.6.13).
Exceptions: arm26 used pte_alloc_kernel on user mm, now pte_alloc_map; ia64
used pte_alloc_map on init_mm, now pte_alloc_kernel; parisc had bad args to
pmd_alloc and pte_alloc_kernel in unused USE_HPPA_IOREMAP code; ppc64
map_io_page forgot to unlock on failure; ppc mmu_mapin_ram and ppc64 im_free
took page_table_lock for no good reason.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ia64 has expand_backing_store function for growing its Register Backing Store
vma upwards. But more complete code for this purpose is found in the
CONFIG_STACK_GROWSUP part of mm/mmap.c. Uglify its #ifdefs further to provide
expand_upwards for ia64 as well as expand_stack for parisc.
The Register Backing Store vma should be marked VM_ACCOUNT. Implement the
intention of growing it only a page at a time, instead of passing an address
outside of the vma to handle_mm_fault, with unknown consequences.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The original vm_stat_account has fallen into disuse, with only one user, and
only one user of vm_stat_unaccount. It's easier to keep track if we convert
them all to __vm_stat_account, then free it from its __shackles.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
... and related annotations for amd64 - swiotlb code is shared, but
prototypes are not.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In arch/ia64/kernel/ptrace.c there is a test for a peek or poke of a
register image (in register backing storage).
The test can be unnecessarily long (and occurs while holding the tasklist_lock).
Especially long on a large system with thousands of active tasks.
The ptrace caller (presumably a debugger) specifies the pid of
its target and an address to peek or poke. But the debugger could be
attached to several tasks.
The idea of find_thread_for_addr() is to find whether the target address
is in the RBS for any of those tasks.
Currently it searches the thread-list of the target pid. If that search
does not find a match, and the shared mm-struct's user count indicates
that there are other tasks sharing this address space (a rare occurrence),
a search is made of all the tasks in the system.
Another approach can drastically shorten this procedure.
It depends upon the fact that in order to peek or poke from/to any task,
the debugger must first attach to that task. And when it does, the
attached task is made a child of the debugger (is chained to its children list).
Therefore we can search just the debugger's children list.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
flush_tlb_all() can be a scaling issue on large SGI Altix systems
since it uses the global call_lock and always executes on all cpus.
When a process enters flush_tlb_range() to purge TLBs for another
process, it is possible to avoid flush_tlb_all() and instead allow
sn2_global_tlb_purge() to purge TLBs only where necessary.
This patch modifies flush_tlb_range() so that this case can be handled
by platform TLB purge functions and updates ia64_global_tlb_purge()
accordingly. sn2_global_tlb_purge() now calculates the region register
value from the mm argument introduced with this patch.
Signed-off-by: Dean Roe <roe@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
bte_copy() calls calls get_nasid(), which will get flagged if
preemption if enabled. raw_smp_processor_id() is used instead.
It is OK if we migrate off node.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Eliminate the passing in of a scratch buffer used for locating the
reserved page setup for XPC.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
XPC needs to be changed to support up to 16k nasids on an SGI Altix system.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch addresses a few issues with the open/close protocol that
were revealed by the newly added disengage functionality combined
with more extensive testing.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
In arch/ia64 change the explicit use of a for-loop using NR_CPUS into the
general for_each_online_cpu() construct. This widens the scope of potential
future optimizations of the general constructs, as well as takes advantage
of the existing optimizations of first_cpu() and next_cpu(), which is
advantageous when the true CPU count is much smaller than NR_CPUS.
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
In arch/ia64 change the explicit use of for-loops and NR_CPUS into the
general for_each_cpu() or for_each_online_cpu() constructs, as
appropriate. This widens the scope of potential future optimizations
of the general constructs, as well as takes advantage of the existing
optimizations of first_cpu() and next_cpu().
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The new ia64 assembler uses slot 1 for the offset of a long (2-slot)
instruction and the old assembler uses slot 2. The 2.6 kernel assumes
slot 2 and won't boot when the new assembler is used:
http://sources.redhat.com/bugzilla/show_bug.cgi?id=1433
This patch will work with either slot 1 or 2.
Patch provided by H.J. Lu
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the "siblings" field value in /proc/cpuinfo so that it now shows the
number of siblings as seen by OS, instead of what is available from
hardware perspective.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The simscsi code at present overflows an int if it's given a large
disk image. The attached patch increases the possible size to 128G.
While it's unlikely that anyone will want to use SKI with such a
large drive, the same framework is currently being used for various
virtualisation experiments.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
changes to swiotlb.c made in commit 281dd25cdc
since this file has been moved from arch/ia64/lib/swiotlb.c to
lib/swiotlb.c
Signed-off-by: Tony Luck <tony.luck@intel.com>
This introduces a limit parameter to the core bootmem allocator; The new
parameter indicates that physical memory allocated by the bootmem
allocator should be within the requested limit.
We also introduce alloc_bootmem_low_pages_limit, alloc_bootmem_node_limit,
alloc_bootmem_low_pages_node_limit apis, but alloc_bootmem_low_pages_limit
is the only api used for swiotlb.
The existing alloc_bootmem_low_pages() api could instead have been
changed and made to pass right limit to the core allocator. But that
would make the patch more intrusive for 2.6.14, as other arches use
alloc_bootmem_low_pages(). We may be done that post 2.6.14 as a
cleanup.
With this, swiotlb gets memory within 4G for both x86_64 and ia64
arches.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Ravikiran G Thirumalai <kiran@scalex86.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I've noticed a kernel hang during a storm of CMC interrupts, which was
tracked down to the continual execution of the interrupt handler.
There's code in the CMC handler that's supposed to disable CMC
interrupts and switch to polling mode when it sees a bunch of CMCs.
Because disabling CMCs across all CPUs isn't safe in interrupt context,
the disable is done with a schedule_work(). But with continual CMC
interrupts, the schedule_work() never gets executed.
The following patch immediately disables CMC interrupts for the current
CPU. This then allows (at least) one CPU to ignore CMC interrupts,
execute the schedule_work() code, and disable CMC interrupts on the rest
of the CPUs.
Acked-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Bryan Sutula <Bryan.Sutula@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
gensparse_defconfig is a config generated file for SPARSEMEM and GENERIC
kernel configuration (defconfig).
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch is the minimal set of changes required by ia64 to use SPARSEMEM.
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
For FLATMEM contig_page_data has been made transparent to the arch code.
This patch conforms to that change.
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The patch modifies the Kconfig file to introduce the new memory model
options and other related SPARSEMEM changes. There is also a minor change
in the Makefile.
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The swiotlb implementation is shared by both IA-64 and EM64T. However,
the source itself lives under arch/ia64. This patch moves swiotlb.c
from arch/ia64/lib to lib/ and fixes-up the appropriate Makefile and
Kconfig files. No actual changes are made to swiotlb.c.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
/proc/iomem describes a block of memory as "Kernel data",
but the end address is derived from "_edata". The kernel
actually has many other sections beyond _edata. Get the
real end address from _end.
Acked-by: Khalid Aziz <khalid_aziz@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch adds a #define for SN_SAL_IOIF_PCI_SAFE and makes that the
preferred method of implementing sn_pci_legacy_read() and
sn_pci_legacy_write().
This SAL call has been present in SGI proms since version 4.10. If the
SN_SAL_IOIF_PCI_SAFE call fails, revert to the previous code for compatability
with older proms.
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Address space resources for ACPI devices have a producer/consumer
flag. All devices "consume" the indicated address space. If the
resource is marked as a "producer", the range is also passed on
to child devices.
We currently ignore this flag when setting up MMIO and I/O port
windows for PCI root bridges, so we could mistakenly interpret
a "consumed-only" range, like CSR space for the device itself,
as a window that is routed to children.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Verify the pfn is valid before calling pfn_to_page(),
and cut isolation message if nothing was done.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Wire the MCA/INIT handler stacks into DTR[2] and track them in
IA64_KR(CURRENT_STACK). This gives the MCA/INIT handler stacks the
same TLB status as normal kernel stacks. Reload the old CURRENT_STACK
data on return from OS to SAL.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The sd driver now uses scsi_execute_req() for almost everything.
scsi_execute_req() converts requests into scatterlists.
Fix the HP SCSI disk simulator to understand scatterlists for
more commands.
Without this patch the current kernel will not boot on the simulator
(the disks are always detected as having no sectors, and so cannot be
mounted).
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Move acpi_map_iosapics() from pci.c to acpi.c, since it doesn't
have anything to do with PCI.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Update comment about how ar.k0 is used. Make the initialization the
same as in start_secondary() (no functional change, just make it look
more similar).
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
User mode kexec tools expect to find information about physical
memory in /proc/iomem (as they do on x86) to validate the addresses
that the new kernel will use.
Signed-off-by: Khalid Aziz <khalid.aziz@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
With the new fdtable locking rules, you have to protect fdtable with either
->file_lock or rcu_read_lock/unlock(). There are some places where we
aren't doing either. This patch fixes those places.
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch increases the maximum number of cpus supported on IA64
to 1024. No changes are made to the default SSI size. The patch
simply allows specifying up to 1024p.
There are certainly scaling (& other) issues that also need to be
addressed!!! Additional patches will follow.....
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There were some trailing white spaces, long lines, brackets in
weird style etc. This patch cleans them up.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes some compilation warnings, mostly
trivially. acpi.c fix also noted by Kenji Kaneshige.
Signed-off-by; Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Some of the SN code & #defines related to compact nodes & IO discovery
have gotten stale over the years. This patch attempts to clean them up.
Some of the various SN MAX_xxx #defines were also unclear & misused.
The primary changes are:
- use MAX_NUMNODES. This is the generic linux #define for the number
of nodes that are known to the generic kernel. Arrays & loops
for constructs that are 1:1 with linux-defined nodes should
use the linux #define - not an SN equivalent.
- use MAX_COMPACT_NODES for MAX_NUMNODES + NUM_TIOS. This is the
number of nodes in the SSI system. Compact nodes are a hack to
get around the IA64 architectural limit of 256 nodes. Large SGI
systems have more than 256 nodes. When we upgrade to ACPI3.0,
I _hope_ that all nodes will be real nodes that are known to
the generic kernel. That will allow us to delete the notion
of "compact nodes".
- add MAX_NUMALINK_NODES for the total number of nodes that
are in the numalink domain - all partitions.
- simplified (understandable) scan_for_ionodes()
- small amount of cleanup related to cnodes
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
PNP and PNPACPI turned on
i8042 recently changed from ACPI to PNP detection. Without PNP, it
probes legacy I/O ports for the keyboard controller, which causes an
MCA on HP boxes.
Also, I'm about to remove 8250_acpi.c, so we'll need PNP to detect
non-PCI serial ports. Until 8250_acpi.c is removed, some systems
will see serial ports reported twice (once from 8250_acpi.c and again
from 8250_pnp.c). This is harmless.
PNPACPI is still marked EXPERIMENTAL, but I'm not aware of any
outstanding issues on ia64.
IDE_GENERIC turned off (except for SGI simulator, all ia64 IDE is PCI)
ide-generic probes compiled-in legacy I/O ports for IDE devices, which
again causes an MCA. It would be nicer to just get rid of all the
legacy junk from include/asm-ia64/ide.h, but that is a bit riskier
because it could break ide-cs and the HDIO_REGISTER_HWIF ioctl
(http://www.ussg.iu.edu/hypermail/linux/kernel/0508.2/0049.html).
Here's the essence of the patch:
-# CONFIG_PNP is not set
+CONFIG_PNP=y
+CONFIG_PNPACPI=y
-CONFIG_IDE_GENERIC=y
+# CONFIG_IDE_GENERIC is not set
Tested on tiger, bigsur, and zx1.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Several implementations were essentialy a common piece of C code using
the cmpxchg() macro. Put the implementation in one spot that everyone
can share, and convert sparc64 over to using this.
Alpha is the lone arch-specific implementation, which codes up a
special fast path for the common case in order to avoid GP reloading
which a pure C version would require.
Signed-off-by: David S. Miller <davem@davemloft.net>
Machine vector selection has always been a bit of a hack given how
early in system boot it needs to be done. Services like ACPI namespace
are not available and there are non-trivial problems to moving them to
early boot. However, there's no reason we can't change to a different
machvec later in boot when the services we need are available. By
adding a entry point for later initialization of the swiotlb, we can add
an error path for the hpzx1 machevec initialization and fall back to the
DIG machine vector if IOMMU hardware isn't found in the system. Since
ia64 uses 4GB for zone DMA (no ISA support), it's trivial to allocate a
contiguous range from the slab for bounce buffer usage.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Pavel Emelianov and Kirill Korotaev observe that fs and arch users of
security_vm_enough_memory tend to forget to vm_unacct_memory when a
failure occurs further down (typically in setup_arg_pages variants).
These are all users of insert_vm_struct, and that reservation will only
be unaccounted on exit if the vma is marked VM_ACCOUNT: which in some
cases it is (hidden inside VM_STACK_FLAGS) and in some cases it isn't.
So x86_64 32-bit and ppc64 vDSO ELFs have been leaking memory into
Committed_AS each time they're run. But don't add VM_ACCOUNT to them,
it's inappropriate to reserve against the very unlikely case that gdb
be used to COW a vDSO page - we ought to do something about that in
do_wp_page, but there are yet other inconsistencies to be resolved.
The safe and economical way to fix this is to let insert_vm_struct do
the security_vm_enough_memory check when it finds VM_ACCOUNT is set.
And the MIPS irix_brk has been calling security_vm_enough_memory before
calling do_brk which repeats it, doubly accounting and so also leaking.
Remove that, and all the fs and arch calls to security_vm_enough_memory:
give it a less misleading name later on.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-Off-By: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If we use 64bit kernel on ia64/x86_64/s390 architecture, and we run
32bit binary on 32bit compatibility mode, sendfile system call seems be
not set offset argument.
This is because sendfile's return value is not zero but the code regards
the result by return value is zero or not.
This problem will be affect to ia64/x86_64/s390 and not affect to other
architecture does not affect other architecture (mips/parisc/ppc64/sparc64).
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Delete the special case unwind code that was only used by the old
MCA/INIT handler.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The bulk of the change. Use per cpu MCA/INIT stacks. Change the SAL
to OS state (sos) to be per process. Do all the assembler work on the
MCA/INIT stacks, leaving the original stack alone. Pass per cpu state
data to the C handlers for MCA and INIT, which also means changing the
mca_drv interfaces slightly. Lots of verification on whether the
original stack is usable before converting it to a sleeping process.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reading the INIT record from SAL during the INIT event has proved to be
unreliable, and a source of hangs during INIT processing. The new
MCA/INIT handlers remove the need to get the INIT record from SAL.
Change salinfo.c so mca.c can just flag that a new record is available,
without having to read the record during INIT processing. This patch
can be applied without the new MCA/INIT handlers.
Also clean up some usage of NR_CPUS which should have been using
cpu_online().
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When introducing the generic asm-offsets.h support the dependency
chain for the prepare targets was changed. All build scripts expecting
include/asm/asm-offsets.h to be made when using the prepare target would broke.
With the limited number of prepare targets left in arch Makefiles
the trivial solution was to introduce a new arch specific target: archprepare
The dependency chain looks like this now:
prepare
|
+--> prepare0
|
+--> archprepare
|
+--> scripts_basic
+--> prepare1
|
+---> prepare2
|
+--> prepare3
So prepare 3 is processed before prepare2 etc.
This guaantees that the asm symlink, version.h, scripts_basic
are all updated before archprepare is processed.
prepare0 which build the asm-offsets.h file will need the
actions performed by archprepare.
The head target is now named prepare, because users scripts will most
likely use that target, but prepare-all has been kept for compatibility.
Updated Documentation/kbuild/makefiles.txt.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
This patch (written by me and also containing many suggestions of Arjan van
de Ven) does a major cleanup of the spinlock code. It does the following
things:
- consolidates and enhances the spinlock/rwlock debugging code
- simplifies the asm/spinlock.h files
- encapsulates the raw spinlock type and moves generic spinlock
features (such as ->break_lock) into the generic code.
- cleans up the spinlock code hierarchy to get rid of the spaghetti.
Most notably there's now only a single variant of the debugging code,
located in lib/spinlock_debug.c. (previously we had one SMP debugging
variant per architecture, plus a separate generic one for UP builds)
Also, i've enhanced the rwlock debugging facility, it will now track
write-owners. There is new spinlock-owner/CPU-tracking on SMP builds too.
All locks have lockup detection now, which will work for both soft and hard
spin/rwlock lockups.
The arch-level include files now only contain the minimally necessary
subset of the spinlock code - all the rest that can be generalized now
lives in the generic headers:
include/asm-i386/spinlock_types.h | 16
include/asm-x86_64/spinlock_types.h | 16
I have also split up the various spinlock variants into separate files,
making it easier to see which does what. The new layout is:
SMP | UP
----------------------------|-----------------------------------
asm/spinlock_types_smp.h | linux/spinlock_types_up.h
linux/spinlock_types.h | linux/spinlock_types.h
asm/spinlock_smp.h | linux/spinlock_up.h
linux/spinlock_api_smp.h | linux/spinlock_api_up.h
linux/spinlock.h | linux/spinlock.h
/*
* here's the role of the various spinlock/rwlock related include files:
*
* on SMP builds:
*
* asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the
* initializers
*
* linux/spinlock_types.h:
* defines the generic type and initializers
*
* asm/spinlock.h: contains the __raw_spin_*()/etc. lowlevel
* implementations, mostly inline assembly code
*
* (also included on UP-debug builds:)
*
* linux/spinlock_api_smp.h:
* contains the prototypes for the _spin_*() APIs.
*
* linux/spinlock.h: builds the final spin_*() APIs.
*
* on UP builds:
*
* linux/spinlock_type_up.h:
* contains the generic, simplified UP spinlock type.
* (which is an empty structure on non-debug builds)
*
* linux/spinlock_types.h:
* defines the generic type and initializers
*
* linux/spinlock_up.h:
* contains the __raw_spin_*()/etc. version of UP
* builds. (which are NOPs on non-debug, non-preempt
* builds)
*
* (included on UP-non-debug builds:)
*
* linux/spinlock_api_up.h:
* builds the _spin_*() APIs.
*
* linux/spinlock.h: builds the final spin_*() APIs.
*/
All SMP and UP architectures are converted by this patch.
arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via
crosscompilers. m32r, mips, sh, sparc, have not been tested yet, but should
be mostly fine.
From: Grant Grundler <grundler@parisc-linux.org>
Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU).
Builds 32-bit SMP kernel (not booted or tested). I did not try to build
non-SMP kernels. That should be trivial to fix up later if necessary.
I converted bit ops atomic_hash lock to raw_spinlock_t. Doing so avoids
some ugly nesting of linux/*.h and asm/*.h files. Those particular locks
are well tested and contained entirely inside arch specific code. I do NOT
expect any new issues to arise with them.
If someone does ever need to use debug/metrics with them, then they will
need to unravel this hairball between spinlocks, atomic ops, and bit ops
that exist only because parisc has exactly one atomic instruction: LDCW
(load and clear word).
From: "Luck, Tony" <tony.luck@intel.com>
ia64 fix
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjanv@infradead.org>
Signed-off-by: Grant Grundler <grundler@parisc-linux.org>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Mikael Pettersson <mikpe@csd.uu.se>
Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This converts the final 20 DEFINE_SPINLOCK holdouts. (another 580 places
are already using DEFINE_SPINLOCK). Build tested on x86.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In order for the RCU to work, the file table array, sets and their sizes must
be updated atomically. Instead of ensuring this through too many memory
barriers, we put the arrays and their sizes in a separate structure. This
patch takes the first step of putting the file table elements in a separate
structure fdtable that is embedded withing files_struct. It also changes all
the users to refer to the file table using files_fdtable() macro. Subsequent
applciation of RCU becomes easier after this.
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For architecture like ia64, the switch stack structure is fairly large
(currently 528 bytes). For context switch intensive application, we found
that significant amount of cache misses occurs in switch_to() function.
The following patch adds a hook in the schedule() function to prefetch
switch stack structure as soon as 'next' task is determined. This allows
maximum overlap in prefetch cache lines for that structure.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Delete obsolete stuff from arch Makefile
Rename file to asm-offsets.h
The trick used in the arch Makefile to circumvent the circular
dependency is kept.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Increase the value for the maximum physical address on SN systems.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
These are SN2 only drivers. They should have platform checks to prevent
them from doing evil stuff in GENERIC kernels.
Signed-off-by: Martin Hicks <mort@sgi.com>
Acked-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Turn off the QLOGIC_FC driver. Supposedly qla2xxx should support
these devices. Do any ia64 machines have one of these devices as
the boot device?
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
New version leaves the original memory map unmodified.
Also saves any granule trimmings for use by the uncached
memory allocator.
Inspired by Khalid Aziz (various traces of his patch still
remain). Fixes to uncached_build_memmap() and sn2 testing
by Martin Hicks.
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch fixes a race condition where in system used to hang or sometime
crash within minutes when kprobes are inserted on ISR routine and a task
routine.
The fix has been stress tested on i386, ia64, pp64 and on x86_64. To
reproduce the problem insert kprobes on schedule() and do_IRQ() functions
and you should see hang or system crash.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch addresses a potential race condition for a case where Kprobe has
been removed right after another CPU has taken a break hit.
The way this is addressed here is when the CPU that has taken a break hit
does not find its corresponding kprobe, then we check to see if the
original instruction got replaced with other than break. If it got
replaced with other than break instruction, then we continue to execute
from the replaced instruction, else if we find that it is still a break,
then we let the kernel handle this, as this might be the break instruction
inserted by other than kprobe(may be kernel debugger).
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the ia64 architecture specific changes to prevent the
possible race conditions.
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch converts kcalloc(1, ...) calls to use the new kzalloc() function.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
64 bit architectures all implement their own compatibility sys_open(),
when in fact the difference is simply not forcing the O_LARGEFILE
flag. So use the a common function instead.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I've already sent this to the maintainers, and this is now being sent to a
larger community audience. I have fixed a problem with the ia64 version of
build_sched_domains(), but a similar fix still needs to be made to the
generic build_sched_domains() in kernel/sched.c.
The "dynamic sched domains" functionality has recently been merged into
2.6.13-rcN that sees the dynamic declaration of a cpu-exclusive (a.k.a.
"isolated") cpuset and rebuilds the CPU Scheduler sched domains and sched
groups to separate away the CPUs in this cpu-exclusive cpuset from the
remainder of the non-isolated CPUs. This allows the non-isolated CPUs to
completely ignore the isolated CPUs when doing load-balancing.
Unfortunately, build_sched_domains() expects that a sched domain will
include all the CPUs of each node in the domain, i.e., that no node will
belong in both an isolated cpuset and a non-isolated cpuset. Declaring a
cpuset that violates this presumption will produce flawed data structures
and will oops the kernel.
To trigger the problem (on a NUMA system with >1 CPUs per node):
cd /dev/cpuset
mkdir newcpuset
cd newcpuset
echo 0 >cpus
echo 0 >mems
echo 1 >cpu_exclusive
I have fixed this shortcoming for ia64 NUMA (with multiple CPUs per node).
A similar shortcoming exists in the generic build_sched_domains() (in
kernel/sched.c) for NUMA, and that needs to be fixed also. The fix
involves dynamically allocating sched_group_nodes[] and
sched_group_allnodes[] for each invocation of build_sched_domains(), rather
than using global arrays for these structures. Care must be taken to
remember kmalloc() addresses so that arch_destroy_sched_domains() can
properly kfree() the new dynamic structures.
Signed-off-by: John Hawkes <hawkes@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When handling writes to /proc/irq, current code is re-programming rte
entries directly. This is not recommended and could potentially cause
chipset's to lockup, or cause missing interrupts.
CONFIG_IRQ_BALANCE does this correctly, where it re-programs only when the
interrupt is pending. The same needs to be done for /proc/irq handling as well.
Otherwise user space irq balancers are really not doing the right thing.
- Changed pending_irq_balance_cpumask to pending_irq_migrate_cpumask for
lack of a generic name.
- added move_irq out of IRQ_BALANCE, and added this same to X86_64
- Added new proc handler for write, so we can do deferred write at irq
handling time.
- Display of /proc/irq/XX/smp_affinity used to display CPU_MASKALL, instead
it now shows only active cpu masks, or exactly what was set.
- Provided a common move_irq implementation, instead of duplicating
when using generic irq framework.
Tested on i386/x86_64 and ia64 with CONFIG_PCI_MSI turned on and off.
Tested UP builds as well.
MSI testing: tbd: I have cards, need to look for a x-over cable, although I
did test an earlier version of this patch. Will test in a couple days.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Zwane Mwaikambo <zwane@holomorphy.com>
Grudgingly-acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Coywolf Qi Hunt <coywolf@lovecn.org>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Resend using accessors instead of volatile qualifiers per hch comments, and
easier to understand convenience macros per rja comments.
Patch to apply volatile semantics when accessing MMR's in various SN files.
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The config option 'CONFIG_ACPI_DEALLOCATE_IRQ' is no longer
needed. This patch removes it.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The reenabling of psr.ic should really belong to dtr mapping code block.
It make the fall through code fast since it doesn't need to execute the
predicated-off instruction. Logically make more sense as well since psr.ic
was turned off in .map code block.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The exception handler in copy user always expects fault occurs only on
user space address and the fall back recovery code is written with that
very assumption in mind. Recent source code inspection revealed that
while it worked splendid and to the expectation under normal circumstances,
It broke down under unexpected condition where some address calculation
might go outside the legal address range the original copy_user was
called for. This patch is to make copy_user exception handler more robust
and to prevent potential memory corruption.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When XPC is being shutdown (i.e., rmmod, reboot) it doesn't ensure that
other partitions with whom it was connected have completely disengaged
from any attempt at cross-partition memory references. This can lead to
MCAs in any of these other partitions when the partition is reset.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When copying data from user-space to kernel-space by __copy_user(),
a page_not_present fault sometimes occurs at vmalloced kernel address
because of VHPT pre-fetching.
Ignore the page_not_present fault in ia64_do_page_fault() before
jumping into exception handlers.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
1) workaround a h/w reset issue
2) to improve the determination of FPGA-based h/w in
the arch/ia64/sn/kernel/tiocx code.
Signed-off-by: Bruce Losure <blosure@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Change sn2-specific calls into generic functions. Without this change
the uncached allocator will not work on non-sn2 platforms.
Signed-off-by: Greg Edwards <edwardsg@sgi.com>
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
- notifying the PROM of specific features that are supported by the OS.
This is used to enable PROM feature if and only if the corresponding
feature is implemented in the OS
- fetch feature sets that are supported by the current PROM. This allows
the OS to selectively enable features when the PROM support is available.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
I've solved the problem I was having with the simulator and not
booting Debian.
The problem is that the number of bits for the virtual linear array
short-format VHPT (Virtually mapped linear page table, VMLPT for
short) is being tested incorrectly.
There are two problems:
1. The PAL call that should tell the kernel the size of the
virtual address space isn't implemented for the simulator, so
the kernel uses the default 50. This is addressed separately
in dc90e95f31
2. In arch/ia64/mm/init.c there's code to calcualte the size
of the VMLPT based on the number of implemented virtual address
bits and the page size. It checks to see if the VMLPT base
address overlaps the top of the mapped region, but this check
doesn't allow for the address space hole, and in fact will
never trigger.
Here's an alternative test and panic, that I think is more accurate.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Not all of the PAL VM calls are implemented for the SKI simulator.
Don't just give up if one fails, print information from the calls
that succeed.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch implements PAL_VM_SUMMARY (and PAL_MEM_ATTRIB for good
measure) and pretends that the simulated machine is a McKinley.
Some extra comments and clean-up by Tony Luck.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
It has been reported that the way Linux handles NODEFER for signals is
not consistent with the way other Unix boxes handle it. I've written a
program to test the behavior of how this flag affects signals and had
several reports from people who ran this on various Unix boxes,
confirming that Linux seems to be unique on the way this is handled.
The way NODEFER affects signals on other Unix boxes is as follows:
1) If NODEFER is set, other signals in sa_mask are still blocked.
2) If NODEFER is set and the signal is in sa_mask, then the signal is
still blocked. (Note: this is the behavior of all tested but Linux _and_
NetBSD 2.0 *).
The way NODEFER affects signals on Linux:
1) If NODEFER is set, other signals are _not_ blocked regardless of
sa_mask (Even NetBSD doesn't do this).
2) If NODEFER is set and the signal is in sa_mask, then the signal being
handled is not blocked.
The patch converts signal handling in all current Linux architectures to
the way most Unix boxes work.
Unix boxes that were tested: DU4, AIX 5.2, Irix 6.5, NetBSD 2.0, SFU
3.5 on WinXP, AIX 5.3, Mac OSX, and of course Linux 2.6.13-rcX.
* NetBSD was the only other Unix to behave like Linux on point #2. The
main concern was brought up by point #1 which even NetBSD isn't like
Linux. So with this patch, we leave NetBSD as the lonely one that
behaves differently here with #2.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
copy_page.o appeared twice in arch/ia64/lib/Makefile. The
one in global lib-y is wrong where it should be just in
lib-$(CONFIG_ITANIUM).
Both copy_page.o and copy_page_mck.o are build for Itanium2
processor and the link order will pick up the low performing
copy_page function (originally written for itanium processor).
In this case, we really want the copy_page_mck.o for optimized
version.
Signed-off-by: Kenneth Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Patch to support P-state transitions on ia64. This driver is based on ACPI,
and uses the ACPI processor driver interface to find out the P-state support
information for the processor. This driver plugs into generic cpufreq
infrastructure.
Once this driver is loaded successfully, ondemand/userspace governor can be
used to change the CPU frequency dynamically based on load or on request from
userspace process.
Refer :
ACPI specification -
http://www.acpi.info
P-state related PAL calls -
http://developer.intel.com/design/itanium/downloads/24869909.pdf
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
bte_copy() calls calls smp_processor_id(), which will get flagged if
preemption if enabled. raw_smp_processor_id() is used instead
because we are just using it to pick a BTE interface and are not
tied to a specific cpu.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Altix patch to abstract irq_affinity down to the pci provider level since
different SGI hardware implements this in different ways.
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Delete the ability to build an ACPI kernel that does
not include PCI support. When such a machine is created
and it requires a tuned kernel, send a patch.
http://bugzilla.kernel.org/show_bug.cgi?id=1364
Signed-off-by: Len Brown <len.brown@intel.com>
Build issues were mostly in the ACPI=n case -- don't do that.
Select ACPI from IA64_GENERIC.
Add some missing dependencies on ACPI.
Mark BLACKLIST_YEAR and some laptop-only ACPI drivers
as X86-only. Let me know when you get an IA64 Laptop.
Signed-off-by: Len Brown <len.brown@intel.com>
Clean up of SGI SN partitioning related code.
The SN_SAL_GET_SN_INFO SAL call returns the partition ID, making
the SN_SAL_SYSCTL_PARTITION_GET SAL call redundant. Remove sn_partid
and use sn_partition_id.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Update the SN pci device info to use the nearest node function
to allocate driver memory on the nearest node (rather than
defaulting to node 0).
Signed-off-by: Mark Goodwin <markgw@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add a new exported function for determining the nearest node
with CPUs for I/O nodes and fix a bug where the hwperf dynamic
misc device was being registered before misc_init().
Signed-off-by: Mark Goodwin <markgw@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Bugfix to export PCI topology information in /proc/sgi_sn/sn_topology.
Signed-off-by: Mark Goodwin <markgw@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Currently, region numbers are defined in several files, with several
names. For example, we have REGION_KERNEL in asm/page.h and
RGN_KERNEL in pgtable.h
We also have address definitions that should depend on the
RGN_XXX macros, but are currently just long constants.
The following patch reorganises all the definitions so that they have
the same form (RGN_XXX), are in one place, and that addresses that
depend on RGN_XXX are derived from them.
(This is a necessary but not sufficient patch to allow UML-like
operation on IA64).
Thanks to David Mosberger for catching the change I missed in mmu_context.h.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Kernel 2.6 doesn't support egcs, and I didn't find any user of this
function.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Removed IA64 architecture specific users of asm/segment.h
The removal of asm-ia64/segment.h itself can wait until all
of the kernel source has been purged of references.
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
pcibios_bus_to_resource is exported on all architectures except ia64
and sparc. Add exports for the two missing architectures. Needed when
Yenta socket support is compiled as a module.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Thanks to Stephane, we've now worked out the real cause of the
`Linux will not boot on simulator' problem. Turns out it's a stack
overflow because the stack pointer wasn't being initialised properly
in boot_head.S (it was being initialised to the lowest instead of the
highest address of the stack, so the first push started to overwrite
data in the BSS).
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix swiotlb sizing to match what the comments and the kernel
parameters documentation indicate. Given a default 16k page size kernel
(ia64) and a 2k swiotlb page size, we're off by a multiple of 8 trying
to size the swiotlb. When specified on the boot line, the swiotlb is
made 8x bigger than requested. When left to the default value, it's 8x
smaller than the comments indicate. For x86_64 the multiplier would be
2x. The patch below fixes this. Now, what's a good default swiotlb
size? Apparently we don't really need 64MB.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
After building a fresh tree with gcc 4 I can't boot the simulator as
the bootloader loader dies with
loading /home/ianw/kerntest/kerncomp//build/sim_defconfig/vmlinux...
failed to read phdr
After some investigation I believe this is do with differences between
the alignment of variables on the stack between gcc 3 and 4 and the
ski simulator. If you trace through with the simulator you can see
that the disk_stat structure value returned from the SSC_WAIT_COMPLETION
call seems to be only half loaded. I guess it doesn't like the alignment
of the input.
Signed-off-by: Ian Wienand <ianw@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Shub2 provides a much improved mechanism for issuing internode
TLB purges. Add code to support the newer mechanism. There is also
some debug code (disabled) that is useful for testing.
Collect statistics on the number, type & duration of TLB purges.
This data will be useful for making future improvements in the algorithms.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Change the BTE driver so that it works for both shub1 and
shub2. Most of the changes are related to the number of cores
that use the BTE engine, to the MMR addresses of various
shub registers, and to using the correct processor or network
physical address.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Disable some shub1-specific code when running on systems with shub2.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Update the addresses of the pio_write_status_addr so that
they are correct for newer processors. Shub2 did not number
the threads in the order that I had expected.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Use local SHUB alias space when referencing MMRs that are known
to be node local. There is a slight performance benefit & code
simplification.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Just `make oldconfig' doesn't help for the zx1 defconfig ---
because we need the MPT Fusion drivers, which are picked up as not
selected.
Tested on HP ZX2000 and ZX2600.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The main issue is that bus_fixup calls may potentially call
functions that require a valid bus->sysdata pointer. Since
this is the case, we must set the bus->sysdata pointer before
calling the bus_fixup functions. The remaining changes are
simple fixes to make sure memory is cleaned up in the function.
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The current one doesn't even make sense anymore on i386 where it
apparently came from.
Follow-up wordsmithing by Matthew Wilcox and Tony Luck.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Altix patch to add an SN pci provider for TIOCE, which is SGI's
PCI Express implementation.
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Altix patch to add TIO "huge-window" address support to sn_dma_flush().
Update copyright in affected files.
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Altix patch to abstract the force_interrupt() mechanism away from the
pcibr provider.
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cosmetic altix patch to rename SGI_PCIBR_ERROR to something more generic and
remove a duplicate #define.
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The PFM_LOAD_CONTEXT may fail silently and cause a session
to remain reserved even though it should not. This can happen
when the commands succeeds in reserving the session but fails
when it actually tries to attach to the load_pid. In that case,
the command has failed but will return 0. More importantly,
the session will remain reserved. This patch fixes the problem.
Signed-off-by: <stephane.eranian@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
this changeset broke the "nohalt" kernel boot option.
8df5a500a3
default_idle() is looking at new variable can_do_pal_halt. However,
that variable did not get cleared upon "nohalt" boot option. Result
is that "nohalt" option is ignored until perfmon is exercised.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
error condition is passed along by acpi_register_gsi().
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Current acpi_register_gsi() function has no way to indicate errors to its
callers even though acpi_register_gsi() can fail to register gsi because of
some reasons (out of memory, lack of interrupt vectors, incorrect BIOS, and so
on). As a result, caller of acpi_register_gsi() cannot handle the case that
acpi_register_gsi() fails. I think failure of acpi_register_gsi() should be
handled properly.
This series of patches changes acpi_register_gsi() to return negative value on
error, and also changes callers of acpi_register_gsi() to handle failure of
acpi_register_gsi().
This patch changes the type of return value of acpi_register_gsi() from
"unsigned int" to "int" to indicate an error. If acpi_register_gsi() fails to
register gsi, it returns negative value.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
This removes sys_set_zone_reclaim() for now. While i'm sure Martin is
trying to solve a real problem, we must not hard-code an incomplete and
insufficient approach into a syscall, because syscalls are pretty much
for eternity. I am quite strongly convinced that this syscall must not
hit v2.6.13 in its current form.
Firstly, the syscall lacks basic syscall design: e.g. it allows the
global setting of VM policy for unprivileged users. (!) [ Imagine an
Oracle installation and a SAP installation on the same NUMA box fighting
over the 'optimal' setting for this flag. What will they do? Will they
try to set the flag to their own preferred value every second or so? ]
Secondly, it was added based on a single datapoint from Martin:
http://marc.theaimsgroup.com/?l=linux-mm&m=111763597218177&w=2
where Martin characterizes the numbers the following way:
' Run-to-run variability for "make -j" is huge, so these numbers aren't
terribly useful except to see that with reclaim the benchmark still
finishes in a reasonable amount of time. '
in other words: the fundamental problem has likely not been solved, only
a tendential move into the right direction has been observed, and a
handful of numbers were picked out of a set of hugely variable results,
without showing the variability data. How much variance is there
run-to-run?
I'd really suggest to first walk the walk and see what's needed to get
stable & predictable kernel compilation numbers on that NUMA box, before
adding random syscalls to tune a particular aspect of the VM ... which
approach might not even matter once the whole picture has been analyzed
and understood!
The third, most important point is that the syscall exposes VM tuning
internals in a completely unstructured way. What sense does it make to
have a _GLOBAL_ per-node setting for 'should we go to another node for
reclaim'? If then it might make sense to do this per-app, via numalib or
so.
The change is minimalistic in that it doesnt remove the syscall and the
underlying infrastructure changes, only the user-visible changes. We
could perhaps add a CAP_SYS_ADMIN-only sysctl for this hack, a'ka
/proc/sys/vm/swappiness, but even that looks quite counterproductive
when the generic approach is that we are trying to reduce the number of
external factors in the VM balance picture.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
unwind.c can read the wrong unat bits from switch_stack.
sw->caller_unat is the value of ar.unat when the task was blocked.
sw->ar_unat is the value of ar.unat after doing st8.spill for r4-7.
IOW, ar_unat is caller_unat with 4 bits changed.
unw_access_gr() uses sw->ar_unat for r4-7 (correct), but it also uses
sw->ar_unat for other scratch registers (incorrect). sw->ar_unat
should only be used for r4-7, everything else should use
sw->caller_unat, unless modified by unwind info. Using sw->ar_unat
risks picking up the 4 bits that were overwritten when r4-7 were saved.
Also this line is wrong
unw.sw_off[unw.preg_index[UNW_REG_PFS]] = SW(AR_UNAT);
and should be
unw.sw_off[unw.preg_index[UNW_REG_PFS]] = SW(AR_PFS);
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Here's the patch again to fix the code to handle if the values between
MAX_USER_RT_PRIO and MAX_RT_PRIO are different.
Without this patch, an SMP system will crash if the values are
different.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
machine_restart, machine_halt and machine_power_off are machine
specific hooks deep into the reboot logic, that modules
have no business messing with. Usually code should be calling
kernel_restart, kernel_halt, kernel_power_off, or
emergency_restart. So don't export machine_restart,
machine_halt, and machine_power_off so we can catch buggy users.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
XPC calls smp_processor_id() twice from xpc_setup_infrastructure() with
preemption enabled, which gets flagged if 'DEBUG_PREEMPT=y'. This patch
replaces the two calls to smp_processor_id() by a single call to
raw_smp_processor_id() since any CPU within the partition will do.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The Altix subarch does not provide node information via ACPI. Instead hooks
are used to fixup pci structures. This patch determines the nodes for Altix
PCI busses.
Remote Bridges:
---------------
Altix supports remote I/O nodes without memory or processors but with bridges.
The TIOCA type of bridge is an AGP bridge and the PROM provides information
about the closest node. That information will be returned by pcibus_to_node.
The TIOCP remote bridge type is a PCI bridge but the PROM does not provide a
closest node id. pcibus_to_node will return -1 for devices on those bridges
meaning that device control structures may be allocated on any node.
Safeguard:
----------
Should the fixups result in invalid node information for a pci controller then
a warning will be printed and pcibus_to_node will return -1.
This patch also fixes the "FIXME" in sn_dma_alloc_coherent. This means that
dma_alloc_coherent will now use alloc_pages_node to allocate memory local to
the node that the PCI device is connected to.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Check with PAL to see what the i-cache line size is for
each level of the cache, and so use the correct stride
when flushing the cache.
Acked-by: David Mosberger
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes the CONFIG_IA64_SGI_SN_SIM option entirely, allowing
any kernel bootable on sn2 to also be booted in the simulator.
Boot tested on Altix and HP rx2600.
Signed-off-by: Greg Edwards <edwardsg@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
pcibus_to_node provides a way for the Linux kernel to identify to which
node a certain pcibus connects to. Allocations of control structures
for devices can then be made on the node where the pci bus is located
to allow local access during interrupt and other device manipulation.
This patch provides a new "node" field in the the pci_controller
structure. The node field will be set based on ACPI information (thanks
to Alex Williamson <alex.williamson@hp.com for that piece).
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Create a new top-level menu named "Networking" thus moving
net related options and protocol selection way from the drivers
menu and up on the top-level where they belong.
To implement this all architectures has to source "net/Kconfig" before
drivers/*/Kconfig in their Kconfig file. This change has been
implemented for all architectures.
Device drivers for ordinary NIC's are still to be found
in the Device Drivers section, but Bluetooth, IrDA and ax25
are located with their corresponding menu entries under the new
networking menu item.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ACPI 3.0 added a Correctable Platform Error Interrupt (CPEI)
Processor Overide flag to MADT.Platform_Interrupt_Source.
Record the processor that was provided as hint from ACPI.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Current assign_irq_vector() will panic if interrupt vectors is running
out. But I think how to handle the case of lack of interrupt vectors
should be handled by the caller of this function. For example, some
PCI devices can raise the interrupt signal via both MSI and I/O
APIC. So even if the driver for these device fails to allocate a
vector for MSI, the driver still has a chance to use I/O APIC based
interrupt. But currently there is no chance for these driver to use
I/O APIC based interrupt because kernel will panic when
assign_irq_vector() fails to allocate interrupt vector.
The following patch changes assign_irq_vector() for ia64 to return
-ENOSPC on error instead of panic (as i386 and x86_64 versions do).
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Description: Replace schedule_timeout() with msleep_interruptible() to
guarantee the task delays as expected.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
changing CONFIG_LOCALVERSION rebuilds too much, for no appearent reason.
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Jesse Barnes provided the original version of this patch months ago, but
other changes kept conflicting with it, so it got deferred. Greg Edwards
dug it out of obscurity just over a week ago, and almost immediately
another conflicting patch appeared (Bob Picco's memory-less nodes).
I've resolved the conflicts and got it running again. CONFIG_SGI_TIOCX
is set to "y" in defconfig, which causes a Tiger to not boot (oops in
tiocx_init). But that can be resolved later ... get this in now before it
gets stale again.
Signed-off-by: Tony Luck <tony.luck@intel.com>
I reworked how nodes with only CPUs are treated. The patch below seems
simpler to me and has eliminated the complicated routine
reassign_cpu_only_nodes. There isn't any longer the requirement
to modify ACPI NUMA information which was in large part the
complexity introduced in reassign_cpu_only_nodes.
This patch will produce a different number of nodes. For example,
reassign_cpu_only_nodes would reduce two CPUonly nodes and one memory node
configuration to one memory+CPUs node configuration. This patch
doesn't change the number of nodes which means the user will see three. Two
nodes without memory and one node with all the memory.
While doing this patch, I noticed that early_nr_phys_cpus_node isn't serving
any useful purpose. It is called once in find_pernode_space but the value
isn't used to computer pernode space.
Signed-off-by: bob.picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
restore_sigcontext calls ia64_set_local_fpu_owner() which requires that
preempt be disabled.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch is the SGI hotplug driver and additional changes required for
the driver. These modifications include changes to the SN io_init.c code
for memory management, the inclusion of new SAL calls to enable and disable
PCI slots, and a hotplug-style driver.
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch is a rewrite of the code to check the PROM version. The current
code has some deficiences in the way PROM comparisons were made. The minimum
value of PROM that will boot has also been changed to 4.04.
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch moves header files out of the arch/ia64/sn directories and into
include/asm-ia64/sn. These files were being included by other subsystems
and should be under include/asm-ia64/sn.
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch fixes the SN IRQ code such that cpu affinity and
Hotplug can modify IRQ values. The sn_irq_info structures are now locked
using a RCU lock mechanism to avoid lock contention in the lost interrupt
WAR code.
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The following renames arch_init, a kprobes function for performing any
architecture specific initialization, to arch_init_kprobes in order to
cleanup the namespace.
Also, this patch adds arch_init_kprobes to sparc64 to fix the sparc64 kprobes
build from the last return probe patch.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's another problem shown up by Ingo's recent patch to make
smp_processor_id() complain if it's called with preemption enabled.
local_finish_flush_tlb_mm() calls activate_context() in a situation
where it could be rescheduled to another processor. This patch
disables preemption around the call.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Patch makes it possible to use the "F4" function key to do
magic sysrq in the HP Ski simulator.
Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch fixes an ordering issue between the init code for the
tiocx bus driver and tiocx-related device drivers. Also adds
a new brick to the list of known FPGA bricks.
Signed-off-by: Bruce Losure <blosure@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch is a sparse compile cleanup of tioca_provider.c, sn_hwperf.h, and
tioca_provider.h. Each of these files had sparse warnings when
compiled.
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch changes some macros that are used when running kernel on the
SGI simulator.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch is a sparse compile cleanup of shub_mmr.h using both the defconfig
and the sn2_defconfig config files.
The issue with this file was the missing usage of __IA64_UL_CONST wrapper.
This wrapper is defined in include/asm-ia64/types.h and wraps a long
constant definition with UL or with nothing depending on its usage in the
kernel. The missing wrapper caused many sparse compile errors like
warning: constant 0x0x0000000010000380 so big it is long
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch greatly speeds up the handling of lfetch.fault instructions
which result in NaT consumption. Due to the NaT-page mapped at address
0, this is guaranteed to happen when lfetch.fault'ing a NULL pointer.
With this patch in place, we can even define prefetch()/prefetchw() as
lfetch.fault without significant performance degradation. More
importantly, it allows compilers to be more aggressive with using
lfetch.fault on pointers that might be NULL.
Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
No functional change, just identify the device nicely:
-IOC: Unknown (103c:12ec) 0.1 HPA 0xf8020002000 IOVA space 1024Mb at 0x40000000
+IOC: sx2000 0.1 HPA 0xf8020002000 IOVA space 1024Mb at 0x40000000
We used to create fake PCI devices for these chips, but we no longer do that.
So I don't think there's any reason to touch pci_ids.h now.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Altix patch to enable use of vgacon driver on that platform. Depends on the
PCDP generalization patch discussed at:
http://marc.theaimsgroup.com/?l=linux-ia64&m=111446235101939&w=2
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Resend 2 with changes per Bjorn Helgaas comments. Changes from original:
+ Change globals to vga_console_iobase/vga_console_membase and make them
unconditional.
+ Address style-related comments.
Patch to extend the PCDP vga setup code to support PCI io/mem translations
for the legacy vga ioport and ram spaces on architectures (e.g. altix) which
need them.
Summary of the changes:
drivers/firmware/pcdp.c
drivers/firmware/pcdp.h
-----------------------
+ add declaration for the spec-defined PCI interface struct (pcdp_if_pci)
as well as support macros.
+ extend setup_vga_console() to know about pcdp_if_pci and add a couple of
globals to hold the io and mem translation offsets if present.
arch/ia64/kernel/setup.c
------------------------
+ tweek early_console_setup() to allow multiple early console setup routines
to be called.
include/asm-ia64/vga.h
----------------------
+ make VGA_MAP_MEM vga_console_membase aware
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is an ia64 implementation of acpi_register_ioapic() and
acpi_unregister_ioapic() interfaces.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds the following new interfaces for I/O xAPIC
hotplug. The implementation of these interfaces depends on each
architecture.
o int acpi_register_ioapic(acpi_handle handle, u64 phys_addr,
u32 gsi_base);
This new interface is to add a new I/O xAPIC specified by
phys_addr and gsi_base pair. phys_addr is the physical address
to which the I/O xAPIC is mapped and gsi_base is global system
interrupt base of the I/O xAPIC. acpi_register_ioapic returns
0 on success, or negative value on error.
o int acpi_unregister_ioapic(acpi_handle handle, u32 gsi_base);
This new interface is to remove a I/O xAPIC specified by
gsi_base. acpi_unregister_ioapic returns 0 on success, or
negative value on error.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Read bridge io/mem/pfmem ranges when fixing up the bus so that bus resources
are tracked. This is required to properly support pci end device and bridge
hotplug.
Signed-off-by: Rajesh Shah <rajesh.shah@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
PCI scan code calls the arch specific pcibios_fixup_bus() each time it scans a
new bridge. For root bridge hot-plug, the bridge and it's attached devices
may not have been configured properly yet, so it's not safe to claim those
resources at this time.
This code goes away when we clean up the way pci resources are claimed (in
pci_enable_device()), so this is only a stopgap fix.
Signed-off-by: Rajesh Shah <rajesh.shah@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When checking if a PCI to PCI bridge should be enabled to decode memory and/or
IO resources, we need to look at all device resources not just the first 6.
This is needed to allow PCI bridges to pass down memory and IO accesses to
child devices even when the bridge itself does not consume resources in its
PCI BARs.
Signed-off-by: Rajesh Shah <rajesh.shah@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When you hot-plug a (root) bridge hierarchy, it may have p2p bridges and
devices attached to it that have not been configured by firmware. In this
case, we need to configure the devices before starting them. This patch
separates device start from device scan so that we can introduce the
configuration step in the middle.
I kept the existing semantics for pci_scan_bus() since there are a huge number
of callers to that function.
Also, I have no way of testing the changes I made to the parisc files, so this
needs review by those folks. Sorry for the massive cross-post, this touches
files in many different places.
Signed-off-by: Rajesh Shah <rajesh.shah@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Not safe to insert kprobes on IVT code.
This patch checks to see if the address on which Kprobes is being inserted is
in ivt code and if it is in ivt code then refuse to register kprobe.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Acked-by: David Mosberger <davidm@napali.hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Without the ability to atomically write 16 bytes, we can not update the
middle slot of a bundle, slot 1, unless we stop the machine first. This
patch will ensure the ability to robustly insert and remove a kprobe by
refusing to insert a kprobe on slot 1 until a mechanism is in place to
safely handle this case.
Signed-off-by: Rusty Lynch <rusty.lynch@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patch implements function return probes for ia64 using
the revised design. With this new design we no longer need to do some
of the odd hacks previous required on the last ia64 return probe port
that I sent out for comments.
Note that this new implementation still does not resolve the problem noted
by Keith Owens where backtrace data is lost after a return probe is hit.
Changes include:
* Addition of kretprobe_trampoline to act as a dummy function for instrumented
functions to return to, and for the return probe infrastructure to place
a kprobe on on, gaining control so that the return probe handler
can be called, and so that the instruction pointer can be moved back
to the original return address.
* Addition of arch_init(), allowing a kprobe to be registered on
kretprobe_trampoline
* Addition of trampoline_probe_handler() which is used as the pre_handler
for the kprobe inserted on kretprobe_implementation. This is the function
that handles the details for calling the return probe handler function
and returning control back at the original return address
* Addition of arch_prepare_kretprobe() which is setup as the pre_handler
for a kprobe registered at the beginning of the target function by
kernel/kprobes.c so that a return probe instance can be setup when
a caller enters the target function. (A return probe instance contains
all the needed information for trampoline_probe_handler to do it's job.)
* Hooks added to the exit path of a task so that we can cleanup any left-over
return probe instances (i.e. if a task dies while inside a targeted function
then the return probe instance was reserved at the beginning of the function
but the function never returns so we need to mark the instance as unused.)
Signed-off-by: Rusty Lynch <rusty.lynch@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This updates the CFQ io scheduler to the new time sliced design (cfq
v3). It provides full process fairness, while giving excellent
aggregate system throughput even for many competing processes. It
supports io priorities, either inherited from the cpu nice value or set
directly with the ioprio_get/set syscalls. The latter closely mimic
set/getpriority.
This import is based on my latest from -mm.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ia64 changes similar to kernel/sched.c.
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Acked-by: Paul Jackson <pj@sgi.com>
Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Do some basic initial tuning.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Dead CPU notifies online CPU that it's dead using cpu_state variable.
After switching to physical cpu hotplug, we forgot setting the variable.
This patch fixes it. Currently only __cpu_die uses it. We changed other
locations for consistency in case others use it.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Ashok Raj <ashok.raj@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
(The i386 CPU hotplug patch provides infrastructure for some work which Pavel
is doing as well as for ACPI S3 (suspend-to-RAM) work which Li Shaohua
<shaohua.li@intel.com> is doing)
The following provides i386 architecture support for safely unregistering and
registering processors during runtime, updated for the current -mm tree. In
order to avoid dumping cpu hotplug code into kernel/irq/* i dropped the
cpu_online check in do_IRQ() by modifying fixup_irqs(). The difference being
that on cpu offline, fixup_irqs() is called before we clear the cpu from
cpu_online_map and a long delay in order to ensure that we never have any
queued external interrupts on the APICs. There are additional changes to s390
and ppc64 to account for this change.
1) Add CONFIG_HOTPLUG_CPU
2) disable local APIC timer on dead cpus.
3) Disable preempt around irq balancing to prevent CPUs going down.
4) Print irq stats for all possible cpus.
5) Debugging check for interrupts on offline cpus.
6) Hacky fixup_irqs() to redirect irqs when cpus go off/online.
7) play_dead() for offline cpus to spin inside.
8) Handle offline cpus set in flush_tlb_others().
9) Grab lock earlier in smp_call_function() to prevent CPUs going down.
10) Implement __cpu_disable() and __cpu_die().
11) Enable local interrupts in cpu_enable() after fixup_irqs()
12) Don't fiddle with NMI on dead cpu, but leave intact on other cpus.
13) Program IRQ affinity whilst cpu is still in cpu_online_map on offline.
Signed-off-by: Zwane Mwaikambo <zwane@linuxpower.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Gcc4 doesn't like volatile casts as lvalues. Make the structure members
volatile instead.
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is based on work by Carlos O'Donell and Matthew Wilcox. It
introduces/updates the compat_time_t type and uses it for compat siginfo
structures. I have built this on ppc64 and x86_64.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch includes IA64 architecture specific changes(ported form i386) to
support temporary disarming on reentrancy of probes.
In case of reentrancy we single step without calling user handler.
Signed-of-by: Anil S Keshavamurth <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Once the jprobe instrumented function returns, it executes a jprobe_break
which is a break instruction with __IA64_JPROBE_BREAK value. The current
patch checks for this break value, before assuming that jprobe instrumented
function just completed.
The previous code was not checking for this value and that was a bug.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current kprobes does not yet handle register kprobes on some of the
following kind of instruction which needs to be emulated in a special way.
1) mov r1=ip
2) chk -- Speculation check instruction
This patch attempts to fail register_kprobes() when user tries to insert
kprobes on the above kind of instruction.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current Kprobes when patching the original instruction with the break
instruction tries to retain the original qualifying predicate(qp), however
for cmp.crel.ctype where ctype == unc, which is a special instruction
always needs to be executed irrespective of qp. Hence, if the instruction
we are patching is of this type, then we should not copy the original qp to
the break instruction, this is because we always want the break fault to
happen so that we can emulate the instruction.
This patch is based on the feedback given by David Mosberger
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
arch_prepare_kprobes() was doing lots of functionality
in just one single function. This patch
attempts to clean up arch_prepare_kprobes() by moving
specific sub task to the following (new)functions
1)valid_kprobe_addr() -->> validate the given kprobe address
2)get_kprobe_inst(slot..)->> Retrives the instruction for a given slot from the bundle
3)prepare_break_inst() -->> Prepares break instruction within the bundle
3a)update_kprobe_inst_flag()-->>Updates the internal flags, required
for proper emulation of the instruction at later
point in time.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a bug where a kprobe still fires when the instruction is predicated
off. So given the p6=0, and we have an instruction like:
(p6) move loc1=0
we should not be triggering the kprobe. This is handled by carrying over
the qp section of the original instruction into the break instruction.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A cleanup of the ia64 kprobes implementation such that all of the bundle
manipulation logic is concentrated in arch_prepare_kprobe().
With the current design for kprobes, the arch specific code only has a
chance to return failure inside the arch_prepare_kprobe() function.
This patch moves all of the work that was happening in arch_copy_kprobe()
and most of the work that was happening in arch_arm_kprobe() into
arch_prepare_kprobe(). By doing this we can add further robustness checks
in arch_arm_kprobe() and refuse to insert kprobes that will cause problems.
Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is required to support kprobe on branch/call instructions.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds IA64 architecture specific JProbes support on top of Kprobes
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is an IA64 arch specific handling of Kprobes
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As many of you know that kprobes exist in the main line kernel for various
architecture including i386, x86_64, ppc64 and sparc64. Attached patches
following this mail are a port of Kprobes and Jprobes for IA64.
I have tesed this patches for kprobes and Jprobes and this seems to work fine.
I have tested this patch by inserting kprobes on various slots and various
templates including various types of branch instructions.
I have also tested this patch using the tool
http://marc.theaimsgroup.com/?l=linux-kernel&m=111657358022586&w=2 and the
kprobes for IA64 works great.
Here is list of TODO things and pathes for the same will appear soon.
1) Support kprobes on "mov r1=ip" type of instruction
2) Support Kprobes and Jprobes to exist on the same address
3) Support Return probes
3) Architecture independent cleanup of kprobes
This patch adds the kdebug die notification mechanism needed by Kprobes.
For break instruction on Branch type slot, imm21 is ignored and value
zero is placed in IIM register, hence we need to handle kprobes
for switch case zero.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com>
From: Rusty Lynch <rusty.lynch@intel.com>
At the point in traps.c where we recieve a break with a zero value, we can
not say if the break was a result of a kprobe or some other debug facility.
This simple patch changes the informational string to a more correct "break
0" value, and applies to the 2.6.12-rc2-mm2 tree with all the kprobes
patches that were just recently included for the next mm cut.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It allows a selectable timer interrupt frequency of 100, 250 and 1000 HZ.
Reducing the timer frequency may have important performance benefits on
large systems.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This will at least suppress one prompt that users would have received the
first time they compile with the new DISCONTIG arch option. They'll still
get the "Memory Model" prompt, but 99% of them will have the default work
there.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For all architectures, this just means that you'll see a "Memory Model"
choice in your architecture menu. For those that implement DISCONTIGMEM,
you may eventually want to make your ARCH_DISCONTIGMEM_ENABLE a "def_bool
y" and make your users select DISCONTIGMEM right out of the new choice
menu. The only disadvantage might be if you have some specific things that
you need in your help option to explain something about DISCONTIGMEM.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch effectively eliminates direct use of pgdat->node_mem_map outside
of the DISCONTIG code. On a flat memory system, these fields aren't
currently used, neither are they on a sparsemem system.
There was also a node_mem_map(nid) macro on many architectures. Its use
along with the use of ->node_mem_map itself was not consistent. It has
been removed in favor of two new, more explicit, arch-independent macros:
pgdat_page_nr(pgdat, pagenr)
nid_page_nr(nid, pagenr)
I called them "pgdat" and "nid" because we overload the term "node" to mean
"NUMA node", "DISCONTIG node" or "pg_data_t" in very confusing ways. I
believe the newer names are much clearer.
These macros can be overridden in the sparsemem case with a theoretically
slower operation using node_start_pfn and pfn_to_page(), instead. We could
make this the only behavior if people want, but I don't want to change too
much at once. One thing at a time.
This patch removes more code than it adds.
Compile tested on alpha, alpha discontig, arm, arm-discontig, i386, i386
generic, NUMAQ, Summit, ppc64, ppc64 discontig, and x86_64. Full list
here: http://sr71.net/patches/2.6.12/2.6.12-rc1-mhp2/configs/
Boot tested on NUMAQ, x86 SMP and ppc64 power4/5 LPARs.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin J. Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The SGI IOC4 I/O controller chip drivers are currently all configured by
CONFIG_BLK_DEV_SGIIOC4. This is undesirable as not all IOC4 hardware features
are needed by all systems.
This patch adds two configuration variables, CONFIG_SGI_IOC4 for core IOC4
driver support (see patch 1/3 in this series for further explanation) and
CONFIG_SERIAL_SGI_IOC4 to independently enable serial port support.
Signed-off-by: Brent Casavant <bcasavan@sgi.com>
Acked-by: Pat Gefre <pfg@sgi.com>
Acked-by: Jeremy Higdon <jeremy@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the bits to make the XPC code use the uncached
allocator rather than calling into the mspec driver. It also includes the
mspec.h header which is required to build the XPC modules.
Signed-off-by: Jes Sorensen <jes@wildopensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the ia64 uncached page allocator and the generic
allocator (genalloc). The uncached allocator was formerly part of the SN2
mspec driver but there are several other users of it so it has been split
off from the driver.
The generic allocator can be used by device driver to manage special memory
etc. The generic allocator is based on the allocator from the sym53c8xx_2
driver.
Various users on ia64 needs uncached memory. The SGI SN architecture requires
it for inter-partition communication between partitions within a large NUMA
cluster. The specific user for this is the XPC code. Another application is
large MPI style applications which use it for synchronization, on SN this can
be done using special 'fetchop' operations but it also benefits non SN
hardware which may use regular uncached memory for this purpose. Performance
of doing this through uncached vs cached memory is pretty substantial. This
is handled by the mspec driver which I will push out in a seperate patch.
Rather than creating a specific allocator for just uncached memory I came up
with genalloc which is a generic purpose allocator that can be used by device
drivers and other subsystems as they please. For instance to handle onboard
device memory. It was derived from the sym53c7xx_2 driver's allocator which
is also an example of a potential user (I am refraining from modifying sym2
right now as it seems to have been under fairly heavy development recently).
On ia64 memory has various properties within a granule, ie. it isn't safe to
access memory as uncached within the same granule as currently has memory
accessed in cached mode. The regular system therefore doesn't utilize memory
in the lower granules which is mixed in with device PAL code etc. The
uncached driver walks the EFI memmap and pulls out the spill uncached pages
and sticks them into the uncached pool. Only after these chunks have been
utilized, will it start converting regular cached memory into uncached memory.
Hence the reason for the EFI related code additions.
Signed-off-by: Jes Sorensen <jes@wildopensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A lot of the code in arch/*/mm/hugetlbpage.c is quite similar. This patch
attempts to consolidate a lot of the code across the arch's, putting the
combined version in mm/hugetlb.c. There are a couple of uglyish hacks in
order to covert all the hugepage archs, but the result is a very large
reduction in the total amount of code. It also means things like hugepage
lazy allocation could be implemented in one place, instead of six.
Tested, at least a little, on ppc64, i386 and x86_64.
Notes:
- this patch changes the meaning of set_huge_pte() to be more
analagous to set_pte()
- does SH4 need s special huge_ptep_get_and_clear()??
Acked-by: William Lee Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is the core of the (much simplified) early reclaim. The goal of this
patch is to reclaim some easily-freed pages from a zone before falling back
onto another zone.
One of the major uses of this is NUMA machines. With the default allocator
behavior the allocator would look for memory in another zone, which might be
off-node, before trying to reclaim from the current zone.
This adds a zone tuneable to enable early zone reclaim. It is selected on a
per-zone basis and is turned on/off via syscall.
Adding some extra throttling on the reclaim was also required (patch
4/4). Without the machine would grind to a crawl when doing a "make -j"
kernel build. Even with this patch the System Time is higher on
average, but it seems tolerable. Here are some numbers for kernbench
runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:
wall user sys %cpu ctx sw. sleeps
---- ---- --- ---- ------ ------
No patch 1009 1384 847 258 298170 504402
w/patch, no reclaim 880 1376 667 288 254064 396745
w/patch & reclaim 1079 1385 926 252 291625 548873
These numbers are the average of 2 runs of 3 "make -j" runs done right
after system boot. Run-to-run variability for "make -j" is huge, so
these numbers aren't terribly useful except to seee that with reclaim
the benchmark still finishes in a reasonable amount of time.
I also looked at the NUMA hit/miss stats for the "make -j" runs and the
reclaim doesn't make any difference when the machine is thrashing away.
Doing a "make -j8" on a single node that is filled with page cache pages
takes 700 seconds with reclaim turned on and 735 seconds without reclaim
(due to remote memory accesses).
The simple zone_reclaim syscall program is at
http://www.bork.org/~mort/sgi/zone_reclaim.c
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes handling of accesses to ar.rsc via ptrace & restore_sigcontext
[With Thanks to Chris Wright for noticing the restore_sigcontext path]
Signed-off-by: Matthew Chapman <matthewc@hp.com>
Acked-by: David Mosberger <davidm@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove "pci=routeirq" option for ia64. This was a workaround
after ACPI IRQ routing was changed from "all at boot for everything
in _PRT" to "do it when the device is enabled" in case there were
drivers that didn't use pci_enable_device().
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The nested_dtlb_miss handler currently does not handle fault from
hugetlb address correctly. It walks the page table assuming PAGE_SIZE.
Thus when taking a fault triggered from hugetlb address, it would not
calculate the pgd/pmd/pte address correctly and thus result an incorrect
invocation of ia64_do_page_fault(). In there, kernel will signal SIGBUS
and application dies (The faulting address is perfectly legal and we
have a valid pte for the corresponding user hugetlb address as well).
This patch fix the described kernel bug. Since nested_dtlb_miss is a
rare event and a slow path anyway, I'm making the change without #ifdef
CONFIG_HUGETLB_PAGE for code readability. Tony, please apply.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
printk() calls should include appropriate KERN_* constant.
Signed-off-by: Christophe Lucas <clucas@rotomalug.org>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Allow the SGI simulator (medusa) to work on generic kernels. There is
no inherent dependency on an sn2-specific kernel.
Boot tested on Altix, medusa and HP rx2600.
Signed-off-by: Greg Edwards <edwardsg@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>