Steven Rostedt found a bug in where in his modified kernel
ftrace was unable to modify the kernel text, due to the PMD
itself having been marked read-only as well in
split_large_page().
The fix, suggested by Linus, is to not try to 'clone' the
reference protection of a huge-page, but to use the standard
(and permissive) page protection bits of KERNPG_TABLE.
The 'cloning' makes sense for the ptes but it's a confused and
incorrect concept at the page table level - because the
pagetable entry is a set of all ptes and hence cannot
'clone' any single protection attribute - the ptes can be any
mixture of protections.
With the permissive KERNPG_TABLE, even if the pte protections
get changed after this point (due to ftrace doing code-patching
or other similar activities like kprobes), the resulting combined
protections will still be correct and the pte's restrictive
(or permissive) protections will control it.
Also update the comment.
This bug was there for a long time but has not caused visible
problems before as it needs a rather large read-only area to
trigger. Steve possibly hacked his kernel with some really
large arrays or so. Anyway, the bug is definitely worth fixing.
[ Huang Ying also experienced problems in this area when writing
the EFI code, but the real bug in split_large_page() was not
realized back then. ]
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Huang Ying <ying.huang@intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix time warps under vmware
Similar to the check for TSC going backwards in the TSC clocksource,
we also need this check for VMI clocksource.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org
The A0 revision of the mv78xx0 development board has four ethernet
ports, with PHY IDs 8-11, whereas the Z0 version has two, with PHY
addresses 8-9. This patch configures the third and fourth ethernet
port to use the PHY addresses on the A0 board to enable use of those
ports -- if we are running on a Z0 board, the ge10/11 setup code in
common.c will force these back to PHYless mode.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
On pre-A0 revisions of the mv78xx0 SoC, the third and fourth
ethernet interface are not brought out to pins, but are internally
cross-connected, so if we run on pre-A0 silicon, we'll force eth2
and eth3 to PHYless mode.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
During boot, identify which chip stepping we're running on (determined
by looking at the first PCIe unit's device ID and revision registers),
and print a message with the details about what we found.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
This allows for board support code to set up their MPP config if the
bootloader didn't do it all or did it wrong. This also allows to
register usable GPIOs.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Especially on Kirkwood, a couple GPIOs are actually only output capable.
Let's separate the ability to configure a GPIO as input or output to
accommodate this restriction.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
arch/ia64/xen/xen_pv_ops.c:156: error: xen_init_ops causes a section type conflict
arch/ia64/xen/xen_pv_ops.c:340: error: xen_iosapic_ops causes a section type conflict
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch fixes xen related Kconfigs and add default config
file for ia64 xen domU.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
The second call to cpu_clear() is redundant, as we've already removed
the CPU from cpu_online_map before calling migrate_platform_irqs().
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
This reverts commit e7b140365b.
Commit e7b14036 removes the targetted disabled CPU from the
cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.
Paul McKenney states that the reasoning behind the patch was to
prevent irq handlers from running on CPUs marked offline because:
RCU happily ignores CPUs that don't have their bits set in
cpu_online_map, so if there are RCU read-side critical sections
in the irq handlers being run, RCU will ignore them. If the
other CPUs were running, they might sequence through the RCU
state machine, which could result in data structures being
yanked out from under those irq handlers, which in turn could
result in oopses or worse.
Unfortunately, both ia64 functions above look at cpu_online_map to find
a new CPU to migrate interrupts onto. This means we can potentially
migrate an interrupt off ourself back to... ourself. Uh oh.
This causes an oops when we finally try to process pending interrupts on
the CPU we want to disable. The oops results from calling __do_IRQ with
a NULL pt_regs:
Unable to handle kernel NULL pointer dereference (address 0000000000000040)
Call Trace:
[<a000000100016930>] show_stack+0x50/0xa0
sp=e0000009c922fa00 bsp=e0000009c92214d0
[<a0000001000171a0>] show_regs+0x820/0x860
sp=e0000009c922fbd0 bsp=e0000009c9221478
[<a00000010003c700>] die+0x1a0/0x2e0
sp=e0000009c922fbd0 bsp=e0000009c9221438
[<a0000001006e92f0>] ia64_do_page_fault+0x950/0xa80
sp=e0000009c922fbd0 bsp=e0000009c92213d8
[<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
sp=e0000009c922fc60 bsp=e0000009c92213d8
[<a0000001000ecdb0>] profile_tick+0xd0/0x1c0
sp=e0000009c922fe30 bsp=e0000009c9221398
[<a00000010003bb90>] timer_interrupt+0x170/0x3e0
sp=e0000009c922fe30 bsp=e0000009c9221330
[<a00000010013a800>] handle_IRQ_event+0x80/0x120
sp=e0000009c922fe30 bsp=e0000009c92212f8
[<a00000010013aa00>] __do_IRQ+0x160/0x4a0
sp=e0000009c922fe30 bsp=e0000009c9221290
[<a000000100012290>] ia64_process_pending_intr+0x2b0/0x360
sp=e0000009c922fe30 bsp=e0000009c9221208
[<a0000001000112d0>] fixup_irqs+0xf0/0x2a0
sp=e0000009c922fe30 bsp=e0000009c92211a8
[<a00000010005bd80>] __cpu_disable+0x140/0x240
sp=e0000009c922fe30 bsp=e0000009c9221168
[<a0000001006c5870>] take_cpu_down+0x50/0xa0
sp=e0000009c922fe30 bsp=e0000009c9221148
[<a000000100122610>] stop_cpu+0xd0/0x200
sp=e0000009c922fe30 bsp=e0000009c92210f0
[<a0000001000e0440>] kthread+0xc0/0x140
sp=e0000009c922fe30 bsp=e0000009c92210c8
[<a000000100014ab0>] kernel_thread_helper+0xd0/0x100
sp=e0000009c922fe30 bsp=e0000009c92210a0
[<a00000010000a4c0>] start_kernel_thread+0x20/0x40
sp=e0000009c922fe30 bsp=e0000009c92210a0
I don't like this revert because it is fragile. ia64 is getting lucky
because we seem to only ever process timer interrupts in this path, but
if we ever race with an IPI here, we definitely use RCU and have the
potential of hitting an oops that Paul describes above.
Patching ia64's timer_interrupt() to check for NULL pt_regs is
insufficient though, as we still hit the above oops.
As a short term solution, I do think that this revert is the right
answer. The revert hold up under repeated testing (24+ hour test runs)
with this setup:
- 8-way rx6600
- randomly toggling CPU online/offline state every 2 seconds
- running CPU exercisers, memory hog, disk exercisers, and
network stressors
- average system load around ~160
In the long term, we really need to figure out why we set pt_regs = NULL
in ia64_process_pending_intr(). If it turns out that it is unnecessary
to do so, then we could safely re-introduce e7b14036 (along with some
other logic to be smarter about migrating interrupts).
One final note: x86 also removes the disabled CPU from cpu_online_map
and then re-enables interrupts for 1ms, presumably to handle any pending
interrupts:
arch/x86/kernel/irq_32.c (and irq_64.c):
cpu_disable_common:
[remove cpu from cpu_online_map]
fixup_irqs():
for_each_irq:
[break CPU affinities]
local_irq_enable();
mdelay(1);
local_irq_disable();
So they are doing implicitly what ia64 is doing explicitly.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
BTE_MAX_XFER is wrong. It is one greater than the number of cache
lines the BTE is actually able to transfer. If you request a transfer
of exactly BTE_MAX_XFER size, you trip a very cryptic BUG_ON() which
should certainly be made more clear.
This patch fixes that constant and also cleans up the BUG_ON()s in
arch/ia64/sn/kernel/bte.c to test one condition per line.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
ia64 only defines __early_pfn_to_nid() for SPARSEMEM && NUMA configurations,
so the recent:
commit: f2dbcfa738
mm: clean up for early_pfn_to_nid()
ends up with some link problems for certain configuration files.
Fix arch/ia64/Kconfig to only define HAVE_ARCH_EARLY_PFN_TO_NID in the
cases where we do provide this function.
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, mce: fix ifdef for 64bit thermal apic vector clear on shutdown
x86, mce: use force_sig_info to kill process in machine check
x86, mce: reinitialize per cpu features on resume
x86, rcu: fix strange load average and ksoftirqd behavior
Remove the gesbc9312.h header since it is unused.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
READ_IMPLIES_EXEC must be set when:
o binary _is_ an executable stack (i.e. not EXSTACK_DISABLE_X)
o processor architecture is _under_ ARMv6 (XN bit is supported from ARMv6)
Signed-off-by: Makito SHIOKAWA <lkhmkt@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Standby memory detected with the sclp interface gets always registered
with add_memory calls without considering the limitationt that the
"mem=" kernel paramater implies.
So fix this and only register standby memory that is below the specified
limit.
This fixes zfcpdump since it uses "mem=32M". In case there is appr.
2GB standby memory present all of usable memory would be used for the
struct pages needed for standby memory.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
commit aa5e97ce4b
[PATCH] improve precision of process accounting.
Introduced a timing regression:
-bash-3.2# time ls
real 0m0.006s
user 0m1.754s
sys 0m1.094s
The problem was introduced by an error in cputime_to_timeval.
Cputime is now 1/4096 microsecond, therefore, we have to divide
the remainder with 4096 to get the microseconds.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When changing the parent of a clock, it is necessary to keep the
clock use counts balanced otherwise things the parent state will
get corrupted. Since we already disable and re-enable the clock,
we might as well use the recursive versions instead.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
In the non highmem case, if two memory banks of 1GB each are provided,
the second bank would evade suppression since its virtual base would
be 0. Fix this by disallowing any memory bank which virtual base
address is found to be lower than PAGE_OFFSET.
Reported-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Now, early_pfn_in_nid(PFN, NID) may returns false if PFN is a hole.
and memmap initialization was not done. This was a trouble for
sparc boot.
To fix this, the PFN should be initialized and marked as PG_reserved.
This patch changes early_pfn_in_nid() return true if PFN is a hole.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reported-by: David Miller <davem@davemlloft.net>
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
What's happening is that the assertion in mm/page_alloc.c:move_freepages()
is triggering:
BUG_ON(page_zone(start_page) != page_zone(end_page));
Once I knew this is what was happening, I added some annotations:
if (unlikely(page_zone(start_page) != page_zone(end_page))) {
printk(KERN_ERR "move_freepages: Bogus zones: "
"start_page[%p] end_page[%p] zone[%p]\n",
start_page, end_page, zone);
printk(KERN_ERR "move_freepages: "
"start_zone[%p] end_zone[%p]\n",
page_zone(start_page), page_zone(end_page));
printk(KERN_ERR "move_freepages: "
"start_pfn[0x%lx] end_pfn[0x%lx]\n",
page_to_pfn(start_page), page_to_pfn(end_page));
printk(KERN_ERR "move_freepages: "
"start_nid[%d] end_nid[%d]\n",
page_to_nid(start_page), page_to_nid(end_page));
...
And here's what I got:
move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
move_freepages: start_nid[1] end_nid[0]
My memory layout on this box is:
[ 0.000000] Zone PFN ranges:
[ 0.000000] Normal 0x00000000 -> 0x0081ff5d
[ 0.000000] Movable zone start PFN for each node
[ 0.000000] early_node_map[8] active PFN ranges
[ 0.000000] 0: 0x00000000 -> 0x00020000
[ 0.000000] 1: 0x00800000 -> 0x0081f7ff
[ 0.000000] 1: 0x0081f800 -> 0x0081fe50
[ 0.000000] 1: 0x0081fed1 -> 0x0081fed8
[ 0.000000] 1: 0x0081feda -> 0x0081fedb
[ 0.000000] 1: 0x0081fedd -> 0x0081fee5
[ 0.000000] 1: 0x0081fee7 -> 0x0081ff51
[ 0.000000] 1: 0x0081ff59 -> 0x0081ff5d
So it's a block move in that 0x81f600-->0x81f7ff region which triggers
the problem.
This patch:
Declaration of early_pfn_to_nid() is scattered over per-arch include
files, and it seems it's complicated to know when the declaration is used.
I think it makes fix-for-memmap-init not easy.
This patch moves all declaration to include/linux/mm.h
After this,
if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
-> Use static definition in include/linux/mm.h
else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
-> Use generic definition in mm/page_alloc.c
else
-> per-arch back end function will be called.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: David Miller <davem@davemlloft.net>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: Bugfix
The ifdef for the apic clear on shutdown for the 64bit intel thermal
vector was incorrect and never triggered. Fix that.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: bug fix (with tolerant == 3)
do_exit cannot be called directly from the exception handler because
it can sleep and the exception handler runs on the exception stack.
Use force_sig() instead.
Based on a earlier patch by Ying Huang who debugged the problem.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: Bug fix
This fixes a long standing bug in the machine check code. On resume the
boot CPU wouldn't get its vendor specific state like thermal handling
reinitialized. This means the boot cpu wouldn't ever get any thermal
events reported again.
Call the respective initialization functions on resume
v2: Remove ancient init because they don't have a resume device anyways.
Pointed out by Thomas Gleixner.
v3: Now fix the Subject too to reflect v2 change
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The GPIO interrupts can be configured as either level triggered or edge
triggered, with a default of level triggered. When an edge triggered
interrupt is requested, the gpio_irq_set_type method is called which
currently switches the given IRQ descriptor between two struct irq_chip
instances: orion_gpio_irq_level_chip and orion_gpio_irq_edge_chip. This
happens via __setup_irq() which also calls irq_chip_set_defaults() to
assign default methods to uninitialized ones. The problem is that
irq_chip_set_defaults() is called before the irq_chip reference is
switched, leaving the new irq_chip (orion_gpio_irq_edge_chip in this
case) with uninitialized methods such as chip->startup() causing a kernel
oops.
Many solutions are possible, such as making irq_chip_set_defaults() global
and calling it from gpio_irq_set_type(), or calling __irq_set_trigger()
before irq_chip_set_defaults() in __setup_irq(). But those require
modifications to the generic IRQ code which might have adverse effect on
other architectures, and that would still be a fragile arrangement.
Manually copying the missing methods from within gpio_irq_set_type()
would be really ugly and it would break again the day new methods with
automatic defaults are added.
A better solution is to have a single irq_chip instance which can deal
with both edge and level triggered interrupts. It is also a good idea
to switch the IRQ handler instead, as the edge IRQ handler allows for
one edge IRQ event to be queued as the IRQ is actually masked only when
that second IRQ is received, at which point the hardware can queue an
additional IRQ event, making edge triggered interrupts a bit more
reliable.
Tested-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
doc: mmiotrace.txt, buffer size control change
trace: mmiotrace to the tracer menu in Kconfig
mmiotrace: count events lost due to not recording
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, vm86: fix preemption bug
x86, olpc: fix model detection without OFW
x86, hpet: fix for LS21 + HPET = boot hang
x86: CPA avoid repeated lazy mmu flush
x86: warn if arch_flush_lazy_mmu_cpu is called in preemptible context
x86/paravirt: make arch_flush_lazy_mmu/cpu disable preemption
x86, pat: fix warn_on_once() while mapping 0-1MB range with /dev/mem
x86/cpa: make sure cpa is safe to call in lazy mmu mode
x86, ptrace, mm: fix double-free on race
Add support for inverted rdy_busy pin for Atmel nand device controller
It will fix building error on NeoCore926 board.
Acked-by: Andrew Victor <linux@maxim.org.za>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Gregory CLEMENT <gclement@adeneo.adetelgroup.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Impact: use new API, fix SMP bug.
Use the new accessors rather than frobbing bits directly.
This also removes the bug introduced in ee0c468b (alpha: compile
fixes) which had Alpha setting bits on an on-stack cpumask, not the
cpu_online_map.
Cc: Richard Henderson <rth@twiddle.net>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Ingo Molnar <mingo@elte.hu>
Impact: fix powernow-k8 when acpi=off (or other error).
There was a spurious change introduced into powernow-k8 in this patch:
so that we try to "restore" the cpus_allowed we never saved. We revert
that file.
See lkml "[PATCH] x86/powernow: fix cpus_allowed brokage when
acpi=off" from Yinghai for the bug report.
Cc: Mike Travis <travis@sgi.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Impact: cosmetic change in Kconfig menu layout
This patch was originally suggested by Peter Zijlstra, but seems it
was forgotten.
CONFIG_MMIOTRACE and CONFIG_MMIOTRACE_TEST were selectable
directly under the Kernel hacking / debugging menu in the kernel
configuration system. They were present only for x86 and x86_64.
Other tracers that use the ftrace tracing framework are in their own
sub-menu. This patch moves the mmiotrace configuration options there.
Since the Kconfig file, where the tracer menu is, is not architecture
specific, HAVE_MMIOTRACE_SUPPORT is introduced and provided only by
x86/x86_64. CONFIG_MMIOTRACE now depends on it.
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 3d2a71a596 ("x86, traps: converge
do_debug handlers") changed the preemption disable logic of do_debug()
so vm86_handle_trap() is called with preemption disabled resulting in:
BUG: sleeping function called from invalid context at include/linux/kernel.h:155
in_atomic(): 1, irqs_disabled(): 0, pid: 3005, name: dosemu.bin
Pid: 3005, comm: dosemu.bin Tainted: G W 2.6.29-rc1 #51
Call Trace:
[<c050d669>] copy_to_user+0x33/0x108
[<c04181f4>] save_v86_state+0x65/0x149
[<c0418531>] handle_vm86_trap+0x20/0x8f
[<c064e345>] do_debug+0x15b/0x1a4
[<c064df1f>] debug_stack_correct+0x27/0x2c
[<c040365b>] sysenter_do_call+0x12/0x2f
BUG: scheduling while atomic: dosemu.bin/3005/0x10000001
Restore the original calling convention and reenable preemption before
calling handle_vm86_trap().
Reported-by: Michal Suchanek <hramrach@centrum.cz>
Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some msrs (notable MSR_KERNEL_GS_BASE) are held in the processor registers
and need to be flushed to the vcpu struture before they can be read.
This fixes cygwin longjmp() failure on Windows x64.
Signed-off-by: Avi Kivity <avi@redhat.com>
Simplify LAPIC TMCCT calculation by using hrtimer provided
function to query remaining time until expiration.
Fixes host hang with nested ESX.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Software are not allow to access device MMIO using cacheable memory type, the
patch limit MMIO region with UC and WC(guest can select WC using PAT and
PCD/PWT).
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This is better.
Currently, this code path is posing us big troubles,
and we won't have a decent patch in time. So, temporarily
disable it.
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
count_load_time assignment is bogus: its supposed to contain what it
means, not the expiration time.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
In the past, kvm_get_kvm() and kvm_put_kvm() was called in assigned device irq
handler and interrupt_work, in order to prevent cancel_work_sync() in
kvm_free_assigned_irq got a illegal state when waiting for interrupt_work done.
But it's tricky and still got two problems:
1. A bug ignored two conditions that cancel_work_sync() would return true result
in a additional kvm_put_kvm().
2. If interrupt type is MSI, we would got a window between cancel_work_sync()
and free_irq(), which interrupt would be injected again...
This patch discard the reference count used for irq handler and interrupt_work,
and ensure the legal state by moving the free function at the very beginning of
kvm_destroy_vm(). And the patch fix the second bug by disable irq before
cancel_work_sync(), which may result in nested disable of irq but OK for we are
going to free it.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
kvm_arch_sync_events is introduced to quiet down all other events may happen
contemporary with VM destroy process, like IRQ handler and work struct for
assigned device.
For kvm_arch_sync_events is called at the very beginning of kvm_destroy_vm(), so
the state of KVM here is legal and can provide a environment to quiet down other
events.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Kconfig symbols are not available in userspace, and are not stripped by
headers-install. Avoid their use by adding #defines in <asm/kvm.h> to
suit each architecture.
Signed-off-by: Avi Kivity <avi@redhat.com>
The floating-point registers f6-f11 is used by vmm and
saved in kvm-pt-regs, so should set the correct bit mask
and the pointer in fp_state, otherwise, fpswa may touch
vmm's fp registers instead of guests'.
In addition, for fp trap handling, since the instruction
which leads to fp trap is completely executed, so can't
use retry machanism to re-execute it, because it may
pollute some registers.
Signed-off-by: Yang Zhang <yang.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Impact: fix "garbled display, laptop is unusable" bug
Commit e51a1ac2df ("x86, olpc: fix endian
bug in openfirmware workaround") breaks model comparison on OLPC; the value
0xc2 needs to be scaled up by olpc_board().
The pre-patch version was wrong, but accidentally worked anyway
(big-endian 0xc2 is big enough to satisfy all other board revisions,
but little endian 0xc2 is not).
Signed-off-by: Chris Ball <cjb@laptop.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Andres Salomon <dilinger@queued.net>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Enable the GPIO clocks earlier in the initialization sequence. This
allow the board-setup code to read and set GPIO pins.
Signed-off-by: Marc Pignat <marc.pignat@hevs.ch>
Signed-off-by: Andrew Victor <linux@maxim.org.za>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The recently merged AT91SAM9 watchdog driver uses the
AT91SAM9X_WATCHDOG config variable, whereas the original version of
the driver (and the platform support code) used AT91SAM9_WATCHDOG.
This causes the watchdog platform_device to never be registered, and
therefore the driver not to be initialized.
This patch:
- updates the platform support code to use AT91SAM9X_WATCHDOG.
- includes <linux/io.h> to fix compile error (same fix as was applied
to at91rm9200_wdt.c)
- fixes comment regarding watchdog clock-rates in at91rm9200.
Signed-off-by: Andrew Victor <linux@maxim.org.za>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
_omap2_clksel_get_src_field() was returning the first entry which was
either the default _or_ applicable to the SoC. This is wrong - we
should be returning the first default which is applicable to the SoC.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The error checks for omap2_divisor_to_clksel() and comment disagree with
the actual value returned on error. Fix this to return the correct error
value.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Between 2.6.23 and 2.6.24-rc1 a change was made that broke IBM LS21
systems that had the HPET enabled in the BIOS, resulting in boot hangs
for x86_64.
Specifically commit b8ce335906, which
merges the i386 and x86_64 HPET code.
Prior to this commit, when we setup the HPET timers in x86_64, we did
the following:
hpet_writel(HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
HPET_TN_32BIT, HPET_T0_CFG);
However after the i386/x86_64 HPET merge, we do the following:
cfg = hpet_readl(HPET_Tn_CFG(timer));
cfg |= HPET_TN_ENABLE | HPET_TN_PERIODIC |
HPET_TN_SETVAL | HPET_TN_32BIT;
hpet_writel(cfg, HPET_Tn_CFG(timer));
However on LS21s with HPET enabled in the BIOS, the HPET_T0_CFG register
boots with Level triggered interrupts (HPET_TN_LEVEL) enabled. This
causes the periodic interrupt to be not so periodic, and that results in
the boot time hang I reported earlier in the delay calibration.
My fix: Always disable HPET_TN_LEVEL when setting up periodic mode.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix the VSX alignment handler for VSX registers > 32. 32-63 are stored
in the VMX part of the thread_struct not the FPR part.
Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: stable@kernel.org (2.6.27 & .28 please)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Change the PS3 hotplug memory routine ps3_mm_add_memory() from
a core_initcall to a device_initcall.
core_initcall routines run before the powerpc topology_init()
startup routine, which is a subsys_initcall, resulting in
failure of ps3_mm_add_memory() when CONFIG_NUMA=y. When
ps3_mm_add_memory() fails the system will boot with just the
128 MiB of boot memory
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix the powerpc NUMA reserve bootmem page selection logic.
commit 8f64e1f2d1 (powerpc: Reserve
in bootmem lmb reserved regions that cross NUMA nodes) changed
the logic for how the powerpc LMB reserved regions were converted
to bootmen reserved regions. As the folowing discussion reports,
the new logic was not correct.
mark_reserved_regions_for_nid() goes through each LMB on the
system that specifies a reserved area. It searches for
active regions that intersect with that LMB and are on the
specified node. It attempts to bootmem-reserve only the area
where the active region and the reserved LMB intersect. We
can not reserve things on other nodes as they may not have
bootmem structures allocated, yet.
We base the size of the bootmem reservation on two possible
things. Normally, we just make the reservation start and
stop exactly at the start and end of the LMB.
However, the LMB reservations are not aware of NUMA nodes and
on occasion a single LMB may cross into several adjacent
active regions. Those may even be on different NUMA nodes
and will require separate calls to the bootmem reserve
functions. So, the bootmem reservation must be trimmed to
fit inside the current active region.
That's all fine and dandy, but we trim the reservation
in a page-aligned fashion. That's bad because we start the
reservation at a non-page-aligned address: physbase.
The reservation may only span 2 bytes, but that those bytes
may span two pfns and cause a reserve_size of 2*PAGE_SIZE.
Take the case where you reserve 0x2 bytes at 0x0fff and
where the active region ends at 0x1000. You'll jump into
that if() statment, but node_ar.end_pfn=0x1 and
start_pfn=0x0. You'll end up with a reserve_size=0x1000,
and then call
reserve_bootmem_node(node, physbase=0xfff, size=0x1000);
0x1000 may not be on the same node as 0xfff. Oops.
In almost all the vm code, end_<anything> is not inclusive.
If you have an end_pfn of 0x1234, page 0x1234 is not
included in the range. Using PFN_UP instead of the
(>> >> PAGE_SHIFT) will make this consistent with the other VM
code.
We also need to do math for the reserved size with physbase
instead of start_pfn. node_ar.end_pfn << PAGE_SHIFT is
*precisely* the end of the node. However,
(start_pfn << PAGE_SHIFT) is *NOT* precisely the beginning
of the reserved area. That is, of course, physbase.
If we don't use physbase here, the reserve_size can be
made too large.
From: Dave Hansen <dave@linux.vnet.ibm.com>
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com> Tested on PS3.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix _PAGE_CHG_MASK so that pte_modify() does not affect the _PAGE_SPECIAL bit.
Signed-off-by: Philippe Gerum <rpm@xenomai.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: Flush the lazy MMU only once
Pending mmu updates only need to be flushed once to bring the
in-memory pagetable state up to date.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Impact: Catch cases where lazy MMU state is active in a preemtible context
arch_flush_lazy_mmu_cpu() has been changed to disable preemption so
the checks in enter/leave will never trigger. Put the preemtible()
check into arch_flush_lazy_mmu_cpu() to catch such cases.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Impact: avoid access to percpu vars in preempible context
They are intended to be used whenever there's the possibility
that there's some stale state which is going to be overwritten
with a queued update, or to force a state change when we may be
in lazy mode. Either way, we could end up calling it with
preemption enabled, so wrap the functions in their own little
preempt-disable section so they can be safely called in any
context (though preemption should never be enabled if we're actually
in a lazy state).
(Move out of line to avoid #include dependencies.)
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The C99 specification states in section 6.11.5:
The placement of a storage-class specifier other than at the
beginning of the declaration specifiers in a declaration is an
obsolescent feature.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Jeff Mahoney reported:
> With Suse's hwinfo tool, on -tip:
> WARNING: at arch/x86/mm/pat.c:637 reserve_pfn_range+0x5b/0x26d()
reserve_pfn_range() is not tracking the memory range below 1MB
as non-RAM and as such is inconsistent with similar checks in
reserve_memtype() and free_memtype()
Rename the pagerange_is_ram() to pat_pagerange_is_ram() and add the
"track legacy 1MB region as non RAM" condition.
And also, fix reserve_pfn_range() to return -EINVAL, when the pfn
range is RAM. This is to be consistent with this API design.
Reported-and-tested-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix race leading to crash under KVM and Xen
The CPA code may be called while we're in lazy mmu update mode - for
example, when using DEBUG_PAGE_ALLOC and doing a slab allocation
in an interrupt handler which interrupted a lazy mmu update. In this
case, the in-memory pagetable state may be out of date due to pending
queued updates. We need to flush any pending updates before inspecting
the page table. Similarly, we must explicitly flush any modifications
CPA may have made (which comes down to flushing queued operations when
flushing the TLB).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Stable Kernel <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timers: fix TIMER_ABSTIME for process wide cpu timers
timers: split process wide cpu clocks/timers, fix
x86: clean up hpet timer reinit
timers: split process wide cpu clocks/timers, remove spurious warning
timers: split process wide cpu clocks/timers
signal: re-add dead task accumulation stats.
x86: fix hpet timer reinit for x86_64
sched: fix nohz load balancer on cpu offline
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ptrace, x86: fix the usage of ptrace_fork()
i8327: fix outb() parameter order
x86: fix math_emu register frame access
x86: math_emu info cleanup
x86: include correct %gs in a.out core dump
x86, vmi: put a missing paravirt_release_pmd in pgd_dtor
x86: find nr_irqs_gsi with mp_ioapic_routing
x86: add clflush before monitor for Intel 7400 series
x86: disable intel_iommu support by default
x86: don't apply __supported_pte_mask to non-present ptes
x86: fix grammar in user-visible BIOS warning
x86/Kconfig.cpu: make Kconfig help readable in the console
x86, 64-bit: print DMI info in the oops trace
Ptrace_detach() races with __ptrace_unlink() if the traced task is
reaped while detaching. This might cause a double-free of the BTS
buffer.
Change the ptrace_detach() path to only do the memory accounting in
ptrace_bts_detach() and leave the buffer free to ptrace_bts_untrace()
which will be called from __ptrace_unlink().
The fix follows a proposal from Oleg Nesterov.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The vdso_per_cpu_data entry in the lowcore structure uses __u32
instead of __u64. If the data page is above 4GB the pointer is
truncated and the kernel crashes.
Reported-by: Mijo Safradin <mijo@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The constraint used for retrieving and restoring the parent function
pointer is incorrect. The parent variable is a pointer, and the
address of the pointer is modified by the asm statement and not
the pointer itself. It is incorrect to pass it in as an output
constraint since the asm will never update the pointer.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The following commit:
commit 64b3d0e812
Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Thu Dec 18 19:13:51 2008 +0000
powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED
broke setting of the _PAGE_COHERENT bit in the PPC HW PTE. Since we now
actually set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing it
out before we propogate it to the PPC HW PTE.
Reported-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] AACI: timeout will reach -1
[ARM] Storage class should be before const qualifier
[ARM] pxa: stop and disable IRQ for each DMA channels at startup
[ARM] pxa: make more SSCR0 bit definitions visible on multiple processors
[ARM] pxa: fix missing of __REG() definition for ac97 registers access
[ARM] pxa: fix NAND and MMC clock initialization for pxa3xx
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Add missing sparsemem.h include
powerpc/pci: mmap anonymous memory when legacy_mem doesn't exist
powerpc/cell: Add missing #include for oprofile
powerpc/ftrace: Fix math to calculate offset in TOC
powerpc: Don't emulate mr. instructions
powerpc/fsl-booke: Fix mapping functions to use phys_addr_t
arch/powerpc: Eliminate double sizeof
powerpc/cpm2: Fix set interrupt type
powerpc/83xx: Fix TSEC0 workability on MPC8313E-RDB boards
powerpc/83xx: Fix missing #{address,size}-cells in mpc8313erdb.dts
powerpc/83xx: Build breakage for CONFIG_PM but no CONFIG_SUSPEND
Impact: fix to prevent a kernel crash on fault
If for some reason the pointer to the parent function on the
stack takes a fault, the fix up code will not return back to
the original faulting code. This can lead to unpredictable
results and perhaps even a kernel panic.
A fault should not happen, but if it does, we should simply
disable the tracer, warn, and continue running the kernel.
It should not lead to a kernel crash.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
In i8237A_resume(), when resetting the DMA controller, the parameters to
dma_outb() were mixed up.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
[ cleaned up the file a tiny bit. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The C99 specification states in section 6.11.5:
The placement of a storage-class specifier other than at the beginning of the
declaration specifiers in a declaration is an obsolescent feature.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
arch/powerpc/platforms/pseries/hotplug-memory.c uses
remove_section_mapping() but doesn't include sparsemem.h which defines
it. This can cause compilation fails for some configs.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The new legacy_mem file in sysfs is causing problems with X on machines
that don't support legacy memory access. The way I initially implemented
it, we would fail with -ENXIO when trying to mmap it, thus exposing to
X that we do support the API but there is no legacy memory.
Unfortunately, X poor error handling is causing it to fail to start when
it gets this error.
This implements a workaround hack that instead maps anonymous memory
instead (using shmem if VM_SHARED is set, just like /dev/zero does).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
arch/powerpc/oprofile/cell/spu_profiler.c is missing a asm/time.h
include which is required for ppc_proc_freq. This can cause compile
failures for some config combinations.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: fix dynamic ftrace with large modules in PPC64
The math to calculate the offset into the TOC that is taken from reading
the trampoline is incorrect. The bottom half of the offset is a signed
extended short. The current code was using an OR to create the offset
when it should have been using an addition.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently emulate_step() emulates mr. instructions without updating cr0
and this can be disastrous. Don't emulate mr.
This bug has been around for a while, but I am not sure if its a worthy
-stable candidate. I'll leave it to Ben do decide.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fixed v_mapped_by_tlbcam() and p_mapped_by_tlbcam() to use phys_addr_t
instead of unsigned long. In 36-bit physical mode we really need these
functions to deal with phys_addr_t when trying to match a physical
address or when returning one.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
do_device_not_available() is the handler for #NM and it declares that
it takes a unsigned long and calls math_emu(), which takes a long
argument and surprisingly expects the stack frame starting at the zero
argument would match struct math_emu_info, which isn't true regardless
of configuration in the current code.
This patch makes do_device_not_available() take struct pt_regs like
other exception handlers and initialize struct math_emu_info with
pointer to it and pass pointer to the math_emu_info to math_emulate()
like normal C functions do. This way, unless gcc makes a copy of
struct pt_regs in do_device_not_available(), the register frame is
correctly accessed regardless of kernel configuration or compiler
used.
This doesn't fix all math_emu problems but it at least gets it
somewhat working.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] powernow-k8: Get transition latency from ACPI _PSS table
[CPUFREQ] Make ignore_nice_load setting of ondemand work as expected.
Architectures other than mips and x86 are not using ticket spinlocks.
Therefore, the contention on the lock is meaningless, since there is
nobody known to be waiting on it (arguably /fairly/ unfair locks).
Dummy it out to return 0 on other architectures.
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: cleanup
* Come on, struct info? s/struct info/struct math_emu_info/
* Use struct pt_regs and kernel_vm86_regs instead of defining its own
register frame structure.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: dump the correct %gs into a.out core dump
aout_dump_thread() read %gs but didn't include it in core dump. Fix
it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 6194ba6ff6 ("x86: don't special-case
pmd allocations as much") made changes to the way we handle pmd allocations,
and while doing that it dropped a call to paravirt_release_pd on the
pgd page from the pgd_dtor code path.
As a result of this missing release, the hypervisor is now unaware of the
pgd page being freed, and as a result it ends up tracking this page as a
page table page.
After this the guest may start using the same page for other purposes, and
depending on what use the page is put to, it may result in various performance
and/or functional issues ( hangs, reboots).
Since this release is only required for VMI, I now release the pgd page from
the (vmi)_pgd_free hook.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Impact: find right nr_irqs_gsi on some systems.
One test-system has gap between gsi's:
[ 0.000000] ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 4, version 0, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: IOAPIC (id[0x05] address[0xfeafd000] gsi_base[48])
[ 0.000000] IOAPIC[1]: apic_id 5, version 0, address 0xfeafd000, GSI 48-54
[ 0.000000] ACPI: IOAPIC (id[0x06] address[0xfeafc000] gsi_base[56])
[ 0.000000] IOAPIC[2]: apic_id 6, version 0, address 0xfeafc000, GSI 56-62
...
[ 0.000000] nr_irqs_gsi: 38
So nr_irqs_gsi is not right. some irq for MSI will overwrite with io_apic.
need to get that with acpi_probe_gsi when acpi io_apic is used
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For Intel 7400 series CPUs, the recommendation is to use a clflush on the
monitored address just before monitor and mwait pair [1].
This clflush makes sure that there are no false wakeups from mwait when the
monitored address was recently written to.
[1] "MONITOR/MWAIT Recommendations for Intel Xeon Processor 7400 series"
section in specification update document of 7400 series
http://download.intel.com/design/xeon/specupdt/32033601.pdf
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is based upon a report from Chris Torek and his initial patch.
From Chris's report:
--------------------
This came up in testing kgdb, using the built-in tests -- turn
on CONFIG_KGDB_TESTS, then
echo V1 > /sys/module/kgdbts/parameters/kgdbts
-- but it would affect using kgdb if you were debugging and looking
at bad pointers.
--------------------
When we get a copy_{from,to}_user() request and the %asi is set to
something other than ASI_AIUS (which is userspace) then we branch off
to a routine called memcpy_user_stub(). It just does a straight
memcpy since we are copying from kernel to kernel in this case.
The logic was that since source and destination are both kernel
pointers we don't need to have exception checks.
But for what probe_kernel_{read,write}() is trying to do, we have to
have the checks, otherwise things like kgdb bad kernel pointer
accesses don't do the right thing.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is an implementation of a suggestion made by Chris Torek:
--------------------
Something else I noticed in passing: the EX and EX_LD/EX_ST macros
scattered throughout the various .S files make a fair bit of .fixup
code, all of which does the same thing. At the cost of one symbol
in copy_in_user.S, you could just have one common two-instruction
retl-and-mov-1 fixup that they all share.
--------------------
The following is with a defconfig build:
text data bss dec hex filename
3972767 344024 584449 4901240 4ac978 vmlinux.orig
3968887 344024 584449 4897360 4aba50 vmlinux
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI PM: make the PM core more careful with drivers using the new PM framework
PCI PM: Read power state from device after trying to change it on resume
PCI PM: Do not disable and enable bridges during suspend-resume
PCI: PCIe portdrv: Simplify suspend and resume
PCI PM: Fix saving of device state in pci_legacy_suspend
PCI PM: Check if the state has been saved before trying to restore it
PCI PM: Fix handling of devices without drivers
PCI: return error on failure to read PCI ROMs
PCI: properly clean up ASPM link state on device remove
One of my past fixes to this code introduced a different new bug.
When using 32-bit "int $0x80" entry for a bogus syscall number,
the return value is not correctly set to -ENOSYS. This only happens
when neither syscall-audit nor syscall tracing is enabled (i.e., never
seen if auditd ever started). Test program:
/* gcc -o int80-badsys -m32 -g int80-badsys.c
Run on x86-64 kernel.
Note to reproduce the bug you need auditd never to have started. */
#include <errno.h>
#include <stdio.h>
int
main (void)
{
long res;
asm ("int $0x80" : "=a" (res) : "0" (99999));
printf ("bad syscall returns %ld\n", res);
return res != -ENOSYS;
}
The fix makes the int $0x80 path match the sysenter and syscall paths.
Reported-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Roland McGrath <roland@redhat.com>
the FPGA on the TS-7800 provides access to a number of devices
and so we have to be careful when reprogramming it. As we
are effectively turning a bus off/on we have to inform the
kernel that it should stop using anything provided by the
FPGA (currently only the RTC however the NAND, LCD, etc is
to come) before it's reprogrammed.
Once reprogramed, we can tell the kernel to (re)enable things
by checking the FPGA ID against a lookup table for what a
particular FPGA bitstream can provide.
Signed-off-by: Alexander Clouter <alex@digriz.org.uk>
The TS-7800's M25P40 is not available to the kernel, it's used
to load the initial bitstream onto the FPGA and so these hooks
point to nothing and need to be removed.
Signed-off-by: Alexander Clouter <alex@digriz.org.uk>
Taking sizeof the result of sizeof is quite strange and does not seem to be
what is wanted here.
This was fixed using the following semantic patch.
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
expression E;
@@
- sizeof (
sizeof (E)
- )
@@
type T;
@@
- sizeof (
sizeof (T)
- )
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This is a simple change to correct problems when using set_irq_type
on platforms using CPM2. This code corrects the problem on most platform
but may have issues on 8272 derived platforms for some interrupts.
On 8272 PC2 & 3 are missing and PC 23 & 29 are added, which this patch
does not address.
Signed-off-by: Paul Bilke <paul@conspiracy.net>
Reviewed-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
TSEC0 is connected to Vitesse 7385 5-port switch. The switch
isn't connected to any mdio bus, the link to the switch is fixed
to Full-duplex 1000 Mb/s (no pause).
This patch fixes following failure during bootup:
mdio@24520:01 not found
eth0: Could not attach to PHY
IP-Config: Failed to open eth0
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
commit b31a1d8b41 ("gianfar: Convert
gianfar to an of_platform_driver") introduced a child node for
the ethernet@25000 controller, but no address and size cells
specifiers were added, and that makes dtc unhappy:
DTC: dts->dtb on file "arch/powerpc/boot/dts/mpc8313erdb.dts"
Warning (reg_format): "reg" property in /soc8313@e0000000/ethernet@25000/mdio@25520 has invalid length (8 bytes) (#address-cells == 2, #size-cells == 1)
Warning (avoid_default_addr_size): Relying on default #address-cells value for /soc8313@e0000000/ethernet@25000/mdio@25520
Warning (avoid_default_addr_size): Relying on default #size-cells value for /soc8313@e0000000/ethernet@25000/mdio@25520
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
I noticed this doing some randconfig testing (.config below). I have
CONFIG_PM but no CONFIG_SUSPEND. Bug is against mainline.
arch/powerpc/sysdev/built-in.o: In function `ipic_suspend':
ipic.c:(.text+0x6b34): undefined reference to `fsl_deep_sleep'
make[1]: *** [.tmp_vmlinux1] Error 1
make: *** [sub-make] Error 2
Looks like #ifdef CONFIG_PM in arch/powerpc/sysdev/ipic.c should be
CONFIG_SUSPEND. d49747bdfb introduced
this.
Fix build when we have CONFIG_PM but no CONFIG_SUSPEND.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Implement Linus's suggestion: introduce the hpet_cnt_ahead()
helper function to compare hpet time values - like other
wrapping counter comparisons are abstracted away elsewhere.
(jiffies, ktime_t, etc.)
Reported-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
They can't be used for profiling and NMI watchdog currently
since they lack the counter overflow interrupt.
Signed-off-by: David S. Miller <davem@davemloft.net>
Prevent kprobes from catching spurious faults which will cause infinite
recursive page-fault and memory corruption by stack overflow.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: <stable@kernel.org> [2.6.28.x]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'sh/for-2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh: Fix up T-bit error handling in SH-4A mutex fastpath.
sh: Fix up spurious syscall restarting.
sh: fcnvds fix with denormalized numbers on SH-4 FPU.
sh: Only reserve memory under CONFIG_ZERO_PAGE_OFFSET when it != 0.
sh: Handle calling csum_partial with misaligned data
sh: ap325rxa: Enable ov772x in defconfig.
sh: ap325rxa: Add ov772x support.
sh: ap325rxa: control camera power toggling.
sh: mach-migor: Enable ov772x and tw9910 in defconfig.
Do usual do {} while (0) dance, otherwise
fs/gfs2/util.c:99: error: expected expression before 'else'
drivers/scsi/lpfc/lpfc_sli.c:363: error: expected expression before 'else'
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
At this time, the PowerNow! driver for K8 uses an experimentally
derived formula to calculate transition latency. The value it
provides is orders of magnitude too large on modern systems.
This patch replaces the formula with ACPI _PSS latency values
for more accuracy and better performance.
I've tested it on two 2nd generation Opteron systems, a 3rd
generation Operton system, and a Turion X2 without seeing any
stability problems.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
Due to recurring issues with DMAR support on certain platforms.
There's a number of filesystem corruption incidents reported:
https://bugzilla.redhat.com/show_bug.cgi?id=479996http://bugzilla.kernel.org/show_bug.cgi?id=12578
Provide a Kconfig option to change whether it is enabled by
default.
If disabled, it can still be reenabled by passing intel_iommu=on to the
kernel. Keep the .config option off by default.
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-By: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On an x86 system which doesn't support global mappings,
__supported_pte_mask has _PAGE_GLOBAL clear, to make sure it never
appears in the PTE. pfn_pte() and so on will enforce it with:
static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
{
return __pte((((phys_addr_t)page_nr << PAGE_SHIFT) |
pgprot_val(pgprot)) & __supported_pte_mask);
}
However, we overload _PAGE_GLOBAL with _PAGE_PROTNONE on non-present
ptes to distinguish them from swap entries. However, applying
__supported_pte_mask indiscriminately will clear the bit and corrupt the
pte.
I guess the best fix is to only apply __supported_pte_mask to present
ptes. This seems like the right solution to me, as it means we can
completely ignore the issue of overlaps between the present pte bits and
the non-present pte-as-swap entry use of the bits.
__supported_pte_mask contains the set of flags we support on the
current hardware. We also use bits in the pte for things like
logically present ptes with no permissions, and swap entries for
swapped out pages. We should only apply __supported_pte_mask to
present ptes, because otherwise we may destroy other information being
stored in the ptes.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This patch makes the ROM reading code return an error to user space if
the size of the ROM read is equal to 0.
The patch also emits a warnings if the contents of the ROM are invalid,
and documents the effects of the "enable" file on ROM reading.
Signed-off-by: Timothy S. Nelson <wayland@wayland.id.au>
Acked-by: Alex Villacis-Lasso <a_villacis@palosanto.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
There's a small problem with hpet_rtc_reinit function - it checks
for the:
hpet_readl(HPET_COUNTER) - hpet_t1_cmp > 0
to continue increasing both the HPET_T1_CMP (register) and the
hpet_t1_cmp (variable).
But since the HPET_COUNTER is always 32-bit, if the hpet_t1_cmp
is 64-bit this condition will always be FALSE once the latter hits
the 32-bit boundary, and we can have a situation, when we don't
increase the HPET_T1_CMP register high enough.
The result - timer stops ticking, since HPET_T1_CMP becomes less,
than the COUNTER and never increased again.
The solution is (based on Linus's suggestion) to not compare 64-bits
(on 64-bit x86), but to do the comparison on 32-bit signed
integers.
Reported-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: APIC: enable workaround on AMD Fam10h CPUs
xen: disable interrupts before saving in percpu
x86: add x86@kernel.org to MAINTAINERS
x86: push old stack address on irqstack for unwinder
irq, x86: fix lock status with numa_migrate_irq_desc
x86: add cache descriptors for Intel Core i7
x86/Voyager: make it build and boot
Impact: cleanup
Some lines exceed the 80 char width making them unreadable.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch echoes what we already do on 32-bit since
90f7d25c6b, and prints the DMI
product name in show_regs, so that system specific problems can be
easily identified.
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6: (40 commits)
Blackfin arch: Remove outdated code
Blackfin arch: Fix udelay implementation
Blackfin arch: Update Copyright information
Blackfin arch: Add BF561 PPI POLS, POLC Masks
Blackfin arch: Update CM-BF527 kernel config
Blackfin arch: define bfin_memmap as static since it is only used here
Blackfin arch: cplb mananger: use a do...while loop rather than a for loop
Blackfin arch: fix bug - traps test case 19 for exception 0x2d fails
Blackfin arch: add platform device bfin_mii-bus and KSZ8893M switch driver platform resources to board files
Blackfin arch: build jtag tty driver as a module by default
Blackfin arch: fix 2 bugs related to debug
Blackfin arch: Add ANOMALY_05000380 to BF54x to kill the compile warning
Blackfin arch: Fix bug - 561 SMP kernel can't boot from jffs2
Blackfin arch: base SIC_IWR# programming on whether the MMR exists
Blackfin arch: read SYSCR on newer parts that mirror the bits of SWRST in it
Blackfin arch: fixup board init function name
Blackfin arch: drop CONFIG_I2C_BOARDINFO ifdefs
Blackfin arch: bfin_reset->_bfin_reset redirection no longer needed
Blackfin arch: sync reboot handler with version in u-boot
Blackfin arch: Faster Implementation of csum_tcpudp_nofold()
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: Kill bogus TPC/address truncation during 32-bit faults.
sparc: fixup for sparseirq changes
sparc64: Validate kernel generated fault addresses on sparc64.
sparc64: On non-Niagara, need to touch NMI watchdog in NOHZ mode.
sparc64: Implement NMI watchdog on capable cpus.
sparc: Probe PMU type and record in sparc_pmu_type.
sparc64: Move generic PCR support code to seperate file.
The removed version with the loop registers saved on the stack was
originally intended to workaround the missing toolchain support for
LoopReg Clobbers.
Since our toolchain now supports these there is no point in keeping this
workaround. And since we don't touch LoopRegs anymore we're no longer
subject for ANOMALY_05000312.
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Avoid possible overflow during 32*32->32 multiplies.
Reported-by: Marco Reppenhagen <marco.reppenhagen@auerswald.de>
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
use a do...while loop rather than a for loop to get slightly better
optimization and to avoid gcc "may be used uninitialized" warnings ...
we know that the [id]cplb_nr_bounds variables will never be 0, so this
is OK
Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Enable null pointer checking for ICPLBs. The code was there but for
some reason I had commented it out at some stage during development.
Should restrict this to 1K since atomic ops start there.
Signed-off-by: Bernd Schmidt <bernds_cb1@t-online.de>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
- unable to single step over emuexcpt instruction
- gdbproxy goes into infinite loop when doing gdb does "next" over
"emuexcpt"
Don't decrement PC after software breakpoint.
Signed-off-by: Jie Zhang <jie.zhang@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
bss_l2 section is garbage when the data in this section is used by
_bfin_relocate_l1_mem, so move the zero out function ahead.
Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
base SIC_IWR# programming on whether the MMR exists
rather than having to maintain another list of processors
Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Drop CONFIG_I2C_BOARDINFO ifdefs as the common i2c header handles this
already by stubbing things out
Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>