Impact: stack protector for x86_32
Implement stack protector for x86_32. GDT entry 28 is used for it.
It's set to point to stack_canary-20 and have the length of 24 bytes.
CONFIG_CC_STACKPROTECTOR turns off CONFIG_X86_32_LAZY_GS and sets %gs
to the stack canary segment on entry. As %gs is otherwise unused by
the kernel, the canary can be anywhere. It's defined as a percpu
variable.
x86_32 exception handlers take register frame on stack directly as
struct pt_regs. With -fstack-protector turned on, gcc copies the
whole structure after the stack canary and (of course) doesn't copy
back on return thus losing all changed. For now, -fno-stack-protector
is added to all files which contain those functions. We definitely
need something better.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: pt_regs changed, lazy gs handling made optional, add slight
overhead to SAVE_ALL, simplifies error_code path a bit
On x86_32, %gs hasn't been used by kernel and handled lazily. pt_regs
doesn't have place for it and gs is saved/loaded only when necessary.
In preparation for stack protector support, this patch makes lazy %gs
handling optional by doing the followings.
* Add CONFIG_X86_32_LAZY_GS and place for gs in pt_regs.
* Save and restore %gs along with other registers in entry_32.S unless
LAZY_GS. Note that this unfortunately adds "pushl $0" on SAVE_ALL
even when LAZY_GS. However, it adds no overhead to common exit path
and simplifies entry path with error code.
* Define different user_gs accessors depending on LAZY_GS and add
lazy_save_gs() and lazy_load_gs() which are noop if !LAZY_GS. The
lazy_*_gs() ops are used to save, load and clear %gs lazily.
* Define ELF_CORE_COPY_KERNEL_REGS() which always read %gs directly.
xen and lguest changes need to be verified.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
On x86_32, %gs is handled lazily. It's not saved and restored on
kernel entry/exit but only when necessary which usually is during task
switch but there are few other places. Currently, it's done by
calling savesegment() and loadsegment() explicitly. Define
get_user_gs(), set_user_gs() and task_user_gs() and use them instead.
While at it, clean up register access macros in signal.c.
This cleans up code a bit and will help future changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Use .macro instead of cpp #define where approriate. This cleans up
code and will ease future changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
do_device_not_available() is the handler for #NM and it declares that
it takes a unsigned long and calls math_emu(), which takes a long
argument and surprisingly expects the stack frame starting at the zero
argument would match struct math_emu_info, which isn't true regardless
of configuration in the current code.
This patch makes do_device_not_available() take struct pt_regs like
other exception handlers and initialize struct math_emu_info with
pointer to it and pass pointer to the math_emu_info to math_emulate()
like normal C functions do. This way, unless gcc makes a copy of
struct pt_regs in do_device_not_available(), the register frame is
correctly accessed regardless of kernel configuration or compiler
used.
This doesn't fix all math_emu problems but it at least gets it
somewhat working.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Unstatic ioapic_write_entry and setup_ioapic_entry functions so that
the Xen code can do its own ioapic routing setup.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Add mp_find_ioapic_pin() to find an IO APIC's specific pin from a GSI,
and use this function within acpi/boot. Make it non-static so other
code can use it too.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] powernow-k8: Get transition latency from ACPI _PSS table
[CPUFREQ] Make ignore_nice_load setting of ondemand work as expected.
to prevent wrongly overwriting fixmap that still want to use.
ACPI used to rely on low mappings being all linearly mapped and
grew a habit: it never really unmapped certain kinds of tables
after use.
This can cause problems - for example the hypothetical case
when some spurious access still references it.
v2: remove prev_map and prev_size in __apci_map_table
v3: let acpi_os_unmap_memory() call early_iounmap too, so remove extral calling to
early_acpi_os_unmap_memory
v4: fix typo in one acpi_get_table_with_size calling
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On x86, __acpi_map_table uses early_ioremap() to create the mapping,
replacing the previous mapping with a new one. Once enough of the
kernel is up an running it switches to using normal ioremap(). At
that point, we need to clean up the final mapping to avoid a warning
from the early_ioremap subsystem.
This can be removed after all the instances in the ACPI code are fixed
that rely on early-ioremap's implicit overmapping of previously
mapped tables.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Always map acpi tables, rather than assuming we can use the normal
linear mapping to access the acpi tables. This is necessary in a
virtual environment where the linear mappings are to pseudo-physical
memory, but the acpi tables exist at a real physical address. It
doesn't hurt to map in the normal non-virtual case, so just do it
unconditionally.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
__acpi_map_table() effectively reimplements early_ioremap(). Rather
than have that duplication, just implement it in terms of
early_ioremap().
However, unlike early_ioremap(), __acpi_map_table() just maintains a
single mapping which gets replaced each call, and has no corresponding
unmap function. Implement this by just removing the previous mapping
each time its called. Unfortunately, this will leave a stray mapping
at the end.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 6194ba6ff6 ("x86: don't special-case
pmd allocations as much") made changes to the way we handle pmd allocations,
and while doing that it dropped a call to paravirt_release_pd on the
pgd page from the pgd_dtor code path.
As a result of this missing release, the hypervisor is now unaware of the
pgd page being freed, and as a result it ends up tracking this page as a
page table page.
After this the guest may start using the same page for other purposes, and
depending on what use the page is put to, it may result in various performance
and/or functional issues ( hangs, reboots).
Since this release is only required for VMI, I now release the pgd page from
the (vmi)_pgd_free hook.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Impact: find right nr_irqs_gsi on some systems.
One test-system has gap between gsi's:
[ 0.000000] ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 4, version 0, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: IOAPIC (id[0x05] address[0xfeafd000] gsi_base[48])
[ 0.000000] IOAPIC[1]: apic_id 5, version 0, address 0xfeafd000, GSI 48-54
[ 0.000000] ACPI: IOAPIC (id[0x06] address[0xfeafc000] gsi_base[56])
[ 0.000000] IOAPIC[2]: apic_id 6, version 0, address 0xfeafc000, GSI 56-62
...
[ 0.000000] nr_irqs_gsi: 38
So nr_irqs_gsi is not right. some irq for MSI will overwrite with io_apic.
need to get that with acpi_probe_gsi when acpi io_apic is used
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With the differences in interrupt handling hoisted into handle_irq(),
do_IRQ is more or less identical between 32 and 64 bit, so unify it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Xen uses a different interrupt path, so introduce handle_irq() to
allow interrupts to be inserted into the normal interrupt path. This
is handled slightly differently on 32 and 64-bit.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/early_printk.c: In function ‘early_dbgp_init’:
arch/x86/kernel/early_printk.c:827: error: ‘PAGE_KERNEL_NOCACHE’ undeclared (first use in this function)
arch/x86/kernel/early_printk.c:827: error: (Each undeclared identifier is reported only once
arch/x86/kernel/early_printk.c:827: error: for each function it appears in.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For Intel 7400 series CPUs, the recommendation is to use a clflush on the
monitored address just before monitor and mwait pair [1].
This clflush makes sure that there are no false wakeups from mwait when the
monitored address was recently written to.
[1] "MONITOR/MWAIT Recommendations for Intel Xeon Processor 7400 series"
section in specification update document of 7400 series
http://download.intel.com/design/xeon/specupdt/32033601.pdf
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When the function graph tracer picks a return address, it ensures this address
is really a kernel text one by calling __kernel_text_address()
Actually this path has never been taken.Its role was more likely to debug the tracer
on the beginning of its development but this function is wasteful since it is called
for every traced function.
The fault check is already sufficient.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Move the power tracer headers to trace/power.h to keep ftrace.h and power bits
more easy to maintain as separated topics.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup and bug fix
Use the linker to create symbols for certain per-cpu variables
that are offset by __per_cpu_load. This allows the removal of
the runtime fixup of the GDT pointer, which fixes a bug with
resume reported by Jiri Slaby.
Reported-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Without frame pointers enabled, the x86 stack traces should not
pretend to be reliable; instead they should just be what they are:
unreliable.
The effect of this is that they have a '?' printed in the stacktrace,
to warn the reader that these entries are guesses rather than known
based on more reliable information.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: find right nr_irqs_gsi on some systems.
One test-system has gap between gsi's:
[ 0.000000] ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 4, version 0, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: IOAPIC (id[0x05] address[0xfeafd000] gsi_base[48])
[ 0.000000] IOAPIC[1]: apic_id 5, version 0, address 0xfeafd000, GSI 48-54
[ 0.000000] ACPI: IOAPIC (id[0x06] address[0xfeafc000] gsi_base[56])
[ 0.000000] IOAPIC[2]: apic_id 6, version 0, address 0xfeafc000, GSI 56-62
...
[ 0.000000] nr_irqs_gsi: 38
So nr_irqs_gsi is not right. some irq for MSI will overwrite with io_apic.
need to get that with acpi_probe_gsi when acpi io_apic is used
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: make check-timer more robust potentially solve boot fragility
For edge trigger io-apic routing, we already unmasked the pin via
setup_IO_APIC_irq(), so don't unmask it again.
Also call local_irq_disable() between timer_irq_works(), because it
calls local_irq_enable() inside.
Also remove not needed apic version reading for 64-bit
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: make nr_irqs depend more on cards used in a system
depend on nr_irq_gsi more, and have a ratio for MSI.
v2: make nr_irqs less than NR_VECTORS * nr_cpu_ids
aka if only one cpu, we only can support nr_irqs = NR_VECTORS
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: clean up
Now that a generic in_nmi is available, this patch removes the
special code in the ring_buffer and implements the in_nmi generic
version instead.
With this change, I was also able to rename the "arch_ftrace_nmi_enter"
back to "ftrace_nmi_enter" and remove the code from the ring buffer.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
The function graph tracer piggy backed onto the dynamic ftracer
to use the in_nmi custom code for dynamic tracing. The problem
was (as Andrew Morton pointed out) it really only wanted to bail
out if the context of the current CPU was in NMI context. But the
dynamic ftrace in_nmi custom code was true if _any_ CPU happened
to be in NMI context.
Now that we have a generic in_nmi interface, this patch changes
the function graph code to use it instead of the dynamic ftarce
custom code.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: clean up
The in_nmi variable in x86 arch ftrace.c is a misnomer.
Andrew Morton pointed out that the in_nmi variable is incremented
by all CPUS. It can be set when another CPU is running an NMI.
Since this is actually intentional, the fix is to rename it to
what it really is: "nmi_running"
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: prevent deadlock in NMI
The ring buffers are not yet totally lockless with writing to
the buffer. When a writer crosses a page, it grabs a per cpu spinlock
to protect against a reader. The spinlocks taken by a writer are not
to protect against other writers, since a writer can only write to
its own per cpu buffer. The spinlocks protect against readers that
can touch any cpu buffer. The writers are made to be reentrant
with the spinlocks disabling interrupts.
The problem arises when an NMI writes to the buffer, and that write
crosses a page boundary. If it grabs a spinlock, it can be racing
with another writer (since disabling interrupts does not protect
against NMIs) or with a reader on the same CPU. Luckily, most of the
users are not reentrant and protects against this issue. But if a
user of the ring buffer becomes reentrant (which is what the ring
buffers do allow), if the NMI also writes to the ring buffer then
we risk the chance of a deadlock.
This patch moves the ftrace_nmi_enter called by nmi_enter() to the
ring buffer code. It replaces the current ftrace_nmi_enter that is
used by arch specific code to arch_ftrace_nmi_enter and updates
the Kconfig to handle it.
When an NMI is called, it will set a per cpu variable in the ring buffer
code and will clear it when the NMI exits. If a write to the ring buffer
crosses page boundaries inside an NMI, a trylock is used on the spin
lock instead. If the spinlock fails to be acquired, then the entry
is discarded.
This bug appeared in the ftrace work in the RT tree, where event tracing
is reentrant. This workaround solved the deadlocks that appeared there.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Fix compile problem:
CC arch/x86/kernel/early_printk.o
In file included from /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/early_printk.c:17:
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h: In function 'pmd_page':
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h:516: error: implicit declaration of function '__pfn_to_section'
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h:516: warning: initialization makes pointer from integer without a cast
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h:516: error: implicit declaration of function '__section_mem_map_addr'
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h:516: warning: return makes pointer from integer without a cast
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h: In function 'pud_page':
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h:586: warning: initialization makes pointer from integer without a cast
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h:586: warning: return makes pointer from integer without a cast
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h: In function 'pgd_page':
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h:625: warning: initialization makes pointer from integer without a cast
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h:625: warning: return makes pointer from integer without a cast
This is a cycling dependency between asm/pgtable.h and linux/mmzone.h
when using CONFIG_SPARSEMEM. Rather than hacking up the headers some
more, remove asm/pgtable.h, since early_printk.c doesn't actually need
it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Implement Linus's suggestion: introduce the hpet_cnt_ahead()
helper function to compare hpet time values - like other
wrapping counter comparisons are abstracted away elsewhere.
(jiffies, ktime_t, etc.)
Reported-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
only leave _default_ipi_xx etc in .h
Beyond the cleanup factor, this saves a bit of code size as well:
text data bss dec hex filename
7281931 1630144 1463304 10375379 9e50d3 vmlinux.before
7281753 1630144 1463304 10375201 9e5021 vmlinux.after
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
disable_ioapic_setup() in init/main.c is ugly as the function is
x86-specific. The #ifdef inline prototype there is ugly too.
Replace it with a generic arch_disable_smp_support() function - which
has a weak alias for non-x86 architectures and for non-ioapic x86 builds.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
At this time, the PowerNow! driver for K8 uses an experimentally
derived formula to calculate transition latency. The value it
provides is orders of magnitude too large on modern systems.
This patch replaces the formula with ACPI _PSS latency values
for more accuracy and better performance.
I've tested it on two 2nd generation Opteron systems, a 3rd
generation Operton system, and a Turion X2 without seeing any
stability problems.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
There's a small problem with hpet_rtc_reinit function - it checks
for the:
hpet_readl(HPET_COUNTER) - hpet_t1_cmp > 0
to continue increasing both the HPET_T1_CMP (register) and the
hpet_t1_cmp (variable).
But since the HPET_COUNTER is always 32-bit, if the hpet_t1_cmp
is 64-bit this condition will always be FALSE once the latter hits
the 32-bit boundary, and we can have a situation, when we don't
increase the HPET_T1_CMP register high enough.
The result - timer stops ticking, since HPET_T1_CMP becomes less,
than the COUNTER and never increased again.
The solution is (based on Linus's suggestion) to not compare 64-bits
(on 64-bit x86), but to do the comparison on 32-bit signed
integers.
Reported-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch echoes what we already do on 32-bit since
90f7d25c6b, and prints the DMI
product name in show_regs, so that system specific problems can be
easily identified.
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
They were long enough set deprecated...
Update Documentation/cpu-freq/users-guide.txt:
The deprecated files listed there seen not to exist for some time anymore
already.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Impact: reduce kernel BSS size by 7 pages, improve code readability
Two page tables are used in current x86_64 kexec implementation. One
is used to jump from kernel virtual address to identity map address,
the other is used to map all physical memory. In fact, on x86_64,
there is no conflict between kernel virtual address space and physical
memory space, so just one page table is sufficient. The page table
pages used to map control page are dynamically allocated to save
memory if kexec image is not loaded. ASM code used to map control page
is replaced by C code too.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: fix to enable APIC for AMD Fam10h on chipsets with a missing/b0rked
ACPI MP table (MADT)
Booting a 32bit kernel on an AMD Fam10h CPU running on chipsets with
missing/b0rked MP table leads to a hang pretty early in the boot process
due to the APIC not being initialized. Fix that by falling back to the
default APIC base address in 32bit code, as it is done in the 64bit
codepath.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: Fixes dumpstack and KDB on 64 bits
This re-adds the old stack pointer to the top of the irqstack to help
with unwinding. It was removed in commit d99015b1ab
as part of the save_args out-of-line work.
Both dumpstack and KDB require this information.
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Zach says:
> Enable/Disable have no clobbers at all.
> Save clobbers only return value, %eax
> Restore also clobbers nothing.
This is precisely compatible with the calling convention, so we can
just call them directly without wrapping.
(Compile tested only.)
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Eric Paris reported:
> I have an hp dl785g5 which is unable to successfully run
> 2.6.29-0.66.rc3.fc11.x86_64 or 2.6.29-rc2-next-20090126. During bootup
> (early in userspace daemons starting) I get the below BUG, which quickly
> renders the machine dead. I assume it is because sparse_irq_lock never
> gets released when the BUG kills that task.
Adjust lock sequence when migrating a descriptor with
CONFIG_NUMA_MIGRATE_IRQ_DESC enabled.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, ds, bts: cleanup/fix DS configuration
ring-buffer: reset timestamps when ring buffer is reset
trace: set max latency variable to zero on default
trace: stop all recording to ring buffer on ftrace_dump
trace: print ftrace_dump at KERN_EMERG log level
ring_buffer: reset write when reserve buffer fail
tracing/function-graph-tracer: fix a regression while suspend to disk
ring-buffer: fix alignment problem
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86 setup: fix asm constraints in vesa_store_edid
xen: make sysfs files behave as their names suggest
x86: tone down mtrr_trim_uncached_memory() warning
x86: correct the CPUID pattern for MSR_IA32_MISC_ENABLE availability
[
mingo@elte.hu: these fixes are a subset of changes cherry-picked from:
git://git.kernel.org:/pub/scm/linux/kernel/git/jejb/voyager-2.6.git
They fix various problems that recent x86 changes caused in the Voyager
subarchitecture: both APIC changes and cpumask changes and certain
cleanups caused subarch assumptions to break.
Most of these changes are obsolete as the subarch code has been removed
from the x86 development tree - but we merge them upstream to make Voyager
build and boot.
]
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: split out a function, no functional change
Xen needs to be able to access percpu data from very early on. For
various reasons, it cannot also load the gdt at that time. It does,
however, have a pefectly functional gdt at that point, so there's no
pressing need to reload the gdt.
Split the function to load the segment registers off, so Xen can call
it directly.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup, prepare for xen boot fix.
Xen needs to call this function very early to setup the GDT and
per-cpu segments. Remove the call to smp_processor_id() and just
pass in the cpu number.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: fix possible tlb mis-flushing on UV
uv_flush_send_and_wait() should return a pointer if the broadcast
remote tlb shootdown requests fail. That causes the conventional IPI
method of shootdown to be used.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: Optimization
In the native case, pte_val, make_pte, etc are all just identity
functions, so there's no need to clobber a lot of registers over them.
(This changes the 32-bit callee-save calling convention to return both
EAX and EDX so functions can return 64-bit values.)
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: Optimization
One of the problems with inserting a pile of C calls where previously
there were none is that the register pressure is greatly increased.
The C calling convention says that the caller must expect a certain
set of registers may be trashed by the callee, and that the callee can
use those registers without restriction. This includes the function
argument registers, and several others.
This patch seeks to alleviate this pressure by introducing wrapper
thunks that will do the register saving/restoring, so that the
callsite doesn't need to worry about it, but the callee function can
be conventional compiler-generated code. In many cases (particularly
performance-sensitive cases) the callee will be in assembler anyway,
and need not use the compiler's calling convention.
Standard calling convention is:
arguments return scratch
x86-32 eax edx ecx eax ?
x86-64 rdi rsi rdx rcx rax r8 r9 r10 r11
The thunk preserves all argument and scratch registers. The return
register is not preserved, and is available as a scratch register for
unwrapped callee code (and of course the return value).
Wrapped function pointers are themselves wrapped in a struct
paravirt_callee_save structure, in order to get some warning from the
compiler when functions with mismatched calling conventions are used.
The most common paravirt ops, both statically and dynamically, are
interrupt enable/disable/save/restore, so handle them first. This is
particularly easy since their calls are handled specially anyway.
XXX Deal with VMI. What's their calling convention?
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: Fix latent bug
The clobber is trying to say that anything except RDI is available for
clobbering, but actually clobbers everything. This hasn't mattered
because the clobbers were basically ignored, but subsequent patches
will rely on them.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: Optimization
Several paravirt ops implementations simply return their arguments,
the most obvious being the make_pte/pte_val class of operations on
native.
On 32-bit, the identity function is literally a no-op, as the calling
convention uses the same registers for the first argument and return.
On 64-bit, it can be implemented with a single "mov".
This patch adds special identity functions for 32 and 64 bit argument,
and machinery to recognize them and replace them with either nops or a
mov as appropriate.
At the moment, the only users for the identity functions are the
pagetable entry conversion functions.
The result is a measureable improvement on pagetable-heavy benchmarks
(2-3%, reducing the pvops overhead from 5 to 2%).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: fix rare crash on 32-bit
The 32-bit APIC drivers had their send_IPI_self vectors set to NULL,
but ioapic_retrigger_irq() depends on it being always set. Fix it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
just like 64 bit switch from flat logical APIC messages to
flat physical mode automatically.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: 32-bit should use logical version
there are two version: for default_send_IPI_mask_sequence/allbutself
one in ipi.h and one in ipi.c for 32bit
it seems .h version overwrote ipi.c for a while.
restore it so 32 bit could use its old logical version.
also remove dupicated functions in .c
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move DMA-mapping.txt to Documentation/PCI/.
DMA-mapping.txt was supposed to be moved from Documentation/ to
Documentation/PCI/. The 00-INDEX files in those two directories
were updated, along with a few other text files, but the file
itself somehow escaped being moved, so move it and update more
text files and source files with its new location.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: Cleans up printk formatting
When LOCAL APIC was calibrated, the debug message is displayed as follows.
CPU0: Intel(R) Xeon(R) CPU 5110 @ 1.60GHz stepping 06
Using local APIC timer interrupts.
calibrating APIC timer ...
... lapic delta = 3773131
... PM timer delta = 812434
APIC calibration not consistent with PM Timer: 226ms instead of 100ms
APIC delta adjusted to PM-Timer: 1662420 (3773131)
TSC delta adjusted to PM-Timer: 159592409 (362220564)
..... delta 1662420
..... mult: 71411249
..... calibration result: 265987
..... CPU clock speed is 1595.0924 MHz.
..... host bus clock speed is 265.0987 MHz.
There are three type of PM-Timer (PM-Timer, PM Timer, and PM timer),
in this message. This patch unifies those messages to PM-Timer.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: Fixes incorrect printk
LOCAL APIC is corrected by PM-Timer, when SMI occurred while LOCAL APIC is calibrated.
In this case, LOCAL APIC debug message(Boot with apic=debug) is displayed correctly,
however, CPU clock speed debug message is displayed wrongly .
When SMI occured on my machine, which has 1.6GHz CPU, CPU clock speed is displayed
3622.0205 MHz as follow.
CPU0: Intel(R) Xeon(R) CPU 5110 @ 1.60GHz stepping 06
Using local APIC timer interrupts.
calibrating APIC timer ...
... lapic delta = 3773130
... PM timer delta = 812434
APIC calibration not consistent with PM Timer: 226ms instead of 100ms
APIC delta adjusted to PM-Timer: 1662420 (3773130)
..... delta 1662420
..... mult: 71411249
..... calibration result: 265987
..... CPU clock speed is 3622.0205 MHz. =====> here
..... host bus clock speed is 265.0987 MHz.
This patch fixes to displaying CPU clock speed correctly as follow.
CPU0: Intel(R) Xeon(R) CPU 5110 @ 1.60GHz stepping 06
Using local APIC timer interrupts.
calibrating APIC timer ...
... lapic delta = 3773131
... PM timer delta = 812434
APIC calibration not consistent with PM Timer: 226ms instead of 100ms
APIC delta adjusted to PM-Timer: 1662420 (3773131)
TSC delta adjusted to PM-Timer: 159592409 (362220564)
..... delta 1662420
..... mult: 71411249
..... calibration result: 265987
..... CPU clock speed is 1595.0924 MHz.
..... host bus clock speed is 265.0987 MHz.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
X86_PC is the only remaining 'sub' architecture, so we dont need
it anymore.
This also cleans up a few spurious references to X86_PC in the
driver space - those certainly should be X86.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
xapic fix for 32bit platform with less than 8 cpu's.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
We may use macros from processor-flags.h instead
of hardcoding bits. Actually it's not direct mapping
of old instructions with new ones -- BTS does change
CF flag while MOV does not. But i didn't find any
dependency on CF in this code.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
X86_GENERICARCH is a misnomer - it contains non-PC 32-bit architectures
that are not included in the default build.
Rename it to X86_32_NON_STANDARD.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86/Voyager had this Kconfig quirk:
config X86_FIND_SMP_CONFIG
def_bool y
depends on X86_MPPARSE || X86_VOYAGER
Which splits off the find_smp_config() callback into a build-time quirk.
Voyager should use the existing x86_quirks.mach_find_smp_config() callback
to introduce SMP-config quirks. NUMAQ-32 and VISWS already use this.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Voyager has this Kconfig quirk:
config X86_BIOS_REBOOT
bool
depends on !X86_VOYAGER
default y
Voyager should use the existing machine_ops.emergency_restart reboot
quirk mechanism instead of a build-time quirk.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86/Voyager can boot on non-zero processors. While that can probably
be fixed by properly remapping the physical CPU IDs, keep boot_cpu_id
for now for easier transition - and expand it to all of x86.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The x86/Voyager subarch used to have this distinction between
'x86 SMP support' and 'Voyager SMP support':
config X86_SMP
bool
depends on SMP && ((X86_32 && !X86_VOYAGER) || X86_64)
This is a pointless distinction - Voyager can (and already does) use
smp_ops to implement various SMP quirks it has - and it can be extended
more to cover all the specialities of Voyager.
So remove this complication in the Kconfig space.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the 32-bit subarchitecture support code.
All subarchitectures but Voyager have been converted. Voyager will be
done later or will be removed.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We are getting rid of subarchitecture support - move the hook files
to asm/. (These are now stale and should be replaced with more explicit
runtime mechanisms - but the transition is simpler this way.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove remaining bits of the subarchitecture code. Now that all the
special platforms are runtime probed and runtime handled, we can remove
these facilities.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move all code to arch/x86/kernel/bigsmp_32.c.
With this it ceases to rely on any build-time subarch features.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move all NUMAQ code into arch/x86/kernel/numaq.c.
With this it ceases to rely on any build-time subarch features.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move all ES7000 code into arch/x86/kernel/es7000_32.c.
With this it ceases to rely on any build-time subarch features.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
kerneloops.org is reporting a lot of these warnings that come due to
vmware not setting up any MTRRs for emulated CPUs:
| Reported 709 times (14696 total reports)
| BIOS bug (often in VMWare) where the MTRR's are set up incorrectly
| or not at all
|
| This warning was last seen in version 2.6.29-rc2-git1, and first
| seen in 2.6.24.
|
| More info:
| http://www.kerneloops.org/searchweek.php?search=mtrr_trim_uncached_memory
Keep a one-liner KERN_INFO about it - so that we have so notice if empty
MTRRs are caused by native hardware/BIOS weirdness.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Nothing exciting - a few subarches dont want APIC remote reads to
be performed - the others are content with the default method.
- extend the generic code to handle NULL methods
- clear out dummy methods and replace them with NULL
- clean up: remove wrapper macros, etc.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Only NUMAQ does something substantial here, because it initializes
via NMIs (not via INIT as standard SMP startup) - so it needs to
store and restore the NMI vector.
- extend the generic code to handle NULL methods
- clear out dummy methods and replace them with NULL
- clean up: remove wrapper macros, etc.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Only NUMAQ does something substantial here, because it initializes
via NMIs (not via INIT as standard SMP startup) - so it needs to
reset the APIC.
- extend the generic code to handle NULL methods
- clear out dummy methods and replace them with NULL
- clean up: remove wrapper macros, etc.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- spread out the namespace on a per APIC driver basis
- handle a NULL ->wait_for_init_deassert() as a 'dont wait' default method
- remove NUMAQ and Summit handlers
Signed-off-by: Ingo Molnar <mingo@elte.hu>
64-bit x86 has zero for ->trampoline_phys_low/high, but the smpboot
code can use these values - so it's better to set them up to their
correct values.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Our send_IPI_*() methods and definitions are a twisted mess: the same
symbol is defined to different things depending on .config details,
in a non-transparent way.
- spread out the quirks into separately named per apic driver methods
- prefix the standard PC methods with default_
- get rid of wrapper macro obfuscation
- clean up various details
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Call all the registered MPS quirk handlers early. These methods scan
low RAM typically for specific signatures so are safe to be called
early.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Refactor the ->phys_pkg_id() methods:
- namespace separation
- macro wrapper removal
- open-coded calls to the methods in the generic code
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- unify the call signature of 64-bit to that of 32-bit
- clean up the types all around
- clean up namespace contamination
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- eliminate the needless es7000_enable_apic_mode() complication which
was not apparent prior the namespace cleanups
- clean up the control flow in es7000_enable_apic_mode()
- other cleanups
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Only ES7000 has a real ->enable_apic_mode() method, the other
subarchitectures define it but keep it empty.
So mark the vector as NULL, extend the generic code to handle
NULL -setup_portio_remap() entries and remove all the empty
handlers.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- spread out the namespace to per driver methods
- extend it to 64-bit as well so that we can use
apic->check_phys_apicid_present() unconditionally
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Only NUMAQ has a real ->setup_portio_remap() method, the other
subarchitectures define it but keep it empty.
So mark the vector as NULL, extend the generic code to handle
NULL -setup_portio_remap() entries and remove all the empty
handlers.
Also move the NUMAQ method from the header file into the
apic driver .c file.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
only NUMAQ uses this quirk: to prevent the timer IRQ from being added
on secondary nodes.
All other genapic templates can have a NULL ->multi_timer_check()
callback.
Also, extend the generic code to treat a NULL pointer accordingly.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
The bigsmp and es7000 subarchitectures un-defined APIC_DEST_LOGICAL in
a rather nasty way by re-defining it to zero. That is infinitely
fragile and makes it very hard to see what to code really does in
a given context. The very same constant has different meanings and
values - depending on which subarch is enabled.
Untangle this mess by never undefining the constant, but instead
propagating the right values into the genapic driver templates.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
the ->ESR_DISABLE shouting variant was used to enable the esr_disable
macro wrappers. Those ugly macros are removed now so we can rename
->ESR_DISABLE to ->disable_esr
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Most subarchitectures want to disable the APIC ESR (Error Status Register),
because they generally have hardware hacks that wrap standard CPUs into
a bigger system and hence the APIC bus is quite non-standard and weirdnesses
(lockups) have been seen with ESR reporting.
Remove the esr_disable macros and put the desired flag into each
subarchitecture's genapic template directly.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the wrapper macros IRQ_DEST_MODE and IRQ_DELIVERY_MODE.
The typical 32-bit and the 64-bit build all dereference via the genapic,
so it's pointless to hide that indirection via these ugly macros.
Furthermore, it also obscures subarchitecture details.
So replace it with apic->irq_dest_mode / etc. accesses.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
int_delivery_mode is supposed to mean 'interrupt delivery mode', but
it's quite a misnomer as 'int' we usually think of as an integer type ...
The standard naming for such attributes is 'irq' - so rename the following
fields and macros:
int_delivery_mode => irq_delivery_mode
INT_DELIVERY_MODE => IRQ_DELIVERY_MODE
int_dest_mode => irq_dest_mode
INT_DEST_MODE => IRQ_DEST_MODE
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
x86 subarchitectures each defined a "apic_id_registered()" method,
which could be an inline function depending on which subarch we build
for, and which was also the name of a genapic field.
Untangle this namespace spaghetti by giving each of the instances
a separate name.
Also remove wrapper macro obfuscation.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: refactor code
x86 subarchitectures each defined a "acpi_madt_oem_check()" method,
which could be an inline function, or an extern, or a static function,
and which was also the name of a genapic field.
Untangle this namespace spaghetti by setting ->acpi_madt_oem_check()
to NULL on those subarchitectures that have no detection quirks,
and rename the other ones (summit, es7000) that do.
Also change default_acpi_madt_oem_check() to handle NULL entries,
and clean its control flow up as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- reorder fields so that they appear in struct genapic field ordering
- add zero-initialized fields too so that it's apparent which functionality
is default / missing.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- reorder fields so that they appear in struct genapic field ordering
- add zero-initialized fields too so that it's apparent which functionality
is default / missing.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- reorder fields so that they appear in struct genapic field ordering
- add zero-initialized fields too so that it's apparent which functionality
is default / missing.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- reorder fields so that they appear in struct genapic field ordering
- add zero-initialized fields too so that it's apparent which functionality
is default / missing.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- reorder fields so that they appear in struct genapic field ordering
- add zero-initialized fields too so that it's apparent which functionality
is default / missing.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rename genapic-> to apic-> references because in a future chagne we'll
open-code all the indirect calls (instead of obscuring them via macros),
so we want this reference to be as short as possible.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Cleanup
While I was looking through the new and improved bootstrap code - great
work that, thanks! I found the below a slight improvement.
Remove unnecessary ugly #ifdef construct around debug register clear.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
fix the following warning:
CC arch/x86/kernel/cpu/intel_cacheinfo.o
arch/x86/kernel/cpu/intel_cacheinfo.c:314: warning: 'cpuid4_cache_lookup' defined but not used
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: build fix
x86_cpu_to_apicid and x86_bios_cpu_apicid aren't defined for voyage.
Earlier patch forgot to conditionalize early percpu clearing. Fix it.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: sync 32 and 64-bit code
Merge load_gs_base() into switch_to_new_gdt(). Load the GDT and
per-cpu state for the boot cpu when its new area is set up.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Rename init_gdt() to setup_percpu_segment(), and move it to
setup_percpu.c.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: standardize all x86 platforms on same setup code
With the preceding changes, Voyager can use the same per-cpu setup
code as all the other x86 platforms.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: Small cleanup
Define BOOT_PERCPU_OFFSET and use it for this_cpu_offset and
__per_cpu_offset initializers.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: Code movement
Move the variable definitions to apic.c. Ifdef the copying of
the two early per-cpu variables, since Voyager doesn't use them.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
The way the code is written, align is always PAGE_SIZE. Simplify
the code by removing the align variable.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: Code movement, no functional change.
Move setup_cpu_local_masks() to kernel/cpu/common.c, where the
masks are defined.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: Code movement, no functional change.
Move the 64-bit NUMA code from setup_percpu.c to numa_64.c
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: minor optimization
Eliminates the need for two loops over possible cpus.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
debugobjects: add and use INIT_WORK_ON_STACK
rcu: remove duplicate CONFIG_RCU_CPU_STALL_DETECTOR
relay: fix lock imbalance in relay_late_setup_files
oprofile: fix uninitialized use of struct op_entry
rcu: move Kconfig menu
softlock: fix false panic which can occur if softlockup_thresh is reduced
rcu: add __cpuinit to rcu_init_percpu_data()
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
hrtimers: fix inconsistent lock state on resume in hres_timers_resume
time-sched.c: tick_nohz_update_jiffies should be static
locking, hpet: annotate false positive warning
kernel/fork.c: unused variable 'ret'
itimers: remove the per-cpu-ish-ness
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (29 commits)
xen: unitialised return value in xenbus_write_transaction
x86: fix section mismatch warning
x86: unmask CPUID levels on Intel CPUs, fix
x86: work around PAGE_KERNEL_WC not getting WC in iomap_atomic_prot_pfn.
x86: use standard PIT frequency
xen: handle highmem pages correctly when shrinking a domain
x86, mm: fix pte_free()
xen: actually release memory when shrinking domain
x86: unmask CPUID levels on Intel CPUs
x86: add MSR_IA32_MISC_ENABLE bits to <asm/msr-index.h>
x86: fix PTE corruption issue while mapping RAM using /dev/mem
x86: mtrr fix debug boot parameter
x86: fix page attribute corruption with cpa()
Revert "x86: signal: change type of paramter for sys_rt_sigreturn()"
x86: use early clobbers in usercopy*.c
x86: remove kernel_physical_mapping_init() from init section
fix: crash: IP: __bitmap_intersects+0x48/0x73
cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
work_on_cpu: Use our own workqueue.
work_on_cpu: don't try to get_online_cpus() in work_on_cpu.
...
Impact: re-enable CPUID unmasking on affected processors
As far as I am capable of discerning from the documentation,
MSR_IA32_MISC_ENABLE should be available for all family 0xf CPUs, as
well as family 6 for model >= 0xd (newer Pentium M).
The documentation on this isn't ideal, so we need to be on the lookout
for errors, still.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Here function vmi_activate calls a init function activate_vmi , which
causes the following section mismatch warnings:
LD arch/x86/kernel/built-in.o
WARNING: arch/x86/kernel/built-in.o(.text+0x13ba9): Section mismatch
in reference from the function vmi_activate() to the function
.init.text:vmi_time_init()
The function vmi_activate() references
the function __init vmi_time_init().
This is often because vmi_activate lacks a __init
annotation or the annotation of vmi_time_init is wrong.
WARNING: arch/x86/kernel/built-in.o(.text+0x13bd1): Section mismatch
in reference from the function vmi_activate() to the function
.devinit.text:vmi_time_bsp_init()
The function vmi_activate() references
the function __devinit vmi_time_bsp_init().
This is often because vmi_activate lacks a __devinit
annotation or the annotation of vmi_time_bsp_init is wrong.
WARNING: arch/x86/kernel/built-in.o(.text+0x13bdb): Section mismatch
in reference from the function vmi_activate() to the function
.devinit.text:vmi_time_ap_init()
The function vmi_activate() references
the function __devinit vmi_time_ap_init().
This is often because vmi_activate lacks a __devinit
annotation or the annotation of vmi_time_ap_init is wrong.
Fix it by marking vmi_activate() as __init too.
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Remove such constructs:
#ifdef CONFIG_EARLY_PRINTK
call early_printk
#else
call printk
#endif
Not only are they ugly, they are also pointless: a call to printk()
maps to early_printk during early bootup anyway, if CONFIG_EARLY_PRINTK
is enabled.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix boot hang on pre-model-15 Intel CPUs
rdmsrl_safe() does not work in very early bootup code yet, because we
dont have the pagefault handler installed yet so exception section
does not get parsed. rdmsr_safe() will just crash and hang the bootup.
So limit the MSR_IA32_MISC_ENABLE MSR read to those CPU types that
support it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Fixes potential crashes on misconfigured systems.
Some CPU features require specific CPUID levels to be available in
order to function, as they contain information about the operation of
a specific feature. However, some BIOSes and virtualization software
provide the ability to mask CPUID levels in order to support legacy
operating systems. We try to enable such CPUID levels when we know
how to do it, but for the remaining cases, filter out such CPU
features when there is no way for us to support them.
Do this in one place, in the CPUID code, with a table-driven approach.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: Cleanup
When PAT was originally introduced, it was handled specially for a few
reasons:
- PAT bugs are hard to track down, so we wanted to maintain a
whitelist of CPUs.
- The i386 and x86-64 CPUID code was not yet unified.
Both of these are now obsolete, so handle PAT like any other features,
including ordinary feature blacklisting due to known bugs.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: use new framework
Use {get|put}_user_try, catch, and _ex in arch/x86/kernel/signal.c.
Note: this patch contains "WARNING: line over 80 characters", because when
introducing new block I insert an indent to avoid mistakes by edit.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The file is used for 32 and 64 bit since:
commit cfb80c9eae
Author: Jeremy Fitzhardinge <jeremy@goop.org>
Date: Tue Dec 16 12:17:36 2008 -0800
x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
APIC definitions aren't needed here. Remove the include and fix
up the fallout.
tj: added include to mce_intel_64.c.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: bogus irq_cpustat field removed
idle_timestamp is left over from the removed irqbalance code.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
pte_flags() was introduced as a new pvop in order to extract just the
flags portion of a pte, which is a potentially cheaper operation than
extracting the page number as well. It turns out this operation is
not needed, because simply using a mask to extract the flags from a
pte is sufficient for all current users.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cleanup the cpuid check for DS configuration.
This also fixes a Corei7 CPUID enumeration bug.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Fix debugobjects warning
debugobject enabled kernels spit out a warning in hpet code due to a
workqueue which is initialized on stack.
Add INIT_WORK_ON_STACK() which calls init_timer_on_stack() and use it
in hpet.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Impact: Fixes crashes with misconfigured BIOSes on XSAVE hardware
Avuton Olrich reported early boot crashes with v2.6.28 and
bisected it down to dc1e35c6e9
("x86, xsave: enable xsave/xrstor on cpus with xsave support").
If the CPUID limit bit in MSR_IA32_MISC_ENABLE is set, clear it to
make all CPUID information available. This is required for some
features to work, in particular XSAVE.
Reported-and-bisected-by: Avuton Olrich <avuton@gmail.com>
Tested-by: Avuton Olrich <avuton@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Make X86 SGI Ultraviolet support configurable. Saves about 13K of text size
on my modest config.
text data bss dec hex filename
6770537 1158680 694356 8623573 8395d5 vmlinux
6757492 1157664 694228 8609384 835e68 vmlinux.nouv
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
while looking at:
http://bugzilla.kernel.org/show_bug.cgi?id=11541
I realized that the mtrr.show param cannot work, because
the code is processed much too early.
This patch:
- Declares mtrr.show as early_param
- Stays consistent with the previous param (which I doubt
that it ever worked), so mtrr.show=1 would still work
- Declares mtrr_show as initdata
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Now that it's unified, move the (SMP) TLB flushing code from arch/x86/kernel/
to arch/x86/mm/, where it belongs logically.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
arch/x86/kernel/tlb_32.c
Merge it here because both the cpumask changes and the ongoing percpu
work is touching the TLB code. The percpu changes take precedence, as
they eliminate tlb_32.c altogether.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit 4217458daf.
Justin Madru bisected this commit, it was causing weird Firefox
crashes.
The reason is that GCC mis-optimizes (re-uses) the on-stack parameters of
the calling frame, which corrupts the syscall return pt_regs state and
thus corrupts user-space register state.
So we go back to the slightly less clean but more optimization-safe
method of getting to pt_regs. Also add a comment to explain this.
Resolves: http://bugzilla.kernel.org/show_bug.cgi?id=12505
Reported-and-bisected-by: Justin Madru <jdm64@gawab.com>
Tested-by: Justin Madru <jdm64@gawab.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: less contention when issuing invalidate IPI, cleanup
Make x86_32 use the same tlb code as 64bit. The 64bit code uses
multiple IPI vectors for tlb shootdown to reduce contention. This
patch makes x86_32 allocate the same 8 IPIs as x86_64 and share the
code paths.
Note that the usage of asmlinkage is inconsistent for x86_32 and 64
and calls for further cleanup. This has been noted with a FIXME
comment in tlb_64.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: clean up, ipi vector number reordering for x86_32
Make the following changes to prepare for tlb merge.
* reorder x86_32 ip vectors
* adjust tlb_32.c and tlb_64.c such that their logics coincide exactly
- on spurious invalidate ipi, tlb_32 acks the irq
- tlb_64 now has proper memory barriers around clearing
flush_cpumask (no change in generated code)
* unexport flush_tlb_page from tlb_32.c, there's no user
* use unsigned int for cpu id
* drop unnecessary includes from tlb_64.c
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Make the following uv related cleanups.
* collect visible uv related definitions and interfaces into uv/uv.h
and use it. this cleans up the messy situation where on 64bit, uv
is defined properly, on 32bit generic it's dummy and on the rest
undefined. after this clean up, uv is defined on 64 and dummy on
32.
* update uv_flush_tlb_others() such that it takes cpumask of
to-be-flushed cpus as argument, instead of that minus self, and
returns yet-to-be-flushed cpumask, instead of modifying the passed
in parameter. this interface change will ease dummy implementation
of uv_flush_tlb_others() and makes uv tlb flush related stuff
defined in tlb_uv proper.
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup, better irq_regs code generation for x86_64
Make 64-bit use the same optimizations as 32-bit.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
%fs is currently set to __KERNEL_DS at boot, and conditionally
switched to __KERNEL_PERCPU for secondary cpus. Instead, initialize
GDT_ENTRY_PERCPU to the same attributes as GDT_ENTRY_KERNEL_DS and
set %fs to __KERNEL_PERCPU unconditionally.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup && more compact percpu area layout with future changes
Move 64-bit GDT to page-aligned section and clean up comment
formatting.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Remove byte locks implementation, which was introduced by Jeremy in
8efcbab6 ("paravirt: introduce a "lock-byte" spinlock implementation"),
but turned out to be dead code that is not used by any in-kernel
virtualization guest (Xen uses its own variant of spinlocks implementation
and KVM is not planning to move to byte locks).
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cleanup the cpuid check for DS configuration.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dump the branch trace on an oops (based on ftrace_dump_on_oops).
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: x86_64 percpu area layout change, irq_stack now at the beginning
Now that the PDA is empty except for the stack canary, it can be removed.
The irqstack is moved to the start of the per-cpu section. If the stack
protector is enabled, the canary overlaps the bottom 48 bytes of the irqstack.
tj: * updated subject
* dropped asm relocation of irq_stack_ptr
* updated comments a bit
* rebased on top of stack canary changes
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Use cpu_number to determine if the adjustment is necessary.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Copy the code to cpu_init() to satisfy the requirement that the cpu
be reinitialized. Remove all other calls, since the segments are
already initialized in head_64.S.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Make the following cleanups.
* remove duplicate comment from boot_init_stack_canary() which fits
better in the other place - cpu_idle().
* move stack_canary offset check from __switch_to() to
boot_init_stack_canary().
Signed-off-by: Tejun Heo <tj@kernel.org>
-tip testing found this crash:
> [ 35.258515] calling acpi_cpufreq_init+0x0/0x127 @ 1
> [ 35.264127] BUG: unable to handle kernel NULL pointer dereference at (null)
> [ 35.267554] IP: [<ffffffff80478092>] __bitmap_intersects+0x48/0x73
> [ 35.267554] PGD 0
> [ 35.267554] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c is still broken: there's no
allocation of the variable mask, so we pass in an uninitialized cmd.mask
field to drv_read(), which then passes it to the scheduler which then
crashes ...
Switch it over to the much simpler constant-cpumask-pointers approach.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: use new work_on_cpu function to reduce stack usage
Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr() with
a work_on_cpu function for drv_read() and drv_write().
Basically converts do_drv_{read,write} into "work_on_cpu" functions that
are now called by drv_read and drv_write.
Note: This patch basically reverts 50c668d6 which reverted 7503bfba, now
that the work_on_cpu() function is more stable.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Dieter Ries <clip2@gmx.de>
Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <cpufreq@vger.kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since commit 75c46fa, "x64, x2apic/intr-remap: MSI and MSI-X
support for interrupt remapping infrastructure", x86 has had an
implementation of arch_setup_msi_irqs().
That implementation does not call arch_setup_msi_irq(), instead it calls
setup_irq(). No other x86 code calls arch_setup_msi_irq().
That leaves only arch_setup_msi_irqs() in drivers/pci/msi.c, but that
routine is overridden by the x86 version of arch_setup_msi_irqs().
So arch_setup_msi_irq() is dead code, remove it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The function setup_cpu_local_masks() has been marked __init, in
order to remove the following section mismatch messages:
WARNING: vmlinux.o(.text+0x3c2c7): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
The function setup_cpu_local_masks() references
the function __init alloc_bootmem_cpumask_var().
This is often because setup_cpu_local_masks lacks a __init
annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
WARNING: vmlinux.o(.text+0x3c2d3): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
The function setup_cpu_local_masks() references
the function __init alloc_bootmem_cpumask_var().
This is often because setup_cpu_local_masks lacks a __init
annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
WARNING: vmlinux.o(.text+0x3c2df): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
The function setup_cpu_local_masks() references
the function __init alloc_bootmem_cpumask_var().
This is often because setup_cpu_local_masks lacks a __init
annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
WARNING: vmlinux.o(.text+0x3c2eb): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
The function setup_cpu_local_masks() references
the function __init alloc_bootmem_cpumask_var().
This is often because setup_cpu_local_masks lacks a __init
annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
Signed-off-by: Leonardo Potenza <lpotenza@inwind.it>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: add debug warning
Fire off one message if two apic's discovered with different
apic versions. (this code is only called during CPU init)
The goal of this is to pave the way of the removal of the apic_version[]
array. We dont expect any apic version incompatibilities in the x86
landscape of systems [if so we dont handle them very well and probably
never will handle deep apic version assymetries well], but it's prudent
to have a debug check for one kernel cycle nevertheless.
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
tj: * in asm-offsets_64.c, pda.h inclusion shouldn't be removed as pda
is still referenced in the file
* s/oldrsp/old_rsp/
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Also clean up PER_CPU_VAR usage in xen-asm_64.S
tj: * remove now unused stack_thread_info()
* s/kernelstack/kernel_stack/
* added FIXME comment in xen-asm_64.S
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
tj: moved cpu_number definition out of CONFIG_HAVE_SETUP_PER_CPU_AREA
for voyager.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Move the exception stacks to per-cpu, removing specific allocation code.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Move the irqstackptr variable from the PDA to per-cpu. Make the
stacks themselves per-cpu, removing some specific allocation code.
Add a seperate flag (is_boot_cpu) to simplify the per-cpu boot
adjustments.
tj: * sprinkle some underbars around.
* irq_stack_ptr is not used till traps_init(), no reason to
initialize it early. On SMP, just leaving it NULL till proper
initialization in setup_per_cpu_areas() works. Dropped
is_boot_cpu and early irq_stack_ptr initialization.
* do DECLARE/DEFINE_PER_CPU(char[IRQ_STACK_SIZE], irq_stack)
instead of (char, irq_stack[IRQ_STACK_SIZE]).
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: use new work_on_cpu function to reduce stack usage
Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr() with
a work_on_cpu function for drv_read() and drv_write().
Basically converts do_drv_{read,write} into "work_on_cpu" functions that
are now called by drv_read and drv_write.
Note: This patch basically reverts 50c668d6 which reverted 7503bfba, now
that the work_on_cpu() function is more stable.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Dieter Ries <clip2@gmx.de>
Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <cpufreq@vger.kernel.org>
Fix two compilation warnings in drivers/acpi/sleep.c, one triggered
by unsetting CONFIG_SUSPEND and the other triggered by unsetting
CONFIG_HIBERNATION, by moving some code under the appropriate
#ifdefs .
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
Check CONFIG_FREEZER instead of CONFIG_PM because kprobe booster
depends on freeze_processes() and thaw_processes() when CONFIG_PREEMPT=y.
This fixes a linkage error which occurs when CONFIG_PREEMPT=y, CONFIG_PM=y
and CONFIG_FREEZER=n.
Reported-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Len Brown <len.brown@intel.com>
On x86_64, if get_per_cpu_var() is used before per cpu area is setup
(if lockdep is turned on, it happens), it needs this_cpu_off to point
to __per_cpu_load. Initialize accordingly.
Signed-off-by: Tejun Heo <tj@kernel.org>
It is an optimization and a cleanup, and adds the following new
generic percpu methods:
percpu_read()
percpu_write()
percpu_add()
percpu_sub()
percpu_and()
percpu_or()
percpu_xor()
and implements support for them on x86. (other architectures will fall
back to a default implementation)
The advantage is that for example to read a local percpu variable,
instead of this sequence:
return __get_cpu_var(var);
ffffffff8102ca2b: 48 8b 14 fd 80 09 74 mov -0x7e8bf680(,%rdi,8),%rdx
ffffffff8102ca32: 81
ffffffff8102ca33: 48 c7 c0 d8 59 00 00 mov $0x59d8,%rax
ffffffff8102ca3a: 48 8b 04 10 mov (%rax,%rdx,1),%rax
We can get a single instruction by using the optimized variants:
return percpu_read(var);
ffffffff8102ca3f: 65 48 8b 05 91 8f fd mov %gs:0x7efd8f91(%rip),%rax
I also cleaned up the x86-specific APIs and made the x86 code use
these new generic percpu primitives.
tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out
* added percpu_and() for completeness's sake
* made generic percpu ops atomic against preemption
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Tejun Heo <tj@kernel.org>
Do the following cleanups:
* kill x86_64_init_pda() which now is equivalent to pda_init()
* use per_cpu_offset() instead of cpu_pda() when initializing
initial_gs
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
pda is now a percpu variable and there's no reason it can't use plain
x86 percpu accessors. Add x86_test_and_clear_bit_percpu() and replace
pda op implementations with wrappers around x86 percpu accessors.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Based on original patch from Christoph Lameter and Mike Travis. ]
As pda is now allocated in percpu area, it can easily be made a proper
percpu variable. Make it so by defining per cpu symbol from linker
script and declaring it in C code for SMP and simply defining it for
UP. This change cleans up code and brings SMP and UP closer a bit.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that pda is allocated as part of percpu, percpu doesn't need to be
accessed through pda. Unify x86_64 SMP percpu access with x86_32 SMP
one. Other than the segment register, operand size and the base of
percpu symbols, they behave identical now.
This patch replaces now unnecessary pda->data_offset with a dummy
field which is necessary to keep stack_canary at its place. This
patch also moves per_cpu_offset initialization out of init_gdt() into
setup_per_cpu_areas(). Note that this change also necessitates
explicit per_cpu_offset initializations in voyager_smp.c.
With this change, x86_OP_percpu()'s are as efficient on x86_64 as on
x86_32 and also x86_64 can use assembly PER_CPU macros.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Based on original patch from Christoph Lameter and Mike Travis. ]
Currently pdas and percpu areas are allocated separately. %gs points
to local pda and percpu area can be reached using pda->data_offset.
This patch folds pda into percpu area.
Due to strange gcc requirement, pda needs to be at the beginning of
the percpu area so that pda->stack_canary is at %gs:40. To achieve
this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is
added and used to reserve pda sized chunk at the start of the percpu
area.
After this change, for boot cpu, %gs first points to pda in the
data.init area and later during setup_per_cpu_areas() gets updated to
point to the actual pda. This means that setup_per_cpu_areas() need
to reload %gs for CPU0 while clearing pda area for other cpus as cpu0
already has modified it when control reaches setup_per_cpu_areas().
This patch also removes now unnecessary get_local_pda() and its call
sites.
A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into
per cpu area" patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
_cpu_pda array first uses statically allocated storage in data.init
and then switches to allocated bootmem to conserve space. However,
after folding pda area into percpu area, _cpu_pda array will be
removed completely. Drop the reallocation part to simplify the code
for soon-to-follow changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Based on original patch from Christoph Lameter and Mike Travis. ]
CPU startup code in head_64.S loaded address of a zero page into %gs
for temporary use till pda is loaded but address to the actual pda is
available at the point. Load the real address directly instead.
This will help unifying percpu and pda handling later on.
This patch is mostly taken from Mike Travis' "x86_64: Fold pda into
per cpu area" patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
[ Based on original patch from Christoph Lameter and Mike Travis. ]
This patch makes percpu symbols zerobased on x86_64 SMP by adding
PERCPU_VADDR() to vmlinux.lds.h which helps setting explicit vaddr on
the percpu output section and using it in vmlinux_64.lds.S. A new
PHDR is added as existing ones cannot contain sections near address
zero. PERCPU_VADDR() also adds a new symbol __per_cpu_load which
always points to the vaddr of the loaded percpu data.init region.
The following adjustments have been made to accomodate the address
change.
* code to locate percpu gdt_page in head_64.S is updated to add the
load address to the gdt_page offset.
* __per_cpu_load is used in places where access to the init data area
is necessary.
* pda->data_offset is initialized soon after C code is entered as zero
value doesn't work anymore.
This patch is mostly taken from Mike Travis' "x86_64: Base percpu
variables at zero" patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make vmlinux_32.lds.S use the generic PERCPU() macro instead of open
coding it. This will ease future changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Based on original patch from Christoph Lameter and Mike Travis. ]
* Ruggedize some calls in setup_percpu.c to prevent mishaps
in early calls, particularly for non-critical functions.
* Cleanup DEBUG_PER_CPU_MAPS usages and some comments.
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make early_per_cpu() a lvalue as per_cpu() is and use it where
applicable.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The function uv_wait_completion() spins on reads of a memory-mapped
register, waiting for completion of BAU hardware replies.
It should call "cpu_relax()" between those reads to improve performance
on hyperthreaded configurations.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Acked-by: Jack Steiner <steiner@sgi.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
-tip testing found this crash:
> [ 35.258515] calling acpi_cpufreq_init+0x0/0x127 @ 1
> [ 35.264127] BUG: unable to handle kernel NULL pointer dereference at (null)
> [ 35.267554] IP: [<ffffffff80478092>] __bitmap_intersects+0x48/0x73
> [ 35.267554] PGD 0
> [ 35.267554] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c is still broken: there's no
allocation of the variable mask, so we pass in an uninitialized cmd.mask
field to drv_read(), which then passes it to the scheduler which then
crashes ...
Switch it over to the much simpler constant-cpumask-pointers approach.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: widen the effect of the 'nolapic' boot parameter
"nolapic" should not only suppress SMP and use of the LAPIC, but it
also ought to have the effect of disabling all IO-APIC related activity
as well as PCI MSI and HT-IRQs.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: micro-optimization, memory reduction
On x86_64 flush tlb data is stored in per_cpu variables. This is
unnecessary because only the first NUM_INVALIDATE_TLB_VECTORS entries
are accessed.
This patch aims at making the code less confusing (there's nothing
really "per_cpu") by using a plain array. It also would save some memory
on most distros out there (Ubuntu x86_64 has NR_CPUS=64 by default).
[ Ravikiran G Thirumalai also pointed out that the correct alignment
is ____cacheline_internodealigned_in_smp, so that there's no
bouncing on vsmp. ]
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@xprog.eu>
Acked-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This commit broke flush_tlb_others_ipi() causing boot hangs on a
16 logical cpu system:
> commit 4595f9620c
> Author: Rusty Russell <rusty@rustcorp.com.au>
> Date: Sat Jan 10 21:58:09 2009 -0800
>
> x86: change flush_tlb_others to take a const struct cpumask
This change resulted in sending the invalidate tlb vector to the
sender itself causing the hang. flush_tlb_others_ipi() should exclude
the sender itself from the destination list.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix potential boot crash on MAXSMP
Remove code left over by:
50c668d: Revert "cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read
That cmd.cpumask is not allocated anymore. No impact on default !MAXSMP
kernels.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Intel "Smackover" x58 BIOS don't have HPET enabled in the BIOS, so allow
to force enable it at least. The register layout is the same as in other
recent ICHs, so all the code can be reused.
Using numerical PCI-ID because it's unlikely the PCI-ID will be used
anywhere else.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit e0c7317557.
This patch was wrong, as lockdep (and thus the irq state tracer)
aren't nmi safe. People are already seeing lockdep warnings due
to this.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit 7503bfbae8.
Dieter Ries reported bootup soft-hangs and bisected it back to
this commit, and reverting this commit gave him a working system.
The commit introduces work_on_cpu() use into the cpufreq code,
but that is subtly problematic from a lock hierarchy POV: the
hotplug-cpu lock is an highlevel lock that is taken before
lowlevel locks, and in this codepath we are called with the
policy lock taken.
Dieter did not have lockdep enabled so we dont have a nice stack
trace proof for this, but using work_on_cpu() in such a lowlevel
place certainly looks wrong, so we revert the patch.
work_on_cpu() needs to be reworked to be more generally usable.
Reported-by: Dieter Ries <clip2@gmx.de>
Tested-by: Dieter Ries <clip2@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix this by reintroducing asm/smp.h include in apic.c - later on
I will fix this by removing non-smp data from smp.h
Also fix the __inquire_remote_apic() prototype/inline.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix this by reintroducing asm/smp.h include in mpparse.c - later on
I will fix this by removing non-smp data from smp.h.
Reported-by: Petr Titera <P.Titera@century.cz>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Alexander Beregalov reported that this warning is caused by the HPET code:
> hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
> hpet0: 3 comparators, 64-bit 14.318180 MHz counter
> ODEBUG: object is on stack, but not annotated
> ------------[ cut here ]------------
> WARNING: at lib/debugobjects.c:251 __debug_object_init+0x2a4/0x352()
> Bisected down to 26afe5f2fb
> (x86: HPET_MSI Initialise per-cpu HPET timers)
The commit is fine - but the on-stack workqueue entry needs annotation.
Reported-and-bisected-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Fix:
ERROR: trailing whitespace
ERROR: code indent should use tabs where possible
WARNING: %Ld/%Lu are not-standard C, use %lld/%llu
WARNING: printk() should include KERN_ facility level
ERROR: spaces required around that '=' (ctx:VxW)
total: 13 errors, 2 warnings
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: preparation, cleanup, add KERN_INFO printk
Modify references from NR_IRQS to nr_irqs as the later will become
variable-sized based on nr_cpu_ids when CONFIG_SPARSE_IRQS=y.
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: reduce stack usage.
init_intel_cacheinfo() does not use the cpumask so define a subset
of struct _cpuid4_info (_cpuid4_info_regs) that can be used instead.
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: Reduce memory usage, use new cpumask API.
Use cpumask_var_t for 'cpus' cpumask in struct threshold_bank and update
remaining old cpumask_t functions to new cpumask API.
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: Improve tlb flush performance for UV
Calling alloc_cpumask_var a zillion times a second does affect
performance. Replace with static cpumask.
Note: when CONFIG_X86_UV is defined, this extra PER_CPU memory
will be optimized out for non-UV configs as is_uv_system() will
then return a constant 0.
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: reduce stack usage, use new cpumask API.
This is made a little more tricky by uv_flush_tlb_others which
actually alters its argument, for an IPI to be sent to the remaining
cpus in the mask.
I solve this by allocating a cpumask_var_t for this case and falling back
to IPI should this fail.
To eliminate temporaries in the caller, all flush_tlb_others implementations
now do the this-cpu-elimination step themselves.
Note also the curious "cpus_or(f->flush_cpumask, cpumask, f->flush_cpumask)"
which has been there since pre-git and yet f->flush_cpumask is always zero
at this point.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: reduce memory usage, use new cpumask API.
Replace the affinity and pending_masks with cpumask_var_t's. This adds
to the significant size reduction done with the SPARSE_IRQS changes.
The added functions (init_alloc_desc_masks & init_copy_desc_masks) are
in the include file so they can be inlined (and optimized out for the
!CONFIG_CPUMASKS_OFFSTACK case.) [Naming chosen to be consistent with
the other init*irq functions, as well as the backwards arg declaration
of "from, to" instead of the more common "to, from" standard.]
Includes a slight change to the declaration of struct irq_desc to embed
the pending_mask within ifdef(CONFIG_SMP) to be consistent with other
references, and some small changes to Xen.
Tested: sparse/non-sparse/cpumask_offstack/non-cpumask_offstack/nonuma/nosmp on x86_64
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: virtualization@lists.osdl.org
Cc: xen-devel@lists.xensource.com
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Impact: micro-optimization
The patch below removes an unnecessary locked instruction from
switch_to(). TIF_FORK is only ever set in copy_thread() on initial
process creation, and gets cleared during the first scheduling of the
process. As such, it is safe to use an unlocked test for the flag
within switch_to().
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The swiotlb_arch_range_needs_mapping() hook should take a physical
address rather than a virtual address in order to support highmem pages.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits)
x86: fix section mismatch warnings in mcheck/mce_amd_64.c
x86: offer frame pointers in all build modes
x86: remove duplicated #include's
x86: k8 numa register active regions later
x86: update Alan Cox's email addresses
x86: rename all fields of mpc_table mpc_X to X
x86: rename all fields of mpc_oemtable oem_X to X
x86: rename all fields of mpc_bus mpc_X to X
x86: rename all fields of mpc_cpu mpc_X to X
x86: rename all fields of mpc_intsrc mpc_X to X
x86: rename all fields of mpc_lintsrc mpc_X to X
x86: rename all fields of mpc_iopic mpc_X to X
x86: irqinit_64.c init_ISA_irqs should be static
Documentation/x86/boot.txt: payload length was changed to payload_length
x86: setup_percpu.c fix style problems
x86: irqinit_64.c fix style problems
x86: irqinit_32.c fix style problems
x86: i8259.c fix style problems
x86: irq_32.c fix style problems
x86: ioport.c fix style problems
...
* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
[IA64] fix typo in cpumask_of_pcibus()
x86: fix x86_32 builds for summit and es7000 arch's
cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs
cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
cpumask: use cpumask_var_t in acpi-cpufreq.c
cpumask: use work_on_cpu in acpi/cstate.c
cpumask: convert struct cpufreq_policy to cpumask_var_t
cpumask: replace CPUMASK_ALLOC etc with cpumask_var_t
x86: cleanup remaining cpumask_t ops in smpboot code
cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
cpumask: update local_cpus_show to use new cpumask API
ia64: cpumask fix for is_affinity_mask_valid()
We found a situation on Linus' machine that the Nvidia timer quirk hit on
a Intel chipset system. The problem is that the system has a fancy Nvidia
card with an own PCI bridge, and the early-quirks code looking for any
NVidia bridge triggered on it incorrectly. This didn't lead a boot
failure by luck, but the timer routing code selecting the wrong timer
first and some ugly messages. It might lead to real problems on other
systems.
I checked all the devices which are currently checked for by early_quirks
and it turns out they are all located in the root bus zero.
So change the early-quirks loop to only scan bus 0. This incidently also
saves quite some unnecessary scanning work, because early_quirks doesn't
go through all the non root busses.
The graphics card is not on bus 0, so it is not matched anymore.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On some boxes there exist both RSDT and XSDT table. But unfortunately
sometimes there exists the following error when XSDT table is used:
a. 32/64X address mismatch
b. The 32/64X FACS address mismatch
In such case the boot option of "acpi=rsdt" is provided so that
RSDT is tried instead of XSDT table when the system can't work well.
http://bugzilla.kernel.org/show_bug.cgi?id=8246
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
cc:Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
The Cx Register address obtained from the _CST object is used as the MWAIT
hints if the register type is FFixedHW. And it is used to check whether
the Cx type is supported or not.
On some boxes the following Cx state package is obtained from _CST object:
>{
ResourceTemplate ()
{
Register (FFixedHW,
0x01, // Bit Width
0x02, // Bit Offset
0x0000000000889759, // Address
0x03, // Access Size
)
},
0x03,
0xF5,
0x015E }
In such case we should use the bit[7:4] of Cx address to check whether
the Cx type is supported or not.
mask the MWAIT hint to avoid array address overflow
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by:Venki Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Impact: cleanup, solve 80 columns wrap problems
It would be cleaner to rename all the mpf->mpf_X fields to
mpf->X - that alone would give 4 characters per usage site.
(we already know that it's an 'mpf' entity -
no need to duplicate that in the field too)
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
intel_mp_floating should be renamed to mpf_intel.
The reason: the 'f' in MPF already means 'floating'
which means MP Floating pointer structure -
no need to repeat that in the type name.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (98 commits)
PCI PM: Put PM callbacks in the order of execution
PCI PM: Run default PM callbacks for all devices using new framework
PCI PM: Register power state of devices during initialization
PCI PM: Call pci_fixup_device from legacy routines
PCI PM: Rearrange code in pci-driver.c
PCI PM: Avoid touching devices behind bridges in unknown state
PCI PM: Move pci_has_legacy_pm_support
PCI PM: Power-manage devices without drivers during suspend-resume
PCI PM: Add suspend counterpart of pci_reenable_device
PCI PM: Fix poweroff and restore callbacks
PCI: Use msleep instead of cpu_relax during ASPM link retraining
PCI: PCIe portdrv: Add kerneldoc comments to remining core funtions
PCI: PCIe portdrv: Rearrange code so that related things are together
PCI: PCIe portdrv: Fix suspend and resume of PCI Express port services
PCI: PCIe portdrv: Add kerneldoc comments to some core functions
x86/PCI: Do not use interrupt links for devices using MSI-X
net: sfc: Use pci_clear_master() to disable bus mastering
PCI: Add pci_clear_master() as opposite of pci_set_master()
PCI hotplug: remove redundant test in cpq hotplug
PCI: pciehp: cleanup register and field definitions
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
trivial: chack -> check typo fix in main Makefile
trivial: Add a space (and a comma) to a printk in 8250 driver
trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx
trivial: Fix misspelling of "firmware" in powerpc Makefile
trivial: Fix misspelling of "firmware" in usb.c
trivial: Fix misspelling of "firmware" in qla1280.c
trivial: Fix misspelling of "firmware" in a100u2w.c
trivial: Fix misspelling of "firmware" in megaraid.c
trivial: Fix misspelling of "firmware" in ql4_mbx.c
trivial: Fix misspelling of "firmware" in acpi_memhotplug.c
trivial: Fix misspelling of "firmware" in ipw2100.c
trivial: Fix misspelling of "firmware" in atmel.c
trivial: Fix misspelled firmware in Kconfig
trivial: fix an -> a typos in documentation and comments
trivial: fix then -> than typos in comments and documentation
trivial: update Jesper Juhl CREDITS entry with new email
trivial: fix singal -> signal typo
trivial: Fix incorrect use of "loose" in event.c
trivial: printk: fix indentation of new_text_line declaration
trivial: rtc-stk17ta8: fix sparse warning
...
This patch is part of a larger patch series which will remove
the "char bus_id[20]" name string from struct device. The device
name is managed in the kobject anyway, and without any size
limitation, and just needlessly copied into "struct device".
To set and read the device name dev_name(dev) and dev_set_name(dev)
must be used. If your code uses static kobjects, which it shouldn't
do, "const char *init_name" can be used to statically provide the
name the registered device should have. At registration time, the
init_name field is cleared, to enforce the use of dev_name(dev) to
access the device name at a later time.
We need to get rid of all occurrences of bus_id in the entire tree
to be able to enable the new interface. Please apply this patch,
and possibly convert any remaining remaining occurrences of bus_id.
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-Off-By: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Mark the function local_allocate_threshold_blocks() with __cpuinit,
in order to remove the following section mismatch messages:
WARNING: arch/x86/kernel/cpu/mcheck/built-in.o(.text+0x1363): Section mismatch in reference from the function local_allocate_threshold_blocks() to the function .cpuinit.text:allocate_threshold_blocks()
The function local_allocate_threshold_blocks() references
the function __cpuinit allocate_threshold_blocks().
This is often because local_allocate_threshold_blocks lacks a __cpuinit
annotation or the annotation of allocate_threshold_blocks is wrong.
WARNING: arch/x86/kernel/cpu/built-in.o(.text+0x1def): Section mismatch in reference from the function local_allocate_threshold_blocks() to the function .cpuinit.text:allocate_threshold_blocks()
The function local_allocate_threshold_blocks() references
the function __cpuinit allocate_threshold_blocks().
This is often because local_allocate_threshold_blocks lacks a __cpuinit
annotation or the annotation of allocate_threshold_blocks is wrong.
WARNING: arch/x86/kernel/built-in.o(.text+0xef2b): Section mismatch in reference from the function local_allocate_threshold_blocks() to the function .cpuinit.text:allocate_threshold_blocks()
The function local_allocate_threshold_blocks() references
the function __cpuinit allocate_threshold_blocks().
This is often because local_allocate_threshold_blocks lacks a __cpuinit
annotation or the annotation of allocate_threshold_blocks is wrong.
All the callsites of this function are __cpuinit already, and all the
functions it calls are __cpuinit as well.
Signed-off-by: Leonardo Potenza <lpotenza@inwind.it>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: fix rcutorture bug
rcu: eliminate synchronize_rcu_xxx macro
rcu: make treercu safe for suspend and resume
rcu: fix rcutree grace-period-latency bug on small systems
futex: catch certain assymetric (get|put)_futex_key calls
futex: make futex_(get|put)_key() calls symmetric
locking, percpu counters: introduce separate lock classes
swiotlb: clean up EXPORT_SYMBOL usage
swiotlb: remove unnecessary declaration
swiotlb: replace architecture-specific swiotlb.h with linux/swiotlb.h
swiotlb: add support for systems with highmem
swiotlb: store phys address in io_tlb_orig_addr array
swiotlb: add hwdev to swiotlb_phys_to_bus() / swiotlb_sg_to_bus()
Add kprobe_insn_mutex for protecting kprobe_insn_pages hlist, and remove
kprobe_mutex from architecture dependent code.
This allows us to call arch_remove_kprobe() (and free_insn_slot) while
holding kprobe_mutex.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds swiotlb_map_page and swiotlb_unmap_page to lib/swiotlb.c and
remove IA64 and X86's swiotlb_map_page and swiotlb_unmap_page.
This also removes unnecessary swiotlb_map_single, swiotlb_map_single_attrs,
swiotlb_unmap_single and swiotlb_unmap_single_attrs.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This converts X86 and IA64 to use include/linux/dma-mapping.h.
It's a bit large but pretty boring. The major change for X86 is
converting 'int dir' to 'enum dma_data_direction dir' in DMA mapping
operations. The major changes for IA64 is using map_page and
unmap_page instead of map_single and unmap_single.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch converts dma_map_single and dma_unmap_single to use
map_page and unmap_page respectively and removes unnecessary
map_single and unmap_single in struct dma_mapping_ops.
This leaves intel-iommu's dma_map_single and dma_unmap_single since
IA64 uses them. They will be removed after the unification.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is a preparation of struct dma_mapping_ops unification. We use
map_page and unmap_page instead of map_single and unmap_single.
We will remove map_single hook in the last patch in this patchset.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is a preparation of struct dma_mapping_ops unification. We use
map_page and unmap_page instead of map_single and unmap_single.
We will remove map_single and unmap_single hooks in the last patch in
this patchset.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is a preparation of struct dma_mapping_ops unification. We use
map_page and unmap_page instead of map_single and unmap_single.
We will remove map_single and unmap_single hooks in the last patch in
this patchset.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is a preparation of struct dma_mapping_ops unification. We use
map_page and unmap_page instead of map_single and unmap_single.
We will remove map_single and unmap_single hooks in the last patch in
this patchset.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is a preparation of struct dma_mapping_ops unification. We use
map_page and unmap_page instead of map_single and unmap_single.
This is sorta temporary workaround. We will move them to lib/swiotlb.c
to enable x86's swiotlb code to directly use them.
We will remove map_single and unmap_single hooks in the last patch in
this patchset.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: use new cpumask API to reduce stack usage
Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr() with
a work_on_cpu function for read_measured_perf_ctrs().
Basically splits off the work function from get_measured_perf which is
run on the designated cpu. Moves definition of struct perf_cur out of
function local namespace, and is used as the work function argument.
References in get_measured_perf use values in the perf_cur struct.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: use new cpumask API to reduce stack usage
Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr() with
a work_on_cpu function for drv_read() and drv_write().
Basically converts do_drv_{read,write} into "work_on_cpu" functions that
are now called by drv_read and drv_write.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, reduce stack usage, use new cpumask API.
Replace the cpumask_t in struct drv_cmd with a cpumask_var_t. Remove unneeded
online_policy_cpus cpumask_t in acpi_cpufreq_target. Update refs to use
new cpumask API.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: use new cpumask API to reduce stack usage
Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr() with
a work_on_cpu function for the acpi_processor_ffh_cstate_probe() function.
Basically splits acpi_processor_ffh_cstate_probe() into two functions, the
other being acpi_processor_ffh_cstate_probe_cpu which is the work function
run on the designated cpu.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: use new cpumask API to reduce memory usage
This is part of an effort to reduce structure sizes for machines
configured with large NR_CPUS. cpumask_t gets replaced by
cpumask_var_t, which is either struct cpumask[1] (small NR_CPUS) or
struct cpumask * (large NR_CPUS).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
There's only one user, and it's a fairly easy conversion.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
swiotlb: Don't include linux/swiotlb.h twice in lib/swiotlb.c
intel-iommu: fix build error with INTR_REMAP=y and DMAR=n
swiotlb: add missing __init annotations
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] Fix on resume, now preserves user policy min/max.
[CPUFREQ] Add Celeron Core support to p4-clockmod.
[CPUFREQ] add to speedstep-lib additional fsb values for core processors
[CPUFREQ] Disable sysfs ui for p4-clockmod.
[CPUFREQ] p4-clockmod: reduce noise
[CPUFREQ] clean up speedstep-centrino and reduce cpumask_t usage
Impact: cleanup, solve 80 columns wrap problems
It would be cleaner to rename all the mpc->mpc_X fields to
mpc->X - that alone would give 4 characters per usage site.
(we already know that it's an 'mpc' entity -
no need to duplicate that in the field too)
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
It would be cleaner to rename all the mpc->oem_X fields to
mpc->X - that alone would give 4 characters per usage site.
(we already know that it's an 'oem' entity -
no need to duplicate that in the field too)
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
It would be cleaner to rename all the mpc->mpc_X fields to
mpc->X - that alone would give 4 characters per usage site.
(we already know that it's an 'mpc' entity -
no need to duplicate that in the field too)
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
It would be cleaner to rename all the mpc->mpc_X fields to
mpc->X - that alone would give 4 characters per usage site.
(we already know that it's an 'mpc' entity -
no need to duplicate that in the field too)
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
It would be cleaner to rename all the mpc->mpc_X fields to
mpc->X - that alone would give 4 characters per usage site.
(we already know that it's an 'mpc' entity -
no need to duplicate that in the field too)
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
It would be cleaner to rename all the mpc->mpc_X fields to
mpc->X - that alone would give 4 characters per usage site.
(we already know that it's an 'mpc' entity -
no need to duplicate that in the field too)
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
It would be cleaner to rename all the mpc->mpc_X fields to
mpc->X - that alone would give 4 characters per usage site.
(we already know that it's an 'mpc' entity -
no need to duplicate that in the field too)
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Fix:
WARNING: Use #include <linux/smp.h> instead of <asm/smp.h>
WARNING: Use #include <linux/percpu.h> instead of <asm/percpu.h>
WARNING: Use #include <linux/topology.h> instead of <asm/topology.h>
WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
ERROR: spaces required around that '?' (ctx:VxW)
ERROR: spaces required around that ':' (ctx:VxV)
total: 2 errors, 4 warnings
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Fix:
WARNING: Use #include <linux/io.h> instead of <asm/io.h>
WARNING: Use #include <linux/delay.h> instead of <asm/delay.h>
ERROR: trailing whitespace
WARNING: externs should be avoided in .c files
ERROR: space required after that ',' (ctx:VxV)
WARNING: space prohibited between function name and open parenthesis '('
total: 2 errors, 4 warnings
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Fix:
WARNING: Use #include <linux/acpi.h> instead of <asm/acpi.h>
WARNING: Use #include <linux/io.h> instead of <asm/io.h>
WARNING: Use #include <linux/delay.h> instead of <asm/delay.h>
ERROR: code indent should use tabs where possible
total: 1 errors, 3 warnings
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: use new cpumask API to reduce memory and stack usage
Allocate the following local cpumasks based on the number of cpus that
are present. References will use new cpumask API. (Currently only
modified for x86_64, x86_32 continues to use the *_map variants.)
cpu_callin_mask
cpu_callout_mask
cpu_initialized_mask
cpu_sibling_setup_mask
Provide the following accessor functions:
struct cpumask *cpu_sibling_mask(int cpu)
struct cpumask *cpu_core_mask(int cpu)
Other changes are when setting or clearing the cpu online, possible
or present maps, use the accessor functions.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Fix:
ERROR: space prohibited after that open parenthesis '('
ERROR: space prohibited before that close parenthesis ')'
total: 4 errors, 0 warnings
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, fix style problems, more readable
Fix:
WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
WARNING: Use #include <linux/smp.h> instead of <asm/smp.h>
ERROR: code indent should use tabs where possible
total: 9 errors, 2 warnings
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Fix:
WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
WARNING: Use #include <linux/io.h> instead of <asm/io.h>
WARNING: Use #include <linux/kdebug.h> instead of <asm/kdebug.h>
WARNING: Use #include <linux/smp.h> instead of <asm/smp.h>
ERROR: "foo * bar" should be "foo *bar"
ERROR: trailing whitespace
ERROR: spaces required around that ':' (ctx:WxO)
ERROR: spaces required around that ':' (ctx:OxW)
total: 7 errors, 4 warnings
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Fix:
WARNING: Use #include <linux/smp.h> instead of <asm/smp.h>
WARNING: Use #include <linux/nmi.h> instead of <asm/nmi.h>
WARNING: Use #include <linux/timex.h> instead of <asm/timex.h>
WARNING: line over 80 characters
ERROR: else should follow close brace '}'
total: 2 errors, 4 warnings
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
mpc_config_oemtable should be renamed to mpc_oemtable.
The reason: the 'c' in MPC already means 'config' -
no need to repeat that in the type name.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
mpc_config_lintsrc should be renamed to mpc_lintsrc.
The reason: the 'c' in MPC already means 'config' -
no need to repeat that in the type name.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
mpc_config_intsrc should be renamed to mpc_intsrc.
The reason: the 'c' in MPC already means 'config' -
no need to repeat that in the type name.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
mpc_config_ioapic should be renamed to mpc_ioapic.
The reason: the 'c' in MPC already means 'config' -
no need to repeat that in the type name.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
mpc_config_processor should be renamed to mpc_cpu.
The reason: the 'c' in MPC already means 'config' -
no need to repeat that in the type name.
Plus 'processor' is a lot longer than 'cpu' - so we try to use 'cpu' in all
type names, as much as possible.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
mpc_config_bus should be renamed to mpc_bus.
The reason: the 'c' in MPC already means 'config' -
no need to repeat that in the type name.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
mp_config_table should be renamed to mpc_table.
The reason: the 'c' in MPC already means 'config' -
no need to repeat that in the type name.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix debug/crash printout
Since errorcode is popped out, RIP is on the top of the stack.
Use real RIP value instead of wrong CS.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits)
x86: setup_per_cpu_areas() cleanup
cpumask: fix compile error when CONFIG_NR_CPUS is not defined
cpumask: use alloc_cpumask_var_node where appropriate
cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t
x86: use cpumask_var_t in acpi/boot.c
x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids
sched: put back some stack hog changes that were undone in kernel/sched.c
x86: enable cpus display of kernel_max and offlined cpus
ia64: cpumask fix for is_affinity_mask_valid()
cpumask: convert RCU implementations, fix
xtensa: define __fls
mn10300: define __fls
m32r: define __fls
h8300: define __fls
frv: define __fls
cris: define __fls
cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
cpumask: zero extra bits in alloc_cpumask_var_node
cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/
cpumask: convert mm/
...
Impact: cleanup
__alloc_bootmem and __alloc_bootmem_node do panic
for us in case of fail so no need for additional
checks here.
Also lets use pr_*() macros for printing.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Reduce inter-node memory traffic.
Reduces inter-node memory traffic (offloading the global system bus)
by allocating referenced struct cpumasks on the same node as the
referring struct.
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Reduce memory usage, use new API.
This is part of an effort to reduce structure sizes for machines
configured with large NR_CPUS. cpumask_t gets replaced by
cpumask_var_t, which is either struct cpumask[1] (small NR_CPUS) or
struct cpumask * (large NR_CPUS).
(Changes to powernow-k* by <travis>.)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: reduce stack size, use new API.
Replace cpumask_t with cpumask_var_t.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Reduce future system panics due to cpumask operations using NR_CPUS
Insure that code does not look at bits >= nr_cpu_ids as when cpumasks are
allocated based on nr_cpu_ids, these extra bits will not be defined.
Also some other minor updates:
* change in to use cpu accessor function set_cpu_present() instead of
directly accessing cpu_present_map w/cpu_clear() [arch/x86/kernel/reboot.c]
* use cpumask_of() instead of &cpumask_of_cpu() [arch/x86/kernel/reboot.c]
* optimize some cpu_mask_to_apicid_and functions.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: enables /sys/devices/system/cpu/{kernel_max,offline} user interface
By setting total_cpus, the drivers/base/cpu.c will display the
values of kernel_max (NR_CPUS-1) and the offlined cpu map.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The #ifdef's are no longer necessary when the iommu-api and the amd
iommu updates are merged together.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Impact: cleanup, reduce kernel size a bit, avoid sparse warning
Fixes sparse warning:
arch/x86/kernel/amd_iommu.c:1299:6: warning: symbol 'prealloc_protection_domains' was not declared. Should it be static?
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Imapct: add a new struct member to 'struct protection_domain'
When using protection domains for dma_ops and KVM its better to know for
which subsystem it was allocated. Add a flags member to struct
protection domain for that purpose.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Impact: save unneeded logic to add and remove domains to the list
The removal of a protection domain from the iommu_pd_list is not
necessary. Another benefit is that we save complexity because we don't
have to readd it later when the device no longer uses the domain.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Impact: split one function into three
The separate functions are required synchronize commands across all
hardware IOMMUs in the system.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Impact: change code to free pagetables from protection domains
The dma_ops_free_pagetable function can only free pagetables from
dma_ops domains. Change that to free pagetables of pure protection
domains.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Impact: function rename
The iommu_map function maps only one page. Make this clear in the
function name.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Impact: reduce printk noise
The message "alloc irq_2_pin on cpu 0 node 0" is printed way too often.
% dmesg|grep irq_2_pin|wc -l
20
Get rid of the debug printks.
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, reduce kernel size a bit
The current kernel build warns:
WARNING: vmlinux.o(.text+0x11458): Section mismatch in reference from the function swiotlb_alloc_boot() to the function .init.text:__alloc_bootmem_low()
The function swiotlb_alloc_boot() references
the function __init __alloc_bootmem_low().
This is often because swiotlb_alloc_boot lacks a __init
annotation or the annotation of __alloc_bootmem_low is wrong.
WARNING: vmlinux.o(.text+0x1011f2): Section mismatch in reference from the function swiotlb_late_init_with_default_size() to the function .init.text:__alloc_bootmem_low()
The function swiotlb_late_init_with_default_size() references
the function __init __alloc_bootmem_low().
This is often because swiotlb_late_init_with_default_size lacks a __init
annotation or the annotation of __alloc_bootmem_low is wrong.
and indeed the functions calling __alloc_bootmem_low() can be marked
__init as well.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
x86: export vector_used_by_percpu_irq
x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
sched: nominate preferred wakeup cpu, fix
x86: fix lguest used_vectors breakage, -v2
x86: fix warning in arch/x86/kernel/io_apic.c
sched: fix warning in kernel/sched.c
sched: move test_sd_parent() to an SMP section of sched.h
sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
sched: activate active load balancing in new idle cpus
sched: bias task wakeups to preferred semi-idle packages
sched: nominate preferred wakeup cpu
sched: favour lower logical cpu number for sched_mc balance
sched: framework for sched_mc/smt_power_savings=N
sched: convert BALANCE_FOR_xx_POWER to inline functions
x86: use possible_cpus=NUM to extend the possible cpus allowed
x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
x86: update io_apic.c to the new cpumask code
x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
x86: xen: use smp_call_function_many()
x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
...
Fixed up trivial conflict in kernel/time/tick-sched.c manually
* 'kvm-updates/2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (140 commits)
KVM: MMU: handle large host sptes on invlpg/resync
KVM: Add locking to virtual i8259 interrupt controller
KVM: MMU: Don't treat a global pte as such if cr4.pge is cleared
MAINTAINERS: Maintainership changes for kvm/ia64
KVM: ia64: Fix kvm_arch_vcpu_ioctl_[gs]et_regs()
KVM: x86: Rework user space NMI injection as KVM_CAP_USER_NMI
KVM: VMX: Fix pending NMI-vs.-IRQ race for user space irqchip
KVM: fix handling of ACK from shared guest IRQ
KVM: MMU: check for present pdptr shadow page in walk_shadow
KVM: Consolidate userspace memory capability reporting into common code
KVM: Advertise the bug in memory region destruction as fixed
KVM: use cpumask_var_t for cpus_hardware_enabled
KVM: use modern cpumask primitives, no cpumask_t on stack
KVM: Extract core of kvm_flush_remote_tlbs/kvm_reload_remote_mmus
KVM: set owner of cpu and vm file operations
anon_inodes: use fops->owner for module refcount
x86: KVM guest: kvm_get_tsc_khz: return khz, not lpj
KVM: MMU: prepopulate the shadow on invlpg
KVM: MMU: skip global pgtables on sync due to cr3 switch
KVM: MMU: collapse remote TLB flushes on root sync
...
Impact: cleanup, fix style problems, more readable
Fixes style problems:
WARNING: Use #include <linux/smp.h> instead of <asm/smp.h>
WARNING: Use #include <linux/acpi.h> instead of <asm/acpi.h>
WARNING: suspect code indent for conditional statements (8, 17)
WARNING: space prohibited between function name and open parenthesis '('
total: 0 errors, 5 warnings
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix crash on x86/UV
UV is the SGI "UltraViolet" machine, which is x86_64 based.
BAU is the "Broadcast Assist Unit", used for TLB shootdown in UV.
This patch removes the allocation and initialization of an unused table.
This table is left over from a development test mode. It is unused in
the present code.
And it was incorrectly initialized: 8 entries allocated but 17 initialized,
causing slab corruption.
This patch should go into 2.6.27 and 2.6.28 as well as the current tree.
Diffed against 2.6.28 (linux-next, 12/30/08)
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The pda rework (commit 3461b0af02)
to remove static boot cpu pdas introduced a performance bug.
_boot_cpu_pda is the actual pda used by the boot cpu and is definitely
not "__read_mostly" and ended up polluting the read mostly section with
writes. This bug caused regression of about 8-10% on certain syscall
intensive workloads.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Acked-by: Mike Travis <travis@sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (34 commits)
nfsd race fixes: jfs
nfsd race fixes: reiserfs
nfsd race fixes: ext4
nfsd race fixes: ext3
nfsd race fixes: ext2
nfsd/create race fixes, infrastructure
filesystem notification: create fs/notify to contain all fs notification
fs/block_dev.c: __read_mostly improvement and sb_is_blkdev_sb utilization
kill ->dir_notify()
filp_cachep can be static in fs/file_table.c
fix f_count description in Documentation/filesystems/files.txt
make INIT_FS use the __RW_LOCK_UNLOCKED initialization
take init_fs to saner place
kill vfs_permission
pass a struct path * to may_open
kill walk_init_root
remove incorrect comment in inode_permission
expand some comments (d_path / seq_path)
correct wrong function name of d_put in kernel document and source comment
fix switch_names() breakage in short-to-short case
...
* 'irq-fixes-for-linus-4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sparseirq: move __weak symbols into separate compilation unit
sparseirq: work around __weak alias bug
sparseirq: fix hang with !SPARSE_IRQ
sparseirq: set lock_class for legacy irq when sparse_irq is selected
sparseirq: work around compiler optimizing away __weak functions
sparseirq: fix desc->lock init
sparseirq: do not printk when migrating IRQ descriptors
sparseirq: remove duplicated arch_early_irq_init()
irq: simplify for_each_irq_desc() usage
proc: remove ifdef CONFIG_SPARSE_IRQ from stat.c
irq: for_each_irq_desc() move to irqnr.h
hrtimer: remove #include <linux/irq.h>
kvm_get_tsc_khz() currently returns the previously-calculated preset_lpj
value, but it is in loops-per-jiffy, not kHz. The current code works
correctly only when HZ=1000.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently, we only set the KVM paravirt signature in case
of CONFIG_KVM_GUEST. However, it is possible to have it turned
off, while CONFIG_KVM_CLOCK is turned on. This is also a paravirt
case, and should be shown accordingly.
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
On emergency_restart, we may need to use an NMI to disable virtualization
on all CPUs. We do that using nmi_shootdown_cpus() if VMX is enabled.
Note: With this patch, we will run the NMI stuff only when the CPU where
emergency_restart() was called has VMX enabled. This should work on most
cases because KVM enables VMX on all CPUs, but we may miss the small
window where KVM is doing that. Also, I don't know if all code using
VMX out there always enable VMX on all CPUs like KVM does. We have two
other alternatives for that:
a) Have an API that all code that enables VMX on any CPU should use
to tell the kernel core that it is going to enable VMX on the CPUs.
b) Always call nmi_shootdown_cpus() if the CPU supports VMX. This is
a bit intrusive and more risky, as it would run nmi_shootdown_cpus()
on emergency_reboot() even on systems where virtualization is never
enabled.
Finding a proper point to hook the nmi_shootdown_cpus() call isn't
trivial, as the non-emergency machine_restart() (that doesn't need the
NMI tricks) uses machine_emergency_restart() directly.
The solution to make this work without adding a new function or argument
to machine_ops was setting a 'reboot_emergency' flag that tells if
native_machine_emergency_restart() needs to do the virt cleanup or not.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
We need to disable virtualization extensions on all CPUs before booting
the kdump kernel, otherwise the kdump kernel booting will fail, and
rebooting after the kdump kernel did its task may also fail.
We do it using cpu_emergency_vmxoff() and cpu_emergency_svm_disable(),
that should always work, because those functions check if the CPUs
support SVM or VMX before doing their tasks.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
For KVM can reuse the type define, and need them to support shadow MTRR.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, sparseirq: clean up Kconfig entry
x86: turn CONFIG_SPARSE_IRQ off by default
sparseirq: fix numa_migrate_irq_desc dependency and comments
sparseirq: add kernel-doc notation for new member in irq_desc, -v2
locking, irq: enclose irq_desc_lock_class in CONFIG_LOCKDEP
sparseirq, xen: make sure irq_desc is allocated for interrupts
sparseirq: fix !SMP building, #2
x86, sparseirq: move irq_desc according to smp_affinity, v7
proc: enclose desc variable of show_stat() in CONFIG_SPARSE_IRQ
sparse irqs: add irqnr.h to the user headers list
sparse irqs: handle !GENIRQ platforms
sparseirq: fix !SMP && !PCI_MSI && !HT_IRQ build
sparseirq: fix Alpha build failure
sparseirq: fix typo in !CONFIG_IO_APIC case
x86, MSI: pass irq_cfg and irq_desc
x86: MSI start irq numbering from nr_irqs_gsi
x86: use NR_IRQS_LEGACY
sparse irq_desc[] array: core kernel and x86 changes
genirq: record IRQ_LEVEL in irq_desc[]
irq.h: remove padding from irq_desc on 64bits
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
hrtimers: fix warning in kernel/hrtimer.c
x86: make sure we really have an hpet mapping before using it
x86: enable HPET on Fujitsu u9200
linux/timex.h: cleanup for userspace
posix-timers: simplify de_thread()->exit_itimers() path
posix-timers: check ->it_signal instead of ->it_pid to validate the timer
posix-timers: use "struct pid*" instead of "struct task_struct*"
nohz: suppress needless timer reprogramming
clocksource, acpi_pm.c: put acpi_pm_read_slow() under CONFIG_PCI
nohz: no softirq pending warnings for offline cpus
hrtimer: removing all ur callback modes, fix
hrtimer: removing all ur callback modes, fix hotplug
hrtimer: removing all ur callback modes
x86: correct link to HPET timer specification
rtc-cmos: export second NVRAM bank
Fixed up conflicts in sound/drivers/pcsp/pcsp.c and sound/core/hrtimer.c
manually.
* 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
stacktrace: provide save_stack_trace_tsk() weak alias
rcu: provide RCU options on non-preempt architectures too
printk: fix discarding message when recursion_bug
futex: clean up futex_(un)lock_pi fault handling
"Tree RCU": scalable classic RCU implementation
futex: rename field in futex_q to clarify single waiter semantics
x86/swiotlb: add default swiotlb_arch_range_needs_mapping
x86/swiotlb: add default phys<->bus conversion
x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
x86: add swiotlb allocation functions
swiotlb: consolidate swiotlb info message printing
swiotlb: support bouncing of HighMem pages
swiotlb: factor out copy to/from device
swiotlb: add arch hook to force mapping
swiotlb: allow architectures to override phys<->bus<->phys conversions
swiotlb: add comment where we handle the overflow of a dma mask on 32 bit
rcu: fix rcutorture behavior during reboot
resources: skip sanity check of busy resources
swiotlb: move some definitions to header
swiotlb: allow architectures to override swiotlb pool allocation
...
Fix up trivial conflicts in
arch/x86/kernel/Makefile
arch/x86/mm/init_32.c
include/linux/hardirq.h
as per Ingo's suggestions.
Impact: cleanup, reduce kernel size a bit, avoid sparse warning
Fixes sparse warning:
arch/x86/kernel/xsave.c:162:5: warning: symbol 'restore_user_xstate' was not declared. Should it be static?
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: cleanup, reduce kernel size a bit, avoid sparse warning
Fixes sparse warning:
arch/x86/kernel/apic.c:270:5: warning: symbol 'x2apic_icr_read' was not declared. Should it be static?
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: cleanup, reduce kernel size a bit, avoid sparse warning
Fixes sparse warning:
arch/x86/kernel/bios_uv.c:28:18: warning: symbol 'uv_systab' was not declared. Should it be static?
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: cleanup, reduce kernel size a bit, avoid sparse warnings
Fixes sparse warnings:
arch/x86/kernel/genx2apic_phys.c:164:6: warning: symbol 'x2apic_send_IPI_self' was not declared. Should it be static?
arch/x86/kernel/genx2apic_phys.c:169:6: warning: symbol 'init_x2apic_ldr' was not declared. Should it be static?
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: cleanup, reduce kernel size a bit, avoid sparse warning
Fixes sparse warning:
arch/x86/kernel/amd_iommu.c:1299:6: warning: symbol 'prealloc_protection_domains' was not declared. Should it be static?
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: cleanup, reduce kernel size a bit, avoid sparse warning
Fixes sparse warning:
arch/x86/kernel/amd_iommu_init.c:246:13: warning: symbol 'iommu_enable' was not declared. Should it be static?
arch/x86/kernel/amd_iommu_init.c:259:13: warning: symbol 'iommu_enable_event_logging' was not declared. Should it be static?
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: cleanup
Now that arch/x86/pci/pci.h is used in a number of other places as well,
move the lowlevel x86 pci definitions into the architecture include files.
(not to be confused with the existing arch/x86/include/asm/pci.h file,
which provides public details about x86 PCI)
Tested on: X86_32_UP, X86_32_SMP and X86_64_SMP
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, reduce kernel size a bit, avoid sparse warning
Fixes sparse warning:
arch/x86/kernel/pci-gart_64.c:55:5: warning: symbol 'iommu_fullflush' was not declared. Should it be static?
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, reduce kernel size a bit, avoid sparse warning
Fixes sparse warning:
arch/x86/kernel/io_apic.c:709:6: warning: symbol 'io_apic_sync' was not declared. Should it be static?
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix section mismatch warning
Commit b2bb855491 ("x86: Remove cpumask games
in x86/kernel/cpu/intel_cacheinfo.c") introduced get_cpu_leaves(), which
references __cpuinit cpuid4_cache_lookup().
Mark get_cpu_leaves() with a __cpuinit annotation.
Signed-off-by: Sergio Luis <sergio@larces.uece.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (241 commits)
sched, trace: update trace_sched_wakeup()
tracing/ftrace: don't trace on early stage of a secondary cpu boot, v3
Revert "x86: disable X86_PTRACE_BTS"
ring-buffer: prevent false positive warning
ring-buffer: fix dangling commit race
ftrace: enable format arguments checking
x86, bts: memory accounting
x86, bts: add fork and exit handling
ftrace: introduce tracing_reset_online_cpus() helper
tracing: fix warnings in kernel/trace/trace_sched_switch.c
tracing: fix warning in kernel/trace/trace.c
tracing/ring-buffer: remove unused ring_buffer size
trace: fix task state printout
ftrace: add not to regex on filtering functions
trace: better use of stack_trace_enabled for boot up code
trace: add a way to enable or disable the stack tracer
x86: entry_64 - introduce FTRACE_ frame macro v2
tracing/ftrace: add the printk-msg-only option
tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()
x86, bts: correctly report invalid bts records
...
Fixed up trivial conflict in scripts/recordmcount.pl due to SH bits
being already partly merged by the SH merge.
Impact: extend functions with a (yet unused) parameter, update callsites
Some architectures need it - in preparation for highmem swiotlb.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix panic on null pointer with sparseirq
Some GCC versions seem to inline the weak global function,
when that function is empty.
Work it around, by making the functions return a (dummy) integer.
Signed-off-by: Yinghai <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, reduce kernel size a bit, avoid sparse warning
Fixes sparse warning:
arch/x86/kernel/apic.c:103:5: warning: symbol 'disable_x2apic' was not declared. Should it be static?
Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, avoid sparse warning
Include "../pci/pci.h" for port_cf9_safe
Fixes this sparse warning:
arch/x86/kernel/reboot.c:43:6: warning: symbol 'port_cf9_safe' was not declared. Should it be static?
Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: New API
Like cpu_coregroup_map, but returns a (const) pointer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Impact: New APIs
The old node_to_cpumask/node_to_pcibus returned a cpumask_t: these
return a pointer to a struct cpumask. Part of removing cpumasks from
the stack.
Also makes __pcibus_to_node take a const pointer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
all for_each_irq_desc() usage point have !desc check.
then its check can move into for_each_irq_desc() macro.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
enable_mtrr_cleanup is static, and is never set to anything but 0 or 1.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
On 32 bits, we may suffer IRQ 13, or supposedly we might have a buggy
implementation which gives spurious trap 16. We did not check for
this on 64 bits, but there is no reason we can't make the code the
same in both cases. Furthermore, this is presumably rare, so do the
spurious check last, instead of first.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: cleanup, avoid warning on X86_64
Fixes this warning on X86_64:
CC arch/x86/kernel/traps.o
arch/x86/kernel/traps.c:695:5: warning: "CONFIG_X86_32" is not defined
Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix a crash/hard-reboot on certain configs while enabling cpu runtime
On some archs, the boot of a secondary cpu can have an early fragile state.
On x86-64, the pda is not initialized on the first stage of a cpu boot but
it is needed to get the cpu number and the current task pointer. This data
is needed during tracing. As they were dereferenced at this stage, we got a
crash while tracing a cpu being enabled at runtime.
Some other archs like ia64 can have such kind of issue too.
Changes on v2:
We dropped the previous solution of a per-arch called function to guess the
current state of a cpu. That could slow down the tracing.
This patch removes the -pg flag on arch/x86/kernel/cpu/common.c where
the low level cpu boot functions exist, on start_secondary() and a helper
function used at this stage.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: build fix
lguest can be built as a module and makes use of this new symbol:
ERROR: "vector_used_by_percpu_irq" [drivers/lguest/lg.ko] undefined!
export it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
These commits:
commit 95d313cf1c
Author: Mike Travis <travis@sgi.com>
Date: Tue Dec 16 17:33:54 2008 -0800
x86: Add cpu_mask_to_apicid_and
and
commit 6eeb7c5a99
Author: Mike Travis <travis@sgi.com>
Date: Tue Dec 16 17:33:55 2008 -0800
x86: update add-cpu_mask_to_apicid_and to use struct cpumask*
broke interrupt delivery on x2apic platforms. As x2apic cluster mode uses
logical delivery mode, we need to use logical apicid instead of physical apicid
in x2apic_cpu_mask_to_apicid_and()
Impact: fixes the broken interrupt delivery issue on generic x2apic platforms.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Mike Travis <travis@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix lguest, clean up
32-bit lguest used used_vectors to record vectors, but that model of
allocating vectors changed and got broken, after we changed vector
allocation to a per_cpu array.
Try enable that for 64bit, and the array is used for all vectors that
are not managed by vector_irq per_cpu array.
Also kill system_vectors[], that is now a duplication of the
used_vectors bitmap.
[ merged in cpus4096 due to io_apic.c cpumask changes. ]
[ -v2, fix build failure ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When ACPI is asked to find an MADT (APIC table)
and fails, then ACPI expects to run in PIC mode.
However, if an MP Table is was found, IRQs will be
registered as if an IOAPIC is being used, even
though ACPI is configuring interrupt links links for PIC mode.
In this scenario, disable MPS so that IRQs
are registered in PIC mode, consistent with ACPI.
http://bugzilla.kernel.org/show_bug.cgi?id=12257
Signed-off-by: Len Brown <len.brown@intel.com>
In the case of multiple FPU errors, prioritize the error codes,
instead of returning __SI_FAULT, which ends up pushing a 0 as the
error code to userspace, a POSIX violation.
For i386, we will simply return if there are no errors at all; for
x86-64 this is probably a "can't happen" (and the code should be
unified), but for this patch, return __SI_FAULT|SI_KERNEL if this ever
happens.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: fix deadlock
This is in response to the following bug report:
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12100
Subject : resume (S2R) broken by Intel microcode module, on A110L
Submitter : Andreas Mohr <andi@lisas.de>
Date : 2008-11-25 08:48 (19 days old)
Handled-By : Dmitry Adamushko <dmitry.adamushko@gmail.com>
[ The deadlock scenario has been discovered by Andreas Mohr ]
I think I might have a logical explanation why the system:
(http://bugzilla.kernel.org/show_bug.cgi?id=12100)
might hang upon resuming, OTOH it should have likely hanged each and every time.
(1) possible deadlock in microcode_resume_cpu() if either 'if' section is
taken;
(2) now, I don't see it in spec. and can't experimentally verify it (newer
ucodes don't seem to be available for my Core2duo)... but logically-wise, I'd
think that when read upon resuming, the 'microcode revision' (MSR 0x8B) should
be back to its original one (we need to reload ucode anyway so it doesn't seem
logical if a cpu doesn't drop the version)... if so, the comparison with
memcmp() for the full 'struct cpu_signature' is wrong... and that's how one of
the aforementioned 'if' sections might have been triggered - leading to a
deadlock.
Obviously, in my tests I simulated loading/resuming with the ucode of the same
version (just to see that the file is loaded/re-loaded upon resuming) so this
issue has never popped up.
I'd appreciate if someone with an appropriate system might give a try to the
2nd patch (titled "fix a comparison && deadlock...").
In any case, the deadlock situation is a must-have fix.
Reported-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Tested-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: move the BTS buffer accounting to the mlock bucket
Add alloc_locked_buffer() and free_locked_buffer() functions to mm/mlock.c
to kalloc a buffer and account the locked memory to current.
Account the memory for the BTS buffer to the tracer.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: introduce new ptrace facility
Add arch_ptrace_untrace() function that is called when the tracer
detaches (either voluntarily or when the tracing task dies);
ptrace_disable() is only called on a voluntary detach.
Add ptrace_fork() and arch_ptrace_fork(). They are called when a
traced task is forked.
Clear DS and BTS related fields on fork.
Release DS resources and reclaim memory in ptrace_untrace(). This
releases resources already when the tracing task dies. We used to do
that when the traced task dies.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, avoid sparse warnings, reduce kernel size a bit
Fixes these sparse warnings:
arch/x86/kernel/cpu/common.c:869:6: warning: symbol 'boot_cpu_stack' was not declared. Should it be static?
arch/x86/kernel/cpu/common.c:910:6: warning: symbol 'boot_exception_stacks' was not declared. Should it be static?
Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: reduce kconfig variable scope and clean up
Bartlomiej pointed out that the config dependencies and comments are not right.
update it depend to NUMA, and fix some comments
Reported-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On some machines it may be necessary to disable the saving/restoring
of the ACPI NVS memory region during hibernation/resume. For this
purpose, introduce new ACPI kernel command line option
acpi_sleep=s4_nonvs.
Based on a patch by Zhang Rui.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
Introduce new initcall for marking the ACPI NVS memory at startup, so
that it can be saved/restored during hibernation/resume.
Based on a patch by Zhang Rui.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>