i386 arch cleanup. Introduce the serialize macro to serialize processor
state. Why the microcode update needs it I am not quite sure, since wrmsr()
is already a serializing instruction, but it is a microcode update, so I will
keep the semantic the same, since this could be a timing workaround. As far
as I can tell, this has always been there since the original microcode update
source.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
i386 Inline asm cleanup. Use cr/dr accessor functions.
Also, a potential bugfix. Also, some CR accessors really should be volatile.
Reads from CR0 (numeric state may change in an exception handler), writes to
CR4 (flipping CR4.TSD) and reads from CR2 (page fault) prevent instruction
re-ordering. I did not add memory clobber to CR3 / CR4 / CR0 updates, as it
was not there to begin with, and in no case should kernel memory be clobbered,
except when doing a TLB flush, which already has memory clobber.
I noticed that page invalidation does not have a memory clobber. I can't find
a bug as a result, but there is definitely a potential for a bug here:
#define __flush_tlb_single(addr) \
__asm__ __volatile__("invlpg %0": :"m" (*(char *) addr))
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is subarch update for ES7000. I've modified platform check code and
removed unnecessary OEM table parsing for newer systems that don't use OEM
information during boot. Parsing the table in fact is causing problems,
and the platform doesn't get recognized. The patch only affects the ES7000
subach.
Signed-off-by: <Natalie.Protasevich@unisys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
i386 generic subarchitecture requires explicit dmi strings or command line
to enable bigsmp mode. The patch below removes that restriction, and uses
bigsmp as soon as it finds more than 8 logical CPUs, Intel processors and
xAPIC support.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The memory descriptors that comprise the EFI memory map are not fixed in
stone such that the size could change in the future. This uses the memory
descriptor size obtained from EFI to iterate over the memory map entries
during boot. This enables the removal of an x86 specific pad (and ifdef)
in the EFI header. I also couldn't stomach the broken up nature of the
function to put EFI runtime calls into virtual mode any longer so I fixed
that up a bit as well.
For reference, this patch only impacts x86.
Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new accessor for PTEs, which passes the full hint from the mmu_gather
struct; this allows architectures with hardware pagetables to optimize away
atomic PTE operations when destroying an address space. Removing the
locked operation should allow better pipelining of memory access in this
loop. I measured an average savings of 30-35 cycles per zap_pte_range on
the first 500 destructions on Pentium-M, but I believe the optimization
would win more on older processors which still assert the bus lock on xchg
for an exclusive cacheline.
Update: I made some new measurements, and this saves exactly 26 cycles over
ptep_get_and_clear on Pentium M. On P4, with a PAE kernel, this saves 180
cycles per ptep_get_and_clear, for a whopping 92160 cycles savings for a
full address space destruction.
pte_clear_full is not yet used, but is provided for future optimizations
(in particular, when running inside of a hypervisor that queues page table
updates, the full hint allows us to avoid queueing unnecessary page table
update for an address space in the process of being destroyed.
This is not a huge win, but it does help a bit, and sets the stage for
further hypervisor optimization of the mm layer on all architectures.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Christoph Lameter <christoph@lameter.com>
Cc: <linux-mm@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is used only in slab.c and each architecture gets to define whcih
underlying type is to be used.
Seems a bit silly - move it to slab.c and use the same type for all
architectures: unsigned int.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I don't think we need to call hugetlb_clean_stale_pgtable() anymore
in 2.6.13 because of the rework with free_pgtables(). It now collect
all the pte page at the time of munmap. It used to only collect page
table pages when entire one pgd can be freed and left with staled pte
pages. Not anymore with 2.6.13. This function will never be called
and We should turn it into a BUG_ON.
I also spotted two problems here, not Adam's fault :-)
(1) in huge_pte_alloc(), it looks like a bug to me that pud is not
checked before calling pmd_alloc()
(2) in hugetlb_clean_stale_pgtable(), it also missed a call to
pmd_free_tlb. I think a tlb flush is required to flush the mapping
for the page table itself when we clear out the pmd pointing to a
pte page. However, since hugetlb_clean_stale_pgtable() is never
called, so it won't trigger the bug.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: Adam Litke <agl@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a macro pte_huge(pte) for i386/x86_64 which is needed by a
patch later in the series. Instead of repeating (_PAGE_PRESENT |
_PAGE_PSE), I've added __LARGE_PTE to i386 to match x86_64.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: <linux-mm@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
_PAGE_FILE does not indicate whether a file is in page / swap cache, it is
set just for non-linear PTE's. Correct the comment for i386, x86_64, UML.
Also clearify _PAGE_NONE.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Someone mentioned that almost all the architectures used basically the same
implementation of get_order. This patch consolidates them into
asm-generic/page.h and includes that in the appropriate places. The
exceptions are ia64 and ppc which have their own (presumably optimised)
versions.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chuck Ebbert noticed that the desc_empty macro is incorrect. Fix it.
Thankfully, this is not used as a security check, but it can falsely
overwrite TLS segments with carefully chosen base / limits. I do not
believe this is an issue in practice, but it is a kernel bug.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Chris Wright <chrisw@osdl.org>
[ x86-64 had the same problem, and the same fix. Linus ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reverts commits
71db63acff
[PATCH] increase PCIBIOS_MIN_IO on x86
and
0b2bfb4e7f
ACPI: increase PCIBIOS_MIN_IO on x86
since Lukas Sandströ<lukass@etek.chalmers.se> reports that this breaks
his on-board nvidia audio.
We should re-visit this later. For now we revert the change
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In file included from linux-2.6.13-rc5/arch/i386/kernel/timers/timer_pit.c:20:
linux-2.6.13-rc5/include/asm-i386/mach-visws/do_timer.h: In function `do_timer_overflow':
linux-2.6.13-rc5/include/asm-i386/mach-visws/do_timer.h:32: error: `i8259A_lock' undeclared (first use in this function)
linux-2.6.13-rc5/include/asm-i386/mach-visws/do_timer.h:32: error: (Each undeclared identifier is reported only once
linux-2.6.13-rc5/include/asm-i386/mach-visws/do_timer.h:32: error: for each function it appears in.)
make[3]: *** [arch/i386/kernel/timers/timer_pit.o] Error 1
make[2]: *** [arch/i386/kernel/timers] Error 2
make[1]: *** [arch/i386/kernel] Error 2
make: *** [_all] Error 2
Signed-off-by: Tom Duffy <thomas.duffy.99@alumni.brown.edu>
Cc: Andrey Panin <pazke@orbita1.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is a number of x86 laptops that have some non-PCI IO ports
in the 0x1000-0x1fff range, and it's quite hard to control the correct
order of resource allocation between PCI and other subsystems controlling
these ports. Especially with modular kernel.
So just increase PCIBIOS_MIN_IO to 0x4000 to prevent any new PCI
resource allocations in the problematic range (this limitation must
apply _only_ to the root bus resources - see Linus' change in
pci_bus_alloc_resource). As PCIBIOS_MIN_IO and PCIBIOS_MIN_CARDBUS_IO
are the same now on i386 and x86-64, we can remove the latter.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some edge problems with the original C rewrite.
Thanks go to Cal Peake, who pinpointed the breakage to the rewrite, and
tested this fixed version.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Avoid using "rep scas", just let the compiler select a sequence of
regular instructions.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
include/asm/ptrace.h: In function `user_mode_vm':
include/asm/ptrace.h:67: `VM_MASK' undeclared (first use in this function)
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- make the new user_mode() return 0 or 1 (same as x86_64)
- remove conditional jump from user_mode_vm() it's called every timer
tick on each CPU on SMP)
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
set_cpus_allowed is not safe in interrupt context
and disabling apics is complicated code so don't
call machine_shutdown on i386 from emergency_restart().
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When the kernel is working well and we want to restart cleanly
kernel_restart is the function to use. But in many instances
the kernel wants to reboot when thing are expected to be working
very badly such as from panic or a software watchdog handler.
This patch adds the function emergency_restart() so that
callers can be clear what semantics they expect when calling
restart. emergency_restart() is expected to be callable
from interrupt context and possibly reliable in even more
trying circumstances.
This is an initial generic implementation for all architectures.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This one ends up using an inline asm format that claims to read memory
and then clobber it (rather than just write it directly), which made it
easier to use the existing "alternative_input()" infrastructure support.
Now the fxsave code matches the fxrstor.
It's really just a single instruction, conditional on whether the CPU
supports FXSR or not, so implement it as such instead of making it a
function that queries FXSR dynamically.
This means that the instruction just gets automatically rewritten to the
correct one at boot-time.
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:
* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?
inotify provides a more usable, simple, powerful solution to file change
notification:
* inotify's interface is a system call that returns a fd, not SIGIO.
You get a single fd, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.
Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.
See Documentation/filesystems/inotify.txt.
Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
http://bugme.osdl.org/show_bug.cgi?id=4016
Written-by: David Shaohua Li <shaohua.li@intel.com>
Acked-by: Adam Belay <abelay@novell.com>
Signed-off-by: Len Brown <len.brown@intel.com>
If CONFIG_NUMA isn't set, we use the define in <linux/mmzone.h> for
early_pfn_to_nid (which defines it to 0).
Because of this, the prototype needs to move inside the CONFIG_NUMA too, or
anal gcc's get really confused.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The recent cleanups to asm-i386/mmzone.h were suboptimal nesting an ifdef of
the same symbol. This patch removes some of the ifdef'ery to make things more
readable again.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There has been some discuss about solving the SMP MTRR suspend/resume
breakage, but I didn't find a patch for it. This is an intent for it. The
basic idea is moving mtrr initializing into cpu_identify for all APs (so it
works for cpu hotplug). For BP, restore_processor_state is responsible for
restoring MTRR.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove legacy ISA serial ports for Accent, Boca, Fourport, Hub6 and MCA
from the architecture specific serial.h include.
The only ports which remain in asm-*/serial.h are the platform specific
entries. These should really be converted by platform maintainers to
use a platform device, such as can be found in
arch/arm/mach-footbridge/isa.c
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
With CONFIG_PCI=n:
In file included from include/linux/pci.h:917,
from lib/iomap.c:6:
include/asm/pci.h:104: warning: `enum pci_dma_burst_strategy' declared inside parameter list
include/asm/pci.h:104: warning: its scope is only this definition or declaration, which is probably not what you want.
include/asm/pci.h: In function `pci_dma_burst_advice':
include/asm/pci.h:106: dereferencing pointer to incomplete type
include/asm/pci.h:106: `PCI_DMA_BURST_INFINITY' undeclared (first use in this function)
include/asm/pci.h:106: (Each undeclared identifier is reported only once
include/asm/pci.h:106: for each function it appears in.)
make[1]: *** [lib/iomap.o] Error 1
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
After seeing, at best, "guesses" as to the following kind
of information in several drivers, I decided that we really
need a way for platforms to specifically give advice in this
area for what works best with their PCI controller implementation.
Basically, this new interface gives DMA bursting advice on
PCI. There are three forms of the advice:
1) Burst as much as possible, it is not necessary to end bursts
on some particular boundary for best performance.
2) Burst on some byte count multiple. A DMA burst to some multiple of
number of bytes may be done, but it is important to end the burst
on an exact multiple for best performance.
The best example of this I am aware of are the PPC64 PCI
controllers, where if you end a burst mid-cacheline then
chip has to refetch the data and the IOMMU translations
which hurts performance a lot.
3) Burst on a single byte count multiple. Bursts shall end
exactly on the next multiple boundary for best performance.
Sparc64 and Alpha's PCI controllers operate this way. They
disconnect any device which tries to burst across a cacheline
boundary.
Actually, newer sparc64 PCI controllers do not have this behavior.
That is why the "pdev" is passed into the interface, so I can
add code later to check which PCI controller the system is using
and give advice accordingly.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Old ISA/VESA systems sometimes put tertiary IDE controllers at addresses
0x1e8, 0x168, 0x1e0 or 0x160. Linux thus probes these addresses on x86
systems. Unfortunately some PCI systems now use these addresses for other
purposes which leads to users seeing minute plus hangs during boot or even
crashes.
The following patch (again has been in Fedora for a while) only probes the
obscure legacy ISA ports on machinea that are pre-PCI. This seems to keep
everyone happy and if there is someone with that utterly weird corner case
the ide= command line still provides a get out of jail card.
Unsurprisingly we've not found anyone so affected.
Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I believe at least for seccomp it's worth to turn off the tsc, not just for
HT but for the L2 cache too. So it's up to you, either you turn it off
completely (which isn't very nice IMHO) or I recommend to apply this below
patch.
This has been tested successfully on x86-64 against current cogito
repository (i686 compiles so I didn't bother testing ;). People selling
the cpu through cpushare may appreciate this bit for a peace of mind.
There's no way to get any timing info anymore with this applied
(gettimeofday is forbidden of course). The seccomp environment is
completely deterministic so it can't be allowed to get timing info, it has
to be deterministic so in the future I can enable a computing mode that
does a parallel computing for each task with server side transparent
checkpointing and verification that the output is the same from all the 2/3
seller computers for each task, without the buyer even noticing (for now
the verification is left to the buyer client side and there's no
checkpointing, since that would require more kernel changes to track the
dirty bits but it'll be easy to extend once the basic mode is finished).
Eliminating a cold-cache read of the cr4 global variable will save one
cacheline during the tlb flush while making the code per-cpu-safe at the
same time. Thanks to Mikael Pettersson for noticing the tlb flush wasn't
per-cpu-safe.
The global tlb flush can run from irq (IPI calling do_flush_tlb_all) but
it'll be transparent to the switch_to code since the IPI won't make any
change to the cr4 contents from the point of view of the interrupted code
and since it's now all per-cpu stuff, it will not race. So no need to
disable irqs in switch_to slow path.
Signed-off-by: Andrea Arcangeli <andrea@cpushare.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This updates the CFQ io scheduler to the new time sliced design (cfq
v3). It provides full process fairness, while giving excellent
aggregate system throughput even for many competing processes. It
supports io priorities, either inherited from the cpu nice value or set
directly with the ioprio_get/set syscalls. The latter closely mimic
set/getpriority.
This import is based on my latest from -mm.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch provides the interfaces necessary to read the dump contents,
treating it as a high memory device.
Signed off by Hariprasad Nellitheertha <hari@in.ibm.com>
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
o Following patch exports kexec global variable "crash_notes" to user space
through sysfs as kernel attribute in /sys/kernel.
Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is the i386 implementation of kexec.
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For one kernel to report a crash another kernel has created we need
to have 2 kernels loaded simultaneously in memory. To accomplish this
the two kernels need to built to run at different physical addresses.
This patch adds the CONFIG_PHYSICAL_START option to the x86 kernel
so we can do just that. You need to know what you are doing and
the ramifications are before changing this value, and most users
won't care so I have made it depend on CONFIG_EMBEDDED
bzImage kernels will work and run at a different address when compiled
with this option but they will still load at 1MB. If you need a kernel
loaded at a different address as well you need to boot a vmlinux.
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When coming out of apic mode attempt to set the appropriate
apic back into virtual wire mode. This improves on previous versions
of this patch by by never setting bot the local apic and the ioapic
into veritual wire mode.
This code looks at data from the mptable to see if an ioapic has
an ExtInt input to make this decision. A future improvement
is to figure out which apic or ioapic was in virtual wire mode
at boot time and to remember it. That is potentially a more accurate
method, of selecting which apic to place in virutal wire mode.
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: "Maciej W. Rozycki" <macro@linux-mips.org>
Fix a kexec problem whcih causes local APIC detection failure.
The problem is detect_init_APIC() is called early, before the command line
have been processed. Therefore "lapic" (and "nolapic") have not been seen,
yet.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: "Maciej W. Rozycki" <macro@linux-mips.org>
Rename APIC_MODE_EXINT to APIC_MODE_EXTINT - I think it should be named
after what the mode is called in documentation.
From: "Eric W. Biederman" <ebiederm@lnxi.com>
I have reduced this patch to just the name change in the header. And
integrated the changes into the patches that add those
lines. Otherwise I ran into some ugly dependencies.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Do some basic initial tuning.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>