Working with ftrace I would get large jumps of 11 millisecs or more with
the clock tracer. This killed the latencing timings of ftrace and also
caused the irqoff self tests to fail.
What was happening is with NO_HZ the idle would stop the jiffy counter and
before the jiffy counter was updated the sched_clock would have a bad
delta jiffies to compare with the gtod with the maximum.
The jiffies would stop and the last sched_tick would record the last gtod.
On wakeup, the sched clock update would compare the gtod + delta jiffies
(which would be zero) and compare it to the TSC. The TSC would have
correctly (with a stable TSC) moved forward several jiffies. But because the
jiffies has not been updated yet the clock would be prevented from moving
forward because it would appear that the TSC jumped too far ahead.
The clock would then virtually stop, until the jiffies are updated. Then
the next sched clock update would see that the clock was very much behind
since the delta jiffies is now correct. This would then jump the clock
forward by several jiffies.
This caused ftrace to report several milliseconds of interrupts off
latency at every resume from NO_HZ idle.
This patch adds hooks into the nohz code to disable the checking of the
maximum clock update when nohz is in effect. It resumes the max check
when nohz has updated the jiffies again.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It has been suggested that I add a way to disable the function tracer
on an oops. This code adds a ftrace_kill_atomic. It is not meant to be
used in normal situations. It will disable the ftrace tracer, but will
not perform the nice shutdown that requires scheduling.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add pseudo-feature bits to describe whether the CPU supports sysenter
and/or syscall from ia32-compat userspace. This removes a hardcoded
test in vdso32-setup.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
when more than 4g memory is installed, don't map the big hole below 4g.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
tun: Persistent devices can get stuck in xoff state
xfrm: Add a XFRM_STATE_AF_UNSPEC flag to xfrm_usersa_info
ipv6: missed namespace context in ipv6_rthdr_rcv
netlabel: netlink_unicast calls kfree_skb on error path by itself
ipv4: fib_trie: Fix lookup error return
tcp: correct kcalloc usage
ip: sysctl documentation cleanup
Documentation: clarify tcp_{r,w}mem sysctl docs
netfilter: nf_nat_snmp_basic: fix a range check in NAT for SNMP
netfilter: nf_conntrack_tcp: fix endless loop
libertas: fix memory alignment problems on the blackfin
zd1211rw: stop beacons on remove_interface
rt2x00: Disable synchronization during initialization
rc80211_pid: Fix fast_start parameter handling
sctp: Add documentation for sctp sysctl variable
ipv6: fix race between ipv6_del_addr and DAD timer
irda: Fix netlink error path return value
irda: New device ID for nsc-ircc
irda: via-ircc proper dma freeing
sctp: Mark the tsn as received after all allocations finish
...
Add a XFRM_STATE_AF_UNSPEC flag to handle the AF_UNSPEC behavior for
the selector family. Userspace applications can set this flag to leave
the selector family of the xfrm_state unspecified. This can be used
to to handle inter family tunnels if the selector is not set from
userspace.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
NR_IRQS: let VISWS be just a sub-case of the generic code.
This can create a somewhat larger irq_desc[] array if NR_CPUS is high
but that should not worry VisWS which has 4 CPUs at most.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
move the SGIVW definitions from setup_arch.h into its own header file.
preparation for turning VISWS into a generic PC architecture.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
move the include/asm-x86/mach-visws/ VISWS specific hardware
details include files into include/asm-x86/visws, to be used from
generic code.
No code changed.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
update asm-x86/mach-visws/mach_apicdef.h to the generic version.
This should work fine as VISWS has a standard local APIC and thus
its mach_apicdef.h copy is just an ancient version of the generic code.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
now that include/asm-x86/mach-visws/smpboot_hooks.h equals
to the default file in ../mach-default/smpboot_hooks.h, simply
include it instead of maintaining a copy.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
update include/asm-x86/mach-visws/smpboot_hooks.h to
include/asm-x86/mach-default/smpboot_hooks.h (the generic version).
this _should_ work, because VISWS sets skip_ioapic_setup, but it
should be tested on a real VISWS to make sure.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Allow the generic smpboot quirks code to be built with
ONFIG_X86_IO_APIC disabled. This way VISWS will be able
to use it as-is.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
now that include/asm-x86/mach-visws/mach_apic.h equals
to include/asm-x86/mach-default/mach_apic.h, simply start
using the generic one.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add early quirks support.
In preparation of enabling the generic architecture to boot on a VISWS.
This will allow us to remove the VISWS subarch and all its complications.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Provide a helper to load the file and validate it in one call, to
simplify error handling in the drivers which are going to use it.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Some devices need their firmware as a set of {address, len, data...}
records in some specific order rather than a simple blob.
The normal way of doing this kind of thing is 'ihex', which is a text
format and not entirely suitable for use in the kernel.
This provides a binary representation which is very similar, but much
more compact -- and a helper routine to skip to the next record,
because the alignment constraints mean that everybody will screw it up
for themselves otherwise.
Also a helper function which can verify that a 'struct firmware'
contains a valid set of ihex records, and that following them won't run
off the end of the loaded data.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Some drivers have their own hacks to bypass the kernel's firmware loader
and build their firmware into the kernel; this renders those unnecessary.
Other drivers don't use the firmware loader at all, because they always
want the firmware to be available. This allows them to start using the
firmware loader.
A third set of drivers already use the firmware loader, but can't be
used without help from userspace, which sometimes requires an initrd.
This allows them to work in a static kernel.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
In preparation for supporting firmware files linked into the static
kernel, make fw->data const to ensure that users aren't modifying it (so
that we can pass a pointer to the original in-kernel copy, rather than
having to copy it).
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
All new crypto interfaces should go into individual files as much
as possible in order to ensure that crypto.h does not collapse under
its own weight.
This patch moves the ahash code into crypto/hash.h and crypto/internal/hash.h
respectively.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds the walking helpers for hash algorithms akin to
those of block ciphers. This is a necessary step before we can
reimplement existing hash algorithms using the new ahash interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The base field in ahash_tfm appears to have been cut-n-pasted from
ablkcipher. It isn't needed here at all. Similarly, the info field
in ahash_request also appears to have originated from its cipher
counter-part and is vestigial.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Support for the at91sam9g20 : Atmel 400Mhz ARM 926ej-s SOC.
AT91sam9g20 is an evolution of the at91sam9260 with a faster clock
speed.
We created a new board for this device but based the chip support
directly on 9260 files with little updates.
Here is the chip page on Atmel wabsite:
http://atmel.com/dyn/products/product_card.asp?part_id=4337
Signed-off-by: Sedji Gaouaou <sedji.gaouaou@atmel.com>
Signed-off-by: Justin Waters <justin.waters@timesys.com>
Acked-by: Andrew Victor <linux@maxim.org.za>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
mach-default/entry_arch.h is exactly the same file as
mach-visws/entry_arch.h, so include the first from the second,
so that updates to the generic one get picked up by VISWS as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix:
arch/x86/kernel/built-in.o: In function `smp_intr_init':
(.init.text+0x49e2): undefined reference to `call_function_single_interrupt'
Caused by include/asm-x86/mach-visws/entry_arch.h getting out of sync
with the include/asm-x86/mach-default/entry_arch.h file it derives from.
Copy the default file over - next step will be to simply include the default
file.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This remove lots of duplications in iommu.h and gart.h.
The end result of this patch is:
- iommu.h is a header file for everyone related with IOMMUs.
- gart.h is the private header file. Only pci-gart_64.c and its friends
include it.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: fujita.tomonori@lab.ntt.co.jp
Signed-off-by: Ingo Molnar <mingo@elte.hu>
A bunch of things in alsa depend on CONFIG_KMOD,
use CONFIG_MODULES instead where the dependency
is needed at all.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
This patch adds several functions for DAI control and config
and replaces the current method of calling function pointers within
the DAI struct.
Signed-off-by: Liam Girdwood <lg@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
This patch series merges struct snd_soc_codec_dai and struct
snd_soc_cpu_dai into struct snd_soc_dai in preparation for further
ASoC v2 patches.
This merger removes duplication in both DAI structures and simplifies
the API for other users.
Signed-off-by: Liam Girdwood <lg@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
and let 64-bit to fall back to use fixmap too.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix:
arch/x86/kernel/built-in.o: In function `dmi_ignore_irq0_timer_override':
boot.c:(.init.text+0x3ea4): undefined reference to `force_mask_ioapic_irq_2'
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch brings support for gpio/gpiolib framework to Intel IOP3xx
platforms.
Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
As well as moving all the device declarations to a single one in devices.c
this causes all platforms to register the I/O and interrupt resources for
the AC97 controller.
Cc: eric miao <eric.miao@marvell.com>
Cc: Mike Rapoport <mike@compulab.co.il>
Cc: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: Jürgen Schindele <linux@schindele.name>
Cc: Juergen Beisert <jbe@pengutronix.de>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Provide a set of functions to control state of pins dedicated to IrDA.
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This is some preliminary work to improve TLB management on SW loaded
TLB powerpc platforms. This introduce support for non-atomic PTE
operations in pgtable-ppc32.h and removes write back to the PTE from
the TLB miss handlers. In addition, the DSI interrupt code no longer
tries to fixup write permission, this is left to generic code, and
_PAGE_HWWRITE is gone.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
The problems are that, with the ACPI vs timer overring issue _fixed_,
after using the box for some time (between several seconds and 1 hour, at
random) processes get very high CPU loads (once I've got X using 107% of
the CPU, for example) and the system becomes unresponsive, as though there
were interrupts lost or something similar.
Andreas Herrman reproduced similar problems:
> Ok, now I've reproduced the stability problem.
> - Using tip/master,
> - reverting e38502eb8aa82314d5ab0eba45f50e6790dadd88 and
> - applying your patch from this posting
> http://marc.info/?l=linux-kernel&m=121539354224562&w=4
>
> Starting X, firefox, gimp, tuxpaint and doing some drawing in tuxpaint
> results in a slow system. Drawing is almost not possible anymore --
> Selections of new colors, cursors etc. is performed with huge delay
> if it's performed at all.
>
> BTW, the code sets up timer IRQ as Virtual Wire IRQ:
>
> Jul 8 14:57:58 kodscha IO-APIC (apicid-pin) 2-22, 2-23 not connected.
> Jul 8 14:57:58 kodscha ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> Jul 8 14:57:58 kodscha ...trying to set up timer as Virtual Wire IRQ... works.
>
> and both INT0 and INT2 of IOAPIC are masked:
>
> Jul 8 14:57:58 kodscha NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
> Jul 8 14:57:58 kodscha 00 000 1 0 0 0 0 0 0 00
> Jul 8 14:57:58 kodscha 01 003 0 0 0 0 0 1 1 31
> Jul 8 14:57:58 kodscha 02 003 1 0 0 0 0 0 0 30
>
> I've also seen strange CPU utilization -- with syslog-ng:
>
> top - 15:33:06 up 35 min, 4 users, load average: 1.70, 0.68, 0.37
> Tasks: 64 total, 4 running, 60 sleeping, 0 stopped, 0 zombie
> Cpu0 : 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu1 : 6.4%us, 87.2%sy, 0.0%ni, 5.8%id, 0.0%wa, 0.6%hi, 0.0%si, 0.0%st
> Mem: 895384k total, 283568k used, 611816k free, 35492k buffers
> Swap: 1959920k total, 0k used, 1959920k free, 163044k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 4632 root 20 0 17216 800 580 S 104 0.1 0:34.22 syslog-ng
> 28505 root 20 0 205m 11m 4024 S 6 1.3 0:21.16 X
> 28518 root 20 0 56292 5652 4492 S 1 0.6 0:01.80 fluxbox
> 1 root 20 0 3724 608 508 S 0 0.1 0:00.36 init
>
> So far I have no clue why C1E-idle in conjunction with virtual wire
> mode causes this strange behaviour.
>
> ... and I start to think about the root cause of all this.
>
> I've performed similar tests under X with the IRQ0/INT0 configuration and
> I did not see above symptoms.
So lets fall back to the IRQ0/INT0 configuration on this box.
This basically restores the dont-use-the-lapic-timer exception mechanism
that was unconditional on this box prior commit 8750bf5 ("x86: add C1E
aware idle function").
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Hmm, looks like it would be nice to have more cleanups of iommu.h and
gart.h.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When system have 4g less ram installed, and acpi table sit
near end of ram, make max_pfn cover them too,
so 64bit kernel don't need to mess up fixmap.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: "Suresh Siddha" <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove them from the arch-specific file.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86_64 does not need it, but it won't have X86_INTEL_USERCOPY
defined either.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We also carry the unaligned version with us. Only x86_64 uses
it, but there's no problem in defining it.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move both versions, which are highly similar, to uaccess.h.
Note that, for x86_64, X86_WP_WORKS_OK is always defined.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We also check user pointer in x86_64 put_user, the way i386 does.
In a separate patch for bisecting purposes.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For both __put_user_x and __put_user_8 macros, pass the error
variable explicitly.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move __get_user_asm and __get_user_size and __get_user_nocheck
to uaccess.h. This requires us to define a macro at __get_user_size
for the 64-bit access case.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Let the user of the macro specify the desired return.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move both __put_user_asm and __put_user_size to
uaccess.h. i386 already had a special function for 64-bit access,
so for x86_64, we just define a macro with the same name.
Note that for X86_64, CONFIG_X86_WP_WORKS_OK will always
be defined, so the #else part will never be even compiled in.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Let the user of the macro specify the desired return.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Do it in a separate patch for bisectability.
Goal is to have put_user_size integrated.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Take it out of uaccess_32.h. Since it seems that no users
of the x86_64 exists, we simply pick the i386 version.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Merge versions of getuser from uaccess_32.h and uaccess_64.h into
uaccess.h. There is a part which is 64-bit only (for now), and for
that, we use a __get_user_8 macro.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Common parts of uaccess_32.h and uaccess_64.h
are put in uaccess.h. Bits in uaccess_32.h and
uaccess_64.h that come to this file are equal
except for comments and whitespaces differences.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Using explicit hexa (0xFFFFFFUL) introduces an unnecessary difference
between i386 and x86_64 because of the size of their long. Use -1UL instead.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Do not refer to the processor word-size with int, as it won't
work with x86_64. Use long instead.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Put the likely hint in access_ok. Just for
bisectability.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Our integration efforts broke a build with this function being used
with i386. Reason is "g" can put the operand in an imm32, which according
to The Book (tm), is invalid as the second operand.
This is actually a bug
in x86_64 too, since the x86_64 instruction set reference does not list
it as valid.
We probably didn't trigger this before due to the ammount of
registers available for 64-bit platforms. But that's just my guess.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For i386, __range_not_ok is a better name than __range_ok, since
it returns 0 when it is in fact okay. Other than that,
both versions does not need the word size specifiers, and we remove them.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In putuser_32.S and putuser_64.S, replace things like .quad, .long,
and explicit references to [r|e]ax for the apropriate macros
in asm/asm.h.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is consistent with i386 usage.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Instead of clobbering r8, clobber rbx, which is the i386 way.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Follow the pattern, and define a single put_user_x, instead
of defining macros for all available sizes. Exception is
put_user_8, since the "A" constraint does not give us enough
power to specify which register (a or d) to use in the 32-bit
common case.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Clobber it in the inline asm macros, and let the compiler do this for us.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
getuser_32.S and getuser_64.S are merged into getuser.S.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There are situations in which the architecture wants to use the
register that represents its word-size, whatever it is. For those,
introduce __ASM_REG in asm.h, along with the first users _ASM_AX
and _ASM_DX. They have users waiting for it, namely the getuser
functions.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There's really no reason to clobber r8 or pass the address in rcx.
We can safely use only two registers (which we already have to touch anyway)
to do the job.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Allow an application to enable Strong Access Ordering on specific pages of
memory on Power 7 hardware. Currently, power has a weaker memory model than
x86. Implementing a stronger memory model allows an emulator to more
efficiently translate x86 code into power code, resulting in faster code
execution.
On Power 7 hardware, storing 0b1110 in the WIMG bits of the hpte enables
strong access ordering mode for the memory page. This patchset allows a
user to specify which pages are thus enabled by passing a new protection
bit through mmap() and mprotect(). I have defined PROT_SAO to be 0x10.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add the CPU feature bit for the new Strong Access Ordering
facility of Power7
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch defines:
- PROT_SAO, which is passed into mmap() and mprotect() in the prot field
- VM_SAO in vma->vm_flags, and
- _PAGE_SAO, the combination of WIMG bits in the pte that enables strong
access ordering for the page.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch allows architectures to define functions to deal with
additional protections bits for mmap() and mprotect().
arch_calc_vm_prot_bits() maps additonal protection bits to vm_flags
arch_vm_get_page_prot() maps additional vm_flags to the vma's vm_page_prot
arch_validate_prot() checks for valid values of the protection bits
Note: vm_get_page_prot() is now pretty ugly, but the generated code
should be identical for architectures that don't define additional
protection bits.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The task_pt_regs() macro allows access to the pt_regs of a given task.
This macro is not currently defined for the powerpc architecture, but
we need it for some upcoming utrace additions.
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Move device_to_mask() to dma-mapping.h because we need to use it from
outside dma_64.c in a later patch.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Update powerpc to use the new dma_*map*_attrs() interfaces. In doing so
update struct dma_mapping_ops to accept a struct dma_attrs and propagate
these changes through to all users of the code (generic IOMMU and the
64bit DMA code, and the iseries and ps3 platform code).
The old dma_*map_*() interfaces are reimplemented as calls to the
corresponding new interfaces.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Make iommu_map_sg take a struct iommu_table. It did so before commit
740c3ce667 (iommu sg merging: ppc: make
iommu respect the segment size limits).
This stops the function looking in the archdata.dma_data for the iommu
table because in the future it will be called with a device that has
no table there.
This also has the nice side effect of making iommu_map_sg() match the
other map functions.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
As nr_active counter includes also spus waiting for syscalls to return
we need a seperate counter that only counts spus that are currently running
on spu side. This counter shall be used by a cpufreq governor that targets
a frequency dependent from the number of running spus.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
fix:
In file included from arch/x86/kernel/tlb_uv.c:14:
include/asm/uv/uv_mmrs.h:986: error: redefinition of ‘union uvh_rh_gam_cfg_overlay_config_mmr_u’
include/asm/uv/uv_mmrs.h:988: error: redefinition of ‘struct uvh_rh_gam_cfg_overlay_config_mmr_s’
include/asm/uv/uv_mmrs.h:1064: error: redefinition of ‘union uvh_rh_gam_mmioh_overlay_config_mmr_u’
include/asm/uv/uv_mmrs.h:1066: error: redefinition of ‘struct uvh_rh_gam_mmioh_overlay_config_mmr_s’
caused by another duplicate section (cut & paste error) in commit
5d061e397d "x86, uv: update x86 mmr list for SGI uv".
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix:
In file included from arch/x86/kernel/genx2apic_uv_x.c:25:
include/asm/uv/uv_mmrs.h:986: error: redefinition of ‘union uvh_rh_gam_cfg_overlay_config_mmr_u’
include/asm/uv/uv_mmrs.h:988: error: redefinition of ‘struct uvh_rh_gam_cfg_overlay_config_mmr_s’
include/asm/uv/uv_mmrs.h:1064: error: redefinition of ‘union uvh_rh_gam_mmioh_overlay_config_mmr_u’
include/asm/uv/uv_mmrs.h:1066: error: redefinition of ‘struct uvh_rh_gam_mmioh_overlay_config_mmr_s’
caused by duplicate section (cut & paste error) in commit
5d061e397d "x86, uv: update x86 mmr list for SGI uv".
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rename the paravirtualized calculate_cpu_khz to calibrate_tsc.
In all cases, we actually calibrate_tsc and use that as the cpu_khz value.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Dan Hecht <dhecht@vmware.com>
Cc: Dan Hecht <dhecht@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Unify the clocksource code.
Unify the tsc_init code.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Dan Hecht <dhecht@vmware.com>
Cc: Dan Hecht <dhecht@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Merge the tsc calibration code for the 32bit and 64bit kernel.
The paravirtualized calculate_cpu_khz for 64bit now points to the correct
tsc_calibrate code as in 32bit.
Original native_calculate_cpu_khz for 64 bit is now called as calibrate_cpu.
Also moved the recalibrate_cpu_khz function in the common file.
Note that this function is called only from powernow K7 cpu freq driver.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Dan Hecht <dhecht@vmware.com>
Cc: Dan Hecht <dhecht@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch updates the X86 mmr list for SGI uv.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add __ide_default_irq() inline helper and use it instead of
ide_default_irq() in ide-probe.c and ns87415.c (all host drivers
except IDE PCI ones always setup hwif->irq so it is enough to
check only for I/O bases 0x1f0 and 0x170).
This fixes post-2.6.25 regression since ide_default_irq()
define could shadow ide_default_irq() inline.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
As Andy Whitcroft recently pointed out, the current powerpc version of
huge_ptep_set_wrprotect() has a bug. It just calls ptep_set_wrprotect()
which in turn calls pte_update() then hpte_need_flush() with the 'huge'
argument set to 0. This will cause hpte_need_flush() to flush the wrong
hash entries (of any). Andy's fix for this is already in the powerpc
tree as commit 016b33c495.
I have confirmed this is a real bug, not masked by some other
synchronization, with a new testcase for libhugetlbfs. A process write
a (MAP_PRIVATE) hugepage mapping, fork(), then alter the mapping and
have the child incorrectly see the second write.
Therefore, this should be fixed for 2.6.26, and for the stable tree.
Here is a suitable patch for 2.6.26, which I think will also be suitable
for the stable tree (neither of the headers in question has been changed
much recently).
It is cut down slighlty from Andy's original version, in that it does
not include a 32-bit version of huge_ptep_set_wrprotect(). Currently,
hugepages are not supported on any 32-bit powerpc platform. When they
are, a suitable 32-bit version can be added - the only 32-bit hardware
which supports hugepages does not use the conventional hashtable MMU and
so will have different needs anyway.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch uses the /sys/firmware/memmap interface provided in the last patch
on the x86 architecture when E820 is used. The patch copies the E820
memory map very early, and registers the E820 map afterwards via
firmware_map_add_early().
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Acked-by: Greg KH <gregkh@suse.de>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: kexec@lists.infradead.org
Cc: yhlu.kernel@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds /sys/firmware/memmap interface that represents the BIOS
(or Firmware) provided memory map. The tree looks like:
/sys/firmware/memmap/0/start (hex number)
end (hex number)
type (string)
... /1/start
end
type
With the following shell snippet one can print the memory map in the same form
the kernel prints itself when booting on x86 (the E820 map).
--------- 8< --------------------------
#!/bin/sh
cd /sys/firmware/memmap
for dir in * ; do
start=$(cat $dir/start)
end=$(cat $dir/end)
type=$(cat $dir/type)
printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type"
done
--------- >8 --------------------------
That patch only provides the needed interface:
1. The sysfs interface.
2. The structure and enumeration definition.
3. The function firmware_map_add() and firmware_map_add_early()
that should be called from architecture code (E820/EFI, for
example) to add the contents to the interface.
If the kernel is compiled without CONFIG_FIRMWARE_MEMMAP, the interface does
nothing without cluttering the architecture-specific code with #ifdef's.
The purpose of the new interface is kexec: While /proc/iomem represents
the *used* memory map (e.g. modified via kernel parameters like 'memmap'
and 'mem'), the /sys/firmware/memmap tree represents the unmodified memory
map provided via the firmware. So kexec can:
- use the original memory map for rebooting,
- use the /proc/iomem for setting up the ELF core headers for kdump
case that should only represent the memory of the system.
The patch has been tested on i386 and x86_64.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Acked-by: Greg KH <gregkh@suse.de>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: kexec@lists.infradead.org
Cc: yhlu.kernel@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Older x86-32 processors do not support global mappings (PGD), so must
only use it if the processor supports it.
The _PAGE_KERNEL* flags always have _PAGE_KERNEL set, since logically
we always want it set.
This is OK even on processors which do not support PGD, since all
_PAGE flags are masked with __supported_pte_mask before being turned
into a real in-pagetable pte. On 32-bit systems, __supported_pte_mask
is initialized to not contain _PAGE_GLOBAL, and it is then added if
the CPU is found to support it.
The x86-32 code used to use __PAGE_KERNEL/__PAGE_KERNEL_EXEC for this
purpose, but they're now redundant and can be removed.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Consistently set _PAGE_GLOBAL in _PAGE_KERNEL flags. This makes 32-
and 64-bit code consistent, and removes some special cases where
__PAGE_KERNEL* did not have _PAGE_GLOBAL set, causing confusion as a
result of the inconsistencies.
This patch only affects x86-64, which generally always supports PGD.
The x86-32 patch is next.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
call it right after we are done with MADT/mptable handling, instead of
doing that in setup_per_cpu_areas() later on...
this way for_possible_cpu() can be used early.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
move out e820_register_active_regions from non numa zones_sizes_init()
and remove numa version zones_sizes_init().
and let 32 bit call remove_all_active_ranges() in setup_arch() directly
like 64-bit
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
so it has a more meaningful name.
also change it to static.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
e820_search_gap also take a end_addr parameter to limit search from
start_addr to end_addr.
Signed-off-by: AloK N Kataria <akataria@vmware.com>
Acked-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: "lenb@kernel.org" <lenb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* When CONFIG_DEBUG_PER_CPU_MAPS is set, the node passed to
node_to_cpumask and node_to_cpumask_ptr should be validated.
If invalid, then a dump_stack is performed and a zero cpumask
is returned.
v2: Slightly different version to remove a compiler warning.
v3: Redone to reflect moving setup.c -> setup_percpu.c
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ying Huang would like setup_data to be reserved, but not included in the
no save range.
Here we try to modify the e820 table to reserve that range early.
also add that in early_res in case bootloader messes up with the ramdisk.
other solution would be
1. add early_res_to_highmem...
2. early_res_to_e820...
but they could reserve another type memory wrongly, if early_res has some
resource reserved early, and not needed later, but it is not removed from
early_res in time. Like the RAMDISK (already handled).
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: andi@firstfloor.org
Tested-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
now that the early-ioremap code is unified, move the prototypes too from
io_32.h to io.h.
this fixes:
arch/x86/kernel/setup.c:531: error: implicit declaration of function ‘early_ioremap_init'
on 64-bit.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
change the enable_local_apic to static force_enable_local_apic for 32bit
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make sure SWAPGS and PARAVIRT_ADJUST_EXCEPTION_FRAME are properly
defined when CONFIG_PARAVIRT is off.
Fixes Ingo's build failure:
arch/x86/kernel/entry_64.S: Assembler messages:
arch/x86/kernel/entry_64.S:1201: Error: invalid character '_' in mnemonic
arch/x86/kernel/entry_64.S:1205: Error: invalid character '_' in mnemonic
arch/x86/kernel/entry_64.S:1209: Error: invalid character '_' in mnemonic
arch/x86/kernel/entry_64.S:1213: Error: invalid character '_' in mnemonic
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Stephen Tweedie <sct@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
instead of calling it from trap_init()
also move init ioapic mapping out of apic_32.c
so 32 bit do same as 64 bit
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
asm-x86/paravirt.h already have protection with CONFIG_PARAVIRT inside
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
64-bit Xen pushes a couple of extra words onto an exception frame.
Add a hook to deal with them.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It's never safe to call a swapgs pvop when the user stack is current -
it must be inline replaced. Rather than making a call, the
SWAPGS_UNSAFE_STACK pvop always just puts "swapgs" as a placeholder,
which must either be replaced inline or trap'n'emulated (somehow).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In a 64-bit system, we need separate sysret/sysexit operations to
return to a 32-bit userspace.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citirx.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There's no need to combine restoring the user rsp within the sysret
pvop, so split it out. This makes the pvop's semantics closer to the
machine instruction.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citirx.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Don't conflate sysret and sysexit; they're different instructions with
different semantics, and may be in use at the same time (at least
within the same kernel, depending on whether its an Intel or AMD
system).
sysexit - just return to userspace, does no register restoration of
any kind; must explicitly atomically enable interrupts.
sysret - reloads flags from r11, so no need to explicitly enable
interrupts on 64-bit, responsible for restoring usermode %gs
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citirx.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We will need to set a pte on l3_user_pgt. Extract set_pte_vaddr_pud()
from set_pte_vaddr(), that will accept the l3 page table as parameter.
This change should be a no-op for existing code.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Because Xen doesn't support PSE mappings in guests, all code which
assumed the presence of PSE has been changed to fall back to smaller
mappings if necessary. As a result, PSE is optional rather than
required (though still used whereever possible).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Split x86_64_start_kernel() into two pieces:
The first essentially cleans up after head_64.S. It clears the
bss, zaps low identity mappings, sets up some early exception
handlers.
The second part preserves the boot data, reserves the kernel's
text/data/bss, pagetables and ramdisk, and then starts the kernel
proper.
This split is so that Xen can call the second part to do the set up it
needs done. It doesn't need any of the first part setups, because it
doesn't boot via head_64.S, and its redundant or actively damaging.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Set __PAGE_OFFSET to the most negative possible address +
16*PGDIR_SIZE. The gap is to allow a space for a hypervisor to fit.
The gap is more or less arbitrary, but it's what Xen needs.
When booting native, kernel/head_64.S has a set of compile-time
generated pagetables used at boot time. This patch removes their
absolutely hard-coded layout, and makes it parameterised on
__PAGE_OFFSET (and __START_KERNEL_map).
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On 32-bit it's best to use a %cs: prefix to access memory where the
other segments may not bet set up properly yet. On 64-bit it's best
to use a rip-relative addressing mode. Define PARA_INDIRECT() to
abstract this and generate the proper addressing mode in each case.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rather than just jumping to 0 when there's a missing operation, raise a BUG.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add hooks which are called at pgd_alloc/free time. The pgd_alloc hook
may return an error code, which if non-zero, causes the pgd allocation
to be failed. The hooks may be used to allocate/free auxillary
per-pgd information.
also fix:
> * Ingo Molnar <mingo@elte.hu> wrote:
>
> include/asm/pgalloc.h: In function ‘paravirt_pgd_free':
> include/asm/pgalloc.h:14: error: parameter name omitted
> arch/x86/kernel/entry_64.S: In file included from
> arch/x86/kernel/traps_64.c:51:include/asm/pgalloc.h: In function ‘paravirt_pgd_free':
> include/asm/pgalloc.h:14: error: parameter name omitted
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is a preparatory patch for the next patch in series.
Moves some code from e820_setup_gap to a new function e820_search_gap.
This patch is a part of a bug fix where we walk the ACPI table to calculate
a gap for PCI optional devices.
v1->v2: Patch on top of tip/master.
Fixes a bug introduced in the last patch about the typeof "last".
Also the new function e820_search_gap now returns if we found a gap in
e820_map.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: lenb@kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
... so can we use mem below max_low_pfn earlier.
this allows us to move several functions more early instead of waiting
to after paging_init.
That includes moving relocate_initrd() earlier in the bootup, and kva
related early setup done in initmem_init. (in followup patches)
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some amount of asm-x86/mmu_context.h can be unified, including
activate_mm paravirt hook.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
pgd_index is common for 32 and 64-bit, so move it to a common place.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For calculating the offset from struct gate_struct fields.
[ gate_offset and gate_segment were broken for 32-bit. ]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The 32-bit early_ioremap will work equally well for 64-bit, so just use it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
wrmsr is a special instruction which can have arbitrary system-wide
effects. We don't want the compiler to reorder it with respect to
memory operations, so make it a memory barrier.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add "memory" clobbers to savesegment and loadsegment, since they can
affect memory accesses and we never want the compiler to reorder them
with respect to memory references.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is no need to keep NMI_DISABLED definition and use it
for nmi_watchdog by default. Here is the point why:
- IO-APIC and APIC chips are programmed for nmi_watchdog support at very
early stage of kernel booting and not having nmi_watchdog specified as
boot option lead only to nmi_watchdog becomes to NMI_NONE anyway
- enable nmi_watchdog thru /proc/sys/kernel/nmi if it was not specified at
boot is not possible too (even having this sysfs entry)
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: macro@linux-mips.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add support for overlapping early memory reservations.
In general, they still can't overlap, and will panic
with "Overlapping early reservations" if they do overlap.
But if a memory range is reserved with the new call:
reserve_early_overlap_ok()
rather than with the usual call:
reserve_early()
then subsequent early reservations are allowed to overlap.
This new reserve_early_overlap_ok() call is only used in one
place so far, which is the "BIOS reserved" reservation for the
the EBDA region, which out of Paranoia reserves more than what
the BIOS might have specified, and which thus might overlap with
another legitimate early memory reservation (such as, perhaps,
the EFI memmap.)
Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: "Yinghai Lu" <yhlu.kernel@gmail.com>
Cc: "Jack Steiner" <steiner@sgi.com>
Cc: "Mike Travis" <travis@sgi.com>
Cc: "Huang
Cc: Ying" <ying.huang@intel.com>
Cc: "Andi Kleen" <andi@firstfloor.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
those function depend on paging setup pgtable, so they could access
the ram in bootmem region but just get mapped.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
so that max_low_pfn is not changed after it is set.
so we can move that early and out of initmem_init.
could call find_low_pfn_range just after max_pfn is set.
also could move reserve_initrd out of setup_bootmem_allocator
so 32bit is more like 64bit.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
... so could add real hole in e820
agp check is using request_mem_region, and could fail if e820 is reserved...
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
if acpi=off, acpi=noirq and pci=noacpi, we need to disable apic.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
want to remove arch_get_ram_range, and use early_node_map instead.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Take it out of smpboot.c, and move it to process_32.c, closer
to its only user.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We create a version of it for i386, and then take the CONFIG_X86_64
ifdef out of the game. We could create a __setup_vector_irq for i386,
but it would incur in an unnecessary lock taking. Moreover, it is better
practice to only export setup_vector_irq anyway.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
i386 and x86_64 used two different schemes for maintaining the gdt.
With this patch, x86_64 initial gdt table is defined in a .c file,
same way as i386 is now. Also, we call it "gdt_page", and the descriptor,
"early_gdt_descr". This way we achieve common naming, which can allow for
more code integration.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Making a variable page-aligned by using
__attribute__((section(".data.page_aligned"))) is fragile because if
sizeof(variable) is not also a multiple of page size, it leaves
variables in the remainder of the section unaligned.
This patch introduces two new qualifiers, __page_aligned_data and
__page_aligned_bss to set the section *and* the alignment of
variables. This makes page-aligned variables more robust because the
linker will make sure they're aligned properly. Unfortunately it
requires *all* page-aligned data to use these macros...
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch moves the reserve_crashkernel() to setup.c and removes the
architecture-specific version. Both versions were more or less the same.
I tested it on both x86-64 and i386, with CONFIG_KEXEC on and off (so
that it compiles).
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: yhlu.kernel@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Increase the maximum number of apics when running very large
configurations. This patch has no affect on most systems.
The patch has no effect on any 32-bit kernel. It adds ~4k to the size
of 64-bit kernels but only if NR_CPUS > 255.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
physid_mask_of_physid() causes a huge stack (12k) to be created if the
number of APICS is large. Replace physid_mask_of_physid() with a
new function that does not create large stacks. This is a problem only
on large x86_64 systems.
this paves the way to increase MAX_APICS.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: linux-mm@kvack.org
Cc: mingo@elte.hu
Cc: tglx@linutronix.de
Signed-off-by: Ingo Molnar <mingo@elte.hu>
TLB shootdown for SGI UV.
v1: 6/2 original
v2: 6/3 corrections/improvements per Ingo's review
v3: 6/4 split atomic operations off to a separate patch (Jeremy's review)
v4: 6/12 include <mach_apic.h> rather than <asm/mach-bigsmp/mach_apic.h>
(fixes a !SMP build problem that Ingo found)
fix the index on uv_table_bases[blade]
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Provide atomic operations for increment of a 16-bit integer and
logical OR into a 64-bit integer.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
TLB shootdown for SGI UV.
Depends on patch (in tip/x86/irq):
x86-update-macros-used-by-uv-platform.patch Jack Steiner May 29
This patch provides the ability to flush TLB's in cpu's that are not on
the local node. The hardware mechanism for distributing the flush
messages is the UV's "broadcast assist unit".
The hook to intercept TLB shootdown requests is a 2-line change to
native_flush_tlb_others() (arch/x86/kernel/tlb_64.c).
This code has been tested on a hardware simulator. The real hardware
is not yet available.
The shootdown statistics are provided through /proc/sgi_uv/ptc_statistics.
The use of /sys was considered, but would have required the use of
many /sys files. The debugfs was also considered, but these statistics
should be available on an ongoing basis, not just for debugging.
Issues to be fixed later:
- The IRQ for the messaging interrupt is currently hardcoded as 200
(see UV_BAU_MESSAGE). It should be dynamically assigned in the future.
- The use of appropriate udelay()'s is untested, as they are a problem
in the simulator.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds a 'flags' parameter to reserve_bootmem_generic() like it
already has been added in reserve_bootmem() with commit
72a7fe3967.
It also changes all users to use BOOTMEM_DEFAULT, which doesn't effectively
change the behaviour. Since the change is x86-specific, I don't think it's
necessary to add a new API for migration. There are only 4 users of that
function.
The change is necessary for the next patch, using reserve_bootmem_generic()
for crashkernel reservation.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Remove the boot_cpu_pda array and pointer table from the data section.
Allocate the pointer table and array during init. do_boot_cpu()
will reallocate the pda in node local memory and if the cpu is being
brought up before the bootmem array is released (after_bootmem = 0),
then it will free the initial pda. This will happen for all cpus
present at system startup.
This removes 512k + 32k bytes from the data section.
For inclusion into sched-devel/latest tree.
Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ sched-devel/latest .../mingo/linux-2.6-sched-devel.git
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Consolidate node_to_cpumask operations and remove the 256k
byte node_to_cpumask_map. This is done by allocating the
node_to_cpumask_map array after the number of possible nodes
(nr_node_ids) is known.
* Debug printouts when CONFIG_DEBUG_PER_CPU_MAPS is active have
been increased. It now shows faults when calling node_to_cpumask()
and node_to_cpumask_ptr().
For inclusion into sched-devel/latest tree.
Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ sched-devel/latest .../mingo/linux-2.6-sched-devel.git
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Restore the nodenumber field in the x86_64 pda. This field is slightly
different than the x86_cpu_to_node_map mainly because it's a static
indication of which node the cpu is on while the cpu to node map is a
dyanamic mapping that may get reset if the cpu goes offline. This also
simplifies the numa_node_id() macro.
For inclusion into sched-devel/latest tree.
Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ sched-devel/latest .../mingo/linux-2.6-sched-devel.git
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Introduce a new PER_CPU macro called "EARLY_PER_CPU". This is
used by some per_cpu variables that are initialized and accessed
before there are per_cpu areas allocated.
["Early" in respect to per_cpu variables is "earlier than the per_cpu
areas have been setup".]
This patchset adds these new macros:
DEFINE_EARLY_PER_CPU(_type, _name, _initvalue)
EXPORT_EARLY_PER_CPU_SYMBOL(_name)
DECLARE_EARLY_PER_CPU(_type, _name)
early_per_cpu_ptr(_name)
early_per_cpu_map(_name, _idx)
early_per_cpu(_name, _cpu)
The DEFINE macro defines the per_cpu variable as well as the early
map and pointer. It also initializes the per_cpu variable and map
elements to "_initvalue". The early_* macros provide access to
the initial map (usually setup during system init) and the early
pointer. This pointer is initialized to point to the early map
but is then NULL'ed when the actual per_cpu areas are setup. After
that the per_cpu variable is the correct access to the variable.
The early_per_cpu() macro is not very efficient but does show how to
access the variable if you have a function that can be called both
"early" and "late". It tests the early ptr to be NULL, and if not
then it's still valid. Otherwise, the per_cpu variable is used
instead:
#define early_per_cpu(_name, _cpu) \
(early_per_cpu_ptr(_name) ? \
early_per_cpu_ptr(_name)[_cpu] : \
per_cpu(_name, _cpu))
A better method is to actually check the pointer manually. In the
case below, numa_set_node can be called both "early" and "late":
void __cpuinit numa_set_node(int cpu, int node)
{
int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);
if (cpu_to_node_map)
cpu_to_node_map[cpu] = node;
else
per_cpu(x86_cpu_to_node_map, cpu) = node;
}
* Add a flag "arch_provides_topology_pointers" that indicates pointers
to topology cpumask_t maps are available. Otherwise, use the function
returning the cpumask_t value. This is useful if cpumask_t set size
is very large to avoid copying data on to/off of the stack.
* The coverage of CONFIG_DEBUG_PER_CPU_MAPS has been increased while
the non-debug case has been optimized a bit.
* Remove an unreferenced compiler warning in drivers/base/topology.c
* Clean up #ifdef in setup.c
For inclusion into sched-devel/latest tree.
Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ sched-devel/latest .../mingo/linux-2.6-sched-devel.git
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Stop mprotect's pte_modify from wiping out the s390 pte_special bit, which
caused oops thereafter when vm_normal_page thought X's abnormal was normal.
Debugged-by: Ryan Hope <rmh3093@gmail.com>
Debugged-by: Zan Lynx <zlynx@acm.org>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
if the system doesn't have ioapic, we don't need to store entries for mptable
update
also let mp_config_acpi_gsi not call func in mpparse
so later could decouple mpparse with acpi more easily
Reported-by: Daniel Exner <dex@dragonslave.de>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Daniel Exner <dex@dragonslave.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1. let 64bit support 88 and e801 too
2. introduce default_machine_specific_memory_setup, and reuse it
for voyager
v2: fix 64 bit compiling
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
seperate SRAT finding and parsing from get_memcfg_from_srat,
and let getmemcfg_from_srat only handle array from previous step.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
so don't punish all other cpus without that problem when init highmem
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
and make 32-bit resource registration more like 64 bit.
also move probe_roms back to setup_32.c
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Because of the size limits of struct boot_params (zero page), the
maximum number of E820 memory map entries can be passed to kernel is
128. As pointed by Paul Jackson, there is some machine produced by SGI
with so many nodes that the number of E820 memory map entries is more
than 128. To enabling Linux kernel on these system, a new setup data
type named SETUP_E820_EXT is defined to pass additional memory map
entries to Linux kernel.
This patch is based on x86/auto-latest branch of git-x86 tree and has
been tested on x86_64 and i386 platform.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
use early_node_map to init high pages, so we can remove page_is_ram() and
page_is_reserved_early() in the big loop with add_one_highpage
also remove page_is_reserved_early(), it is not needed anymore.
v2: fix the build of other platforms
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
rename update_memory_range to e820_update_range
rename add_memory_region to e820_add_region
to make it more clear that they are about e820 map operations.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Daniel Exner reported IO-APIC enumeration breakage in linux-next.
Alexey Starikovskiy found out that it might be related to
commit 2944e16b25 "x86: update mptable".
use enable_update_mptable to decide if need check before add mp_irqs array.
Reported-by: Daniel Exner <webmaster@dragonslave.de>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
in case we have kva before ramdisk on a node, we still need to use
those ranges.
v2: reserve_early kva ram area, in case there are holes in highmem, to avoid
those area could be treat as free high pages.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1. add reserve_bootmem_generic for 32bit
2. change len to unsigned long
3. make early_res_to_bootmem to use it
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
since we now have 32-bit support for e820_register_active_regions(),
we can merge the parsing of the mem=/memmap= boot parameters.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds a 'flags' parameter to reserve_bootmem_generic() like it
already has been added in reserve_bootmem() with commit
72a7fe3967.
It also changes all users to use BOOTMEM_DEFAULT, which doesn't effectively
change the behaviour. Since the change is x86-specific, I don't think it's
necessary to add a new API for migration. There are only 4 users of that
function.
The change is necessary for the next patch, using reserve_bootmem_generic()
for crashkernel reservation.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Each I/O APIC redirection table entry has a number of fields.
Define names for them to eliminate reference by hard coded
numbers.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If configured to use the I/O APIC, the NMI watchdog is deemed to fail if
the chip has been deactivated as a result of "noapic". Downgrade to the
local APIC watchdog similarly to what is done for the UP case.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
A downgrade helper for the NMI watchdog to be used in all places where
the I/O APIC watchdog may have been requested, but the I/O APIC is found
not to be there or meant to be left disabled. This is so that the
reconfiguration is cosistent and defined in a single place only.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is no point in keeping the 8259A enabled if the I/O APIC NMI
watchdog has failed and the 8259A is not used to pass through regular
timer interrupts. This fixes problems with some systems where some logic
gets confused.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the "disable_8254_timer" and "enable_8254_timer" kernel
parameters. Now that AEOI acknowledgements are no longer needed for
correct timer operation, the 8259A can be kept disabled unconditionally
unless interrupts, either timer or watchdog ones, are actually passed
through it.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1) Remove __meminit from update_pages_count. It is used inside
split_pages()
2) Make the code depend on PROC_FS. Doing statistics for nothing is
useless and not adding useless code is nice to the Linux tiny folks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add information about the mapping state of the direct mapping to
/proc/meminfo. I chose /proc/meminfo because that is where all the other
memory statistics are too and it is a generally useful metric even
outside debugging situations. A lot of split kernel pages means the
kernel will run slower.
This way we can see how many large pages are really used for it and how
many are split.
Useful for general insight into the kernel.
v2: Add hotplug locking to 64bit to plug a very obscure theoretical race.
32bit doesn't need it because it doesn't support hotadd for lowmem.
Fix some typos
v3: Rename dpages_cnt
Add CONFIG ifdef for count update as requested by tglx
Expand description
v4: Fix stupid bugs added in v3
Move update_page_count to pageattr.c
Signed-off-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
"Form follows function". Code is now where it belongs to.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
C1E on AMD machines is like C3 but without control from the OS. Up to
now we disabled the local apic timer for those machines as it stops
when the CPU goes into C1E. This excludes those machines from high
resolution timers / dynamic ticks, which hurts especially X2 based
laptops.
The current boot time C1E detection has another, more serious flaw
as well: some BIOSes do not enable C1E until the ACPI processor module
is loaded. This causes systems to stop working after that point.
To work nicely with C1E enabled machines we use a separate idle
function, which checks on idle entry whether C1E was enabled in the
Interrupt Pending Message MSR. This allows us to do timer broadcasting
for C1E and covers the late enablement of C1E as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Warn people when using pxa2xx-gpio.h as it is only here for backwards
compatibility. The new mfp-pxa2[57]x.h and the relevant API should be used
instead.
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Hypercalls can modify arbitrary regions of memory. Make sure to indicate this
in the clobber list. This fixes a hang when using KVM_GUEST kernel built with
GCC 4.3.0.
This was originally spotted and analyzed by Marcelo.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
These two macros are useful beyond lock debugging. Moved definitions from
include/linux/debug_locks.h to include/linux/kernel.h, so code that needs
them does not have to include the former, which would have been a less
intuitive choice of a header.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Phytec phyCORE-i.MX27 CPU module is delivered with the PCM970
baseboard by default. This patch adds support for the hardware.
This code is only an empty stub; it is filled up with functionality
in a later patch series.
Signed-off-by: Juergen Beisert <j.beisert@pengutronix.de>
This patch adds support for the phyCORE-i.MX27 cpu module (aka pcm038).
It is as generic as possible in order to support any kind of baseboard.
Note: This CPU module implementation can't work without a baseboard
support. Baseboard support can be added by the PCM-970 (included in
this patch stack) or any custom variant.
Signed-off-by: Juergen Beisert <j.beisert@pengutronix.de>
This patch adds basic support for the Freescale MX27ADS reference board.
Currently only a serial console can be used.
Signed-off-by: Juergen Beisert <j.beisert@pengutronix.de>
This patch adds basic mach support for the mx2 processor family, based
on the original freescale code and adapted to mainline kernel coding
style.
Signed-off-by: Juergen Beisert <j.beisert@pengutronix.de>
This patch adds basic support for i.MX31 LiteKit by LogicPD.
With printascii() in kernel/printk.c, it boots right into the
rootfs-panic.
Note: This is a modified version of Daniel's patch to fit into this patch
stack.
> On 09.06.2008, at 17:26, Russell King - ARM Linux wrote:
>
> > I would much prefer it if board specific includes were included by the
> > code which needs them rather than in asm/arch/hardware.h. With the
> > device model, drivers shouldn't need to include any board specific
> > includes - only the board specific C file should need it.
>
> The new version of this patch (#5102) has been uploaded to the patch
> tracker this morning.
Signed-off-by: Daniel Mack <daniel@caiaq.de>
--
arch/arm/configs/mx31litekit_defconfig | 1100 ++++++++++++++++++++++++++++++
arch/arm/mach-mx3/Kconfig | 7
arch/arm/mach-mx3/Makefile | 1
arch/arm/mach-mx3/mx31lite.c | 96 ++
include/asm-arm/arch-mxc/board-mx31lite.h | 38 +
include/asm-arm/arch-mxc/debug-macro.S | 3
6 files changed, 1245 insertions(+)
This patch adds debug-macro.S for arch-mxc
Disadvantage: Due to the board specific UART definition, these macros (and
compile time) will fail for multi board kernels.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
This patch adds timer support for the i.MX machine family. This code can
be used on the following machs:
- i.MX1 (tested)
- i.MX2 (i.MX21 (to be tested), i.MX27 (tested))
- i.MX3 (i.MX31 (tested))
TODO: It seems impossible to build a kernel for more than one CPU because the
timer do not follow the platform device rules. So it does only work if
timer 1 can be accessed on all CPUs at the same address.
Signed-off-by: Juergen Beisert <j.beisert@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
This patch bases on the one from Daniel Mack. The most important change to
Daniel's patch is to be more generic. This gpio routine supports at least
the i.MX27 and i.MX31 processors.
Signed-off-by: Juergen Beisert <j.beisert@pengutronix.de>
Acked-by: Daniel Mack <daniel@caiaq.de>
Internal clock path handling for the mxc CPUs.
Changed against the original Freescale code (and against clocklib for example):
- clock rate is always calculated whenever one ask for the current rate
(means struct clk has no more a member called "rate"). So switching the PLL
base frequency will propagate immediately to all other clocks that are
depending on this frequency.
Signed-off-by: Juergen Beisert <j.beisert@pengutronix.de>