When using an uncached DMA region less than 1 MiB, we try to mask off
the whole last 1 MiB for it. Unfortunately, this fails as we forgot
to subtract one from the calculated mask, leading to the region still
be marked as cacheable.
Reported-by: Andrew Rook <andrew.rook@speakerbus.co.uk>
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Since linux-2.6.31, the kernel suspend framework will do disable_irq/enable_irq,
so save/restore irq in standby and suspend to mem callback should be dropped.
Otherwise the common code notices things are enabled and complains.
Signed-off-by: Steven Miao <realmz6@gmail.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Any consumer of dpmc.h expects to use VR_CTL, so also pull in the new
mach/pll.h header for those functions.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Currently, sending an interprocessor interrupt (IPI) requires building up
a message dynamically which means memory allocation. But often times, we
will want to send an IPI in low level contexts where allocation is not
possible which may lead to a panic(). So create a per-cpu static array
for the message queue and use that instead.
Further, while we have two supplemental interrupts, we are currently only
using one of them. So use the second one for the most common IPI message
of all -- smp_send_reschedule(). This avoids ugly contention for locks
which in turn would require an IPI message ...
In general, this improves SMP performance, and in some cases allows the
SMP port to work in places it wouldn't before. Such as the PREEMPT_RT
state where the slab is protected by a per-cpu spin lock. If the slab
kmalloc/kfree were to put the task to sleep, and that task was actually
the IPI handler, then the system falls down yet again.
After running some various stress tests on the system, the static limit
of 5 messages seems to work. On the off chance even this overflows, we
simply panic(), and we can review that scenario to see if the limit needs
to be increased a bit more.
Signed-off-by: Yi Li <yi.li@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Since we're breaking apart some inter-header dependencies to avoid more
circular loops, move the blackfin_core_id() definition to the func that
it is based upon.
Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Common code expects these to be defined for SMP ports, so add them.
Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
The BF561 mem_map.h header has the __ASSEMBLY__/CONFIG_SMP checks out
of order which leads to build errors for assembly code that happens to
include this file.
Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This function takes an irq_handler_t function, but the prototype in
the header doesn't match the function definition. This is due to the
smp headers needing to avoid circular dependencies. So change the
function to take a simple pointer.
Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
The common asm-generic non-atomic bitops.h defines test_bit() for us, but
we need to use our own version. So redirect the definition of this func
to avoid having to inline the rest of the asm-generic file.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
The cpu maps are defines provided by common linux/cpumask.h, not local
variables. So stop exporting them locally and include the right header
for their definition.
Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
The external functions are named __raw_xxx, not arch_xxx, so rename the
prototypes to match reality. This fixes some simple build errors in the
bfin_ksyms.c code which exports these helpers to modules.
Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Rather than maintain Kconfig entries where people have to enter raw
numbers and hardcode lists of addresses/pins in the driver itself,
push it all to platform resources. This lets us simplify the driver,
the Kconfig, and gives board porters greater flexibility.
In the process, we need to also start supporting the early platform
interface. Not a big deal, but it causes the patch to be bigger than
a simple resource relocation.
All the Blackfin boards already have their resources updated and in
place for this change.
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
These were only included because of the irq handling of the PLL funcs,
and those PLL funcs have been moved out into their own header now.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
The defBF512.h header exists only to include defBF51x_base.h, and it is
the only place where defBF51x_base.h is included. So move the contents
of the defBF51x_base.h header into the defBF512.h header.
Same situation for the other def/cdef pairs.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
The main asm/blackfin.h header will pull in mach/blackfin.h to get
all the fun Blackfin defines. So having any of the sub-mach headers
trying to include asm/blackfin.h makes no sense -- punt it.
The mach/blackfin.h header takes care of including the part-specific
def headers which in turn will include any other needed def file.
Similarly, it takes care of pulling in the part-specific cdef header.
So move this logic out of the blackfin.h when necessary.
Further, make sure the cdef headers do not waste time including the
def headers again.
Since all parts need the common def/cdef headers, move this logic
out of the part-specific headers and into the mach/blackfin.h file.
Finally, we need to split the BF539 def header since the BF538 does
not have MXVR and we don't want to expose those MMRs.
So now all parts should have the same behavior:
mach/blackfin.h
asm/def_LPBlackfin.h
part-specific def.h
if ! asm
asm/cdef_LPBlackfin.h
part-specific cdef.h
And the sub def/cdef headers only tail into what they need.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
We don't want the BF533 to be different in terms of its MMR headers, so
merge the FIO_FLAG helpers back into the normal place. To avoid circular
dependencies with headers, turn the inline C funcs into CPP defines. Not
like there will be any code size differences.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Since the SMP code paths tend to compile fail a lot, start a SMP defconfig
so our nightly build tools will automatically test it.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
We don't want people banging on MMRs directly. As for the ip0x board,
it shouldn't need to muck with the CS pin directly as the Blackfin SPI
bus master driver takes care of driving this.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Use the same naming convention for DMA traffic MMRs (most were legacy
anyways) so we can avoid useless ifdef trees.
Same goes for MDMA names -- this actually allows us to undo a bunch of
ifdef redirects that existed for this purpose alone.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
A bunch of arches define reads[bwl]/writes[bwl] helpers for accessing
memory mapped registers. Since the Blackfin ones aren't specific to
Blackfin code, move them to the common asm-generic/io.h for people.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Each Blackfin port has been duplicating UART structures and defines when
there really is no need for it. So start a new bfin_serial.h header to
unify all these pieces and give ourselves a fresh start.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
In order to not touch the driver file for different xtal usage,
push the clkin value to board file and calculate the register
value instead of hardcoding it.
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Latest atom socs(penwell) does not have hpet timer.
As their local APIC timer is clocked at 400KHZ, and the current
code limit their Initial Counter register to 23 bits, they
cannot sleep more than 1.34 seconds which leads to ~2 spurious
wakeup per second (1 per thread)
These SOCs support 32bit timer so we change the max_delta to at
least 31bits. So we can at least sleep for 300 seconds.
We could not find any previous chip errata where lapic would
only have 23 bit precision As powertop is suggesting to activate
HPET to "sleep longer", this could mean this problem is already
known.
Problem is here since very first implementation of lapic timer
as a clock event e9e2cdb [PATCH] clockevents: i386 drivers.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Pierre Tardy <pierre.tardy@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Adrian Bunk <bunk@stusta.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andi Kleen <ak@suse.de>
LKML-Reference: <1294327409-19426-1-git-send-email-pierre.tardy@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
"git grep" shows this is the last piece of code using DEBUG_BOOTMEM,
so remove it.
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Commit 004417a6d4
("perf, arch: Cleanup perf-pmu init vs lockup-detector")
move the perf events init to be an early_initcall.
But this won't work properly unless the dependencies for
this code initialize beforehand.
Fix it by making cpu_type_probe and pcr_arch_init be
an early_initcall as well.
Reported-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don found that P4 PMU reads CCCR register instead of counter
itself (in attempt to catch unflagged event) this makes P4
NMI handler to consume all NMIs it observes. So the other
NMI users such as kgdb simply have no chance to get NMI
on their hands.
Side note: at moment there is no way to run nmi-watchdog
together with perf tool. This is because both 'perf top' and
nmi-watchdog use same event. So while nmi-watchdog reserves
one event/counter for own needs there is no room for perf tool
left (there is a way to disable nmi-watchdog on boot of course).
Ming has tested this patch with the following results
| 1. watchdog disabled
|
| kgdb tests on boot OK
| perf works OK
|
| 2. watchdog enabled, without patch perf-x86-p4-nmi-4
|
| kgdb tests on boot hang
|
| 3. watchdog enabled, without patch perf-x86-p4-nmi-4 and do not run kgdb
| tests on boot
|
| "perf top" partialy works
| cpu-cycles no
| instructions yes
| cache-references no
| cache-misses no
| branch-instructions no
| branch-misses yes
| bus-cycles no
|
| 4. watchdog enabled, with patch perf-x86-p4-nmi-4 applied
|
| kgdb tests on boot OK
| perf does not work, NMI "Dazed and confused" messages show up
|
Which means we still have problems with p4 box due to 'unknown'
nmi happens but at least it should fix kgdb test cases.
Reported-by: Jason Wessel <jason.wessel@windriver.com>
Reported-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Lin Ming <ming.m.lin@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <4D275E7E.3040903@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix sparse warning for non-ANSI function declaration:
arch/x86/kernel/smpboot.c💯30: warning: non-ANSI function declaration of function 'cpu_hotplug_driver_lock'
arch/x86/kernel/smpboot.c:105:32: warning: non-ANSI function declaration of function 'cpu_hotplug_driver_unlock'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <20110108195914.95d366ea.randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
SDHCI driver for Tegra. This driver plugs in as a new variant of
sdhci-pltfm, using the platform data structure passed in to specify the
GPIOs to use for card detect, write protect and card power enablement.
Original driver (of which only the header file is left):
Signed-off-by: Yvonne Yip <y@palm.com>
The rest, which has been rewritten by now:
Signed-off-by: Olof Johansson <olof@lixom.net>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Mike Rapoport <mike@compulab.co.il>
Signed-off-by: Chris Ball <cjb@laptop.org>
* 'next-spi' of git://git.secretlab.ca/git/linux-2.6: (77 commits)
spi/omap: Fix DMA API usage in OMAP MCSPI driver
spi/imx: correct the test on platform_get_irq() return value
spi/topcliff: Typo fix threhold to threshold
spi/dw_spi Typo change diable to disable.
spi/fsl_espi: change the read behaviour of the SPIRF
spi/mpc52xx-psc-spi: move probe/remove to proper sections
spi/dw_spi: add DMA support
spi/dw_spi: change to EXPORT_SYMBOL_GPL for exported APIs
spi/dw_spi: Fix too short timeout in spi polling loop
spi/pl022: convert running variable
spi/pl022: convert busy flag to a bool
spi/pl022: pass the returned sglen to the DMA engine
spi/pl022: map the buffers on the DMA engine
spi/topcliff_pch: Fix data transfer issue
spi/imx: remove autodetection
spi/pxa2xx: pass of_node to spi device and set a parent device
spi/pxa2xx: Modify RX-Tresh instead of busy-loop for the remaining RX bytes.
spi/pxa2xx: Add chipselect support for Sodaville
spi/pxa2xx: Consider CE4100's FIFO depth
spi/pxa2xx: Add CE4100 support
...
* 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits)
gameport: use this_cpu_read instead of lookup
x86: udelay: Use this_cpu_read to avoid address calculation
x86: Use this_cpu_inc_return for nmi counter
x86: Replace uses of current_cpu_data with this_cpu ops
x86: Use this_cpu_ops to optimize code
vmstat: User per cpu atomics to avoid interrupt disable / enable
irq_work: Use per cpu atomics instead of regular atomics
cpuops: Use cmpxchg for xchg to avoid lock semantics
x86: this_cpu_cmpxchg and this_cpu_xchg operations
percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support
percpu,x86: relocate this_cpu_add_return() and friends
connector: Use this_cpu operations
xen: Use this_cpu_inc_return
taskstats: Use this_cpu_ops
random: Use this_cpu_inc_return
fs: Use this_cpu_inc_return in buffer.c
highmem: Use this_cpu_xx_return() operations
vmstat: Use this_cpu_inc_return for vm statistics
x86: Support for this_cpu_add, sub, dec, inc_return
percpu: Generic support for this_cpu_add, sub, dec, inc_return
...
Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c}
as per Tejun.
* 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (33 commits)
usb: don't use flush_scheduled_work()
speedtch: don't abuse struct delayed_work
media/video: don't use flush_scheduled_work()
media/video: explicitly flush request_module work
ioc4: use static work_struct for ioc4_load_modules()
init: don't call flush_scheduled_work() from do_initcalls()
s390: don't use flush_scheduled_work()
rtc: don't use flush_scheduled_work()
mmc: update workqueue usages
mfd: update workqueue usages
dvb: don't use flush_scheduled_work()
leds-wm8350: don't use flush_scheduled_work()
mISDN: don't use flush_scheduled_work()
macintosh/ams: don't use flush_scheduled_work()
vmwgfx: don't use flush_scheduled_work()
tpm: don't use flush_scheduled_work()
sonypi: don't use flush_scheduled_work()
hvsi: don't use flush_scheduled_work()
xen: don't use flush_scheduled_work()
gdrom: don't use flush_scheduled_work()
...
Fixed up trivial conflict in drivers/media/video/bt8xx/bttv-input.c
as per Tejun.
The msm provides timer hardware that is private to each core. Each
timer has separate counter and match registers, so we create separate
clock_event_devices for each core. For the global clocksource, use
cpu 0's counter.
Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
Signed-off-by: David Brown <davidb@codeaurora.org>
Add support for setting the cold boot address of core 1 and
the warm boot addresses of cores 0 and 1 using a secure
domain call.
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org>
Signed-off-by: David Brown <davidb@codeaurora.org>
SCM is the protocol used to communicate between the secure and
non-secure code executing on the applications processor. The
non-secure side uses a physically contiguous buffer to pass
information to the secure side; where the buffer conforms to a
format that is agreed upon by both sides. The use of a buffer
allows multiple pending requests to be in flight on the secure
side. It also benefits use cases where the command or response
buffer contains large chunks of data.
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: David Brown <davidb@codeaurora.org>
* 'x86-apic-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: apic: Cleanup and simplify setup_local_APIC()
x86: Further simplify mp_irq info handling
x86: Unify 3 similar ways of saving mp_irqs info
x86, ioapic: Avoid writing io_apic id if already correct
x86, x2apic: Don't map lapic addr for preenabled x2apic systems
x86, sfi: Use register_lapic_address()
x86, apic: Use register_lapic_address() in init_apic_mapping()
x86, apic: Remove early_init_lapic_mapping()
x86, apic: Unify identical register_lapic_address() functions
* 'rmobile-latest' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (67 commits)
ARM: mach-shmobile: update for SMP changes.
ARM: mach-shmobile: update for GIC changes.
ARM: mach-shmobile: Fix up clkdev fallout for SH73A0.
dma: shdma: don't register the global die notifier multiple times
ARM: mach-shmobile: Rely on run-time IRQ handlers
ARM: mach-shmobile: Run-time IRQ handler for GIC
ARM: mach-shmobile: Run-time IRQ handler for INTCA
ARM: mach-shmobile: Enable CONFIG_MULTI_IRQ_HANDLER
ARM: mach-shmobile: Use shared GIC entry macros
ARM: mach-shmobile: mackerel: Add zboot support
ARM: mach-shmobile: mackerel: Add HDMI sound support
ARM: mach-shmobile: mackerel: add HDMI video support
ARM: mach-shmobile: ap4evb: fixup clk_put timing of fsib_clk
ARM: mach-shmobile: sh73a0: fix div4 table
ARM: mach-shmobile: ap4/mackerel: modify wrong comment out of USB
ARM: mach-shmobile: Mackerel VGA camera support
mmc: sh_mmcif: make DMA support by the driver unconditional
ARM: mach-shmobile: Add eMMC support through MMCIF on AG5EVM
ARM: mach-shmobile: Use pullups for AG5EVM KEYSC pins
ARM: mach-shmobile: sh73a0 GPIO pullup improvement
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (58 commits)
Input: wacom_w8001 - support pen or touch only devices
Input: wacom_w8001 - use __set_bit to set keybits
Input: bu21013_ts - fix misuse of logical operation in place of bitop
Input: i8042 - add Acer Aspire 5100 to the Dritek list
Input: wacom - add support for digitizer in Lenovo W700
Input: psmouse - disable the synaptics extension on OLPC machines
Input: psmouse - fix up Synaptics comment
Input: synaptics - ignore bogus mt packet
Input: synaptics - add multi-finger and semi-mt support
Input: synaptics - report clickpad property
input: mt: Document interface updates
Input: fix double equality sign in uevent
Input: introduce device properties
hid: egalax: Add support for Wetab (726b)
Input: include MT library as source for kerneldoc
MAINTAINERS: Update input-mt entry
hid: egalax: Add support for Samsung NB30 netbook
hid: egalax: Document the new devices in Kconfig
hid: egalax: Add support for Wetab
hid: egalax: Convert to MT slots
...
Fixed up trivial conflict in drivers/input/keyboard/Kconfig
* 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (36 commits)
serial: apbuart: Fixup apbuart_console_init()
TTY: Add tty ioctl to figure device node of the system console.
tty: add 'active' sysfs attribute to tty0 and console device
drivers: serial: apbuart: Handle OF failures gracefully
Serial: Avoid unbalanced IRQ wake disable during resume
tty: fix typos/errors in tty_driver.h comments
pch_uart : fix warnings for 64bit compile
8250: fix uninitialized FIFOs
ip2: fix compiler warning on ip2main_pci_tbl
specialix: fix compiler warning on specialix_pci_tbl
rocket: fix compiler warning on rocket_pci_ids
8250: add a UPIO_DWAPB32 for 32 bit accesses
8250: use container_of() instead of casting
serial: omap-serial: Add support for kernel debugger
serial: fix pch_uart kconfig & build
drivers: char: hvc: add arm JTAG DCC console support
RS485 documentation: add 16C950 UART description
serial: ifx6x60: fix memory leak
serial: ifx6x60: free IRQ on error
Serial: EG20T: add PCH_UART driver
...
Fixed up conflicts in drivers/serial/apbuart.c with evil merge that
makes the code look fairly sane (unlike either side).
* 'usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (144 commits)
USB: add support for Dream Cheeky DL100B Webmail Notifier (1d34:0004)
USB: serial: ftdi_sio: add support for TIOCSERGETLSR
USB: ehci-mxc: Setup portsc register prior to accessing OTG viewport
USB: atmel_usba_udc: fix freeing irq in usba_udc_remove()
usb: ehci-omap: fix tll channel enable mask
usb: ohci-omap3: fix trivial typo
USB: gadget: ci13xxx: don't assume that PAGE_SIZE is 4096
USB: gadget: ci13xxx: fix complete() callback for no_interrupt rq's
USB: gadget: update ci13xxx to work with g_ether
USB: gadgets: ci13xxx: fix probing of compiled-in gadget drivers
Revert "USB: musb: pm: don't rely fully on clock support"
Revert "USB: musb: blackfin: pm: make it work"
USB: uas: Use GFP_NOIO instead of GFP_KERNEL in I/O submission path
USB: uas: Ensure we only bind to a UAS interface
USB: uas: Rename sense pipe and sense urb to status pipe and status urb
USB: uas: Use kzalloc instead of kmalloc
USB: uas: Fix up the Sense IU
usb: musb: core: kill unneeded #include's
DA8xx: assign name to MUSB IRQ resource
usb: gadget: g_ncm added
...
Manually fix up trivial conflicts in USB Kconfig changes in:
arch/arm/mach-omap2/Kconfig
arch/sh/Kconfig
drivers/usb/Kconfig
drivers/usb/host/ehci-hcd.c
and annoying chip clock data conflicts in:
arch/arm/mach-omap2/clock3xxx_data.c
arch/arm/mach-omap2/clock44xx_data.c
acpi_numa_init() has to parse the whole SRAT table, even if the
kernel wants to limit the number of cpus it will use (because the
ones it is going to use may be described by entries at the end of
the SRAT table). Avoid overflowing the node_cpuid array.
Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
From the x86_64 low level interrupt handlers, the frame pointer is
saved right after the partial pt_regs frame.
rbp is not supposed to be part of the irq partial saved registers,
but it only requires to extend the pt_regs frame by 8 bytes to
do so, plus a tiny stack offset fixup on irq exit.
This changes a bit the semantics or get_irq_entry() that is supposed
to provide only the value of caller saved registers and the cpu
saved frame. However it's a win for unwinders that can walk through
stack frames on top of get_irq_regs() snapshots.
A noticeable impact is that it makes perf events cpu-clock and
task-clock events based callchains working on x86_64.
Let's then save rbp into the irq pt_regs.
As a result with:
perf record -e cpu-clock perf bench sched messaging
perf report --stdio
Before:
20.94% perf [kernel.kallsyms] [k] lock_acquire
|
--- lock_acquire
|
|--44.01%-- __write_nocancel
|
|--43.18%-- __read
|
|--6.08%-- fork
| create_worker
|
|--0.88%-- _dl_fixup
|
|--0.65%-- do_lookup_x
|
|--0.53%-- __GI___libc_read
--4.67%-- [...]
After:
19.23% perf [kernel.kallsyms] [k] __lock_acquire
|
--- __lock_acquire
|
|--97.74%-- lock_acquire
| |
| |--21.82%-- _raw_spin_lock
| | |
| | |--37.26%-- unix_stream_recvmsg
| | | sock_aio_read
| | | do_sync_read
| | | vfs_read
| | | sys_read
| | | system_call
| | | __read
| | |
| | |--24.09%-- unix_stream_sendmsg
| | | sock_aio_write
| | | do_sync_write
| | | vfs_write
| | | sys_write
| | | system_call
| | | __write_nocancel
v2: Fix cfi annotations.
Reported-by: Soeren Sandmann Pedersen <sandmann@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Jan Beulich <JBeulich@novell.com>
In dump_stack function, bp isn't used anymore, which is introduced by
commit 9c0729dc80. This patch removes bp
completely.
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: Soeren Sandmann <sandmann@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <AANLkTik9U_Z0WSZ7YjrykER_pBUfPDdgUUmtYx=R74nL@mail.gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
This patch is similiar to Gleb Natapov's patch for KVM, which enable the
hypervisor to emulate x2apic feature for the guest. By this way, the emulation
of lapic would be simpler with x2apic interface(MSR), and faster.
[v2: Re-organized 'xen_hvm_need_lapic' per Ian Campbell suggestion]
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Then we can reuse it for Xen later.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Avi Kivity <avi@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Just re-arrange the code a bit to make it easier to follow what is
going on. Basically un-negating the if-statement and swapping the code
inside the if-statement with code outside.
No functional changes.
Originally-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-7-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In original NMI handler, NMI reason io port (0x61) is only processed
on BSP. This makes it impossible to hot-remove BSP. To solve the
issue, a raw spinlock is used to allow the port to be processed on any
CPU.
Originally-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-6-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With priorities in place and no one really understanding the difference between
DIE_NMI and DIE_NMI_IPI, just remove DIE_NMI_IPI and convert everyone to DIE_NMI.
This also simplifies default_do_nmi() a little bit. Instead of calling the
die_notifier in both the if and else part, just pull it out and call it before
the if-statement. This has the side benefit of avoiding a call to the ioport
to see if there is an external NMI sitting around until after the (more frequent)
internal NMIs are dealt with.
Patch-Inspired-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-5-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In order to consolidate the NMI die_chain events, we need to setup the priorities
for the die notifiers.
I started by defining a bunch of common priorities that can be used by the
notifier blocks. Then I modified the notifier blocks to use the newly created
priorities.
Now that the priorities are straightened out, it should be easier to remove the
event DIE_NMI_IPI.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-4-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
They are a handful of places in the code that register a die_notifier
as a catch all in case no claims the NMI. Unfortunately, they trigger
on events like DIE_NMI and DIE_NMI_IPI, which depending on when they
registered may collide with other handlers that have the ability to
determine if the NMI is theirs or not.
The function unknown_nmi_error() makes one last effort to walk the
die_chain when no one else has claimed the NMI before spitting out
messages that the NMI is unknown.
This is a better spot for these devices to execute any code without
colliding with the other handlers.
The two drivers modified are only compiled on x86 arches I believe, so
they shouldn't be affected by other arches that may not have
DIE_NMIUNKNOWN defined.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: openipmi-developer@lists.sourceforge.net
Cc: dann frazier <dannf@hp.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Replace the NMI related magic numbers with symbol constants.
Memory parity error is only valid for IBM PC-AT, newer machine use
bit 7 (0x80) of 0x61 port for PCI SERR. While memory error is usually
reported via MCE. So corresponding function name and kernel log string
is changed.
But on some machines, PCI SERR line is still used to report memory
errors. This is used by EDAC, so corresponding EDAC call is reserved.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/x86/include/asm/io_apic.h
Merge reason: Resolve the conflict, update to a more recent -rc base
Signed-off-by: Ingo Molnar <mingo@elte.hu>
"x86, numa: Fake node-to-cpumask for NUMA emulation" broke the
build when CONFIG_DEBUG_PER_CPU_MAPS is set and CONFIG_NUMA_EMU
is not. This is because it is possible to map a cpu to multiple
nodes when NUMA emulation is used; the patch required a physical
node address table to find those nodes that was only available
when CONFIG_NUMA_EMU was enabled.
This extracts the common debug functionality to its own function
for CONFIG_DEBUG_PER_CPU_MAPS and uses it regardless of whether
CONFIG_NUMA_EMU is set or not.
NUMA emulation will now iterate over the set of possible nodes
for each cpu and call the new debug function whereas only the
cpu's node will be used without NUMA emulation enabled.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <alpine.DEB.2.00.1012301053590.12995@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
... from back in 2004; again, it's ifdefed out by CONFIG_FPU.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
we shouldn't bugger off to userland when there still are
pending signals; among other things it makes e.g. SIGSEGV
triggered by failure to build a sigframe to be delivered
_now_ and not when we hit the next syscall or interrupt.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
If we leave sigreturn via ret_from_signal, we end up with syscall
trace only on entry, leading to very unhappy strace, among other
things. Note that this means different behaviours for signals
delivered while we were in pagefault and for ones delivered while
we were in interrupt...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
a) we should hold modifying regs->format until we know we *will* be
doing stack expansion; otherwise attacker can modify sigframe to
have wrong ->sc_formatvec and install SIGSEGV handler.
b) we should *not* mix copying saved extra stuff from userland with
expanding the stack; once we'd done that manual memmove, we'd better
not return to C, so cleanup is very hard to do. The easiest way
is to copy it on stack first, making sure we won't overwrite on stack
expansion. Fortunately that's easy to do...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Same principle as with the previous patch - do not destroy the
state if sigframe setup fails. Incidentally, it's actually
_less_ work - we don't need to go through adjust_stack dance
on failure if we don't touch regs->stkadj until we know we'd
written sigframe out.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
If we'd failed in setup_frame(), we've no place to store
the original sigmask. It's not an unrecoverable situation -
we raise SIGSEGV, but that SIGSEGV might be successfully
handled (e.g. on altstack). In that case we really don't
want sa_mask of original signal permanently slapped on
the set of blocked signals.
Standard solution: have setup_frame()/setup_rt_frame()
report failure and don't mess with the signal-related
state if that has happened...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Instead of checking the return value of do_signal() we can just do
the work (raise SIGTRAP and clear SR.T1) directly in handle_signal(),
when setting the sigframe up. Simplifies the assembler glue and is
closer to the way we do it on other targets.
Note that do_delayed_trace does *not* disappear; it's still needed
to deal with single-stepping through syscall, since 68040 doesn't
raise the trace exception at all if the trap exception is pending.
We hit it after returning from sys_...() if TIF_DELAYED_TRACE is
set; all that has changed is that we don't reuse it for "single-step
into the handler" codepath.
As the result, do_signal() doesn't need to return anything anymore.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
and saner do_signal() arguments, while we are at it
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
... and had been such since the introduction of get_signal_to_deliver()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
The problem that this patch aims to fix is vfsmount refcounting scalability.
We need to take a reference on the vfsmount for every successful path lookup,
which often go to the same mount point.
The fundamental difficulty is that a "simple" reference count can never be made
scalable, because any time a reference is dropped, we must check whether that
was the last reference. To do that requires communication with all other CPUs
that may have taken a reference count.
We can make refcounts more scalable in a couple of ways, involving keeping
distributed counters, and checking for the global-zero condition less
frequently.
- check the global sum once every interval (this will delay zero detection
for some interval, so it's probably a showstopper for vfsmounts).
- keep a local count and only taking the global sum when local reaches 0 (this
is difficult for vfsmounts, because we can't hold preempt off for the life of
a reference, so a counter would need to be per-thread or tied strongly to a
particular CPU which requires more locking).
- keep a local difference of increments and decrements, which allows us to sum
the total difference and hence find the refcount when summing all CPUs. Then,
keep a single integer "long" refcount for slow and long lasting references,
and only take the global sum of local counters when the long refcount is 0.
This last scheme is what I implemented here. Attached mounts and process root
and working directory references are "long" references, and everything else is
a short reference.
This allows scalable vfsmount references during path walking over mounted
subtrees and unattached (lazy umounted) mounts with processes still running
in them.
This results in one fewer atomic op in the fastpath: mntget is now just a
per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
and non-atomic decrement in the common case. However code is otherwise bigger
and heavier, so single threaded performance is basically a wash.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.
Patched with:
git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
RCU free the struct inode. This will allow:
- Subsequent store-free path walking patch. The inode must be consulted for
permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
to take i_lock no longer need to take sb_inode_list_lock to walk the list in
the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
page lock to follow page->mapping.
The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.
In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.
The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
dget_locked was a shortcut to avoid the lazy lru manipulation when we already
held dcache_lock (lru manipulation was relatively cheap at that point).
However, how that the lru lock is an innermost one, we never hold it at any
caller, so the lock cost can now be avoided. We already have well working lazy
dcache LRU, so it should be fine to defer LRU manipulations to scan time.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Protect d_unhashed(dentry) condition with d_lock. This means keeping
DCACHE_UNHASHED bit in synch with hash manipulations.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
we start protecting many other dentry members with d_lock.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.
This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* 'omap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6: (243 commits)
omap2: Make OMAP2PLUS select OMAP_DM_TIMER
OMAP4: hwmod data: Fix alignment and end of line in structurefields
OMAP4: hwmod data: Move the DMA structures
OMAP4: hwmod data: Move the smartreflex structures
OMAP4: hwmod data: Fix missing SIDLE_SMART_WKUP in smartreflexsysc
arm: omap: tusb6010: add name for MUSB IRQ
arm: omap: craneboard: Add USB EHCI support
omap2+: Initialize serial port for dynamic remuxing for n8x0
omap2+: Add struct omap_board_data and use it for platform level serial init
omap2+: Allow hwmod state changes to mux pads based on the state changes
omap2+: Add support for hwmod specific muxing of devices
omap2+: Add omap_mux_get_by_name
OMAP2: PM: fix compile error when !CONFIG_SUSPEND
MAINTAINERS: OMAP: hwmod: update hwmod code, data maintainership
OMAP4: Smartreflex framework extensions
OMAP4: hwmod: Add inital data for smartreflex modules.
OMAP4: PM: Program correct init voltages for scalable VDDs
OMAP4: Adding voltage driver support
OMAP4: Register voltage PMIC parameters with the voltage layer
OMAP3: PM: Program correct init voltages for VDD1 and VDD2
...
Fix up trivial conflict in arch/arm/plat-omap/Kconfig