Commit Graph

14094 Commits

Author SHA1 Message Date
Linus Torvalds
b32fc0a062 Merge branch 'upstream/jump-label-noearly' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'upstream/jump-label-noearly' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  jump-label: initialize jump-label subsystem much earlier
  x86/jump_label: add arch_jump_label_transform_static()
  s390/jump-label: add arch_jump_label_transform_static()
  jump_label: add arch_jump_label_transform_static() to optimise non-live code updates
  sparc/jump_label: drop arch_jump_label_text_poke_early()
  x86/jump_label: drop arch_jump_label_text_poke_early()
  jump_label: if a key has already been initialized, don't nop it out
  stop_machine: make stop_machine safe and efficient to call early
  jump_label: use proper atomic_t initializer

Conflicts:
 - arch/x86/kernel/jump_label.c
	Added __init_or_module to arch_jump_label_text_poke_early vs
	removal of that function entirely
 - kernel/stop_machine.c
	same patch ("stop_machine: make stop_machine safe and efficient
	to call early") merged twice, with whitespace fix in one version
2011-11-06 20:20:46 -08:00
Linus Torvalds
403299a851 Merge branch 'upstream/xen-settime' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'upstream/xen-settime' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen/dom0: set wallclock time in Xen
  xen: add dom0_op hypercall
  xen/acpi: Domain0 acpi parser related platform hypercall
2011-11-06 20:15:05 -08:00
Linus Torvalds
32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Linus Torvalds
02ebbbd481 Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
* 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  scsi: drop unused Kconfig symbol
  pci: drop unused Kconfig symbol
  stmmac: drop unused Kconfig symbol
  x86: drop unused Kconfig symbol
  powerpc: drop unused Kconfig symbols
  powerpc: 40x: drop unused Kconfig symbol
  mips: drop unused Kconfig symbols
  openrisc: drop unused Kconfig symbols
  arm: at91: drop unused Kconfig symbol
  samples: drop unused Kconfig symbol
  m32r: drop unused Kconfig symbol
  score: drop unused Kconfig symbols
  sh: drop unused Kconfig symbol
  um: drop unused Kconfig symbol
  sparc: drop unused Kconfig symbol
  alpha: drop unused Kconfig symbol

Fix up trivial conflict in drivers/net/ethernet/stmicro/stmmac/Kconfig
as per Michal: the STMMAC_DUAL_MAC config variable is still unused and
should be deleted.
2011-11-06 18:54:53 -08:00
Linus Torvalds
06d381484f Merge branch 'stable/vmalloc-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/vmalloc-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  net: xen-netback: use API provided by xenbus module to map rings
  block: xen-blkback: use API provided by xenbus module to map rings
  xen: use generic functions instead of xen_{alloc, free}_vm_area()
2011-11-06 18:31:36 -08:00
Linus Torvalds
a0a4194c94 Merge branch 'for-next' of git://git.infradead.org/users/sameo/mfd-2.6
* 'for-next' of git://git.infradead.org/users/sameo/mfd-2.6: (80 commits)
  mfd: Fix missing abx500 header file updates
  mfd: Add missing <linux/io.h> include to intel_msic
  x86, mrst: add platform support for MSIC MFD driver
  mfd: Expose TurnOnStatus in ab8500 sysfs
  mfd: Remove support for early drop ab8500 chip
  mfd: Add support for ab8500 v3.3
  mfd: Add ab8500 interrupt disable hook
  mfd: Convert db8500-prcmu panic() into pr_crit()
  mfd: Refactor db8500-prcmu request_clock() function
  mfd: Rename db8500-prcmu init function
  mfd: Fix db5500-prcmu defines
  mfd: db8500-prcmu voltage domain consumers additions
  mfd: db8500-prcmu reset code retrieval
  mfd: db8500-prcmu tweak for modem wakeup
  mfd: Add db8500-pcmu watchdog accessor functions for watchdog
  mfd: hwacc power state db8500-prcmu accessor
  mfd: Add db8500-prcmu accessors for PLL and SGA clock
  mfd: Move to the new db500 PRCMU API
  mfd: Create a common interface for dbx500 PRCMU drivers
  mfd: Initialize DB8500 PRCMU regs
  ...

Fix up trivial conflicts in
	arch/arm/mach-imx/mach-mx31moboard.c
	arch/arm/mach-omap2/board-omap3beagle.c
	arch/arm/mach-u300/include/mach/irqs.h
	drivers/mfd/wm831x-spi.c
2011-11-03 09:40:51 -07:00
Linus Torvalds
6681ba7ec4 Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (21 commits)
  MAINTAINERS: add an entry for Edac Sandy Bridge driver
  edac: tag sb_edac as EXPERIMENTAL, as it requires more testing
  EDAC: Fix incorrect edac mode reporting in sb_edac
  edac: sb_edac: Add it to the building system
  edac: Add an experimental new driver to support Sandy Bridge CPU's
  i7300_edac: Fix error cleanup logic
  i7core_edac: Initialize memory name with cpu, channel, bank
  i7core_edac: Fix compilation on 32 bits arch
  i7core_edac: scrubbing fixups
  EDAC: Correct Kconfig dependencies
  i7core_edac: return -ENODEV if no MC is found
  i7core_edac: use edac's own way to print errors
  MAINTAINERS: remove dropped edac_mce.* from the file
  i7core_edac: Drop the edac_mce facility
  x86, MCE: Use notifier chain only for MCE decoding
  EDAC i7core: Use mce socketid for better compatibility
  i7core_edac: Don't enable memory scrubbing for Xeon 35xx
  i7core_edac: Add scrubbing support
  edac: Move edac main structs to include/linux/edac.h
  i7core_edac: Fix oops when trying to inject errors
  ...
2011-11-02 16:55:15 -07:00
Linus Torvalds
092f4c56c1 Merge branch 'akpm' (Andrew's incoming - part two)
Says Andrew:

 "60 patches.  That's good enough for -rc1 I guess.  I have quite a lot
  of detritus to be rechecked, work through maintainers, etc.

 - most of the remains of MM
 - rtc
 - various misc
 - cgroups
 - memcg
 - cpusets
 - procfs
 - ipc
 - rapidio
 - sysctl
 - pps
 - w1
 - drivers/misc
 - aio"

* akpm: (60 commits)
  memcg: replace ss->id_lock with a rwlock
  aio: allocate kiocbs in batches
  drivers/misc/vmw_balloon.c: fix typo in code comment
  drivers/misc/vmw_balloon.c: determine page allocation flag can_sleep outside loop
  w1: disable irqs in critical section
  drivers/w1/w1_int.c: multiple masters used same init_name
  drivers/power/ds2780_battery.c: fix deadlock upon insertion and removal
  drivers/power/ds2780_battery.c: add a nolock function to w1 interface
  drivers/power/ds2780_battery.c: create central point for calling w1 interface
  w1: ds2760 and ds2780, use ida for id and ida_simple_get() to get it
  pps gpio client: add missing dependency
  pps: new client driver using GPIO
  pps: default echo function
  include/linux/dma-mapping.h: add dma_zalloc_coherent()
  sysctl: make CONFIG_SYSCTL_SYSCALL default to n
  sysctl: add support for poll()
  RapidIO: documentation update
  drivers/net/rionet.c: fix ethernet address macros for LE platforms
  RapidIO: fix potential null deref in rio_setup_device()
  RapidIO: add mport driver for Tsi721 bridge
  ...
2011-11-02 16:07:27 -07:00
Andrea Arcangeli
b35a35b556 thp: share get_huge_page_tail()
This avoids duplicating the function in every arch gup_fast.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:58 -07:00
Andrea Arcangeli
70b50f94f1 mm: thp: tail page refcounting fix
Michel while working on the working set estimation code, noticed that
calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
wasn't safe, if the pfn ended up being a tail page of a transparent
hugepage under splitting by __split_huge_page_refcount().

He then found the problem could also theoretically materialize with
page_cache_get_speculative() during the speculative radix tree lookups
that uses get_page_unless_zero() in SMP if the radix tree page is freed
and reallocated and get_user_pages is called on it before
page_cache_get_speculative has a chance to call get_page_unless_zero().

So the best way to fix the problem is to keep page_tail->_count zero at
all times.  This will guarantee that get_page_unless_zero() can never
succeed on any tail page.  page_tail->_mapcount is guaranteed zero and
is unused for all tail pages of a compound page, so we can simply
account the tail page references there and transfer them to
tail_page->_count in __split_huge_page_refcount() (in addition to the
head_page->_mapcount).

While debugging this s/_count/_mapcount/ change I also noticed get_page is
called by direct-io.c on pages returned by get_user_pages.  That wasn't
entirely safe because the two atomic_inc in get_page weren't atomic.  As
opposed to other get_user_page users like secondary-MMU page fault to
establish the shadow pagetables would never call any superflous get_page
after get_user_page returns.  It's safer to make get_page universally safe
for tail pages and to use get_page_foll() within follow_page (inside
get_user_pages()).  get_page_foll() is safe to do the refcounting for tail
pages without taking any locks because it is run within PT lock protected
critical sections (PT lock for pte and page_table_lock for
pmd_trans_huge).

The standard get_page() as invoked by direct-io instead will now take
the compound_lock but still only for tail pages.  The direct-io paths
are usually I/O bound and the compound_lock is per THP so very
finegrined, so there's no risk of scalability issues with it.  A simple
direct-io benchmarks with all lockdep prove locking and spinlock
debugging infrastructure enabled shows identical performance and no
overhead.  So it's worth it.  Ideally direct-io should stop calling
get_page() on pages returned by get_user_pages().  The spinlock in
get_page() is already optimized away for no-THP builds but doing
get_page() on tail pages returned by GUP is generally a rare operation
and usually only run in I/O paths.

This new refcounting on page_tail->_mapcount in addition to avoiding new
RCU critical sections will also allow the working set estimation code to
work without any further complexity associated to the tail page
refcounting with THP.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:57 -07:00
Linus Torvalds
de0a5345a5 Merge branch 'for-linus' of git://github.com/richardweinberger/linux
* 'for-linus' of git://github.com/richardweinberger/linux: (90 commits)
  um: fix ubd cow size
  um: Fix kmalloc argument order in um/vdso/vma.c
  um: switch to use of drivers/Kconfig
  UserModeLinux-HOWTO.txt: fix a typo
  UserModeLinux-HOWTO.txt: remove ^H characters
  um: we need sys/user.h only on i386
  um: merge delay_{32,64}.c
  um: distribute exports to where exported stuff is defined
  um: kill system-um.h
  um: generic ftrace.h will do...
  um: segment.h is x86-only and needed only there
  um: asm/pda.h is not needed anymore
  um: hw_irq.h can go generic as well
  um: switch to generic-y
  um: clean Kconfig up a bit
  um: a couple of missing dependencies...
  um: kill useless argument of free_chan() and free_one_chan()
  um: unify ptrace_user.h
  um: unify KSTK_...
  um: fix gcov build breakage
  ...
2011-11-02 09:45:39 -07:00
Dave Jones
0d65ede0a6 um: Fix kmalloc argument order in um/vdso/vma.c
kmalloc size is 1st arg, not second.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Richard Weinberger <richard@nod.at>

Cc: <stable@kernel.org> # 3.0.x
[richard@nod.at: on 3.0 the to be patched file is
arch/um/sys-x86_64/vdso/vma.c]
2011-11-02 14:15:42 +01:00
Richard Weinberger
38b64aed78 um: we need sys/user.h only on i386
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:38 +01:00
Richard Weinberger
d0af6cbfa2 um: merge delay_{32,64}.c
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:37 +01:00
Al Viro
a34978cbd9 um: kill system-um.h
most of it belonged in irqflags.h, actually

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:34 +01:00
Al Viro
46ecca8ae1 um: segment.h is x86-only and needed only there
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:33 +01:00
Al Viro
966e803ab1 um: unify ptrace_user.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:27 +01:00
Al Viro
a10c95d84c um: unify KSTK_...
... and switch get_thread_register() to HOST_... for register numbers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:26 +01:00
Al Viro
4d211093e8 um: fix gcov build breakage
a) exports in gmon_syms.c duplicate kernel/gcov/* ones
b) excluding -pg in vdso compile is not enough - -fprofile-arcs
and -ftest-coverage also needs to be excluded

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:26 +01:00
Al Viro
3fb77d7256 um: irq_vectors.h just shadows x86 one
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:24 +01:00
Al Viro
ff9586e98f um: required-features.h is there only to shadow x86 one...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:23 +01:00
Al Viro
8807c1d561 um: asm/apic.h is there only to shadow the x86 one...
... so take it to arch/um/x86/asm.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:22 +01:00
Al Viro
b3ee571e58 um: take ldt.h to arch/x86/um/asm/mm_context.h
it's x86-only and we have no business playing with it in asm/mmu.h; make
the latter have
	struct uml_arch_mm_context arch;
instead of
	struct uml_ldt ldt;
and let arch/<subarch>/um/asm/mm_context.h decide what'll be in there.
While we are at it, kill host_ldt.h - it's not needed in part of places
that include it (we want asm/ldt.h in those) and it can be trivially
expanded into the single remaining one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:21 +01:00
Al Viro
f67aa2ffb7 um: merge signal_{32,64}.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:20 +01:00
Al Viro
fbe9868693 um: no need to play with save_sp in signal frame setup anymore
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:19 +01:00
Al Viro
c7ea591c91 um: increase stack growth cushion in pagefault
analog of [PATCH] i386: let usermode execute the "enter" instruction from
circa 2006.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:19 +01:00
Al Viro
3579a38973 um: merge HOST_... of registers common on i386 and amd64
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:18 +01:00
Al Viro
8edc4147be um: sanitize paths in sys_call_table* includes
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:18 +01:00
Al Viro
1bbd5f21f4 um: merge os-Linux/tls.c into arch/x86/um/os-Linux/tls.c
it's i386-specific; moreover, analogs on other targets have
incompatible interface - PTRACE_GET_THREAD_AREA does exist
elsewhere, but struct user_desc does *not*

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:17 +01:00
Al Viro
c5cc32fe14 um: move asm/desc.h into arch/x86/um/asm
its only purpose is to shadow the x86 asm/desc.h

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:16 +01:00
Al Viro
2014d01878 um: merge host_ldt_{32,64}.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:15 +01:00
Al Viro
09e129a603 um: merge tls_{32,64}.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:15 +01:00
Al Viro
5ade8878e0 um: kill shared/task.h and HOST_TASK_REGS
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:08 +01:00
Al Viro
0acdbbeb6d um: bury unused macros around ptrace.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:05 +01:00
Al Viro
5c48b108ec um: take arch/um/sys-x86 to arch/x86/um
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:05 +01:00
Linus Torvalds
dc47d3810c Merge git://github.com/herbertx/crypto
* git://github.com/herbertx/crypto: (48 commits)
  crypto: user - Depend on NET instead of selecting it
  crypto: user - Add dependency on NET
  crypto: talitos - handle descriptor not found in error path
  crypto: user - Initialise match in crypto_alg_match
  crypto: testmgr - add twofish tests
  crypto: testmgr - add blowfish test-vectors
  crypto: Make hifn_795x build depend on !ARCH_DMA_ADDR_T_64BIT
  crypto: twofish-x86_64-3way - fix ctr blocksize to 1
  crypto: blowfish-x86_64 - fix ctr blocksize to 1
  crypto: whirlpool - count rounds from 0
  crypto: Add userspace report for compress type algorithms
  crypto: Add userspace report for cipher type algorithms
  crypto: Add userspace report for rng type algorithms
  crypto: Add userspace report for pcompress type algorithms
  crypto: Add userspace report for nivaead type algorithms
  crypto: Add userspace report for aead type algorithms
  crypto: Add userspace report for givcipher type algorithms
  crypto: Add userspace report for ablkcipher type algorithms
  crypto: Add userspace report for blkcipher type algorithms
  crypto: Add userspace report for ahash type algorithms
  ...
2011-11-01 09:24:41 -07:00
Borislav Petkov
4140c54266 i7core_edac: Drop the edac_mce facility
Remove edac_mce pieces and use the normal MCE decoder notifier chain by
retaining the same functionality with considerably less code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01 10:01:24 -02:00
Christopher Yeoh
fcf634098c Cross Memory Attach
The basic idea behind cross memory attach is to allow MPI programs doing
intra-node communication to do a single copy of the message rather than a
double copy of the message via shared memory.

The following patch attempts to achieve this by allowing a destination
process, given an address and size from a source process, to copy memory
directly from the source process into its own address space via a system
call.  There is also a symmetrical ability to copy from the current
process's address space into a destination process's address space.

- Use of /proc/pid/mem has been considered, but there are issues with
  using it:
  - Does not allow for specifying iovecs for both src and dest, assuming
    preadv or pwritev was implemented either the area read from or
  written to would need to be contiguous.
  - Currently mem_read allows only processes who are currently
  ptrace'ing the target and are still able to ptrace the target to read
  from the target. This check could possibly be moved to the open call,
  but its not clear exactly what race this restriction is stopping
  (reason  appears to have been lost)
  - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
  domain socket is a bit ugly from a userspace point of view,
  especially when you may have hundreds if not (eventually) thousands
  of processes  that all need to do this with each other
  - Doesn't allow for some future use of the interface we would like to
  consider adding in the future (see below)
  - Interestingly reading from /proc/pid/mem currently actually
  involves two copies! (But this could be fixed pretty easily)

As mentioned previously use of vmsplice instead was considered, but has
problems.  Since you need the reader and writer working co-operatively if
the pipe is not drained then you block.  Which requires some wrapping to
do non blocking on the send side or polling on the receive.  In all to all
communication it requires ordering otherwise you can deadlock.  And in the
example of many MPI tasks writing to one MPI task vmsplice serialises the
copying.

There are some cases of MPI collectives where even a single copy interface
does not get us the performance gain we could.  For example in an
MPI_Reduce rather than copy the data from the source we would like to
instead use it directly in a mathops (say the reduce is doing a sum) as
this would save us doing a copy.  We don't need to keep a copy of the data
from the source.  I haven't implemented this, but I think this interface
could in the future do all this through the use of the flags - eg could
specify the math operation and type and the kernel rather than just
copying the data would apply the specified operation between the source
and destination and store it in the destination.

Although we don't have a "second user" of the interface (though I've had
some nibbles from people who may be interested in using it for intra
process messaging which is not MPI).  This interface is something which
hardware vendors are already doing for their custom drivers to implement
fast local communication.  And so in addition to this being useful for
OpenMPI it would mean the driver maintainers don't have to fix things up
when the mm changes.

There was some discussion about how much faster a true zero copy would
go. Here's a link back to the email with some testing I did on that:

http://marc.info/?l=linux-mm&m=130105930902915&w=2

There is a basic man page for the proposed interface here:

http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

This has been implemented for x86 and powerpc, other architecture should
mainly (I think) just need to add syscall numbers for the process_vm_readv
and process_vm_writev. There are 32 bit compatibility versions for
64-bit kernels.

For arch maintainers there are some simple tests to be able to quickly
verify that the syscalls are working correctly here:

http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz

Signed-off-by: Chris Yeoh <yeohc@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: <linux-man@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:44 -07:00
Paul Gortmaker
39a0e33da0 lguest: add export.h to lguest files for THIS_MODULE/EXPORT_SYMBOL
We need this in advance of the module.h cleanup, or we'll
get compile errors like this:

  CC      drivers/lguest/lguest_device.o
drivers/lguest/lguest_device.c: In function ‘lguest_devices_init’:
drivers/lguest/lguest_device.c:490: error: ‘THIS_MODULE’ undeclared (first use in this function)

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:13 -04:00
Paul Gortmaker
783ac47c2a x86: efi_32.c is implicitly getting asm/desc.h via module.h
We want to clean up the chain of includes stumbling through
module.h, and when we do that, we'll see:

  CC      arch/x86/platform/efi/efi_32.o
  efi/efi_32.c: In function ‘efi_call_phys_prelog’:
  efi/efi_32.c:80: error: implicit declaration of function ‘get_cpu_gdt_table’
  efi/efi_32.c:82: error: implicit declaration of function ‘load_gdt’
  make[4]: *** [arch/x86/platform/efi/efi_32.o] Error 1

Include asm/desc.h so that there are no implicit include assumptions.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:36 -04:00
Paul Gortmaker
7c52d55170 x86: fix up files really needing to include module.h
These files aren't just exporting symbols -- they are also defining
a MODULE_LICENSE etc. so give them the full module.h file.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:36 -04:00
Paul Gortmaker
69c60c88ee x86: Fix files explicitly requiring export.h for EXPORT_SYMBOL/THIS_MODULE
These files were implicitly getting EXPORT_SYMBOL via device.h
which was including module.h, but that will be fixed up shortly.

By fixing these now, we can avoid seeing things like:

arch/x86/kernel/rtc.c:29: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’
arch/x86/kernel/pci-dma.c:20: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’
arch/x86/kernel/e820.c:69: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL_GPL’

[ with input from Randy Dunlap <rdunlap@xenotime.net> and also
  from Stephen Rothwell <sfr@canb.auug.org.au> ]

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:35 -04:00
Paul Gortmaker
2957402297 x86: fix implicit include of <linux/topology.h> in vsyscall_64
In removing the presence of <linux/module.h> from some of the
more common <linux/something.h> files, this implict include
of <linux/topology.h> was uncovered.

  CC      arch/x86/kernel/vsyscall_64.o
  arch/x86/kernel/vsyscall_64.c: In function ‘vsyscall_set_cpu’:
  arch/x86/kernel/vsyscall_64.c:259: error: implicit declaration of function ‘cpu_to_node’

Explicitly call it out so the cleanup can take place.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:34 -04:00
Paul Bolle
4f4d7a9ba0 x86: drop unused Kconfig symbol
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
2011-10-31 23:39:52 +01:00
Borislav Petkov
f0cb545243 x86, MCE: Use notifier chain only for MCE decoding
Drop the edac_mce custom hook in favor of the generic notifier
mechanism. Also, do not log the error to mcelog if the notified agent
was able to decode it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-10-31 15:10:05 -02:00
Linus Torvalds
0cfdc72439 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (33 commits)
  iommu/core: Remove global iommu_ops and register_iommu
  iommu/msm: Use bus_set_iommu instead of register_iommu
  iommu/omap: Use bus_set_iommu instead of register_iommu
  iommu/vt-d: Use bus_set_iommu instead of register_iommu
  iommu/amd: Use bus_set_iommu instead of register_iommu
  iommu/core: Use bus->iommu_ops in the iommu-api
  iommu/core: Convert iommu_found to iommu_present
  iommu/core: Add bus_type parameter to iommu_domain_alloc
  Driver core: Add iommu_ops to bus_type
  iommu/core: Define iommu_ops and register_iommu only with CONFIG_IOMMU_API
  iommu/amd: Fix wrong shift direction
  iommu/omap: always provide iommu debug code
  iommu/core: let drivers know if an iommu fault handler isn't installed
  iommu/core: export iommu_set_fault_handler()
  iommu/omap: Fix build error with !IOMMU_SUPPORT
  iommu/omap: Migrate to the generic fault report mechanism
  iommu/core: Add fault reporting mechanism
  iommu/core: Use PAGE_SIZE instead of hard-coded value
  iommu/core: use the existing IS_ALIGNED macro
  iommu/msm: ->unmap() should return order of unmapped page
  ...

Fixup trivial conflicts in drivers/iommu/Makefile: "move omap iommu to
dedicated iommu folder" vs "Rename the DMAR and INTR_REMAP config
options" just happened to touch lines next to each other.
2011-10-30 15:46:19 -07:00
Linus Torvalds
1bc87b0055 Merge branch 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (75 commits)
  KVM: SVM: Keep intercepting task switching with NPT enabled
  KVM: s390: implement sigp external call
  KVM: s390: fix register setting
  KVM: s390: fix return value of kvm_arch_init_vm
  KVM: s390: check cpu_id prior to using it
  KVM: emulate lapic tsc deadline timer for guest
  x86: TSC deadline definitions
  KVM: Fix simultaneous NMIs
  KVM: x86 emulator: convert push %sreg/pop %sreg to direct decode
  KVM: x86 emulator: switch lds/les/lss/lfs/lgs to direct decode
  KVM: x86 emulator: streamline decode of segment registers
  KVM: x86 emulator: simplify OpMem64 decode
  KVM: x86 emulator: switch src decode to decode_operand()
  KVM: x86 emulator: qualify OpReg inhibit_byte_regs hack
  KVM: x86 emulator: switch OpImmUByte decode to decode_imm()
  KVM: x86 emulator: free up some flag bits near src, dst
  KVM: x86 emulator: switch src2 to generic decode_operand()
  KVM: x86 emulator: expand decode flags to 64 bits
  KVM: x86 emulator: split dst decode to a generic decode_operand()
  KVM: x86 emulator: move memop, memopp into emulation context
  ...
2011-10-30 15:36:45 -07:00
Jan Kiszka
f1c1da2bde KVM: SVM: Keep intercepting task switching with NPT enabled
AMD processors apparently have a bug in the hardware task switching
support when NPT is enabled. If the task switch triggers a NPF, we can
get wrong EXITINTINFO along with that fault. On resume, spurious
exceptions may then be injected into the guest.

We were able to reproduce this bug when our guest triggered #SS and the
handler were supposed to run over a separate task with not yet touched
stack pages.

Work around the issue by continuing to emulate task switches even in
NPT mode.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-10-30 12:24:10 +02:00
Linus Torvalds
ce949717b5 Merge git://github.com/rustyrussell/linux
* git://github.com/rustyrussell/linux:
  lguest: move process freezing before pending signals check
  lguest: don't allow KVM-detection cpuid.
  lguest: Allow running under paravirt-enabled KVM.
2011-10-29 07:52:16 -07:00
Linus Torvalds
0e59e7e7fe Merge branch 'next-rebase' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci
* 'next-rebase' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci:
  PCI: Clean-up MPS debug output
  pci: Clamp pcie_set_readrq() when using "performance" settings
  PCI: enable MPS "performance" setting to properly handle bridge MPS
  PCI: Workaround for Intel MPS errata
  PCI: Add support for PASID capability
  PCI: Add implementation for PRI capability
  PCI: Export ATS functions to modules
  PCI: Move ATS implementation into own file
  PCI / PM: Remove unnecessary error variable from acpi_dev_run_wake()
  PCI hotplug: acpiphp: Prevent deadlock on PCI-to-PCI bridge remove
  PCI / PM: Extend PME polling to all PCI devices
  PCI quirk: mmc: Always check for lower base frequency quirk for Ricoh 1180:e823
  PCI: Make pci_setup_bridge() non-static for use by arch code
  x86: constify PCI raw ops structures
  PCI: Add quirk for known incorrect MPSS
  PCI: Add Solarflare vendor ID and SFC4000 device IDs
2011-10-28 14:20:44 -07:00