This patch introduces the new syscall pipe2 which is like pipe but it also
takes an additional parameter which takes a flag value. This patch implements
the handling of O_CLOEXEC for the flag. I did not add support for the new
syscall for the architectures which have a special sys_pipe implementation. I
think the maintainers of those archs have the chance to go with the unified
implementation but that's up to them.
The implementation introduces do_pipe_flags. I did that instead of changing
all callers of do_pipe because some of the callers are written in assembler.
I would probably screw up changing the assembly code. To avoid breaking code
do_pipe is now a small wrapper around do_pipe_flags. Once all callers are
changed over to do_pipe_flags the old do_pipe function can be removed.
The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#ifndef __NR_pipe2
# ifdef __x86_64__
# define __NR_pipe2 293
# elif defined __i386__
# define __NR_pipe2 331
# else
# error "need __NR_pipe2"
# endif
#endif
int
main (void)
{
int fd[2];
if (syscall (__NR_pipe2, fd, 0) != 0)
{
puts ("pipe2(0) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
int coe = fcntl (fd[i], F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
return 1;
}
}
close (fd[0]);
close (fd[1]);
if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
{
puts ("pipe2(O_CLOEXEC) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
int coe = fcntl (fd[i], F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
return 1;
}
}
close (fd[0]);
close (fd[1]);
puts ("OK");
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
boundary. For example:
u64 val = PAGE_ALIGN(size);
always returns a value < 4GB even if size is greater than 4GB.
The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
example):
#define PAGE_SHIFT 12
#define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT)
#define PAGE_MASK (~(PAGE_SIZE-1))
...
#define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK)
The "~" is performed on a 32-bit value, so everything in "and" with
PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
Using the ALIGN() macro seems to be the right way, because it uses
typeof(addr) for the mask.
Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
include/linux/mm.h.
See also lkml discussion: http://lkml.org/lkml/2008/6/11/237
[akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c]
[akpm@linux-foundation.org: fix v850]
[akpm@linux-foundation.org: fix powerpc]
[akpm@linux-foundation.org: fix arm]
[akpm@linux-foundation.org: fix mips]
[akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c]
[akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c]
[akpm@linux-foundation.org: fix powerpc]
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Straight forward extensions for huge pages located in the PUD instead of
PMDs.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The goal of this patchset is to support multiple hugetlb page sizes. This
is achieved by introducing a new struct hstate structure, which
encapsulates the important hugetlb state and constants (eg. huge page
size, number of huge pages currently allocated, etc).
The hstate structure is then passed around the code which requires these
fields, they will do the right thing regardless of the exact hstate they
are operating on.
This patch adds the hstate structure, with a single global instance of it
(default_hstate), and does the basic work of converting hugetlb to use the
hstate.
Future patches will add more hstate structures to allow for different
hugetlbfs mounts to have different page sizes.
[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Adam Litke <agl@us.ibm.com>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are a lot of places that define either a single bootmem descriptor or an
array of them. Use only one central array with MAX_NUMNODES items instead.
Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
remove CONFIG_KMOD from core kernel code
remove CONFIG_KMOD from lib
remove CONFIG_KMOD from sparc64
rework try_then_request_module to do less in non-modular kernels
remove mention of CONFIG_KMOD from documentation
make CONFIG_KMOD invisible
modules: Take a shortcut for checking if an address is in a module
module: turn longs into ints for module sizes
Shrink struct module: CONFIG_UNUSED_SYMBOLS ifdefs
module: reorder struct module to save space on 64 bit builds
module: generic each_symbol iterator function
module: don't use stop_machine for waiting rmmod
One place is just a comment, the other a conditional, unused
inclusion of linux/kmod.h.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This converts all instances of bus_id in the sparc core kernel to use
either dev_set_name(), or dev_name() depending on the need.
This is done in anticipation of removing the bus_id field from struct
driver.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This allow to dynamically generate attributes and share show/store
functions between attributes. Right now most attributes are generated
by special macros and lots of duplicated code. With the attribute
passed it's instead possible to attach some data to the attribute
and then use that in shared low level functions to do different things.
I need this for the dynamically generated bank attributes in the x86
machine check code, but it'll allow some further cleanups.
I converted all users in tree to the new show/store prototype. It's a single
huge patch to avoid unbisectable sections.
Runtime tested: x86-32, x86-64
Compiled only: ia64, powerpc
Not compile tested/only grep converted: sh, arm, avr32
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Kobjects do not have a limit in name size since a while, so stop
pretending that they do.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Adrian Bunk reported that enabling 4MB page size breaks the build.
The problem is that MAX_ORDER combined with the page shift exceeds the
SECTION_SIZE_BITS we use in asm-sparc64/sparsemem.h
There are several ways I suppose we could work around this. For one
we could define a CONFIG_FORCE_MAX_ZONEORDER to decrease MAX_ORDER in
these higher page size cases.
But I also know that these page size cases are broken wrt. TLB miss
handling especially on pre-hypervisor systems, and there isn't an easy
way to fix that.
These options were meant to be fun experimental hacks anyways, and
only 8K and 64K make any sense to support.
So remove 512K and 4M base page size support. Of course, we still
support these page sizes for huge pages.
Signed-off-by: David S. Miller <davem@davemloft.net>
kernel bugzilla #11059:
sparc64 config menu is missing "Processor type and features",
so add that and move General Setup before Processor menu.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Miller noticed that the build of vmlinux.lds
failed to use the -m64 specifier.
This caused the build to break with a bi-arch gcc with
unified headers.
Add the -m64 option to CPPFLAGS_vmlinux.lds so we
have the correct defines available when building
vmliux.lds.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (72 commits)
Revert "x86/PCI: ACPI based PCI gap calculation"
PCI: remove unnecessary volatile in PCIe hotplug struct controller
x86/PCI: ACPI based PCI gap calculation
PCI: include linux/pm_wakeup.h for device_set_wakeup_capable
PCI PM: Fix pci_prepare_to_sleep
x86/PCI: Fix PCI config space for domains > 0
Fix acpi_pm_device_sleep_wake() by providing a stub for CONFIG_PM_SLEEP=n
PCI: Simplify PCI device PM code
PCI PM: Introduce pci_prepare_to_sleep and pci_back_from_sleep
PCI ACPI: Rework PCI handling of wake-up
ACPI: Introduce new device wakeup flag 'prepared'
ACPI: Introduce acpi_device_sleep_wake function
PCI: rework pci_set_power_state function to call platform first
PCI: Introduce platform_pci_power_manageable function
ACPI: Introduce acpi_bus_power_manageable function
PCI: make pci_name use dev_name
PCI: handle pci_name() being const
PCI: add stub for pci_set_consistent_dma_mask()
PCI: remove unused arch pcibios_update_resource() functions
PCI: fix pci_setup_device()'s sprinting into a const buffer
...
Fixed up conflicts in various files (arch/x86/kernel/setup_64.c,
arch/x86/pci/irq.c, arch/x86/pci/pci.h, drivers/acpi/sleep/main.c,
drivers/pci/pci.c, drivers/pci/pci.h, include/acpi/acpi_bus.h) from x86
and ACPI updates manually.
* 'tracing/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (228 commits)
ftrace: build fix for ftraced_suspend
ftrace: separate out the function enabled variable
ftrace: add ftrace_kill_atomic
ftrace: use current CPU for function startup
ftrace: start wakeup tracing after setting function tracer
ftrace: check proper config for preempt type
ftrace: trace schedule
ftrace: define function trace nop
ftrace: move sched_switch enable after markers
ftrace: prevent ftrace modifications while being kprobe'd, v2
fix "ftrace: store mcount address in rec->ip"
mmiotrace broken in linux-next (8-bit writes only)
ftrace: avoid modifying kprobe'd records
ftrace: freeze kprobe'd records
kprobes: enable clean usage of get_kprobe
ftrace: store mcount address in rec->ip
ftrace: build fix with gcc 4.3
namespacecheck: fixes
ftrace: fix "notrace" filtering priority
ftrace: fix printout
...
Today's linux-next build (spac64 allmodconfig) failed like this:
arch/sparc64/kernel/stacktrace.c:50: warning: type defaults to `int' in declaration of `EXPORT_SYMBOL_GPL'
arch/sparc64/kernel/stacktrace.c:50: warning: parameter names (without types) in function declaration
arch/sparc64/kernel/stacktrace.c:50: warning: data definition has no type or storage class
Signed-off-by: Stephen Rothwell <sf@canb.auug.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Also fixes up the sparc code that was assuming this is not a constant.
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Andrew Morton reported this against linux-next:
ERROR: ".save_stack_trace" [tests/backtracetest.ko] undefined!
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Alexander Beregalov reported this build failure:
$ make CROSS_COMPILE=sparc64-unknown-linux-gnu- image modules && sudo
make modules_install
CHK include/linux/version.h
CHK include/linux/utsrelease.h
CALL scripts/checksyscalls.sh
CHK include/linux/compile.h
dnsdomainname: Unknown host
CC arch/sparc64/kernel/sparc64_ksyms.o
arch/sparc64/kernel/sparc64_ksyms.c:116: error: '_mcount' undeclared
here (not in a function)
cc1: warnings being treated as errors
arch/sparc64/kernel/sparc64_ksyms.c:116: error: type defaults to 'int'
in declaration of '_mcount'
And bisected it back to:
| commit 395a59d0f8
| Author: Abhishek Sagar <sagar.abhishek@gmail.com>
| Date: Sat Jun 21 23:47:27 2008 +0530
|
| ftrace: store mcount address in rec->ip
the mcount prototype is only available under CONFIG_FTRACE,
extend it to CONFIG_MCOUNT as well.
Reported-and-bisected-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This adds kernel/smp.c which contains helpers for IPI function calls. In
addition to supporting the existing smp_call_function() in a more efficient
manner, it also adds a more scalable variant called smp_call_function_single()
for calling a given function on a single CPU only.
The core of this is based on the x86-64 patch from Nick Piggin, lots of
changes since then. "Alan D. Brunelle" <Alan.Brunelle@hp.com> has
contributed lots of fixes and suggestions as well. Also thanks to
Paul E. McKenney <paulmck@linux.vnet.ibm.com> for reviewing RCU usage
and getting rid of the data allocation fallback deadlock.
Acked-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Record the address of the mcount call-site. Currently all archs except sparc64
record the address of the instruction following the mcount call-site. Some
general cleanups are entailed. Storing mcount addresses in rec->ip enables
looking them up in the kprobe hash table later on to check if they're kprobe'd.
Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: davem@davemloft.net
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that ftrace is being ported to other architectures, it has become
apparent that DYNAMIC_FTRACE is dependent on whether or not that
architecture implements dynamic ftrace. FTRACE itself may be ported to
an architecture without porting dynamic ftrace.
This patch adds HAVE_DYNAMIC_FTRACE to allow architectures to port ftrace
without having to also port the dynamic aspect as well.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When we fully commit to returning back to kernel mode from
a trap, zero out the regs->magic value to prevent false
positives during stack backtraces.
Signed-off-by: David S. Miller <davem@davemloft.net>
The offset to the pt_regs area was wrong, so we weren't
looking at the right location for the magic cookie.
A trap frame is composed of a "struct sparc_stackf" then
a "struct pt_regs", the code was using "struct reg_window"
instead of "struct sparc_stackf".
Signed-off-by: David S. Miller <davem@davemloft.net>
Because of the silly way I set up the initial stack for
new kernel threads, there is a loop at the top of the
stack.
To fix this, properly add another stack frame that is copied
from the parent and terminate it in the child by setting
the frame pointer in that frame to zero.
Signed-off-by: David S. Miller <davem@davemloft.net>
When a cpu really is stuck in the kernel, it can be often
impossible to figure out which cpu is stuck where. The
worst case is when the stuck cpu has interrupts disabled.
Therefore, implement a global cpu state capture that uses
SMP message interrupts which are not disabled by the
normal IRQ enable/disable APIs of the kernel.
As long as we can get a sysrq 'y' to the kernel, we can
get a dump. Even if the console interrupt cpu is wedged,
we can trigger it from userspace using /proc/sysrq-trigger
The output is made compact so that this facility is more
useful on high cpu count systems, which is where this
facility will likely find itself the most useful :)
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the CVS keywords that weren't updated for a long time
from comments.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
[PATCH] return to old errno choice in mkdir() et.al.
[Patch] fs/binfmt_elf.c: fix wrong return values
[PATCH] get rid of leak in compat_execve()
[Patch] fs/binfmt_elf.c: fix a wrong free
[PATCH] avoid multiplication overflows and signedness issues for max_fds
[PATCH] dup_fd() part 4 - race fix
[PATCH] dup_fd() - part 3
[PATCH] dup_fd() part 2
[PATCH] dup_fd() fixes, part 1
[PATCH] take init_files to fs/file.c
This fixes the missing ram regression reported by
Mikael Pettersson <mikpe@it.uu.se>, much thanks for
all of this help in diagnosing this.
The second argument to lmb_reserve() is a size,
not an end address bounds.
Tested-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just like mmap, we need to validate address ranges regardless
of MAP_FIXED.
sparc{,64}_mmap_check()'s flag argument is unused, remove.
Based upon a report and preliminary patch by
Jan Lieskovsky <jlieskov@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Read all of the OF memory and translation tables, then read
the physical available memory list twice.
When making these requests, OF can allocate more memory to
do it's job, which can remove pages from the available
memory list.
So fetch in all of the tables at once, and fetch the available
list last to make sure we read a stable value.
Signed-off-by: David S. Miller <davem@davemloft.net>
So, forever, we've had this ptrace_signal_deliver implementation
which tries to handle all of the nasties that can occur when the
debugger looks at a process about to take a signal. It's meant
to address all of these issues inside of the kernel so that the
debugger need not be mindful of such things.
Problem is, this doesn't work.
The idea was that we should do the syscall restart business first, so
that the debugger captures that state. Otherwise, if the debugger for
example saves the child's state, makes the child execute something
else, then restores the saved state, we won't handle the syscall
restart properly because we lose the "we're in a syscall" state.
The code here worked for most cases, but if the debugger actually
passes the signal through to the child unaltered, it's possible that
we would do a syscall restart when we shouldn't have.
In particular this breaks the case of debugging a process under a gdb
which is being debugged by yet another gdb. gdb uses sigsuspend
to wait for SIGCHLD of the inferior, but if gdb itself is being
debugged by a top-level gdb we get a ptrace_stop(). The top-level gdb
does a PTRACE_CONT with SIGCHLD to let the inferior gdb see the
signal. But ptrace_signal_deliver() assumed the debugger would cancel
out the signal and therefore did a syscall restart, because the return
error was ERESTARTNOHAND.
Fix this by simply making ptrace_signal_deliver() a nop, and providing
a way for the debugger to control system call restarting properly:
1) Report a "in syscall" software bit in regs->{tstate,psr}.
It is set early on in trap entry to a system call and is fully
visible to the debugger via ptrace() and regsets.
2) Test this bit right before doing a syscall restart. We have
to do a final recheck right after get_signal_to_deliver() in
case the debugger cleared the bit during ptrace_stop().
3) Clear the bit in trap return so we don't accidently try to set
that bit in the real register.
As a result we also get a ptrace_{is,clear}_syscall() for sparc32 just
like sparc64 has.
M68K has this same exact bug, and is now the only other user of the
ptrace_signal_deliver hook. It needs to be fixed in the same exact
way as sparc.
Signed-off-by: David S. Miller <davem@davemloft.net>
Forever we had a PTRACE_SUNOS_DETACH which was unconditionally
recognized, regardless of the personality of the process.
Unfortunately, this value is what ended up in the GLIBC sys/ptrace.h
header file on sparc as PTRACE_DETACH and PT_DETACH.
So continue to recognize this old value. Luckily, it doesn't conflict
with anything we actually care about.
Signed-off-by: David S. Miller <davem@davemloft.net>