linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-10 22:21:40 +00:00

Author	SHA1	Message	Date
Nadia Derbey	983bfb7db3	ipc: call idr_find() without locking in ipc_lock() Call idr_find() locklessly from ipc_lock(), since the idr tree is now RCU protected. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Jim Houston <jim.houston@comcast.net> Cc: Pierre Peiffer <peifferp@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:42 -07:00
Andi Kleen	a551643895	hugetlb: modular state for hugetlb page size The goal of this patchset is to support multiple hugetlb page sizes. This is achieved by introducing a new struct hstate structure, which encapsulates the important hugetlb state and constants (eg. huge page size, number of huge pages currently allocated, etc). The hstate structure is then passed around the code which requires these fields, they will do the right thing regardless of the exact hstate they are operating on. This patch adds the hstate structure, with a single global instance of it (default_hstate), and does the basic work of converting hugetlb to use the hstate. Future patches will add more hstate structures to allow for different hugetlbfs mounts to have different page sizes. [akpm@linux-foundation.org: coding-style fixes] Acked-by: Adam Litke <agl@us.ibm.com> Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:17 -07:00
David S. Miller	4ae127d1b6	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/smc911x.c	2008-06-13 20:52:39 -07:00
Paul Menage	6c826818ff	/proc/sysvipc/shm: fix 32-bit truncation of segment sizes sysvipc_shm_proc_show() picks between format strings (based on the expected maximum length of a SHM segment) in a way that prevents gcc from performing format checks on the seq_printf() parameters. This hid two format errors - shp->shm_segsz and shp->shm_nattach are both unsigned long, but were being printed as unsigned int and signed int respectively. This leads to 32-bit truncation of SHM segment sizes reported in /proc/sysvipc/shm. (And for nattach, but that's less of a problem for most users). This patch makes the format string directly visible to gcc's format specifier checker, and fixes the two broken format specifiers. Signed-off-by: Paul Menage <menage@google.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-06-12 18:05:41 -07:00
Neil Horman	c592713b3e	shm: Remove silly double assignment Found a silly double assignment of err is do_shmat. Silly, but good to clean up the useless code. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-06-10 07:58:00 -07:00
Nadia Derbey	dfcceb26f8	ipc: only output msgmni value at boot time When posting: [PATCH 1/8] Scaling msgmni to the amount of lowmem (see http://lkml.org/lkml/2008/2/11/171), I have added a KERN_INFO message that is output each time msgmni is recomputed. In http://lkml.org/lkml/2008/4/29/575 Tony Luck complained that this message references an ipc namespace address that is useless. I first thought of using an audit_log instead of a printk, as suggested by Serge Hallyn. But unfortunately, we do not have any other information than the namespace address to provide here too. So I chose to move the message and output it only at boot time, removing the reference to the namespace. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Pierre Peiffer <peifferp@gmail.com> Cc: Manfred Spraul <manfred@colorfullife.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-06-06 11:29:12 -07:00
Denis V. Lunev	9457afee85	netlink: Remove nonblock parameter from netlink_attachskb Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-05 11:23:39 -07:00
Ulrich Drepper	269f21344b	tiny mq_open optimization A very small cleanup for mq_open. We do not have to call set_close_on_exit if we create the file descriptor right away with the flag set. We have a function for this now. The resulting code is smaller and a tiny bit faster. Signed-off-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-03 13:50:33 -07:00
Denis V. Lunev	6a6375db13	sysvipc: use non-racy method for proc entries creation Use proc_create_data() to make sure that ->proc_fops and ->data be setup before gluing PDE to main tree. Signed-off-by: Denis V. Lunev <den@openvz.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:20 -07:00
Manfred Spraul	9edff4ab1f	ipc: sysvsem: implement sys_unshare(CLONE_SYSVSEM) sys_unshare(CLONE_NEWIPC) doesn't handle the undo lists properly, this can cause a kernel memory corruption. CLONE_NEWIPC must detach from the existing undo lists. Fix, part 1: add support for sys_unshare(CLONE_SYSVSEM) The original reason to not support it was the potential (inevitable?) confusion due to the fact that sys_unshare(CLONE_SYSVSEM) has the inverse meaning of clone(CLONE_SYSVSEM). Our two most reasonable options then appear to be (1) fully support CLONE_SYSVSEM, or (2) continue to refuse explicit CLONE_SYSVSEM, but always do it anyway on unshare(CLONE_SYSVSEM). This patch does (1). Changelog: Apr 16: SEH: switch to Manfred's alternative patch which removes the unshare_semundo() function which always refused CLONE_SYSVSEM. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:14 -07:00
Zhang, Yanmin	44f564a4bf	ipc: add definitions of USHORT_MAX and others Add definitions of USHORT_MAX and others into kernel. ipc uses it and slub implementation might also use it. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: "Pierre Peiffer" <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:14 -07:00
Pierre Peiffer	a5f75e7f25	IPC: consolidate all xxxctl_down() functions semctl_down(), msgctl_down() and shmctl_down() are used to handle the same set of commands for each kind of IPC. They all start to do the same job (they retrieve the ipc and do some permission checks) before handling the commands on their own. This patch proposes to consolidate this by moving these same pieces of code into one common function called ipcctl_pre_down(). It simplifies a little these xxxctl_down() functions and increases a little the maintainability. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:14 -07:00
Pierre Peiffer	8f4a3809c1	IPC: introduce ipc_update_perm() The IPC_SET command performs the same permission setting for all IPCs. This patch introduces a common ipc_update_perm() function to update these permissions and makes use of it for all IPCs. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:13 -07:00
Pierre Peiffer	016d7132f2	IPC: get rid of the use _setbuf structure. All IPCs make use of an intermetiate _setbuf structure to handle the IPC_SET command. This is not really needed and, moreover, it complicates a little bit the code. This patch gets rid of the use of it and uses directly the semid64_ds/ msgid64_ds/shmid64_ds structure. In addition of removing one struture declaration, it also simplifies and improves a little bit the common 64-bits path. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:13 -07:00
Pierre Peiffer	21a4826a7c	IPC/semaphores: remove one unused parameter from semctl_down() semctl_down() takes one unused parameter: semnum. This patch proposes to get rid of it. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:13 -07:00
Pierre Peiffer	522bb2a2b4	IPC/semaphores: move the rwmutex handling inside semctl_down semctl_down is called with the rwmutex (the one which protects the list of ipcs) taken in write mode. This patch moves this rwmutex taken in write-mode inside semctl_down. This has the advantages of reducing a little bit the window during which this rwmutex is taken, clarifying sys_semctl, and finally of having a coherent behaviour with [shm\|msg]ctl_down Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:13 -07:00
Pierre Peiffer	a0d092fc2d	IPC/message queues: introduce msgctl_down Currently, sys_msgctl is not easy to read. This patch tries to improve that by introducing the msgctl_down function to handle all commands requiring the rwmutex to be taken in write mode (ie IPC_SET and IPC_RMID for now). It is the equivalent function of semctl_down for message queues. This greatly changes the readability of sys_msgctl and also harmonizes the way these commands are handled among all IPCs. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:13 -07:00
Pierre Peiffer	8d4cc8b5c5	IPC/shared memory: introduce shmctl_down Currently, the way the different commands are handled in sys_shmctl introduces some duplicated code. This patch introduces the shmctl_down function to handle all the commands requiring the rwmutex to be taken in write mode (ie IPC_SET and IPC_RMID for now). It is the equivalent function of semctl_down for shared memory. This removes some duplicated code for handling these both commands and harmonizes the way they are handled among all IPCs. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:13 -07:00
Pierre Peiffer	6ff3797218	IPC/semaphores: code factorisation Trivial patch which adds some small locking functions and makes use of them to factorize some part of the code and to make it cleaner. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:13 -07:00
Nadia Derbey	6546bc4279	ipc: re-enable msgmni automatic recomputing msgmni if set to negative The enhancement as asked for by Yasunori: if msgmni is set to a negative value, register it back into the ipcns notifier chain. A new interface has been added to the notification mechanism: notifier_chain_cond_register() registers a notifier block only if not already registered. With that new interface we avoid taking care of the states changes in procfs. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Pierre Peiffer <pierre.peiffer@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:13 -07:00
Nadia Derbey	91cfb2b4b5	ipc: do not recompute msgmni anymore if explicitly set by user Make msgmni not recomputed anymore upon ipc namespace creation / removal or memory add/remove, as soon as it has been set from userland. As soon as msgmni is explicitly set via procfs or sysctl(), the associated callback routine is unregistered from the ipc namespace notifier chain. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Pierre Peiffer <pierre.peiffer@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:13 -07:00
Nadia Derbey	e2c284d8a8	ipc: recompute msgmni on ipc namespace creation/removal Introduce a notification mechanism that aims at recomputing msgmni each time an ipc namespace is created or removed. The ipc namespace notifier chain already defined for memory hotplug management is used for that purpose too. Each time a new ipc namespace is allocated or an existing ipc namespace is removed, the ipcns notifier chain is notified. The callback routine for each registered ipc namespace is then activated in order to recompute msgmni for that namespace. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Pierre Peiffer <pierre.peiffer@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:12 -07:00
Nadia Derbey	424450c1db	ipc: invoke the ipcns notifier chain as a work item Make the memory hotplug chain's mutex held for a shorter time: when memory is offlined or onlined a work item is added to the global workqueue. When the work item is run, it notifies the ipcns notifier chain with the IPCNS_MEMCHANGED event. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Pierre Peiffer <pierre.peiffer@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:12 -07:00
Nadia Derbey	b6b337ad1c	ipc: recompute msgmni on memory add / remove Introduce the registration of a callback routine that recomputes msg_ctlmni upon memory add / remove. A single notifier block is registered in the hotplug memory chain for all the ipc namespaces. Since the ipc namespaces are not linked together, they have their own notification chain: one notifier_block is defined per ipc namespace. Each time an ipc namespace is created (removed) it registers (unregisters) its notifier block in (from) the ipcns chain. The callback routine registered in the memory chain invokes the ipcns notifier chain with the IPCNS_LOWMEM event. Each callback routine registered in the ipcns namespace, in turn, recomputes msgmni for the owning namespace. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Pierre Peiffer <pierre.peiffer@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:12 -07:00
Nadia Derbey	4d89dc6ab2	ipc: scale msgmni to the number of ipc namespaces Since all the namespaces see the same amount of memory (the total one) this patch introduces a new variable that counts the ipc namespaces and divides msg_ctlmni by this counter. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Pierre Peiffer <pierre.peiffer@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:12 -07:00
Nadia Derbey	f7bf3df8be	ipc: scale msgmni to the amount of lowmem On large systems we'd like to allow a larger number of message queues. In some cases up to 32K. However simply setting MSGMNI to a larger value may cause problems for smaller systems. The first patch of this series introduces a default maximum number of message queue ids that scales with the amount of lowmem. Since msgmni is per namespace and there is no amount of memory dedicated to each namespace so far, the second patch of this series scales msgmni to the number of ipc namespaces too. Since msgmni depends on the amount of memory, it becomes necessary to recompute it upon memory add/remove. In the 4th patch, memory hotplug management is added: a notifier block is registered into the memory hotplug notifier chain for the ipc subsystem. Since the ipc namespaces are not linked together, they have their own notification chain: one notifier_block is defined per ipc namespace. Each time an ipc namespace is created (removed) it registers (unregisters) its notifier block in (from) the ipcns chain. The callback routine registered in the memory chain invokes the ipcns notifier chain with the IPCNS_MEMCHANGE event. Each callback routine registered in the ipcns namespace, in turn, recomputes msgmni for the owning namespace. The 5th patch makes it possible to keep the memory hotplug notifier chain's lock for a lesser amount of time: instead of directly notifying the ipcns notifier chain upon memory add/remove, a work item is added to the global workqueue. When activated, this work item is the one who notifies the ipcns notifier chain. Since msgmni depends on the number of ipc namespaces, it becomes necessary to recompute it upon ipc namespace creation / removal. The 6th patch uses the ipc namespace notifier chain for that purpose: that chain is notified each time an ipc namespace is created or removed. This makes it possible to recompute msgmni for all the namespaces each time one of them is created or removed. When msgmni is explicitely set from userspace, we should avoid recomputing it upon memory add/remove or ipcns creation/removal. This is what the 7th patch does: it simply unregisters the ipcns callback routine as soon as msgmni has been changed from procfs or sysctl(). Even if msgmni is set by hand, it should be possible to make it back automatically recomputed upon memory add/remove or ipcns creation/removal. This what is achieved in patch 8: if set to a negative value, msgmni is added back to the ipcns notifier chain, making it automatically recomputed again. This patch: Compute msg_ctlmni to make it scale with the amount of lowmem. msg_ctlmni is now set to make the message queues occupy 1/32 of the available lowmem. Some cleaning has also been done for the MSGPOOL constant: the msgctl man page says it's not used, but it also defines it as a size in bytes (the code expresses it in Kbytes). Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Pierre Peiffer <pierre.peiffer@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:12 -07:00
Pierre Peiffer	48dea404ed	IPC: use ipc_buildid() directly from ipc_addid() By continuing to consolidate a little the IPC code, each id can be built directly in ipc_addid() instead of having it built from each callers of ipc_addid() And I also remove shm_addid() in order to have, as much as possible, the same code for shm/sem/msg. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:12 -07:00
Lee Schermerhorn	52cd3b0740	mempolicy: rework mempolicy Reference Counting [yet again] After further discussion with Christoph Lameter, it has become clear that my earlier attempts to clean up the mempolicy reference counting were a bit of overkill in some areas, resulting in superflous ref/unref in what are usually fast paths. In other areas, further inspection reveals that I botched the unref for interleave policies. A separate patch, suitable for upstream/stable trees, fixes up the known errors in the previous attempt to fix reference counting. This patch reworks the memory policy referencing counting and, one hopes, simplifies the code. Maybe I'll get it right this time. See the update to the numa_memory_policy.txt document for a discussion of memory policy reference counting that motivates this patch. Summary: Lookup of mempolicy, based on (vma, address) need only add a reference for shared policy, and we need only unref the policy when finished for shared policies. So, this patch backs out all of the unneeded extra reference counting added by my previous attempt. It then unrefs only shared policies when we're finished with them, using the mpol_cond_put() [conditional put] helper function introduced by this patch. Note that shmem_swapin() calls read_swap_cache_async() with a dummy vma containing just the policy. read_swap_cache_async() can call alloc_page_vma() multiple times, so we can't let alloc_page_vma() unref the shared policy in this case. To avoid this, we make a copy of any non-null shared policy and remove the MPOL_F_SHARED flag from the copy. This copy occurs before reading a page [or multiple pages] from swap, so the overhead should not be an issue here. I introduced a new static inline function "mpol_cond_copy()" to copy the shared policy to an on-stack policy and remove the flags that would require a conditional free. The current implementation of mpol_cond_copy() assumes that the struct mempolicy contains no pointers to dynamically allocated structures that must be duplicated or reference counted during copy. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:24 -07:00
Lee Schermerhorn	ae4d8c16aa	mempolicy: fixup Fallback for Default Shmem Policy get_vma_policy() is not handling fallback to task policy correctly when the get_policy() vm_op returns NULL. The NULL overwrites the 'pol' variable that was holding the fallback task mempolicy. So, it was falling back directly to system default policy. Fix get_vma_policy() to use only non-NULL policy returned from the vma get_policy op. shm_get_policy() was falling back to current task's mempolicy if the "backing file system" [tmpfs vs hugetlbfs] does not support the get_policy vm_op and the vma policy is null. This is incorrect for show_numa_maps() which is likely querying the numa_maps of some task other than current. Remove this fallback. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:24 -07:00
Dave Hansen	4a3fd211cc	[PATCH] r/o bind mounts: elevate write count for open()s This is the first really tricky patch in the series. It elevates the writer count on a mount each time a non-special file is opened for write. We used to do this in may_open(), but Miklos pointed out that __dentry_open() is used as well to create filps. This will cover even those cases, while a call in may_open() would not have. There is also an elevated count around the vfs_create() call in open_namei(). See the comments for more details, but we need this to fix a 'create, remount, fail r/w open()' race. Some filesystems forego the use of normal vfs calls to create struct files. Make sure that these users elevate the mnt writer count because they will get __fput(), and we need to make sure they're balanced. Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-19 00:29:25 -04:00
Dave Hansen	0622753b80	[PATCH] r/o bind mounts: elevate write count for rmdir and unlink. Elevate the write count during the vfs_rmdir() and vfs_unlink(). [AV: merged rmdir and unlink parts, added missing pieces in nfsd] Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-19 00:25:33 -04:00
Lee Schermerhorn	69682d852f	mempolicy: fix reference counting bugs Address 3 known bugs in the current memory policy reference counting method. I have a series of patches to rework the reference counting to reduce overhead in the allocation path. However, that series will require testing in -mm once I repost it. 1) alloc_page_vma() does not release the extra reference taken for vma/shared mempolicy when the mode == MPOL_INTERLEAVE. This can result in leaking mempolicy structures. This is probably occurring, but not being noticed. Fix: add the conditional release of the reference. 2) hugezonelist unconditionally releases a reference on the mempolicy when mode == MPOL_INTERLEAVE. This can result in decrementing the reference count for system default policy [should have no ill effect] or premature freeing of task policy. If this occurred, the next allocation using task mempolicy would use the freed structure and probably BUG out. Fix: add the necessary check to the release. 3) The current reference counting method assumes that vma 'get_policy()' methods automatically add an extra reference a non-NULL returned mempolicy. This is true for shmem_get_policy() used by tmpfs mappings, including regular page shm segments. However, SHM_HUGETLB shm's, backed by hugetlbfs, just use the vma policy without the extra reference. This results in freeing of the vma policy on the first allocation, with reuse of the freed mempolicy structure on subsequent allocations. Fix: Rather than add another condition to the conditional reference release, which occur in the allocation path, just add a reference when returning the vma policy in shm_get_policy() to match the assumptions. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Greg KH <greg@kroah.com> Cc: Andi Kleen <ak@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Cc: <eric.whitney@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-10 18:01:19 -07:00
Pavel Emelyanov	56496c1d83	Pidns: fix badly converted mqueues pid handling When sending the pid namespaces patches I wrongly converted the tsk->tgid into task_pid_vnr(tsk) in mqueue-s (the git id of this patch is `b488893a39`). The proper behavior is to get the task_tgid_vnr(tsk). This seem to be the only mistake of that kind. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:29 -08:00
Pavel Emelyanov	6c5f3e7b43	Pidns: make full use of xxx_vnr() calls Some time ago the xxx_vnr() calls (e.g. pid_vnr or find_task_by_vpid) were _all_ converted to operate on the current pid namespace. After this each call like xxx_nr_ns(foo, current->nsproxy->pid_ns) is nothing but a xxx_vnr(foo) one. Switch all the xxx_nr_ns() callers to use the xxx_vnr() calls where appropriate. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:29 -08:00
Pierre Peiffer	01b8b07a5d	IPC: consolidate sem_exit_ns(), msg_exit_ns() and shm_exit_ns() sem_exit_ns(), msg_exit_ns() and shm_exit_ns() are all called when an ipc_namespace is released to free all ipcs of each type. But in fact, they do the same thing: they loop around all ipcs to free them individually by calling a specific routine. This patch proposes to consolidate this by introducing a common function, free_ipcs(), that do the job. The specific routine to call on each individual ipcs is passed as parameter. For this, these ipc-specific 'free' routines are reworked to take a generic 'struct ipc_perm' as parameter. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:26 -08:00
Pierre Peiffer	ed2ddbf88c	IPC: make struct ipc_ids static in ipc_namespace Each ipc_namespace contains a table of 3 pointers to struct ipc_ids (3 for msg, sem and shm, structure used to store all ipcs) These 'struct ipc_ids' are dynamically allocated for each icp_namespace as the ipc_namespace itself (for the init namespace, they are initialized with pointers to static variables instead) It is so for historical reason: in fact, before the use of idr to store the ipcs, the ipcs were stored in tables of variable length, depending of the maximum number of ipc allowed. Now, these 'struct ipc_ids' have a fixed size. As they are allocated in any cases for each new ipc_namespace, there is no gain of memory in having them allocated separately of the struct ipc_namespace. This patch proposes to make this table static in the struct ipc_namespace. Thus, we can allocate all in once and get rid of all the code needed to allocate and free these ipc_ids separately. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:26 -08:00
Pierre Peiffer	4b9fcb0ec6	IPC/semaphores: consolidate SEM_STAT and IPC_STAT commands These commands (SEM_STAT and IPC_STAT) are rather doing the same things (only the meaning of the id given as input and the return value differ). However, for the semaphores, they are handled in two different places (two different functions). This patch consolidates this for clarification by handling these both commands in the same place in semctl_nolock(). It also removes one unused parameter for this function. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:26 -08:00
Pavel Emelyanov	b2d75cddc8	ipc: uninline some code from util.h ipc_lock_check_down(), ipc_lock_check() and ipcget() seem too large to be inline. Besides, they give no optimization being inline as they perform calls inside in any case. Moving them into ipc/util.c saves 500 bytes of vmlinux and shortens IPC internal API. $ ./scripts/bloat-o-meter vmlinux-orig vmlinux add/remove: 3/2 grow/shrink: 0/10 up/down: 490/-989 (-499) function old new delta ipcget - 392 +392 ipc_lock_check_down - 49 +49 ipc_lock_check - 49 +49 sys_semget 119 105 -14 sys_shmget 108 86 -22 sys_msgget 100 78 -22 do_msgsnd 665 631 -34 do_msgrcv 680 644 -36 do_shmat 771 733 -38 sys_msgctl 1302 1229 -73 ipcget_new 80 - -80 sys_semtimedop 1534 1452 -82 sys_semctl 2034 1922 -112 sys_shmctl 1919 1765 -154 ipcget_public 322 - -322 The ipcget() growth is the result of gcc inlining of currently static ipcget_new/_public. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:25 -08:00
Pavel Emelyanov	ae5e1b22f1	namespaces: move the IPC namespace under IPC_NS option Currently the IPC namespace management code is spread over the ipc/*.c files. I moved this code into ipc/namespace.c file which is compiled out when needed. The linux/ipc_namespace.h file is used to store the prototypes of the functions in namespace.c and the stubs for NAMESPACES=n case. This is done so, because the stub for copy_ipc_namespace requires the knowledge of the CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in included into many many .c files via the sys.h->sem.h sequence so adding the sched.h into it will make all these .c depend on sched.h which is not that good. On the other hand the knowledge about the namespaces stuff is required in 4 .c files only. Besides, this patch compiles out some auxiliary functions from ipc/sem.c, msg.c and shm.c files. It turned out that moving these functions into namespaces.c is not that easy because they use many other calls and macros from the original file. Moving them would make this patch complicated. On the other hand all these functions can be consolidated, so I will send a separate patch doing this a bit later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:23 -08:00
Adrian Bunk	b524b9adb3	make ipc/util.c:sysvipc_find_ipc() static sysvipc_find_ipc() can become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:01 -08:00
Pierre Peiffer	b1ed88b47f	IPC: fix error check in all new xxx_lock() and xxx_exit_ns() functions In the new implementation of the [sem\|shm\|msg]_lock[_check]() routines, we use the return value of ipc_lock() in container_of() without any check. But ipc_lock may return a errcode. The use of this errcode in container_of() may alter this errcode, and we don't want this. And in xxx_exit_ns, the pointer return by idr_find is of type 'struct kern_ipc_per'... Today, the code will work as is because the member used in these container_of() is the first member of its container (offset == 0), the errcode isn't changed then. But in the general case, we can't count on this assumption and this may lead later to a real bug if we don't correct this. Again, the proposed solution is simple and correct. But, as pointed by Nadia, with this solution, the same check will be done several times (in all sub-callers...), what is not very funny/optimal... Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:01 -08:00
Pavel Emelyanov	fd79b77117	ipc: lost unlock and fput in mqueue.c on error path The error path in sys_mq_getsetattr() after the call to audit_mq_getsetattr() is wrong - the info->lock is not unlocked and the struct file *filp is not put. Fix them both. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Pierre Peiffer <pierre.peiffer@bull.net> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-29 09:24:52 -08:00
Patrick McHardy	c3d8d1e30c	[NETLINK]: Fix unicast timeouts Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts by moving the schedule_timeout() call to a new function that doesn't propagate the remaining timeout back to the caller. This means on each retry we start with the full timeout again. ipc/mqueue.c seems to actually want to wait indefinitely so this behaviour is retained. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:15:12 -08:00
Al Viro	5a190ae697	[PATCH] pass dentry to audit_inode()/audit_inode_child() makes caller simpler and allows to scan ancestors Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-10-21 02:37:18 -04:00
Pierre Peiffer	283bb7fada	IPC: fix error case when idr-cache is empty in ipcget() With the use of idr to store the ipc, the case where the idr cache is empty, when idr_get_new is called (this may happen even if we call idr_pre_get() before), is not well handled: it lets semget()/shmget()/msgget() return ENOSPC when this cache is empty, what 1. does not reflect the facts and 2. does not conform to the man(s). This patch fixes this by retrying the whole process of allocation in this case. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:49 -07:00
Kirill Korotaev	3ac88a41ff	virtualization of sysv msg queues is incomplete Virtualization of sysv msg queues is incomplete: msg_hdrs and msg_bytes variables visible from userspace are global. Let's make them per-namespace. Signed-off-by: Alexey Kuznetsov <alexey@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Cc: Pierre Peiffer <pierre.peiffer@bull.net> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Serge Hallyn <serue@us.ibm.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:48 -07:00
Pierre Peiffer	c530c6ac7e	IPC: cleanup some code and wrong comments about semundo list managment Some comments about sem_undo_list seem wrong. About the comment above unlock_semundo: "... If task2 now exits before task1 releases the lock (by calling unlock_semundo()), then task1 will never call spin_unlock(). ..." This is just wrong, I see no reason for which task1 will not call spin_unlock... The rest of this comment is also wrong... Unless I miss something (of course). Finally, (un)lock_semundo functions are useless, so remove them for simplification. (this avoids an useless if statement) Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:48 -07:00
Nadia Derbey	1b531f2136	ipc: remove unneeded parameters Remvoe the unneeded parameters from ipc_checkid() and ipc_buildid() interfaces. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:48 -07:00
Nadia Derbey	3e148c7993	fix idr_find() locking This is a patch that fixes the way idr_find() used to be called in ipc_lock(): in all the paths that don't imply an update of the ipcs idr, it was called without the idr tree being locked. The changes are: . in ipc_ids, the mutex has been changed into a reader/writer semaphore. . ipc_lock() now takes the mutex as a reader during the idr_find(). . a new routine ipc_lock_down() has been defined: it doesn't take the mutex, assuming that it is being held by the caller. This is the routine that is now called in all the update paths. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Acked-by: Jarek Poplawski <jarkao2@o2.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:48 -07:00
Nadia Derbey	f4566f0485	ipc: fix wrong comments This patch fixes the wrong / obsolete comments in the ipc code. Also adds a missing lock around ipc_get_maxid() in shm_get_stat(). Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:47 -07:00
Nadia Derbey	2802831313	ipc: inline ipc_buildid() This is a trivial patch that changes the ipc_buildid() routine into a static inline. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:47 -07:00
Nadia Derbey	ce621f5ba5	ipc: introduce the ipcid_to_idx macro This is a trivial patch that changes all the (id % SEQ_MULTIPLIER) into a call to the ipcid_to_idx(id) macro. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:46 -07:00
Nadia Derbey	03f02c7657	Storing ipcs into IDRs This patch converts casts of struct kern_ipc_perm to . struct msg_queue . struct sem_array . struct shmid_kernel into the equivalent container_of() macro. It improves code maintenance because the code need not change if kern_ipc_perm is no longer at the beginning of the containing struct. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:46 -07:00
Nadia Derbey	023a53557e	ipc: integrate ipc_checkid() into ipc_lock() This patch introduces a new ipc_lock_check() routine interface: . each time ipc_checkid() is called, this is done after calling ipc_lock(). ipc_checkid() is now called from inside ipc_lock_check(). [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix RCU locking] Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:44 -07:00
Nadia Derbey	637c366340	ipc: remove the ipc_get() routine This is a trivial patch that removes the ipc_get() routine: it is replaced by a call to idr_find(). Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:44 -07:00
Nadia Derbey	7748dbfaa0	ipc: unify the syscalls code This patch introduces a change into the sys_msgget(), sys_semget() and sys_shmget() routines: they now share a common code, which is better for maintainability. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:44 -07:00
Nadia Derbey	7ca7e564e0	ipc: store ipcs into IDRs This patch introduces ipcs storage into IDRs. The main changes are: . This ipc_ids structure is changed: the entries array is changed into a root idr structure. . The grow_ary() routine is removed: it is not needed anymore when adding an ipc structure, since we are now using the IDR facility. . The ipc_rmid() routine interface is changed: . there is no need for this routine to return the pointer passed in as argument: it is now declared as a void . since the id is now part of the kern_ipc_perm structure, no need to have it as an argument to the routine Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:44 -07:00
Pavel Emelyanov	b488893a39	pid namespaces: changes to show virtual ids to user This is the largest patch in the set. Make all (I hope) the places where the pid is shown to or get from user operate on the virtual pids. The idea is: - all in-kernel data structures must store either struct pid itself or the pid's global nr, obtained with pid_nr() call; - when seeking the task from kernel code with the stored id one should use find_task_by_pid() call that works with global pids; - when showing pid's numerical value to the user the virtual one should be used, but however when one shows task's pid outside this task's namespace the global one is to be used; - when getting the pid from userspace one need to consider this as the virtual one and use appropriate task/pid-searching functions. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: nuther build fix] [akpm@linux-foundation.org: yet nuther build fix] [akpm@linux-foundation.org: remove unneeded casts] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:40 -07:00
Eric W. Biederman	97aeacf492	sysctl mqueue: remove the binary sysctl numbers Because of a conflict with FS_INODE_NR none of the binary sysctl numbers use by mqueue, were available to user space. So just remove them. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:22 -07:00
Dave Hansen	ce8d2cdf3d	r/o bind mounts: filesystem helpers for custom 'struct file's Why do we need r/o bind mounts? This feature allows a read-only view into a read-write filesystem. In the process of doing that, it also provides infrastructure for keeping track of the number of writers to any given mount. This has a number of uses. It allows chroots to have parts of filesystems writable. It will be useful for containers in the future because users may have root inside a container, but should not be allowed to write to somefilesystems. This also replaces patches that vserver has had out of the tree for several years. It allows security enhancement by making sure that parts of your filesystem read-only (such as when you don't trust your FTP server), when you don't want to have entire new filesystems mounted, or when you want atime selectively updated. I've been using the following script to test that the feature is working as desired. It takes a directory and makes a regular bind and a r/o bind mount of it. It then performs some normal filesystem operations on the three directories, including ones that are expected to fail, like creating a file on the r/o mount. This patch: Some filesystems forego the vfs and may_open() and create their own 'struct file's. This patch creates a couple of helper functions which can be used by these filesystems, and will provide a unified place which the r/o bind mount code may patch. Also, rename an existing, static-scope init_file() to a less generic name. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:04 -07:00
Cedric Le Goater	a08b4be74c	ipc namespace: remove config ipc ns fix Finish the work : kill all #ifdef CONFIG_IPC_NS. Thanks Robert ! Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Eric Biederman <ebiederm@xmision.com> Cc: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:57 -07:00
Adrian Bunk	d823e3e754	ipc/shm.c: make 2 functions static This patch makes two needlessly global functions static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:50 -07:00
Christoph Lameter	4ba9b9d0ba	Slab API: remove useless ctor parameter and reorder parameters Slab constructors currently have a flags parameter that is never used. And the order of the arguments is opposite to other slab functions. The object pointer is placed before the kmem_cache pointer. Convert ctor(void object, struct kmem_cache s, unsigned long flags) to ctor(struct kmem_cache s, void object) throughout the kernel [akpm@linux-foundation.org: coupla fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:45 -07:00
Denis V. Lunev	7ee015e0fa	[NET]: cleanup 3rd argument in netlink_sendskb netlink_sendskb does not use third argument. Clean it and save a couple of bytes. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 21:14:03 -07:00
Pavel Emelianov	7be77e20d5	Fix user struct leakage with locked IPC shem segment When user locks an ipc shmem segmant with SHM_LOCK ctl and the segment is already locked the shmem_lock() function returns 0. After this the subsequent code leaks the existing user struct: == ipc/shm.c: sys_shmctl() == ... err = shmem_lock(shp->shm_file, 1, user); if (!err) { shp->shm_perm.mode \|= SHM_LOCKED; shp->mlock_user = user; } ... == Other results of this are: 1. the new shp->mlock_user is not get-ed and will point to freed memory when the task dies. 2. the RLIMIT_MEMLOCK is screwed on both user structs. The exploit looks like this: == id = shmget(...); setresuid(uid, 0, 0); shmctl(id, SHM_LOCK, NULL); setresuid(uid + 1, 0, 0); shmctl(id, SHM_LOCK, NULL); == My solution is to return 0 to the userspace and do not change the segment's user. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:40 -07:00
David Howells	2e92a3baee	NOMMU: Fix SYSV IPC SHM Fix the SYSV IPC SHM to work with the changes applied by the new fault handler patches when CONFIG_MMU=n. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:36 -07:00
Paul Mundt	20c2df83d2	mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's `c59def9f22` change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 10:11:58 +09:00
Nick Piggin	d0217ac04c	mm: fault feedback #1 Change ->fault prototype. We now return an int, which contains VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte. FAULT_RET_ code tells the VM whether a page was found, whether it has been locked, and potentially other things. This is not quite the way he wanted it yet, but that's changed in the next patch (which requires changes to arch code). This means we no longer set VM_CAN_INVALIDATE in the vma in order to say that a page is locked which requires filemap_nopage to go away (because we can no longer remain backward compatible without that flag), but we were going to do that anyway. struct fault_data is renamed to struct vm_fault as Linus asked. address is now a void __user * that we should firmly encourage drivers not to use without really good reason. The page is now returned via a page pointer in the vm_fault struct. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:41 -07:00
Nick Piggin	54cb8821de	mm: merge populate and nopage into fault (fixes nonlinear) Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes the virtual address -> file offset differently from linear mappings. ->populate is a layering violation because the filesystem/pagecache code should need to know anything about the virtual memory mapping. The hitch here is that the ->nopage handler didn't pass down enough information (ie. pgoff). But it is more logical to pass pgoff rather than have the ->nopage function calculate it itself anyway (because that's a similar layering violation). Having the populate handler install the pte itself is likewise a nasty thing to be doing. This patch introduces a new fault handler that replaces ->nopage and ->populate and (later) ->nopfn. Most of the old mechanism is still in place so there is a lot of duplication and nice cleanups that can be removed if everyone switches over. The rationale for doing this in the first place is that nonlinear mappings are subject to the pagefault vs invalidate/truncate race too, and it seemed stupid to duplicate the synchronisation logic rather than just consolidate the two. After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in pagecache. Seems like a fringe functionality anyway. NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no users have hit mainline yet. [akpm@linux-foundation.org: cleanup] [randy.dunlap@oracle.com: doc. fixes for readahead] [akpm@linux-foundation.org: build fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:41 -07:00
Jeff Garzik	8e1c091ccc	arch/i386/* fs/* ipc/*: mark variables with uninitialized_var() Mark variables with uninitialized_var() if such a warning appears, and analysis proves that the var is initialized properly on all paths it is used. Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-07-17 16:23:19 -04:00
Cedric Le Goater	7d69a1f4a7	remove CONFIG_UTS_NS and CONFIG_IPC_NS CONFIG_UTS_NS and CONFIG_IPC_NS have very little value as they only deactivate the unshare of the uts and ipc namespaces and do not improve performance. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Acked-by: "Serge E. Hallyn" <serue@us.ibm.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:47 -07:00
Alexander Graf	d57d973101	fix logic error in ipc compat semctl() When calling a semctl(IPC_STAT) without IPC_64 the check if the memory is unevaluated. This patch fixes this. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-06 10:23:43 -07:00
Eric W. Biederman	9d66586f77	shm: fix the filename of hugetlb sysv shared memory Some user space tools need to identify SYSV shared memory when examining /proc/<pid>/maps. To do so they look for a block device with major zero, a dentry named SYSV<sysv key>, and having the minor of the internal sysv shared memory kernel mount. To help these tools and to make it easier for people just browsing /proc/<pid>/maps this patch modifies hugetlb sysv shared memory to use the SYSV<key> dentry naming convention. User space tools will still have to be aware that hugetlb sysv shared memory lives on a different internal kernel mount and so has a different block device minor number from the rest of sysv shared memory. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Albert Cahalan <acahalan@gmail.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-16 13:16:16 -07:00
Adam Litke	22741925d2	hugetlb: fix get_policy for stacked shared memory files Here's another breakage as a result of shared memory stacked files :( The NUMA policy for a VMA is determined by checking the following (in the order given): 1) vma->vm_ops->get_policy() (if defined) 2) vma->vm_policy (if defined) 3) task->mempolicy (if defined) 4) Fall back to default_policy By switching to stacked files for shared memory, get_policy() is now always set to shm_get_policy which is a wrapper function. This causes us to stop at step 1, which yields NULL for hugetlb instead of task->mempolicy which was the previous (and correct) result. This patch modifies the shm_get_policy() wrapper to maintain steps 1-3 for the wrapped vm_ops. (akpm: the refcounting of mempolicies is busted and this patch does nothing to improve it) Signed-off-by: Adam Litke <agl@us.ibm.com> Acked-by: William Irwin <bill.irwin@oracle.com> Cc: dean gaudet <dean@arctic.org> Cc: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-16 13:16:16 -07:00
Badari Pulavarty	30475cc12a	Restore shmid as inode# to fix /proc/pid/maps ABI breakage shmid used to be stored as inode# for shared memory segments. Some of the proc-ps tools use this from /proc/pid/maps. Recent cleanups to newseg() changed it. This patch sets inode number back to shared memory id to fix breakage. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Cc: "Albert Cahalan" <acahalan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-16 13:16:15 -07:00
Christoph Lameter	a35afb830f	Remove SLAB_CTOR_CONSTRUCTOR SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: David Howells <dhowells@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@ucw.cz> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-17 05:23:04 -07:00
Amy Griffis	4fc03b9beb	[PATCH] complete message queue auditing Handle the edge cases for POSIX message queue auditing. Collect inode info when opening an existing mq, and for send/receive operations. Remove audit_inode_update() as it has really evolved into the equivalent of audit_inode(). Signed-off-by: Amy Griffis <amy.griffis@hp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-05-11 05:38:26 -04:00
Randy Dunlap	e63340ae6b	header cleaning: don't include smp_lock.h when not used Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:07 -07:00
Badari Pulavarty	e3222c4ecc	Merge sys_clone()/sys_unshare() nsproxy and namespace handling sys_clone() and sys_unshare() both makes copies of nsproxy and its associated namespaces. But they have different code paths. This patch merges all the nsproxy and its associated namespace copy/clone handling (as much as possible). Posted on container list earlier for feedback. - Create a new nsproxy and its associated namespaces and pass it back to caller to attach it to right process. - Changed all copy__ns() routines to return a new copy of namespace instead of attaching it to task->nsproxy. - Moved the CAP_SYS_ADMIN checks out of copy__ns() routines. - Removed unnessary !ns checks from copy__ns() and added BUG_ON() just incase. - Get rid of all individual unshare__ns() routines and make use of copy_*_ns() instead. [akpm@osdl.org: cleanups, warning fix] [clg@fr.ibm.com: remove dup_namespaces() declaration] [serue@us.ibm.com: fix CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) retval] [akpm@linux-foundation.org: fix build with CONFIG_SYSVIPC=n] Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <containers@lists.osdl.org> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:00 -07:00
Guy Streeter	af7c693f14	Cap shmmax at INT_MAX in compat shminfo The value of shmmax may be larger than will fit in the struct used by the 32bit compat version of sys_shmctl. This change mirrors what the normal sys_shmctl does when called with the old IPC_INFO command. Signed-off-by: Guy Streeter <streeter@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:00 -07:00
Christoph Lameter	50953fe9e0	slab allocators: Remove SLAB_DEBUG_INITIAL flag I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by SLAB. I think its purpose was to have a callback after an object has been freed to verify that the state is the constructor state again? The callback is performed before each freeing of an object. I would think that it is much easier to check the object state manually before the free. That also places the check near the code object manipulation of the object. Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was compiled with SLAB debugging on. If there would be code in a constructor handling SLAB_DEBUG_INITIAL then it would have to be conditional on SLAB_DEBUG otherwise it would just be dead code. But there is no such code in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real use of, difficult to understand and there are easier ways to accomplish the same effect (i.e. add debug code before kfree). There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be clear in fs inode caches. Remove the pointless checks (they would even be pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors. This is the last slab flag that SLUB did not support. Remove the check for unimplemented flags from SLUB. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:57 -07:00
Serge E. Hallyn	a28d193cbf	[PATCH] ipcns: fix !CONFIG_IPC_NS behavior When CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) claims success, but did not actually clone a new IPC namespace. Fix this to return -EINVAL so the caller knows his request was denied. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-27 09:05:16 -07:00
Peter Zijlstra	7a434814c7	[PATCH] mqueue: nested locking annotation Fix http://bugzilla.kernel.org/show_bug.cgi?id=8130 Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-06 09:30:25 -08:00
Adam Litke	516dffdcd8	[PATCH] Fix get_unmapped_area and fsync for hugetlb shm segments This patch provides the following hugetlb-related fixes to the recent stacked shm files changes: - Update is_file_hugepages() so it will reconize hugetlb shm segments. - get_unmapped_area must be called with the nested file struct to handle the sfd->file->f_ops->get_unmapped_area == NULL case. - The fsync f_op must be wrapped since it is specified in the hugetlbfs f_ops. This is based on proposed fixes from Eric Biederman that were debugged and tested by me. Without it, attempting to use hugetlb shared memory segments on powerpc (and likely ia64) will kill your box. Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: William Irwin <bill.irwin@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-01 17:18:39 -08:00
Adrian Bunk	de01bad2f5	[PATCH] make ipc/shm.c:shm_nopage() static shm_nopage() can become static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-01 14:53:35 -08:00
Eric W. Biederman	bc56bba8f3	[PATCH] shm: make sysv ipc shared memory use stacked files The current ipc shared memory code runs into several problems because it does not quite use files like the rest of the kernel. With the option of backing ipc shared memory with either hugetlbfs or ordinary shared memory the problems got worse. With the added support for ipc namespaces things behaved so unexpected that we now have several bad namespace reference counting bugs when using what appears at first glance to be a reasonable idiom. So to attack these problems and hopefully make the code more maintainable this patch simply uses the files provided by other parts of the kernel and builds it's own files out of them. The shm files are allocated in do_shmat and freed when their reference count drops to zero with their last unmap. The file and vm operations that we don't want to implement or we don't implement completely we just delegate to the operations of our backing file. This means that we now get an accurate shm_nattch count for we have a hugetlbfs inode for backing store, and the shm accounting of last attach and last detach time work as well. This means that getting a reference to the ipc namespace when we create the file and dropping the referenece in the release method is now safe and correct. This means we no longer need a special case for clearing VM_MAYWRITE as our file descriptor now only has write permissions when we have requested write access when calling shmat. Although VM_SHARED is now cleared as well which I believe is harmless and is mostly likely a minor bug fix. By using the same set of operations for both the hugetlb case and regular shared memory case shmdt is not simplified and made slightly more correct as now the test "vma->vm_ops == &shm_vm_ops" is 100% accurate in spotting all shared memory regions generated from sysvipc shared memory. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-20 17:10:13 -08:00
Eric W. Biederman	0b4d414714	[PATCH] sysctl: remove insert_at_head from register_sysctl The semantic effect of insert_at_head is that it would allow new registered sysctl entries to override existing sysctl entries of the same name. Which is pain for caching and the proc interface never implemented. I have done an audit and discovered that none of the current users of register_sysctl care as (excpet for directories) they do not register duplicate sysctl entries. So this patch simply removes the support for overriding existing entries in the sys_sysctl interface since no one uses it or cares and it makes future enhancments harder. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andi Kleen <ak@muc.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Corey Minyard <minyard@acm.org> Cc: Neil Brown <neilb@suse.de> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jan Kara <jack@ucw.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Eric W. Biederman	a5494dcd8b	[PATCH] sysctl: move SYSV IPC sysctls to their own file This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded with special cases, and by keeping all of the ipc logic to together it makes the code a little more readable. [gcoady.lk@gmail.com: build fix] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Grant Coady <gcoady.lk@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Arjan van de Ven	92e1d5be91	[PATCH] mark struct inode_operations const 2 Many struct inode_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:46 -08:00
Arjan van de Ven	9a32144e9d	[PATCH] mark struct file_operations const 7 Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:46 -08:00
Eric W. Biederman	bc1fc6d88c	[PATCH] ipc: save the ipc namespace while reading proc files The problem we were assuming that current->nsproxy->ipc_ns would never change while someone has our file in /proc/sysvipc/ file open. Given that this can change with both unshare and by passing the file descriptor to another process that assumption is occasionally wrong. Therefore this patch causes /proc/sysvipc/* to cache the namespace and increment it's count when we open the file and to decrement the count when we close the file, ensuring consistent operation with no surprises. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:29 -08:00
Robert P. J. Day	72fd4a35a8	[PATCH] Numerous fixes to kernel-doc info in source files. A variety of (mostly) innocuous fixes to the embedded kernel-doc content in source files, including: * make multi-line initial descriptions single line * denote some function names, constants and structs as such * change erroneous opening '/' to '/' in a few places reword some text for clarity Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:32 -08:00
Guy Streeter	f66d45e99e	[PATCH] correct sys_shmget allocation check As written, sys_shmget will return ENOSPC when one page is still available for allocation. This patch corrects the test. Signed-off-by: Guy Streeter <guy.streeter+lkml@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> --	2007-01-23 11:18:50 -08:00
Robert P. J. Day	5cbded585d	[PATCH] getting rid of all casts of k[cmz]alloc() calls Run this: #!/bin/sh for f in $(grep -Erl "$[^$]\) k[cmz]alloc" ) ; do echo "De-casting $f..." perl -pi -e "s/ ?= ?$[^$]\) (k[cmz]alloc) \(/ = \1\(/" $f done And then go through and reinstate those cases where code is casting pointers to non-pointers. And then drop a few hunks which conflicted with outstanding work. Cc: Russell King <rmk@arm.linux.org.uk>, Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Greg KH <greg@kroah.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Paul Fulghum <paulkf@microgate.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Karsten Keil <kkeil@suse.de> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Ian Kent <raven@themaw.net> Cc: Steven French <sfrench@us.ibm.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Neil Brown <neilb@cse.unsw.edu.au> Cc: Jaroslav Kysela <perex@suse.cz> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:58 -08:00
Josef Sipek	6d63079add	[PATCH] struct path: convert ipc Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:28:46 -08:00
Burman Yan	4668edc334	[PATCH] kernel core: replace kmalloc+memset with kzalloc Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:41 -08:00
suzuki	651971cb72	[PATCH] Fix the size limit of compat space msgsize Currently we allocate 64k space on the user stack and use it the msgbuf for sys_{msgrcv,msgsnd} for compat and the results are later copied in user [ by copy_in_user]. This patch introduces helper routines for sys_{msgrcv,msgsnd} as below: do_msgsnd() : Accepts the mtype and user space ptr to the buffer along with the msqid and msgflg. do_msgrcv() : Accepts a kernel space ptr to mtype and a userspace ptr to the buffer. The mtype has to be copied back the user space msgbuf by the caller. These changes avoid the need to allocate the msgsize on the userspace ( thus removing the size limt ) and the overhead of an extra copy_in_user(). Signed-off-by: Suzuki K P <suzuki@in.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:38 -08:00
Christoph Lameter	e18b890bb0	[PATCH] slab: remove kmem_cache_t Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name ".c" -o -name ".h"\|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:25 -08:00
Christoph Lameter	e94b176609	[PATCH] slab: remove SLAB_KERNEL SLAB_KERNEL is an alias of GFP_KERNEL. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:24 -08:00
David Howells	65f27f3844	WorkStruct: Pass the work_struct pointer instead of context data Pass the work_struct pointer to the work function rather than context data. The work function can use container_of() to work out the data. For the cases where the container of the work_struct may go away the moment the pending bit is cleared, it is made possible to defer the release of the structure by deferring the clearing of the pending bit. To make this work, an extra flag is introduced into the management side of the work_struct. This governs auto-release of the structure upon execution. Ordinarily, the work queue executor would release the work_struct for further scheduling or deallocation by clearing the pending bit prior to jumping to the work function. This means that, unless the driver makes some guarantee itself that the work_struct won't go away, the work function may not access anything else in the work_struct or its container lest they be deallocated.. This is a problem if the auxiliary data is taken away (as done by the last patch). However, if the pending bit is not cleared before jumping to the work function, then the work function may access the work_struct and its container with no problems. But then the work function must itself release the work_struct by calling work_release(). In most cases, automatic release is fine, so this is the default. Special initiators exist for the non-auto-release case (ending in _NAR). Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:55:48 +00:00

1 2 3 4 5

215 Commits