linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-18 10:01:43 +00:00

Author	SHA1	Message	Date
Frank Mayhar	f06febc96b	timers: fix itimer/many thread hang Overview This patch reworks the handling of POSIX CPU timers, including the ITIMER_PROF, ITIMER_VIRT timers and rlimit handling. It was put together with the help of Roland McGrath, the owner and original writer of this code. The problem we ran into, and the reason for this rework, has to do with using a profiling timer in a process with a large number of threads. It appears that the performance of the old implementation of run_posix_cpu_timers() was at least O(n3) (where "n" is the number of threads in a process) or worse. Everything is fine with an increasing number of threads until the time taken for that routine to run becomes the same as or greater than the tick time, at which point things degrade rather quickly. This patch fixes bug 9906, "Weird hang with NPTL and SIGPROF." Code Changes This rework corrects the implementation of run_posix_cpu_timers() to make it run in constant time for a particular machine. (Performance may vary between one machine and another depending upon whether the kernel is built as single- or multiprocessor and, in the latter case, depending upon the number of running processors.) To do this, at each tick we now update fields in signal_struct as well as task_struct. The run_posix_cpu_timers() function uses those fields to make its decisions. We define a new structure, "task_cputime," to contain user, system and scheduler times and use these in appropriate places: struct task_cputime { cputime_t utime; cputime_t stime; unsigned long long sum_exec_runtime; }; This is included in the structure "thread_group_cputime," which is a new substructure of signal_struct and which varies for uniprocessor versus multiprocessor kernels. For uniprocessor kernels, it uses "task_cputime" as a simple substructure, while for multiprocessor kernels it is a pointer: struct thread_group_cputime { struct task_cputime totals; }; struct thread_group_cputime { struct task_cputime totals; }; We also add a new task_cputime substructure directly to signal_struct, to cache the earliest expiration of process-wide timers, and task_cputime also replaces the it_*_expires fields of task_struct (used for earliest expiration of thread timers). The "thread_group_cputime" structure contains process-wide timers that are updated via account_user_time() and friends. In the non-SMP case the structure is a simple aggregator; unfortunately in the SMP case that simplicity was not achievable due to cache-line contention between CPUs (in one measured case performance was actually _worse_ on a 16-cpu system than the same test on a 4-cpu system, due to this contention). For SMP, the thread_group_cputime counters are maintained as a per-cpu structure allocated using alloc_percpu(). The timer functions update only the timer field in the structure corresponding to the running CPU, obtained using per_cpu_ptr(). We define a set of inline functions in sched.h that we use to maintain the thread_group_cputime structure and hide the differences between UP and SMP implementations from the rest of the kernel. The thread_group_cputime_init() function initializes the thread_group_cputime structure for the given task. The thread_group_cputime_alloc() is a no-op for UP; for SMP it calls the out-of-line function thread_group_cputime_alloc_smp() to allocate and fill in the per-cpu structures and fields. The thread_group_cputime_free() function, also a no-op for UP, in SMP frees the per-cpu structures. The thread_group_cputime_clone_thread() function (also a UP no-op) for SMP calls thread_group_cputime_alloc() if the per-cpu structures haven't yet been allocated. The thread_group_cputime() function fills the task_cputime structure it is passed with the contents of the thread_group_cputime fields; in UP it's that simple but in SMP it must also safely check that tsk->signal is non-NULL (if it is it just uses the appropriate fields of task_struct) and, if so, sums the per-cpu values for each online CPU. Finally, the three functions account_group_user_time(), account_group_system_time() and account_group_exec_runtime() are used by timer functions to update the respective fields of the thread_group_cputime structure. Non-SMP operation is trivial and will not be mentioned further. The per-cpu structure is always allocated when a task creates its first new thread, via a call to thread_group_cputime_clone_thread() from copy_signal(). It is freed at process exit via a call to thread_group_cputime_free() from cleanup_signal(). All functions that formerly summed utime/stime/sum_sched_runtime values from from all threads in the thread group now use thread_group_cputime() to snapshot the values in the thread_group_cputime structure or the values in the task structure itself if the per-cpu structure hasn't been allocated. Finally, the code in kernel/posix-cpu-timers.c has changed quite a bit. The run_posix_cpu_timers() function has been split into a fast path and a slow path; the former safely checks whether there are any expired thread timers and, if not, just returns, while the slow path does the heavy lifting. With the dedicated thread group fields, timers are no longer "rebalanced" and the process_timer_rebalance() function and related code has gone away. All summing loops are gone and all code that used them now uses the thread_group_cputime() inline. When process-wide timers are set, the new task_cputime structure in signal_struct is used to cache the earliest expiration; this is checked in the fast path. Performance The fix appears not to add significant overhead to existing operations. It generally performs the same as the current code except in two cases, one in which it performs slightly worse (Case 5 below) and one in which it performs very significantly better (Case 2 below). Overall it's a wash except in those two cases. I've since done somewhat more involved testing on a dual-core Opteron system. Case 1: With no itimer running, for a test with 100,000 threads, the fixed kernel took 1428.5 seconds, 513 seconds more than the unfixed system, all of which was spent in the system. There were twice as many voluntary context switches with the fix as without it. Case 2: With an itimer running at .01 second ticks and 4000 threads (the most an unmodified kernel can handle), the fixed kernel ran the test in eight percent of the time (5.8 seconds as opposed to 70 seconds) and had better tick accuracy (.012 seconds per tick as opposed to .023 seconds per tick). Case 3: A 4000-thread test with an initial timer tick of .01 second and an interval of 10,000 seconds (i.e. a timer that ticks only once) had very nearly the same performance in both cases: 6.3 seconds elapsed for the fixed kernel versus 5.5 seconds for the unfixed kernel. With fewer threads (eight in these tests), the Case 1 test ran in essentially the same time on both the modified and unmodified kernels (5.2 seconds versus 5.8 seconds). The Case 2 test ran in about the same time as well, 5.9 seconds versus 5.4 seconds but again with much better tick accuracy, .013 seconds per tick versus .025 seconds per tick for the unmodified kernel. Since the fix affected the rlimit code, I also tested soft and hard CPU limits. Case 4: With a hard CPU limit of 20 seconds and eight threads (and an itimer running), the modified kernel was very slightly favored in that while it killed the process in 19.997 seconds of CPU time (5.002 seconds of wall time), only .003 seconds of that was system time, the rest was user time. The unmodified kernel killed the process in 20.001 seconds of CPU (5.014 seconds of wall time) of which .016 seconds was system time. Really, though, the results were too close to call. The results were essentially the same with no itimer running. Case 5: With a soft limit of 20 seconds and a hard limit of 2000 seconds (where the hard limit would never be reached) and an itimer running, the modified kernel exhibited worse tick accuracy than the unmodified kernel: .050 seconds/tick versus .028 seconds/tick. Otherwise, performance was almost indistinguishable. With no itimer running this test exhibited virtually identical behavior and times in both cases. In times past I did some limited performance testing. those results are below. On a four-cpu Opteron system without this fix, a sixteen-thread test executed in 3569.991 seconds, of which user was 3568.435s and system was 1.556s. On the same system with the fix, user and elapsed time were about the same, but system time dropped to 0.007 seconds. Performance with eight, four and one thread were comparable. Interestingly, the timer ticks with the fix seemed more accurate: The sixteen-thread test with the fix received 149543 ticks for 0.024 seconds per tick, while the same test without the fix received 58720 for 0.061 seconds per tick. Both cases were configured for an interval of 0.01 seconds. Again, the other tests were comparable. Each thread in this test computed the primes up to 25,000,000. I also did a test with a large number of threads, 100,000 threads, which is impossible without the fix. In this case each thread computed the primes only up to 10,000 (to make the runtime manageable). System time dominated, at 1546.968 seconds out of a total 2176.906 seconds (giving a user time of 629.938s). It received 147651 ticks for 0.015 seconds per tick, still quite accurate. There is obviously no comparable test without the fix. Signed-off-by: Frank Mayhar <fmayhar@google.com> Cc: Roland McGrath <roland@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-14 16:25:35 +02:00
Eric Paris	8e531af90f	SELinux: memory leak in security_context_to_sid_core Fix a bug and a philosophical decision about who handles errors. security_context_to_sid_core() was leaking a context in the common case. This was causing problems on fedora systems which recently have started making extensive use of this function. In discussion it was decided that if string_to_context_struct() had an error it was its own responsibility to clean up any mess it created along the way. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-09-04 08:35:13 +10:00
David Howells	5cd9c58fbe	security: Fix setting of PF_SUPERPRIV by __capable() Fix the setting of PF_SUPERPRIV by __capable() as it could corrupt the flags the target process if that is not the current process and it is trying to change its own flags in a different way at the same time. __capable() is using neither atomic ops nor locking to protect t->flags. This patch removes __capable() and introduces has_capability() that doesn't set PF_SUPERPRIV on the process being queried. This patch further splits security_ptrace() in two: (1) security_ptrace_may_access(). This passes judgement on whether one process may access another only (PTRACE_MODE_ATTACH for ptrace() and PTRACE_MODE_READ for /proc), and takes a pointer to the child process. current is the parent. (2) security_ptrace_traceme(). This passes judgement on PTRACE_TRACEME only, and takes only a pointer to the parent process. current is the child. In Smack and commoncap, this uses has_capability() to determine whether the parent will be permitted to use PTRACE_ATTACH if normal checks fail. This does not set PF_SUPERPRIV. Two of the instances of __capable() actually only act on current, and so have been changed to calls to capable(). Of the places that were using __capable(): (1) The OOM killer calls __capable() thrice when weighing the killability of a process. All of these now use has_capability(). (2) cap_ptrace() and smack_ptrace() were using __capable() to check to see whether the parent was allowed to trace any process. As mentioned above, these have been split. For PTRACE_ATTACH and /proc, capable() is now used, and for PTRACE_TRACEME, has_capability() is used. (3) cap_safe_nice() only ever saw current, so now uses capable(). (4) smack_setprocattr() rejected accesses to tasks other than current just after calling __capable(), so the order of these two tests have been switched and capable() is used instead. (5) In smack_file_send_sigiotask(), we need to allow privileged processes to receive SIGIO on files they're manipulating. (6) In smack_task_wait(), we let a process wait for a privileged process, whether or not the process doing the waiting is privileged. I've tested this with the LTP SELinux and syscalls testscripts. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>	2008-08-14 22:59:43 +10:00
Eric Paris	383795c206	SELinux: /proc/mounts should show what it can Given a hosed SELinux config in which a system never loads policy or disables SELinux we currently just return -EINVAL for anyone trying to read /proc/mounts. This is a configuration problem but we can certainly be more graceful. This patch just ignores -EINVAL when displaying LSM options and causes /proc/mounts display everything else it can. If policy isn't loaded the obviously there are no options, so we aren't really loosing any information here. This is safe as the only other return of EINVAL comes from security_sid_to_context_core() in the case of an invalid sid. Even if a FS was mounted with a now invalidated context that sid should have been remapped to unlabeled and so we won't hit the EINVAL and will work like we should. (yes, I tested to make sure it worked like I thought) Signed-off-by: Eric Paris <eparis@redhat.com> Tested-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-30 08:31:28 +10:00
Linus Torvalds	4836e30078	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (39 commits) [PATCH] fix RLIM_NOFILE handling [PATCH] get rid of corner case in dup3() entirely [PATCH] remove remaining namei_{32,64}.h crap [PATCH] get rid of indirect users of namei.h [PATCH] get rid of __user_path_lookup_open [PATCH] f_count may wrap around [PATCH] dup3 fix [PATCH] don't pass nameidata to __ncp_lookup_validate() [PATCH] don't pass nameidata to gfs2_lookupi() [PATCH] new (local) helper: user_path_parent() [PATCH] sanitize __user_walk_fd() et.al. [PATCH] preparation to __user_walk_fd cleanup [PATCH] kill nameidata passing to permission(), rename to inode_permission() [PATCH] take noexec checks to very few callers that care Re: [PATCH 3/6] vfs: open_exec cleanup [patch 4/4] vfs: immutable inode checking cleanup [patch 3/4] fat: dont call notify_change [patch 2/4] vfs: utimes cleanup [patch 1/4] vfs: utimes: move owner check into inode_change_ok() [PATCH] vfs: use kstrdup() and check failing allocation ...	2008-07-26 20:23:44 -07:00
Linus Torvalds	2284284281	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: netns: fix ip_rt_frag_needed rt_is_expired netfilter: nf_conntrack_extend: avoid unnecessary "ct->ext" dereferences netfilter: fix double-free and use-after free netfilter: arptables in netns for real netfilter: ip{,6}tables_security: fix future section mismatch selinux: use nf_register_hooks() netfilter: ebtables: use nf_register_hooks() Revert "pkt_sched: sch_sfq: dump a real number of flows" qeth: use dev->ml_priv instead of dev->priv syncookies: Make sure ECN is disabled net: drop unused BUG_TRAP() net: convert BUG_TRAP to generic WARN_ON drivers/net: convert BUG_TRAP to generic WARN_ON	2008-07-26 20:17:56 -07:00
Al Viro	b77b0646ef	[PATCH] pass MAY_OPEN to vfs_permission() explicitly ... and get rid of the last "let's deduce mask from nameidata->flags" bit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:22 -04:00
Alexey Dobriyan	6c5a9d2e15	selinux: use nf_register_hooks() Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 17:48:15 -07:00
Roland McGrath	0d094efeb1	tracehook: tracehook_tracer_task This adds the tracehook_tracer_task() hook to consolidate all forms of "Who is using ptrace on me?" logic. This is used for "TracerPid:" in /proc and for permission checks. We also clean up the selinux code the called an identical accessor. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:08 -07:00
James Morris	089be43e40	Revert "SELinux: allow fstype unknown to policy to use xattrs if present" This reverts commit `811f379927`. From Eric Paris: "Please drop this patch for now. It deadlocks on ntfs-3g. I need to rework it to handle fuse filesystems better. (casey was right)"	2008-07-15 18:32:49 +10:00
James Morris	6f0f0fd496	security: remove register_security hook The register security hook is no longer required, as the capability module is always registered. LSMs wishing to stack capability as a secondary module should do so explicitly. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-07-14 15:04:06 +10:00
Miklos Szeredi	b478a9f988	security: remove unused sb_get_mnt_opts hook The sb_get_mnt_opts() hook is unused, and is superseded by the sb_show_options() hook. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: James Morris <jmorris@namei.org>	2008-07-14 15:02:05 +10:00
Eric Paris	2069f45784	LSM/SELinux: show LSM mount options in /proc/mounts This patch causes SELinux mount options to show up in /proc/mounts. As with other code in the area seq_put errors are ignored. Other LSM's will not have their mount options displayed until they fill in their own security_sb_show_options() function. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:02:05 +10:00
Eric Paris	811f379927	SELinux: allow fstype unknown to policy to use xattrs if present Currently if a FS is mounted for which SELinux policy does not define an fs_use_* that FS will either be genfs labeled or not labeled at all. This decision is based on the existence of a genfscon rule in policy and is irrespective of the capabilities of the filesystem itself. This patch allows the kernel to check if the filesystem supports security xattrs and if so will use those if there is no fs_use_* rule in policy. An fstype with a no fs_use_* rule but with a genfs rule will use xattrs if available and will follow the genfs rule. This can be particularly interesting for things like ecryptfs which actually overlays a real underlying FS. If we define excryptfs in policy to use xattrs we will likely get this wrong at times, so with this path we just don't need to define it! Overlay ecryptfs on top of NFS with no xattr support: SELinux: initialized (dev ecryptfs, type ecryptfs), uses genfs_contexts Overlay ecryptfs on top of ext4 with xattr support: SELinux: initialized (dev ecryptfs, type ecryptfs), uses xattr It is also useful as the kernel adds new FS we don't need to add them in policy if they support xattrs and that is how we want to handle them. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:02:04 +10:00
James Morris	2baf06df85	SELinux: use do_each_thread as a proper do/while block Use do_each_thread as a proper do/while block. Sparse complained. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>	2008-07-14 15:02:02 +10:00
James Morris	e399f98224	SELinux: remove unused and shadowed addrlen variable Remove unused and shadowed addrlen variable. Picked up by sparse. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Paul Moore <paul.moore@hp.com>	2008-07-14 15:02:01 +10:00
Eric Paris	6cbe27061a	SELinux: more user friendly unknown handling printk I've gotten complaints and reports about people not understanding the meaning of the current unknown class/perm handling the kernel emits on every policy load. Hopefully this will make make it clear to everyone the meaning of the message and won't waste a printk the user won't care about anyway on systems where the kernel and the policy agree on everything. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:02:00 +10:00
Stephen Smalley	22df4adb04	selinux: change handling of invalid classes (Was: Re: 2.6.26-rc5-mm1 selinux whine) On Mon, 2008-06-09 at 01:24 -0700, Andrew Morton wrote: > Getting a few of these with FC5: > > SELinux: context_struct_compute_av: unrecognized class 69 > SELinux: context_struct_compute_av: unrecognized class 69 > > one came out when I logged in. > > No other symptoms, yet. Change handling of invalid classes by SELinux, reporting class values unknown to the kernel as errors (w/ ratelimit applied) and handling class values unknown to policy as normal denials. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:59 +10:00
Eric Paris	89abd0acf0	SELinux: drop load_mutex in security_load_policy We used to protect against races of policy load in security_load_policy by using the load_mutex. Since then we have added a new mutex, sel_mutex, in sel_write_load() which is always held across all calls to security_load_policy we are covered and can safely just drop this one. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:58 +10:00
Eric Paris	cea78dc4ca	SELinux: fix off by 1 reference of class_to_string in context_struct_compute_av The class_to_string array is referenced by tclass. My code mistakenly was using tclass - 1. If the proceeding class is a userspace class rather than kernel class this may cause a denial/EINVAL even if unknown handling is set to allow. The bug shouldn't be allowing excess privileges since those are given based on the contents of another array which should be correctly referenced. At this point in time its pretty unlikely this is going to cause problems. The most recently added kernel classes which could be affected are association, dccp_socket, and peer. Its pretty unlikely any policy with handle_unknown=allow doesn't have association and dccp_socket undefined (they've been around longer than unknown handling) and peer is conditionalized on a policy cap which should only be defined if that class exists in policy. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:58 +10:00
James Morris	bdd581c143	SELinux: open code sidtab lock Open code sidtab lock to make Andrew Morton happy. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>	2008-07-14 15:01:57 +10:00
James Morris	972ccac2b2	SELinux: open code load_mutex Open code load_mutex as suggested by Andrew Morton. Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:56 +10:00
James Morris	0804d1133c	SELinux: open code policy_rwlock Open code policy_rwlock, as suggested by Andrew Morton. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>	2008-07-14 15:01:55 +10:00
Stephen Smalley	59dbd1ba98	selinux: fix endianness bug in network node address handling Fix an endianness bug in the handling of network node addresses by SELinux. This yields no change on little endian hardware but fixes the incorrect handling on big endian hardware. The network node addresses are stored in network order in memory by checkpolicy, not in cpu/host order, and thus should not have cpu_to_le32/le32_to_cpu conversions applied upon policy write/read unlike other data in the policy. Bug reported by John Weeks of Sun, who noticed that binary policy files built from the same policy source on x86 and sparc differed and tracked it down to the ipv4 address handling in checkpolicy. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:54 +10:00
Stephen Smalley	242631c49d	selinux: simplify ioctl checking Simplify and improve the robustness of the SELinux ioctl checking by using the "access mode" bits of the ioctl command to determine the permission check rather than dealing with individual command values. This removes any knowledge of specific ioctl commands from SELinux and follows the same guidance we gave to Smack earlier. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:53 +10:00
Stephen Smalley	abc69bb633	SELinux: enable processes with mac_admin to get the raw inode contexts Enable processes with CAP_MAC_ADMIN + mac_admin permission in policy to get undefined contexts on inodes. This extends the support for deferred mapping of security contexts in order to permit restorecon and similar programs to see the raw file contexts unknown to the system policy in order to check them. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:52 +10:00
Stephen Smalley	006ebb40d3	Security: split proc ptrace checking into read vs. attach Enable security modules to distinguish reading of process state via proc from full ptrace access by renaming ptrace_may_attach to ptrace_may_access and adding a mode argument indicating whether only read access or full attach access is requested. This allows security modules to permit access to reading process state without granting full ptrace access. The base DAC/capability checking remains unchanged. Read access to /proc/pid/mem continues to apply a full ptrace attach check since check_mem_permission() already requires the current task to already be ptracing the target. The other ptrace checks within proc for elements like environ, maps, and fds are changed to pass the read mode instead of attach. In the SELinux case, we model such reading of process state as a reading of a proc file labeled with the target process' label. This enables SELinux policy to permit such reading of process state without permitting control or manipulation of the target process, as there are a number of cases where programs probe for such information via proc but do not need to be able to control the target (e.g. procps, lsof, PolicyKit, ConsoleKit). At present we have to choose between allowing full ptrace in policy (more permissive than required/desired) or breaking functionality (or in some cases just silencing the denials via dontaudit rules but this can hide genuine attacks). This version of the patch incorporates comments from Casey Schaufler (change/replace existing ptrace_may_attach interface, pass access mode), and Chris Wright (provide greater consistency in the checking). Note that like their predecessors __ptrace_may_attach and ptrace_may_attach, the __ptrace_may_access and ptrace_may_access interfaces use different return value conventions from each other (0 or -errno vs. 1 or 0). I retained this difference to avoid any changes to the caller logic but made the difference clearer by changing the latter interface to return a bool rather than an int and by adding a comment about it to ptrace.h for any future callers. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:47 +10:00
James Morris	feb2a5b82d	SELinux: remove inherit field from inode_security_struct Remove inherit field from inode_security_struct, per Stephen Smalley: "Let's just drop inherit altogether - dead field." Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:38 +10:00
Richard Kennedy	fdeb05184b	SELinux: reorder inode_security_struct to increase objs/slab on 64bit reorder inode_security_struct to remove padding on 64 bit builds size reduced from 72 to 64 bytes increasing objects per slab to 64. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:37 +10:00
Eric Paris	f526971078	SELinux: keep the code clean formating and syntax Formatting and syntax changes whitespace, tabs to spaces, trailing space put open { on same line as struct def remove unneeded {} after if statements change printk("Lu") to printk("llu") convert asm/uaccess.h to linux/uaacess.h includes remove unnecessary asm/bug.h includes convert all users of simple_strtol to strict_strtol Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:36 +10:00
Stephen Smalley	9a59daa03d	SELinux: fix sleeping allocation in security_context_to_sid Fix a sleeping function called from invalid context bug by moving allocation to the callers prior to taking the policy rdlock. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:35 +10:00
Stephen Smalley	12b29f3455	selinux: support deferred mapping of contexts Introduce SELinux support for deferred mapping of security contexts in the SID table upon policy reload, and use this support for inode security contexts when the context is not yet valid under the current policy. Only processes with CAP_MAC_ADMIN + mac_admin permission in policy can set undefined security contexts on inodes. Inodes with such undefined contexts are treated as having the unlabeled context until the context becomes valid upon a policy reload that defines the context. Context invalidation upon policy reload also uses this support to save the context information in the SID table and later recover it upon a subsequent policy reload that defines the context again. This support is to enable package managers and similar programs to set down file contexts unknown to the system policy at the time the file is created in order to better support placing loadable policy modules in packages and to support build systems that need to create images of different distro releases with different policies w/o requiring all of the contexts to be defined or legal in the build host policy. With this patch applied, the following sequence is possible, although in practice it is recommended that this permission only be allowed to specific program domains such as the package manager. # rmdir baz # rm bar # touch bar # chcon -t foo_exec_t bar # foo_exec_t is not yet defined chcon: failed to change context of `bar' to `system_u:object_r:foo_exec_t': Invalid argument # mkdir -Z system_u:object_r:foo_exec_t baz mkdir: failed to set default file creation context to `system_u:object_r:foo_exec_t': Invalid argument # cat setundefined.te policy_module(setundefined, 1.0) require { type unconfined_t; type unlabeled_t; } files_type(unlabeled_t) allow unconfined_t self:capability2 mac_admin; # make -f /usr/share/selinux/devel/Makefile setundefined.pp # semodule -i setundefined.pp # chcon -t foo_exec_t bar # foo_exec_t is not yet defined # mkdir -Z system_u:object_r:foo_exec_t baz # ls -Zd bar baz -rw-r--r-- root root system_u:object_r:unlabeled_t bar drwxr-xr-x root root system_u:object_r:unlabeled_t baz # cat foo.te policy_module(foo, 1.0) type foo_exec_t; files_type(foo_exec_t) # make -f /usr/share/selinux/devel/Makefile foo.pp # semodule -i foo.pp # defines foo_exec_t # ls -Zd bar baz -rw-r--r-- root root user_u:object_r:foo_exec_t bar drwxr-xr-x root root system_u:object_r:foo_exec_t baz # semodule -r foo # ls -Zd bar baz -rw-r--r-- root root system_u:object_r:unlabeled_t bar drwxr-xr-x root root system_u:object_r:unlabeled_t baz # semodule -i foo.pp # ls -Zd bar baz -rw-r--r-- root root user_u:object_r:foo_exec_t bar drwxr-xr-x root root system_u:object_r:foo_exec_t baz # semodule -r setundefined foo # chcon -t foo_exec_t bar # no longer defined and not allowed chcon: failed to change context of `bar' to `system_u:object_r:foo_exec_t': Invalid argument # rmdir baz # mkdir -Z system_u:object_r:foo_exec_t baz mkdir: failed to set default file creation context to `system_u:object_r:foo_exec_t': Invalid argument Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:34 +10:00
Al Viro	9f3acc3140	[PATCH] split linux/file.h Initial splitoff of the low-level stuff; taken to fdtable.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-05-01 13:08:16 -04:00
Oleg Nesterov	3b5e9e53c6	signals: cleanup security_task_kill() usage/implementation Every implementation of ->task_kill() does nothing when the signal comes from the kernel. This is correct, but means that check_kill_permission() should call security_task_kill() only for SI_FROMUSER() case, and we can remove the same check from ->task_kill() implementations. (sadly, check_kill_permission() is the last user of signal->session/__session but we can't s/task_session_nr/task_session/ here). NOTE: Eric W. Biederman pointed out cap_task_kill() should die, and I think he is very right. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Roland McGrath <roland@redhat.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: David Quigley <dpquigl@tycho.nsa.gov> Cc: Eric Paris <eparis@redhat.com> Cc: Harald Welte <laforge@gnumonks.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-30 08:29:34 -07:00
David Howells	7bf570dc8d	Security: Make secctx_to_secid() take const secdata Make secctx_to_secid() take constant secdata. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 13:22:56 -07:00
Linus Torvalds	9781db7b34	Merge branch 'audit.b50' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b50' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: [PATCH] new predicate - AUDIT_FILETYPE [patch 2/2] Use find_task_by_vpid in audit code [patch 1/2] audit: let userspace fully control TTY input auditing [PATCH 2/2] audit: fix sparse shadowed variable warnings [PATCH 1/2] audit: move extern declarations to audit.h Audit: MAINTAINERS update Audit: increase the maximum length of the key field Audit: standardize string audit interfaces Audit: stop deadlock from signals under load Audit: save audit_backlog_limit audit messages in case auditd comes back Audit: collect sessionid in netlink messages Audit: end printk with newline	2008-04-29 11:41:22 -07:00
David Howells	69664cf16a	keys: don't generate user and user session keyrings unless they're accessed Don't generate the per-UID user and user session keyrings unless they're explicitly accessed. This solves a problem during a login process whereby set*uid() is called before the SELinux PAM module, resulting in the per-UID keyrings having the wrong security labels. This also cures the problem of multiple per-UID keyrings sometimes appearing due to PAM modules (including pam_keyinit) setuiding and causing user_structs to come into and go out of existence whilst the session keyring pins the user keyring. This is achieved by first searching for extant per-UID keyrings before inventing new ones. The serial bound argument is also dropped from find_keyring_by_name() as it's not currently made use of (setting it to 0 disables the feature). Signed-off-by: David Howells <dhowells@redhat.com> Cc: <kwc@citi.umich.edu> Cc: <arunsr@cse.iitk.ac.in> Cc: <dwalsh@redhat.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:17 -07:00
David Howells	70a5bb72b5	keys: add keyctl function to get a security label Add a keyctl() function to get the security label of a key. The following is added to Documentation/keys.txt: () Get the LSM security context attached to a key. long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char buffer, size_t buflen) This function returns a string that represents the LSM security context attached to a key in the buffer provided. Unless there's an error, it always returns the amount of data it could produce, even if that's too big for the buffer, but it won't copy more than requested to userspace. If the buffer pointer is NULL then no copy will take place. A NUL character is included at the end of the string if the buffer is sufficiently big. This is included in the returned count. If no LSM is in force then an empty string will be returned. A process must have view permission on the key for this function to be successful. [akpm@linux-foundation.org: declare keyctl_get_security()] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: Paul Moore <paul.moore@hp.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: James Morris <jmorris@namei.org> Cc: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:16 -07:00
David Howells	8f0cfa52a1	xattr: add missing consts to function arguments Add missing consts to xattr function arguments. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:06 -07:00
Linus Torvalds	cfd299dffe	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6: SELinux: Fix a RCU free problem with the netport cache SELinux: Made netnode cache adds faster SELinux: include/security.h whitespace, syntax, and other cleanups SELinux: policydb.h whitespace, syntax, and other cleanups SELinux: mls_types.h whitespace, syntax, and other cleanups SELinux: mls.h whitespace, syntax, and other cleanups SELinux: hashtab.h whitespace, syntax, and other cleanups SELinux: context.h whitespace, syntax, and other cleanups SELinux: ss/conditional.h whitespace, syntax, and other cleanups SELinux: selinux/include/security.h whitespace, syntax, and other cleanups SELinux: objsec.h whitespace, syntax, and other cleanups SELinux: netlabel.h whitespace, syntax, and other cleanups SELinux: avc_ss.h whitespace, syntax, and other cleanups Fixed up conflict in include/linux/security.h manually	2008-04-28 10:08:49 -07:00
Andrew G. Morgan	3898b1b4eb	capabilities: implement per-process securebits Filesystem capability support makes it possible to do away with (set)uid-0 based privilege and use capabilities instead. That is, with filesystem support for capabilities but without this present patch, it is (conceptually) possible to manage a system with capabilities alone and never need to obtain privilege via (set)uid-0. Of course, conceptually isn't quite the same as currently possible since few user applications, certainly not enough to run a viable system, are currently prepared to leverage capabilities to exercise privilege. Further, many applications exist that may never get upgraded in this way, and the kernel will continue to want to support their setuid-0 base privilege needs. Where pure-capability applications evolve and replace setuid-0 binaries, it is desirable that there be a mechanisms by which they can contain their privilege. In addition to leveraging the per-process bounding and inheritable sets, this should include suppressing the privilege of the uid-0 superuser from the process' tree of children. The feature added by this patch can be leveraged to suppress the privilege associated with (set)uid-0. This suppression requires CAP_SETPCAP to initiate, and only immediately affects the 'current' process (it is inherited through fork()/exec()). This reimplementation differs significantly from the historical support for securebits which was system-wide, unwieldy and which has ultimately withered to a dead relic in the source of the modern kernel. With this patch applied a process, that is capable(CAP_SETPCAP), can now drop all legacy privilege (through uid=0) for itself and all subsequently fork()'d/exec()'d children with: prctl(PR_SET_SECUREBITS, 0x2f); This patch represents a no-op unless CONFIG_SECURITY_FILE_CAPABILITIES is enabled at configure time. [akpm@linux-foundation.org: fix uninitialised var warning] [serue@us.ibm.com: capabilities: use cap_task_prctl when !CONFIG_SECURITY] Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Reviewed-by: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Paul Moore <paul.moore@hp.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:26 -07:00
Eric Paris	b556f8ad58	Audit: standardize string audit interfaces This patch standardized the string auditing interfaces. No userspace changes will be visible and this is all just cleanup and consistancy work. We have the following string audit interfaces to use: void audit_log_n_hex(struct audit_buffer ab, const unsigned char buf, size_t len); void audit_log_n_string(struct audit_buffer ab, const char buf, size_t n); void audit_log_string(struct audit_buffer ab, const char buf); void audit_log_n_untrustedstring(struct audit_buffer ab, const char string, size_t n); void audit_log_untrustedstring(struct audit_buffer ab, const char string); This may be the first step to possibly fixing some of the issues that people have with the string output from the kernel audit system. But we still don't have an agreed upon solution to that problem. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-28 06:19:22 -04:00
Paul Moore	c9b7b97937	SELinux: Fix a RCU free problem with the netport cache The netport cache doesn't free resources in a manner which is safe or orderly. This patch fixes this by adding in a missing call to rcu_dereference() in sel_netport_insert() as well as some general cleanup throughout the file. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-04-28 09:36:27 +10:00
Paul Moore	a639e7ca8e	SELinux: Made netnode cache adds faster When adding new entries to the network node cache we would walk the entire hash bucket to make sure we didn't cross a threshold (done to bound the cache size). This isn't a very quick or elegant solution for something which is supposed to be quick-ish so add a counter to each hash bucket to track the size of the bucket and eliminate the need to walk the entire bucket list on each add. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-04-28 09:36:23 +10:00
Eric Paris	489a5fd719	SELinux: policydb.h whitespace, syntax, and other cleanups This patch changes policydb.h to fix whitespace and syntax issues. Things that are fixed may include (does not not have to include) spaces followed by tabs spaces used instead of tabs location of * in pointer declarations Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-04-28 09:29:07 +10:00
Eric Paris	8bf1f3a6c0	SELinux: mls_types.h whitespace, syntax, and other cleanups This patch changes mls_types.h to fix whitespace and syntax issues. Things that are fixed may include (does not not have to include) spaces used instead of tabs Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-04-28 09:29:06 +10:00
Eric Paris	d497fc87c0	SELinux: mls.h whitespace, syntax, and other cleanups This patch changes mls.h to fix whitespace and syntax issues. Things that are fixed may include (does not not have to include) spaces used instead of tabs Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-04-28 09:29:05 +10:00
Eric Paris	faff786ce2	SELinux: hashtab.h whitespace, syntax, and other cleanups This patch changes hashtab.h to fix whitespace and syntax issues. Things that are fixed may include (does not not have to include) spaces used instead of tabs Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-04-28 09:29:04 +10:00
Eric Paris	81fa42df78	SELinux: context.h whitespace, syntax, and other cleanups This patch changes context.h to fix whitespace and syntax issues. Things that are fixed may include (does not not have to include) include spaces around , in function calls Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-04-28 09:29:03 +10:00
Eric Paris	ccb3cbeb4f	SELinux: ss/conditional.h whitespace, syntax, and other cleanups This patch changes ss/conditional.h to fix whitespace and syntax issues. Things that are fixed may include (does not not have to include) location of * in pointer declarations Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-04-28 09:29:02 +10:00

1 2 3 4 5 ...

412 Commits