linux

Author	SHA1	Message	Date
etienne	113a0e4590	smack: fixes for unlabeled host support The following patch (against 2.6.29rc5) fixes a few issues in the smack/netlabel "unlabeled host support" functionnality that was added in 2.6.29rc. It should go in before -final. 1) smack_host_label disregard a "0.0.0.0/0 @" rule (or other label), preventing 'tagged' tasks to access Internet (many systems drop packets with IP options) 2) netmasks were not handled correctly, they were stored in a way _not equivalent_ to conversion to be32 (it was equivalent for /0, /8, /16, /24, /32 masks but not other masks) 3) smack_netlbladdr prefixes (IP/mask) were not consistent (mask&IP was not done), so there could have been different list entries for the same IP prefix; if those entries had different labels, well ... 4) they were not sorted 1) 2) 3) are bugs, 4) is a more cosmetic issue. The patch : -creates a new helper smk_netlbladdr_insert to insert a smk_netlbladdr, -sorted by netmask length -use the new sorted nature of smack_netlbladdrs list to simplify smack_host_label : the first match _will_ be the more specific -corrects endianness issues in smk_write_netlbladdr & netlbladdr_seq_show Signed-off-by: <etienne.basset@numericable.fr> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-03-05 08:30:01 +11:00
Paul Moore	d7f59dc464	selinux: Fix a panic in selinux_netlbl_inode_permission() Rick McNeal from LSI identified a panic in selinux_netlbl_inode_permission() caused by a certain sequence of SUNRPC operations. The problem appears to be due to the lack of NULL pointer checking in the function; this patch adds the pointer checks so the function will exit safely in the cases where the socket is not completely initialized. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-03-02 09:30:04 +11:00
Serge E. Hallyn	454804ab03	keys: make procfiles per-user-namespace Restrict the /proc/keys and /proc/key-users output to keys belonging to the same user namespace as the reading task. We may want to make this more complicated - so that any keys in a user-namespace which is belongs to the reading task are also shown. But let's see if anyone wants that first. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-27 12:35:15 +11:00
Serge E. Hallyn	2ea190d0a0	keys: skip keys from another user namespace When listing keys, do not return keys belonging to the same uid in another user namespace. Otherwise uid 500 in another user namespace will return keyrings called uid.500 for another user namespace. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-27 12:35:12 +11:00
Serge E. Hallyn	8ff3bc3138	keys: consider user namespace in key_permission If a key is owned by another user namespace, then treat the key as though it is owned by both another uid and gid. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-27 12:35:09 +11:00
Serge E. Hallyn	1d1e97562e	keys: distinguish per-uid keys in different namespaces per-uid keys were looked by uid only. Use the user namespace to distinguish the same uid in different namespaces. This does not address key_permission. So a task can for instance try to join a keyring owned by the same uid in another namespace. That will be handled by a separate patch. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-27 12:35:06 +11:00
Paul Moore	09c50b4a52	selinux: Fix the NetLabel glue code for setsockopt() At some point we (okay, I) managed to break the ability for users to use the setsockopt() syscall to set IPv4 options when NetLabel was not active on the socket in question. The problem was noticed by someone trying to use the "-R" (record route) option of ping: # ping -R 10.0.0.1 ping: record route: No message of desired type The solution is relatively simple, we catch the unlabeled socket case and clear the error code, allowing the operation to succeed. Please note that we still deny users the ability to override IPv4 options on socket's which have NetLabel labeling active; this is done to ensure the labeling remains intact. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-23 10:05:55 +11:00
Mimi Zohar	be38e0fd5f	integrity: ima iint radix_tree_lookup locking fix Based on Andrew Morton's comments: - add missing locks around radix_tree_lookup in ima_iint_insert() Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-23 09:54:53 +11:00
Tetsuo Handa	1581e7ddbd	TOMOYO: Do not call tomoyo_realpath_init unless registered. tomoyo_realpath_init() is unconditionally called by security_initcall(). But nobody will use realpath related functions if TOMOYO is not registered. So, let tomoyo_init() call tomoyo_realpath_init(). This patch saves 4KB of memory allocation if TOMOYO is not registered. Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-23 09:45:05 +11:00
Mimi Zohar	0da0a420bb	integrity: ima scatterlist bug fix Based on Alexander Beregalov's post http://lkml.org/lkml/2009/2/19/198 - replaced sg_set_buf() with sg_init_one() kernel BUG at include/linux/scatterlist.h:65! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: CPU 2 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.29-rc5-next-20090219 #5 PowerEdge 1950 RIP: 0010:[<ffffffff8045ec70>] [<ffffffff8045ec70>] ima_calc_hash+0xc0/0x160 RSP: 0018:ffff88007f46bc40 EFLAGS: 00010286 RAX: ffffe200032c45e8 RBX: 00000000fffffff4 RCX: 0000000087654321 RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff88007cf71048 RBP: ffff88007f46bcd0 R08: 0000000000000000 R09: 0000000000000163 R10: ffff88007f4707a8 R11: 0000000000000000 R12: ffff88007cf71048 R13: 0000000000001000 R14: 0000000000000000 R15: 0000000000009d98 FS: 0000000000000000(0000) GS:ffff8800051ac000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Tested-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-21 00:29:59 +11:00
Randy Dunlap	251a2a958b	smack: fix lots of kernel-doc notation Fix/add kernel-doc notation and fix typos in security/smack/. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-19 15:51:10 +11:00
Tetsuo Handa	e5a3b95f58	TOMOYO: Don't create securityfs entries unless registered. TOMOYO should not create /sys/kernel/security/tomoyo/ interface unless TOMOYO is registered. Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-16 09:01:48 +11:00
Tetsuo Handa	33043cbb9f	TOMOYO: Fix exception policy read failure. Due to wrong initialization, "cat /sys/kernel/security/tomoyo/exception_policy" returned nothing. Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-14 12:33:30 +11:00
Eric Paris	26036651c5	SELinux: convert the avc cache hash list to an hlist We do not need O(1) access to the tail of the avc cache lists and so we are wasting lots of space using struct list_head instead of struct hlist_head. This patch converts the avc cache to use hlists in which there is a single pointer from the head which saves us about 4k of global memory. Resulted in about a 1.5% decrease in time spent in avc_has_perm_noaudit based on oprofile sampling of tbench. Although likely within the noise.... Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-14 09:23:48 +11:00
Eric Paris	edf3d1aecd	SELinux: code readability with avc_cache The code making use of struct avc_cache was not easy to read thanks to liberal use of &avc_cache.{slots_lock,slots}[hvalue] throughout. This patch simply creates local pointers and uses those instead of the long global names. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-14 09:23:45 +11:00
Eric Paris	f1c6381a6e	SELinux: remove unused av.decided field It appears there was an intention to have the security server only decide certain permissions and leave other for later as some sort of a portential performance win. We are currently always deciding all 32 bits of permissions and this is a useless couple of branches and wasted space. This patch completely drops the av.decided concept. This in a 17% reduction in the time spent in avc_has_perm_noaudit based on oprofile sampling of a tbench benchmark. Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Paul Moore <paul.moore@hp.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-14 09:23:08 +11:00
Eric Paris	21193dcd1f	SELinux: more careful use of avd in avc_has_perm_noaudit we are often needlessly jumping through hoops when it comes to avd entries in avc_has_perm_noaudit and we have extra initialization and memcpy which are just wasting performance. Try to clean the function up a bit. This patch resulted in a 13% drop in time spent in avc_has_perm_noaudit in my oprofile sampling of a tbench benchmark. Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Paul Moore <paul.moore@hp.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-14 09:23:04 +11:00
Eric Paris	906d27d9d2	SELinux: remove the unused ae.used Currently SELinux code has an atomic which was intended to track how many times an avc entry was used and to evict entries when they haven't been used recently. Instead we never let this atomic get above 1 and evict when it is first checked for eviction since it hits zero. This is a total waste of time so I'm completely dropping ae.used. This change resulted in about a 3% faster avc_has_perm_noaudit when running oprofile against a tbench benchmark. Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed by: Paul Moore <paul.moore@hp.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-14 09:22:37 +11:00
Eric Paris	a5dda68332	SELinux: check seqno when updating an avc_node The avc update node callbacks do not check the seqno of the caller with the seqno of the node found. It is possible that a policy change could happen (although almost impossibly unlikely) in which a permissive or permissive_domain decision is not valid for the entry found. Simply pass and check that the seqno of the caller and the seqno of the node found match. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-14 09:22:34 +11:00
Eric Paris	4cb912f1d1	SELinux: NULL terminate al contexts from disk When a context is pulled in from disk we don't know that it is null terminated. This patch forecebly null terminates contexts when we pull them from disk. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-14 09:22:30 +11:00
Eric Paris	4ba0a8ad63	SELinux: better printk when file with invalid label found Currently when an inode is read into the kernel with an invalid label string (can often happen with removable media) we output a string like: SELinux: inode_doinit_with_dentry: context_to_sid([SOME INVALID LABEL]) returned -22 dor dev=[blah] ino=[blah] Which is all but incomprehensible to all but a couple of us. Instead, on EINVAL only, I plan to output a much more user friendly string and I plan to ratelimit the printk since many of these could be generated very rapidly. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-14 09:22:27 +11:00
Eric Paris	200ac532a4	SELinux: call capabilities code directory For cleanliness and efficiency remove all calls to secondary-> and instead call capabilities code directly. capabilities are the only module that selinux stacks with and so the code should not indicate that other stacking might be possible. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-14 09:22:24 +11:00
Randy Dunlap	b53fab9d48	ima: fix build error IMA_LSM_RULES requires AUDIT. This is automatic if SECURITY_SELINUX=y but not when SECURITY_SMACK=y (and SECURITY_SELINUX=n), so make the dependency explicit. This fixes the following build error: security/integrity/ima/ima_policy.c:111:error: implicit declaration of function 'security_audit_rule_match' security/integrity/ima/ima_policy.c:230:error: implicit declaration of function 'security_audit_rule_init' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-13 09:27:56 +11:00
Tetsuo Handa	35d50e60e8	tomoyo: fix sparse warning Fix sparse warning. $ make C=2 SUBDIRS=security/tomoyo CF="-D__cold__=" CHECK security/tomoyo/common.c CHECK security/tomoyo/realpath.c CHECK security/tomoyo/tomoyo.c security/tomoyo/tomoyo.c:110:8: warning: symbol 'buf' shadows an earlier one security/tomoyo/tomoyo.c💯7: originally declared here Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 20:21:10 +11:00
James Morris	42d5aaa2d8	security: change link order of LSMs so security=tomoyo works LSMs need to be linked before root_plug to ensure the security= boot parameter works with them. Do this for Tomoyo. (root_plug probably needs to be taken out and shot at some point, too). Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 16:29:04 +11:00
Kentaro Takeda	00d7d6f840	Kconfig and Makefile TOMOYO uses LSM hooks for pathname based access control and securityfs support. Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 15:19:00 +11:00
Kentaro Takeda	f743324377	LSM adapter functions. DAC's permissions and TOMOYO's permissions are not one-to-one mapping. Regarding DAC, there are "read", "write", "execute" permissions. Regarding TOMOYO, there are "allow_read", "allow_write", "allow_read/write", "allow_execute", "allow_create", "allow_unlink", "allow_mkdir", "allow_rmdir", "allow_mkfifo", "allow_mksock", "allow_mkblock", "allow_mkchar", "allow_truncate", "allow_symlink", "allow_rewrite", "allow_link", "allow_rename" permissions. +----------------------------------+----------------------------------+ \| requested operation \| required TOMOYO's permission \| +----------------------------------+----------------------------------+ \| sys_open(O_RDONLY) \| allow_read \| +----------------------------------+----------------------------------+ \| sys_open(O_WRONLY) \| allow_write \| +----------------------------------+----------------------------------+ \| sys_open(O_RDWR) \| allow_read/write \| +----------------------------------+----------------------------------+ \| open_exec() from do_execve() \| allow_execute \| +----------------------------------+----------------------------------+ \| open_exec() from !do_execve() \| allow_read \| +----------------------------------+----------------------------------+ \| sys_read() \| (none) \| +----------------------------------+----------------------------------+ \| sys_write() \| (none) \| +----------------------------------+----------------------------------+ \| sys_mmap() \| (none) \| +----------------------------------+----------------------------------+ \| sys_uselib() \| allow_read \| +----------------------------------+----------------------------------+ \| sys_open(O_CREAT) \| allow_create \| +----------------------------------+----------------------------------+ \| sys_open(O_TRUNC) \| allow_truncate \| +----------------------------------+----------------------------------+ \| sys_truncate() \| allow_truncate \| +----------------------------------+----------------------------------+ \| sys_ftruncate() \| allow_truncate \| +----------------------------------+----------------------------------+ \| sys_open() without O_APPEND \| allow_rewrite \| +----------------------------------+----------------------------------+ \| setfl() without O_APPEND \| allow_rewrite \| +----------------------------------+----------------------------------+ \| sys_sysctl() for writing \| allow_write \| +----------------------------------+----------------------------------+ \| sys_sysctl() for reading \| allow_read \| +----------------------------------+----------------------------------+ \| sys_unlink() \| allow_unlink \| +----------------------------------+----------------------------------+ \| sys_mknod(S_IFREG) \| allow_create \| +----------------------------------+----------------------------------+ \| sys_mknod(0) \| allow_create \| +----------------------------------+----------------------------------+ \| sys_mknod(S_IFIFO) \| allow_mkfifo \| +----------------------------------+----------------------------------+ \| sys_mknod(S_IFSOCK) \| allow_mksock \| +----------------------------------+----------------------------------+ \| sys_bind(AF_UNIX) \| allow_mksock \| +----------------------------------+----------------------------------+ \| sys_mknod(S_IFBLK) \| allow_mkblock \| +----------------------------------+----------------------------------+ \| sys_mknod(S_IFCHR) \| allow_mkchar \| +----------------------------------+----------------------------------+ \| sys_symlink() \| allow_symlink \| +----------------------------------+----------------------------------+ \| sys_mkdir() \| allow_mkdir \| +----------------------------------+----------------------------------+ \| sys_rmdir() \| allow_rmdir \| +----------------------------------+----------------------------------+ \| sys_link() \| allow_link \| +----------------------------------+----------------------------------+ \| sys_rename() \| allow_rename \| +----------------------------------+----------------------------------+ TOMOYO requires "allow_execute" permission of a pathname passed to do_execve() but does not require "allow_read" permission of that pathname. Let's consider 3 patterns (statically linked, dynamically linked, shell script). This description is to some degree simplified. $ cat hello.c #include <stdio.h> int main() { printf("Hello\n"); return 0; } $ cat hello.sh #! /bin/sh echo "Hello" $ gcc -static -o hello-static hello.c $ gcc -o hello-dynamic hello.c $ chmod 755 hello.sh Case 1 -- Executing hello-static from bash. (1) The bash process calls fork() and the child process requests do_execve("hello-static"). (2) The kernel checks "allow_execute hello-static" from "bash" domain. (3) The kernel calculates "bash hello-static" as the domain to transit to. (4) The kernel overwrites the child process by "hello-static". (5) The child process transits to "bash hello-static" domain. (6) The "hello-static" starts and finishes. Case 2 -- Executing hello-dynamic from bash. (1) The bash process calls fork() and the child process requests do_execve("hello-dynamic"). (2) The kernel checks "allow_execute hello-dynamic" from "bash" domain. (3) The kernel calculates "bash hello-dynamic" as the domain to transit to. (4) The kernel checks "allow_read ld-linux.so" from "bash hello-dynamic" domain. I think permission to access ld-linux.so should be charged hello-dynamic program, for "hello-dynamic needs ld-linux.so" is not a fault of bash program. (5) The kernel overwrites the child process by "hello-dynamic". (6) The child process transits to "bash hello-dynamic" domain. (7) The "hello-dynamic" starts and finishes. Case 3 -- Executing hello.sh from bash. (1) The bash process calls fork() and the child process requests do_execve("hello.sh"). (2) The kernel checks "allow_execute hello.sh" from "bash" domain. (3) The kernel calculates "bash hello.sh" as the domain to transit to. (4) The kernel checks "allow_read /bin/sh" from "bash hello.sh" domain. I think permission to access /bin/sh should be charged hello.sh program, for "hello.sh needs /bin/sh" is not a fault of bash program. (5) The kernel overwrites the child process by "/bin/sh". (6) The child process transits to "bash hello.sh" domain. (7) The "/bin/sh" requests open("hello.sh"). (8) The kernel checks "allow_read hello.sh" from "bash hello.sh" domain. (9) The "/bin/sh" starts and finishes. Whether a file is interpreted as a program or not depends on an application. The kernel cannot know whether the file is interpreted as a program or not. Thus, TOMOYO treats "hello-static" "hello-dynamic" "ld-linux.so" "hello.sh" "/bin/sh" equally as merely files; no distinction between executable and non-executable. Therefore, TOMOYO doesn't check DAC's execute permission. TOMOYO checks "allow_read" permission instead. Calling do_execve() is a bold gesture that an old program's instance (i.e. current process) is ready to be overwritten by a new program and is ready to transfer control to the new program. To split purview of programs, TOMOYO requires "allow_execute" permission of the new program against the old program's instance and performs domain transition. If do_execve() succeeds, the old program is no longer responsible against the consequence of the new program's behavior. Only the new program is responsible for all consequences. But TOMOYO doesn't require "allow_read" permission of the new program. If TOMOYO requires "allow_read" permission of the new program, TOMOYO will allow an attacker (who hijacked the old program's instance) to open the new program and steal data from the new program. Requiring "allow_read" permission will widen purview of the old program. Not requiring "allow_read" permission of the new program against the old program's instance is my design for reducing purview of the old program. To be able to know whether the current process is in do_execve() or not, I want to add in_execve flag to "task_struct". Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 15:15:05 +11:00
Kentaro Takeda	26a2a1c9eb	Domain transition handler. This file controls domain creation/deletion/transition. Every process belongs to a domain in TOMOYO Linux. Domain transition occurs when execve(2) is called and the domain is expressed as 'process invocation history', such as '<kernel> /sbin/init /etc/init.d/rc'. Domain information is stored in current->cred->security field. Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 15:15:05 +11:00
Kentaro Takeda	b69a54ee58	File operation restriction part. This file controls file related operations of TOMOYO Linux. tomoyo/tomoyo.c calls the following six functions in this file. Each function handles the following access types. * tomoyo_check_file_perm sysctl()'s "read" and "write". * tomoyo_check_exec_perm "execute". * tomoyo_check_open_permission open(2) for "read" and "write". * tomoyo_check_1path_perm "create", "unlink", "mkdir", "rmdir", "mkfifo", "mksock", "mkblock", "mkchar", "truncate" and "symlink". * tomoyo_check_2path_perm "rename" and "unlink". * tomoyo_check_rewrite_permission "rewrite". ("rewrite" are operations which may lose already recorded data of a file, i.e. open(!O_APPEND) \|\| open(O_TRUNC) \|\| truncate() \|\| ftruncate()) The functions which actually checks ACLs are the following three functions. Each function handles the following access types. ACL directive is expressed by "allow_<access type>". * tomoyo_check_file_acl Open() operation and execve() operation. ("read", "write", "read/write" and "execute") * tomoyo_check_single_write_acl Directory modification operations with 1 pathname. ("create", "unlink", "mkdir", "rmdir", "mkfifo", "mksock", "mkblock", "mkchar", "truncate", "symlink" and "rewrite") * tomoyo_check_double_write_acl Directory modification operations with 2 pathname. ("link" and "rename") Also, this file contains handlers of some utility directives for file related operations. * "allow_read": specifies globally (for all domains) readable files. * "path_group": specifies pathname macro. * "deny_rewrite": restricts rewrite operation. Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 15:15:05 +11:00
Kentaro Takeda	9590837b89	Common functions for TOMOYO Linux. This file contains common functions (e.g. policy I/O, pattern matching). -------------------- About pattern matching -------------------- Since TOMOYO Linux is a name based access control, TOMOYO Linux seriously considers "safe" string representation. TOMOYO Linux's string manipulation functions make reviewers feel crazy, but there are reasons why TOMOYO Linux needs its own string manipulation functions. ----- Part 1 : preconditions ----- People definitely want to use wild card. To support pattern matching, we have to support wild card characters. In a typical Linux system, filenames are likely consists of only alphabets, numbers, and some characters (e.g. + - ~ . / ). But theoretically, the Linux kernel accepts all characters but NUL character (which is used as a terminator of a string). Some Linux systems can have filenames which contain * ? ** etc. Therefore, we have to somehow modify string so that we can distinguish wild card characters and normal characters. It might be possible for some application's configuration files to restrict acceptable characters. It is impossible for kernel to restrict acceptable characters. We can't accept approaches which will cause troubles for applications. ----- Part 2 : commonly used approaches ----- Text formatted strings separated by space character (0x20) and new line character (0x0A) is more preferable for users over array of NUL-terminated string. Thus, people use text formatted configuration files separated by space character and new line. We sometimes need to handle non-printable characters. Thus, people use \ character (0x5C) as escape character and represent non-printable characters using octal or hexadecimal format. At this point, we remind (at least) 3 approaches. (1) Shell glob style expression (2) POSIX regular expression (UNIX style regular expression) (3) Maverick wild card expression On the surface, (1) and (2) sound good choices. But they have a big pitfall. All meta-characters in (1) and (2) are legal characters for representing a pathname, and users easily write incorrect expression. What is worse, users unlikely notice incorrect expressions because characters used for regular pathnames unlikely contain meta-characters. This incorrect use of meta-characters in pathname representation reveals vulnerability (e.g. unexpected results) only when irregular pathname is specified. The authors of TOMOYO Linux think that approaches which adds some character for interpreting meta-characters as normal characters (i.e. (1) and (2)) are not suitable for security use. Therefore, the authors of TOMOYO Linux propose (3). ----- Part 3: consideration points ----- We need to solve encoding problem. A single character can be represented in several ways using encodings. For Japanese language, there are "ShiftJIS", "ISO-2022-JP", "EUC-JP", "UTF-8" and more. Some languages (e.g. Japanese language) supports multi-byte characters (where a single character is represented using several bytes). Some multi-byte characters may match the escape character. For Japanese language, some characters in "ShiftJIS" encoding match \ character, and bothering Web's CGI developers. It is important that the kernel string is not bothered by encoding problem. Linus said, "I really would expect that kernel strings don't have an encoding. They're just C strings: a NUL-terminated stream of bytes." http://lkml.org/lkml/2007/11/6/142 Yes. The kernel strings are just C strings. We are talking about how to store and carry "kernel strings" safely. If we store "kernel string" into policy file as-is, the "kernel string" will be interpreted differently depending on application's encoding settings. One application may interpret "kernel string" as "UTF-8", another application may interpret "kernel string" as "ShiftJIS". Therefore, we propose to represent strings using ASCII encoding. In this way, we are no longer bothered by encoding problems. We need to avoid information loss caused by display. It is difficult to input and display non-printable characters, but we have to be able to handle such characters because the kernel string is a C string. If we use only ASCII printable characters (from 0x21 to 0x7E) and space character (0x20) and new line character (0x0A), it is easy to input from keyboard and display on all terminals which is running Linux. Therefore, we propose to represent strings using only characters which value is one of "from 0x21 to 0x7E", "0x20", "0x0A". We need to consider ease of splitting strings from a line. If we use an approach which uses "\ " for representing a space character within a string, we have to count the string from the beginning to check whether this space character is accompanied with \ character or not. As a result, we cannot monotonically split a line using space character. If we use an approach which uses "\040" for representing a space character within a string, we can monotonically split a line using space character. If we use an approach which uses NUL character as a delimiter, we cannot use string manipulation functions for splitting strings from a line. Therefore, we propose that we represent space character as "\040". We need to avoid wrong designations (incorrect use of special characters). Not all users can understand and utilize POSIX's regular expressions correctly and perfectly. If a character acts as a wild card by default, the user will get unexpected result if that user didn't know the meaning of that character. Therefore, we propose that all characters but \ character act as a normal character and let the user add \ character to make a character act as a wild card. In this way, users needn't to know all wild card characters beforehand. They can learn when they encountered an unseen wild card character for their first time. ----- Part 4: supported wild card expressions ----- At this point, we have wild card expressions listed below. +-----------+--------------------------------------------------------------+ \| Wild card \| Meaning and example \| +-----------+--------------------------------------------------------------+ \| \* \| More than or equals to 0 character other than '/'. \| \| \| /var/log/samba/\* \| +-----------+--------------------------------------------------------------+ \| \@ \| More than or equals to 0 character other than '/' or '.'. \| \| \| /var/www/html/\@.html \| +-----------+--------------------------------------------------------------+ \| \? \| 1 byte character other than '/'. \| \| \| /tmp/mail.\?\?\?\?\?\? \| +-----------+--------------------------------------------------------------+ \| \$ \| More than or equals to 1 decimal digit. \| \| \| /proc/\$/cmdline \| +-----------+--------------------------------------------------------------+ \| \+ \| 1 decimal digit. \| \| \| /var/tmp/my_work.\+ \| +-----------+--------------------------------------------------------------+ \| \X \| More than or equals to 1 hexadecimal digit. \| \| \| /var/tmp/my-work.\X \| +-----------+--------------------------------------------------------------+ \| \x \| 1 hexadecimal digit. \| \| \| /tmp/my-work.\x \| +-----------+--------------------------------------------------------------+ \| \A \| More than or equals to 1 alphabet character. \| \| \| /var/log/my-work/\$-\A-\$.log \| +-----------+--------------------------------------------------------------+ \| \a \| 1 alphabet character. \| \| \| /home/users/\a/\/public_html/\.html \| +-----------+--------------------------------------------------------------+ \| \- \| Pathname subtraction operator. \| \| \| +---------------------+------------------------------------+ \| \| \| \| Example \| Meaning \| \| \| \| +---------------------+------------------------------------+ \| \| \| \| /etc/\* \| All files in /etc/ directory. \| \| \| \| +---------------------+------------------------------------+ \| \| \| \| /etc/\\-\shadow\* \| /etc/\* other than /etc/\shadow\ \| \| \| \| +---------------------+------------------------------------+ \| \| \| \| /\\-proc\-sys/ \| /\/ other than /proc/ /sys/ \| \| \| \| +---------------------+------------------------------------+ \| +-----------+--------------------------------------------------------------+ +----------------+---------------------------------------------------------+ \| Representation \| Meaning and example \| +----------------+---------------------------------------------------------+ \| \\ \| backslash character itself. \| +----------------+---------------------------------------------------------+ \| \ooo \| 1 byte character. \| \| \| ooo is 001 <= ooo <= 040 \|\| 177 <= ooo <= 377. \| \| \| \| \| \| \040 for space character. \| \| \| \177 for del character. \| \| \| \| +----------------+---------------------------------------------------------+ ----- Part 5: Advantages ----- We can obtain extensibility. Since our proposed approach adds \ to a character to interpret as a wild card, we can introduce new wild card in future while maintaining backward compatibility. We can process monotonically. Since our proposed approach separates strings using a space character, we can split strings using existing string manipulation functions. We can reliably analyze access logs. It is guaranteed that a string doesn't contain space character (0x20) and new line character (0x0A). It is guaranteed that a string won't be converted by FTP and won't be damaged by a terminal's settings. It is guaranteed that a string won't be affected by encoding converters (except encodings which insert NUL character (e.g. UTF-16)). ----- Part 6: conclusion ----- TOMOYO Linux is using its own encoding with reasons described above. There is a disadvantage that we need to introduce a series of new string manipulation functions. But TOMOYO Linux's encoding is useful for all users (including audit and AppArmor) who want to perform pattern matching and safely exchange string information between the kernel and the userspace. -------------------- About policy interface -------------------- TOMOYO Linux creates the following files on securityfs (normally mounted on /sys/kernel/security) as interfaces between kernel and userspace. These files are for TOMOYO Linux management tools only, not for general programs. * profile * exception_policy * domain_policy * manager * meminfo * self_domain * version * .domain_status * .process_status /sys/kernel/security/tomoyo/profile This file is used to read or write profiles. "profile" means a running mode of process. A profile lists up functions and their modes in "$number-$variable=$value" format. The $number is profile number between 0 and 255. Each domain is assigned one profile. To assign profile to domains, use "ccs-setprofile" or "ccs-editpolicy" or "ccs-loadpolicy" commands. (Example) [root@tomoyo]# cat /sys/kernel/security/tomoyo/profile 0-COMMENT=-----Disabled Mode----- 0-MAC_FOR_FILE=disabled 0-MAX_ACCEPT_ENTRY=2048 0-TOMOYO_VERBOSE=disabled 1-COMMENT=-----Learning Mode----- 1-MAC_FOR_FILE=learning 1-MAX_ACCEPT_ENTRY=2048 1-TOMOYO_VERBOSE=disabled 2-COMMENT=-----Permissive Mode----- 2-MAC_FOR_FILE=permissive 2-MAX_ACCEPT_ENTRY=2048 2-TOMOYO_VERBOSE=enabled 3-COMMENT=-----Enforcing Mode----- 3-MAC_FOR_FILE=enforcing 3-MAX_ACCEPT_ENTRY=2048 3-TOMOYO_VERBOSE=enabled - MAC_FOR_FILE: Specifies access control level regarding file access requests. - MAX_ACCEPT_ENTRY: Limits the max number of ACL entries that are automatically appended during learning mode. Default is 2048. - TOMOYO_VERBOSE: Specifies whether to print domain policy violation messages or not. /sys/kernel/security/tomoyo/manager This file is used to read or append the list of programs or domains that can write to /sys/kernel/security/tomoyo interface. By default, only processes with both UID = 0 and EUID = 0 can modify policy via /sys/kernel/security/tomoyo interface. You can use keyword "manage_by_non_root" to allow policy modification by non root user. (Example) [root@tomoyo]# cat /sys/kernel/security/tomoyo/manager /usr/lib/ccs/loadpolicy /usr/lib/ccs/editpolicy /usr/lib/ccs/setlevel /usr/lib/ccs/setprofile /usr/lib/ccs/ld-watch /usr/lib/ccs/ccs-queryd /sys/kernel/security/tomoyo/exception_policy This file is used to read and write system global settings. Each line has a directive and operand pair. Directives are listed below. - initialize_domain: To initialize domain transition when specific program is executed, use initialize_domain directive. * initialize_domain "program" from "domain" * initialize_domain "program" from "the last program part of domain" * initialize_domain "program" If the part "from" and after is not given, the entry is applied to all domain. If the "domain" doesn't start with "<kernel>", the entry is applied to all domain whose domainname ends with "the last program part of domain". This directive is intended to aggregate domain transitions for daemon program and program that are invoked by the kernel on demand, by transiting to different domain. - keep_domain To prevent domain transition when program is executed from specific domain, use keep_domain directive. * keep_domain "program" from "domain" * keep_domain "program" from "the last program part of domain" * keep_domain "domain" * keep_domain "the last program part of domain" If the part "from" and before is not given, this entry is applied to all program. If the "domain" doesn't start with "<kernel>", the entry is applied to all domain whose domainname ends with "the last program part of domain". This directive is intended to reduce total number of domains and memory usage by suppressing unneeded domain transitions. To declare domain keepers, use keep_domain directive followed by domain definition. Any process that belongs to any domain declared with this directive, the process stays at the same domain unless any program registered with initialize_domain directive is executed. In order to control domain transition in detail, you can use no_keep_domain/no_initialize_domain keywrods. - alias: To allow executing programs using the name of symbolic links, use alias keyword followed by dereferenced pathname and reference pathname. For example, /sbin/pidof is a symbolic link to /sbin/killall5 . In normal case, if /sbin/pidof is executed, the domain is defined as if /sbin/killall5 is executed. By specifying "alias /sbin/killall5 /sbin/pidof", you can run /sbin/pidof in the domain for /sbin/pidof . (Example) alias /sbin/killall5 /sbin/pidof - allow_read: To grant unconditionally readable permissions, use allow_read keyword followed by canonicalized file. This keyword is intended to reduce size of domain policy by granting read access to library files such as GLIBC and locale files. Exception is, if ignore_global_allow_read keyword is given to a domain, entries specified by this keyword are ignored. (Example) allow_read /lib/libc-2.5.so - file_pattern: To declare pathname pattern, use file_pattern keyword followed by pathname pattern. The pathname pattern must be a canonicalized Pathname. This keyword is not applicable to neither granting execute permissions nor domain definitions. For example, canonicalized pathname that contains a process ID (i.e. /proc/PID/ files) needs to be grouped in order to make access control work well. (Example) file_pattern /proc/\$/cmdline - path_group To declare pathname group, use path_group keyword followed by name of the group and pathname pattern. For example, if you want to group all files under home directory, you can define path_group HOME-DIR-FILE /home/\/\ path_group HOME-DIR-FILE /home/\/\/\* path_group HOME-DIR-FILE /home/\/\/\/\ in the exception policy and use like allow_read @HOME-DIR-FILE to grant file access permission. - deny_rewrite: To deny overwriting already written contents of file (such as log files) by default, use deny_rewrite keyword followed by pathname pattern. Files whose pathname match the patterns are not permitted to open for writing without append mode or truncate unless the pathnames are explicitly granted using allow_rewrite keyword in domain policy. (Example) deny_rewrite /var/log/\* - aggregator To deal multiple programs as a single program, use aggregator keyword followed by name of original program and aggregated program. This keyword is intended to aggregate similar programs. For example, /usr/bin/tac and /bin/cat are similar. By specifying "aggregator /usr/bin/tac /bin/cat", you can run /usr/bin/tac in the domain for /bin/cat . For example, /usr/sbin/logrotate for Fedora Core 3 generates programs like /tmp/logrotate.\?\?\?\?\?\? and run them, but TOMOYO Linux doesn't allow using patterns for granting execute permission and defining domains. By specifying "aggregator /tmp/logrotate.\?\?\?\?\?\? /tmp/logrotate.tmp", you can run /tmp/logrotate.\?\?\?\?\?\? as if /tmp/logrotate.tmp is running. /sys/kernel/security/tomoyo/domain_policy This file contains definition of all domains and permissions that are granted to each domain. Lines from the next line to a domain definition ( any lines starting with "<kernel>") to the previous line to the next domain definitions are interpreted as access permissions for that domain. /sys/kernel/security/tomoyo/meminfo This file is to show the total RAM used to keep policy in the kernel by TOMOYO Linux in bytes. (Example) [root@tomoyo]# cat /sys/kernel/security/tomoyo/meminfo Shared: 61440 Private: 69632 Dynamic: 768 Total: 131840 You can set memory quota by writing to this file. (Example) [root@tomoyo]# echo Shared: 2097152 > /sys/kernel/security/tomoyo/meminfo [root@tomoyo]# echo Private: 2097152 > /sys/kernel/security/tomoyo/meminfo /sys/kernel/security/tomoyo/self_domain This file is to show the name of domain the caller process belongs to. (Example) [root@etch]# cat /sys/kernel/security/tomoyo/self_domain <kernel> /usr/sbin/sshd /bin/zsh /bin/cat /sys/kernel/security/tomoyo/version This file is used for getting TOMOYO Linux's version. (Example) [root@etch]# cat /sys/kernel/security/tomoyo/version 2.2.0-pre /sys/kernel/security/tomoyo/.domain_status This is a view (of a DBMS) that contains only profile number and domainnames of domain so that "ccs-setprofile" command can do line-oriented processing easily. /sys/kernel/security/tomoyo/.process_status This file is used by "ccs-ccstree" command to show "list of processes currently running" and "domains which each process belongs to" and "profile number which the domain is currently assigned" like "pstree" command. This file is writable by programs that aren't registered as policy manager. Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 15:15:04 +11:00
Kentaro Takeda	c73bd6d473	Memory and pathname management functions. TOMOYO Linux performs pathname based access control. To remove factors that make pathname based access control difficult (e.g. symbolic links, "..", "//" etc.), TOMOYO Linux derives realpath of requested pathname from "struct dentry" and "struct vfsmount". The maximum length of string data is limited to 4000 including trailing '\0'. Since TOMOYO Linux uses '\ooo' style representation for non ASCII printable characters, maybe TOMOYO Linux should be able to support 16336 (which means (NAME_MAX * (PATH_MAX / (NAME_MAX + 1)) * 4 + (PATH_MAX / (NAME_MAX + 1))) including trailing '\0'), but I think 4000 is enough for practical use. TOMOYO uses only 0x21 - 0x7E (as printable characters) and 0x20 (as word delimiter) and 0x0A (as line delimiter). 0x01 - 0x20 and 0x80 - 0xFF is handled in \ooo style representation. The reason to use \ooo is to guarantee that "%s" won't damage logs. Userland program can request open("/tmp/file granted.\nAccess /tmp/file ", O_WRONLY \| O_CREAT, 0600) and logging such crazy pathname using "Access %s denied.\n" format will cause "fabrication of logs" like Access /tmp/file granted. Access /tmp/file denied. TOMOYO converts such characters to \ooo so that the logs will become Access /tmp/file\040granted.\012Access\040/tmp/file denied. and the administrator can read the logs safely using /bin/cat . Likewise, a crazy request like open("/tmp/\x01\x02\x03\x04\x05\x06\x07\x08\x09", O_WRONLY \| O_CREAT, 0600) will be processed safely by converting to Access /tmp/\001\002\003\004\005\006\007\010\011 denied. Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 15:15:04 +11:00
Mimi Zohar	523979adfa	integrity: audit update Based on discussions on linux-audit, as per Steve Grubb's request http://lkml.org/lkml/2009/2/6/269, the following changes were made: - forced audit result to be either 0 or 1. - made template names const - Added new stand-alone message type: AUDIT_INTEGRITY_RULE Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 09:40:14 +11:00
James Morris	cb5629b10d	Merge branch 'master' into next Conflicts: fs/namei.c Manually merged per: diff --cc fs/namei.c index 734f2b5,bbc15c2..0000000 --- a/fs/namei.c +++ b/fs/namei.c @@@ -860,9 -848,8 +849,10 @@@ static int __link_path_walk(const char nd->flags \|= LOOKUP_CONTINUE; err = exec_permission_lite(inode); if (err == -EAGAIN) - err = vfs_permission(nd, MAY_EXEC); + err = inode_permission(nd->path.dentry->d_inode, + MAY_EXEC); + if (!err) + err = ima_path_check(&nd->path, MAY_EXEC); if (err) break; @@@ -1525,14 -1506,9 +1509,14 @@@ int may_open(struct path path, int acc flag &= ~O_TRUNC; } - error = vfs_permission(nd, acc_mode); + error = inode_permission(inode, acc_mode); if (error) return error; + - error = ima_path_check(&nd->path, ++ error = ima_path_check(path, + acc_mode & (MAY_READ \| MAY_WRITE \| MAY_EXEC)); + if (error) + return error; / * An append-only file must be opened in append mode for writing. */ Signed-off-by: James Morris <jmorris@namei.org>	2009-02-06 11:01:45 +11:00
James Morris	64c61d80a6	IMA: fix ima_delete_rules() definition Fix ima_delete_rules() definition so sparse doesn't complain. Signed-off-by: James Morris <jmorris@namei.org>	2009-02-06 09:05:34 +11:00
Mimi Zohar	1df9f0a731	Integrity: IMA file free imbalance The number of calls to ima_path_check()/ima_file_free() should be balanced. An extra call to fput(), indicates the file could have been accessed without first being measured. Although f_count is incremented/decremented in places other than fget/fput, like fget_light/fput_light and get_file, the current task must already hold a file refcnt. The call to __fput() is delayed until the refcnt becomes 0, resulting in ima_file_free() flagging any changes. - add hook to increment opencount for IPC shared memory(SYSV), shmat files, and /dev/zero - moved NULL iint test in opencount_get() Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-06 09:05:33 +11:00
Mimi Zohar	f4bd857bc8	integrity: IMA policy open Sequentialize access to the policy file - permit multiple attempts to replace default policy with a valid policy Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-06 09:05:32 +11:00
Mimi Zohar	4af4662fa4	integrity: IMA policy Support for a user loadable policy through securityfs with support for LSM specific policy data. - free invalid rule in ima_parse_add_rule() Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-06 09:05:31 +11:00
Mimi Zohar	bab7393787	integrity: IMA display Make the measurement lists available through securityfs. - removed test for NULL return code from securityfs_create_file/dir Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-06 09:05:31 +11:00
Mimi Zohar	3323eec921	integrity: IMA as an integrity service provider IMA provides hardware (TPM) based measurement and attestation for file measurements. As the Trusted Computing (TPM) model requires, IMA measures all files before they are accessed in any way (on the integrity_bprm_check, integrity_path_check and integrity_file_mmap hooks), and commits the measurements to the TPM. Once added to the TPM, measurements can not be removed. In addition, IMA maintains a list of these file measurements, which can be used to validate the aggregate value stored in the TPM. The TPM can sign these measurements, and thus the system can prove, to itself and to a third party, the system's integrity in a way that cannot be circumvented by malicious or compromised software. - alloc ima_template_entry before calling ima_store_template() - log ima_add_boot_aggregate() failure - removed unused IMA_TEMPLATE_NAME_LEN - replaced hard coded string length with #define name Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-06 09:05:30 +11:00
Serge E. Hallyn	faa3aad75a	securityfs: fix long-broken securityfs_create_file comment If there is an error creating a file through securityfs_create_file, NULL is not returned, rather the error is propagated. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-03 11:02:51 +11:00
James Morris	5626d3e861	selinux: remove hooks which simply defer to capabilities Remove SELinux hooks which do nothing except defer to the capabilites hooks (or in one case, replicates the function). Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>	2009-02-02 09:20:34 +11:00
James Morris	95c14904b6	selinux: remove secondary ops call to shm_shmat Remove secondary ops call to shm_shmat, which is a noop in capabilities. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-30 08:55:16 +11:00
James Morris	5c4054ccfa	selinux: remove secondary ops call to unix_stream_connect Remove secondary ops call to unix_stream_connect, which is a noop in capabilities. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-30 08:55:15 +11:00
James Morris	2cbbd19812	selinux: remove secondary ops call to task_kill Remove secondary ops call to task_kill, which is a noop in capabilities. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-30 08:55:14 +11:00
James Morris	ef76e748fa	selinux: remove secondary ops call to task_setrlimit Remove secondary ops call to task_setrlimit, which is a noop in capabilities. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-30 08:55:13 +11:00
James Morris	ca5143d3ff	selinux: remove unused cred_commit hook Remove unused cred_commit hook from SELinux. This currently calls into the capabilities hook, which is a noop. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-30 08:55:12 +11:00
James Morris	af294e41d0	selinux: remove secondary ops call to task_create Remove secondary ops call to task_create, which is a noop in capabilities. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-30 08:55:11 +11:00
James Morris	d541bbee69	selinux: remove secondary ops call to file_mprotect Remove secondary ops call to file_mprotect, which is a noop in capabilities. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-30 08:55:11 +11:00
James Morris	438add6b32	selinux: remove secondary ops call to inode_setattr Remove secondary ops call to inode_setattr, which is a noop in capabilities. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-30 08:55:10 +11:00
James Morris	188fbcca9d	selinux: remove secondary ops call to inode_permission Remove secondary ops call to inode_permission, which is a noop in capabilities. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-30 08:55:09 +11:00
James Morris	f51115b9ab	selinux: remove secondary ops call to inode_follow_link Remove secondary ops call to inode_follow_link, which is a noop in capabilities. Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-30 08:55:08 +11:00
James Morris	dd4907a6d4	selinux: remove secondary ops call to inode_mknod Remove secondary ops call to inode_mknod, which is a noop in capabilities. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-30 08:55:07 +11:00
James Morris	e4737250b7	selinux: remove secondary ops call to inode_unlink Remove secondary ops call to inode_unlink, which is a noop in capabilities. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-30 08:55:06 +11:00
James Morris	efdfac4376	selinux: remove secondary ops call to inode_link Remove secondary ops call to inode_link, which is a noop in capabilities. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-30 08:55:06 +11:00
James Morris	97422ab9ef	selinux: remove secondary ops call to sb_umount Remove secondary ops call to sb_umount, which is a noop in capabilities. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-30 08:55:05 +11:00
James Morris	ef935b9136	selinux: remove secondary ops call to sb_mount Remove secondary ops call to sb_mount, which is a noop in capabilities. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-30 08:55:04 +11:00
James Morris	5565b0b865	selinux: remove secondary ops call to bprm_committed_creds Remove secondary ops call to bprm_committed_creds, which is a noop in capabilities. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-30 08:55:03 +11:00
James Morris	2ec5dbe23d	selinux: remove secondary ops call to bprm_committing_creds Remove secondary ops call to bprm_committing_creds, which is a noop in capabilities. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-30 08:55:02 +11:00
James Morris	bc05595845	selinux: remove unused bprm_check_security hook Remove unused bprm_check_security hook from SELinux. This currently calls into the capabilities hook, which is a noop. Acked-by: Eric Paris <eparis@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-30 08:55:01 +11:00
Casey Schaufler	152a649b64	smackfs load append mode fix Given just how hard it is to find the code that uses MAY_APPEND it's probably not a big surprise that this went unnoticed for so long. The Smack rules loading code is incorrectly setting the MAY_READ bit when MAY_APPEND is requested. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-27 20:13:32 -08:00
David P. Quigley	cd89596f0c	SELinux: Unify context mount and genfs behavior Context mounts and genfs labeled file systems behave differently with respect to setting file system labels. This patch brings genfs labeled file systems in line with context mounts in that setxattr calls to them should return EOPNOTSUPP and fscreate calls will be ignored. Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@macbook.localdomain>	2009-01-19 09:47:14 +11:00
David P. Quigley	11689d47f0	SELinux: Add new security mount option to indicate security label support. There is no easy way to tell if a file system supports SELinux security labeling. Because of this a new flag is being added to the super block security structure to indicate that the particular super block supports labeling. This flag is set for file systems using the xattr, task, and transition labeling methods unless that behavior is overridden by context mounts. Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@macbook.localdomain>	2009-01-19 09:47:06 +11:00
David P. Quigley	0d90a7ec48	SELinux: Condense super block security structure flags and cleanup necessary code. The super block security structure currently has three fields for what are essentially flags. The flags field is used for mount options while two other char fields are used for initialization and proc flags. These latter two fields are essentially bit fields since the only used values are 0 and 1. These fields have been collapsed into the flags field and new bit masks have been added for them. The code is also fixed to work with these new flags. Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@macbook.localdomain>	2009-01-19 09:46:40 +11:00
Vegard Nossum	0d54ee1c78	security: introduce missing kfree Plug this leak. Acked-by: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: <stable@kernel.org> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-17 14:24:46 -08:00
Heiko Carstens	938bb9f5e8	[CVE-2009-0029] System call wrappers part 28 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2009-01-14 14:15:30 +01:00
Heiko Carstens	1e7bfb2134	[CVE-2009-0029] System call wrappers part 27 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2009-01-14 14:15:29 +01:00
Fernando Carrijo	c19a28e119	remove lots of double-semicolons Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Theodore Ts'o <tytso@mit.edu> Acked-by: Mark Fasheh <mfasheh@suse.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-08 08:31:14 -08:00
Serge E. Hallyn	0b82ac37b8	devices cgroup: allow mkfifo The devcgroup_inode_permission() hook in the devices whitelist cgroup has always bypassed access checks on fifos. But the mknod hook did not. The devices whitelist is only about block and char devices, and fifos can't even be added to the whitelist, so fifos can't be created at all except by tasks which have 'a' in their whitelist (meaning they have access to all devices). Fix the behavior by bypassing access checks to mkfifo. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Paul Menage <menage@google.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: James Morris <jmorris@namei.org> Reported-by: Daniel Lezcano <dlezcano@fr.ibm.com> Cc: <stable@kernel.org> [2.6.27.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-08 08:31:03 -08:00
Lai Jiangshan	116e057512	devcgroup: use list_for_each_entry_rcu() We should use list_for_each_entry_rcu in RCU read site. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-08 08:31:03 -08:00
James Morris	ac8cc0fa53	Merge branch 'next' into for-linus	2009-01-07 09:58:22 +11:00
David Howells	3699c53c48	CRED: Fix regression in cap_capable() as shown up by sys_faccessat() [ver #3 ] Fix a regression in cap_capable() due to: commit `3b11a1dece` Author: David Howells <dhowells@redhat.com> Date: Fri Nov 14 10:39:26 2008 +1100 CRED: Differentiate objective and effective subjective credentials on a task The problem is that the above patch allows a process to have two sets of credentials, and for the most part uses the subjective credentials when accessing current's creds. There is, however, one exception: cap_capable(), and thus capable(), uses the real/objective credentials of the target task, whether or not it is the current task. Ordinarily this doesn't matter, since usually the two cred pointers in current point to the same set of creds. However, sys_faccessat() makes use of this facility to override the credentials of the calling process to make its test, without affecting the creds as seen from other processes. One of the things sys_faccessat() does is to make an adjustment to the effective capabilities mask, which cap_capable(), as it stands, then ignores. The affected capability check is in generic_permission(): if (!(mask & MAY_EXEC) \|\| execute_ok(inode)) if (capable(CAP_DAC_OVERRIDE)) return 0; This change passes the set of credentials to be tested down into the commoncap and SELinux code. The security functions called by capable() and has_capability() select the appropriate set of credentials from the process being checked. This can be tested by compiling the following program from the XFS testsuite: /* * t_access_root.c - trivial test program to show permission bug. * * Written by Michael Kerrisk - copyright ownership not pursued. * Sourced from: http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-10/6030.html / #include <limits.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <sys/stat.h> #define UID 500 #define GID 100 #define PERM 0 #define TESTPATH "/tmp/t_access" static void errExit(char msg) { perror(msg); exit(EXIT_FAILURE); } /* errExit / static void accessTest(char file, int mask, char mstr) { printf("access(%s, %s) returns %d\n", file, mstr, access(file, mask)); } / accessTest / int main(int argc, char argv[]) { int fd, perm, uid, gid; char testpath; char cmd[PATH_MAX + 20]; testpath = (argc > 1) ? argv[1] : TESTPATH; perm = (argc > 2) ? strtoul(argv[2], NULL, 8) : PERM; uid = (argc > 3) ? atoi(argv[3]) : UID; gid = (argc > 4) ? atoi(argv[4]) : GID; unlink(testpath); fd = open(testpath, O_RDWR \| O_CREAT, 0); if (fd == -1) errExit("open"); if (fchown(fd, uid, gid) == -1) errExit("fchown"); if (fchmod(fd, perm) == -1) errExit("fchmod"); close(fd); snprintf(cmd, sizeof(cmd), "ls -l %s", testpath); system(cmd); if (seteuid(uid) == -1) errExit("seteuid"); accessTest(testpath, 0, "0"); accessTest(testpath, R_OK, "R_OK"); accessTest(testpath, W_OK, "W_OK"); accessTest(testpath, X_OK, "X_OK"); accessTest(testpath, R_OK \| W_OK, "R_OK \| W_OK"); accessTest(testpath, R_OK \| X_OK, "R_OK \| X_OK"); accessTest(testpath, W_OK \| X_OK, "W_OK \| X_OK"); accessTest(testpath, R_OK \| W_OK \| X_OK, "R_OK \| W_OK \| X_OK"); exit(EXIT_SUCCESS); } / main */ This can be run against an Ext3 filesystem as well as against an XFS filesystem. If successful, it will show: [root@andromeda src]# ./t_access_root /tmp/xxx 0 4043 4043 ---------- 1 dhowells dhowells 0 2008-12-31 03:00 /tmp/xxx access(/tmp/xxx, 0) returns 0 access(/tmp/xxx, R_OK) returns 0 access(/tmp/xxx, W_OK) returns 0 access(/tmp/xxx, X_OK) returns -1 access(/tmp/xxx, R_OK \| W_OK) returns 0 access(/tmp/xxx, R_OK \| X_OK) returns -1 access(/tmp/xxx, W_OK \| X_OK) returns -1 access(/tmp/xxx, R_OK \| W_OK \| X_OK) returns -1 If unsuccessful, it will show: [root@andromeda src]# ./t_access_root /tmp/xxx 0 4043 4043 ---------- 1 dhowells dhowells 0 2008-12-31 02:56 /tmp/xxx access(/tmp/xxx, 0) returns 0 access(/tmp/xxx, R_OK) returns -1 access(/tmp/xxx, W_OK) returns -1 access(/tmp/xxx, X_OK) returns -1 access(/tmp/xxx, R_OK \| W_OK) returns -1 access(/tmp/xxx, R_OK \| X_OK) returns -1 access(/tmp/xxx, W_OK \| X_OK) returns -1 access(/tmp/xxx, R_OK \| W_OK \| X_OK) returns -1 I've also tested the fix with the SELinux and syscalls LTP testsuites. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-07 09:38:48 +11:00
James Morris	29881c4502	Revert "CRED: Fix regression in cap_capable() as shown up by sys_faccessat() [ver #2 ]" This reverts commit `14eaddc967`. David has a better version to come.	2009-01-07 09:21:54 +11:00
Linus Torvalds	520c853466	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: inotify: fix type errors in interfaces fix breakage in reiserfs_new_inode() fix the treatment of jfs special inodes vfs: remove duplicate code in get_fs_type() add a vfs_fsync helper sys_execve and sys_uselib do not call into fsnotify zero i_uid/i_gid on inode allocation inode->i_op is never NULL ntfs: don't NULL i_op isofs check for NULL ->i_op in root directory is dead code affs: do not zero ->i_op kill suid bit only for regular files vfs: lseek(fd, 0, SEEK_CUR) race condition	2009-01-05 18:32:06 -08:00
Al Viro	56ff5efad9	zero i_uid/i_gid on inode allocation ... and don't bother in callers. Don't bother with zeroing i_blocks, while we are at it - it's already been zeroed. i_mode is not worth the effort; it has no common default value. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-05 11:54:28 -05:00
Al Viro	acfa4380ef	inode->i_op is never NULL We used to have rather schizophrenic set of checks for NULL ->i_op even though it had been eliminated years ago. You'd need to go out of your way to set it to NULL explicitly _and_ a bunch of code would die on such inodes anyway. After killing two remaining places that still did that bogosity, all that crap can go away. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-05 11:54:28 -05:00
Eric Paris	76f7ba35d4	SELinux: shrink sizeof av_inhert selinux_class_perm and context I started playing with pahole today and decided to put it against the selinux structures. Found we could save a little bit of space on x86_64 (and no harm on i686) just reorganizing some structs. Object size changes: av_inherit: 24 -> 16 selinux_class_perm: 48 -> 40 context: 80 -> 72 Admittedly there aren't many of av_inherit or selinux_class_perm's in the kernel (33 and 1 respectively) But the change to the size of struct context reverberate out a bit. I can get some hard number if they are needed, but I don't see why they would be. We do change which cacheline context->len and context->str would be on, but I don't see that as a problem since we are clearly going to have to load both if the context is to be of any value. I've run with the patch and don't seem to be having any problems. An example of what's going on using struct av_inherit would be: form: to: struct av_inherit { struct av_inherit { u16 tclass; const char common_pts; const char common_pts; u32 common_base; u32 common_base; u16 tclass; }; (notice all I did was move u16 tclass to the end of the struct instead of the beginning) Memory layout before the change: struct av_inherit { u16 tclass; /* 2 / / 6 bytes hole / const char* common_pts; /* 8 / u32 common_base; / 4 / / 4 byes padding / / size: 24, cachelines: 1 / / sum members: 14, holes: 1, sum holes: 6 / / padding: 4 / }; Memory layout after the change: struct av_inherit { const char * common_pts; /* 8 / u32 common_base; / 4 / u16 tclass; / 2 / / 2 bytes padding / / size: 16, cachelines: 1 / / sum members: 14, holes: 0, sum holes: 0 / / padding: 2 */ }; Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-05 19:19:55 +11:00
David Howells	14eaddc967	CRED: Fix regression in cap_capable() as shown up by sys_faccessat() [ver #2 ] Fix a regression in cap_capable() due to: commit 5ff7711e635b32f0a1e558227d030c7e45b4a465 Author: David Howells <dhowells@redhat.com> Date: Wed Dec 31 02:52:28 2008 +0000 CRED: Differentiate objective and effective subjective credentials on a task The problem is that the above patch allows a process to have two sets of credentials, and for the most part uses the subjective credentials when accessing current's creds. There is, however, one exception: cap_capable(), and thus capable(), uses the real/objective credentials of the target task, whether or not it is the current task. Ordinarily this doesn't matter, since usually the two cred pointers in current point to the same set of creds. However, sys_faccessat() makes use of this facility to override the credentials of the calling process to make its test, without affecting the creds as seen from other processes. One of the things sys_faccessat() does is to make an adjustment to the effective capabilities mask, which cap_capable(), as it stands, then ignores. The affected capability check is in generic_permission(): if (!(mask & MAY_EXEC) \|\| execute_ok(inode)) if (capable(CAP_DAC_OVERRIDE)) return 0; This change splits capable() from has_capability() down into the commoncap and SELinux code. The capable() security op now only deals with the current process, and uses the current process's subjective creds. A new security op - task_capable() - is introduced that can check any task's objective creds. strictly the capable() security op is superfluous with the presence of the task_capable() op, however it should be faster to call the capable() op since two fewer arguments need be passed down through the various layers. This can be tested by compiling the following program from the XFS testsuite: /* * t_access_root.c - trivial test program to show permission bug. * * Written by Michael Kerrisk - copyright ownership not pursued. * Sourced from: http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-10/6030.html / #include <limits.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <sys/stat.h> #define UID 500 #define GID 100 #define PERM 0 #define TESTPATH "/tmp/t_access" static void errExit(char msg) { perror(msg); exit(EXIT_FAILURE); } /* errExit / static void accessTest(char file, int mask, char mstr) { printf("access(%s, %s) returns %d\n", file, mstr, access(file, mask)); } / accessTest / int main(int argc, char argv[]) { int fd, perm, uid, gid; char testpath; char cmd[PATH_MAX + 20]; testpath = (argc > 1) ? argv[1] : TESTPATH; perm = (argc > 2) ? strtoul(argv[2], NULL, 8) : PERM; uid = (argc > 3) ? atoi(argv[3]) : UID; gid = (argc > 4) ? atoi(argv[4]) : GID; unlink(testpath); fd = open(testpath, O_RDWR \| O_CREAT, 0); if (fd == -1) errExit("open"); if (fchown(fd, uid, gid) == -1) errExit("fchown"); if (fchmod(fd, perm) == -1) errExit("fchmod"); close(fd); snprintf(cmd, sizeof(cmd), "ls -l %s", testpath); system(cmd); if (seteuid(uid) == -1) errExit("seteuid"); accessTest(testpath, 0, "0"); accessTest(testpath, R_OK, "R_OK"); accessTest(testpath, W_OK, "W_OK"); accessTest(testpath, X_OK, "X_OK"); accessTest(testpath, R_OK \| W_OK, "R_OK \| W_OK"); accessTest(testpath, R_OK \| X_OK, "R_OK \| X_OK"); accessTest(testpath, W_OK \| X_OK, "W_OK \| X_OK"); accessTest(testpath, R_OK \| W_OK \| X_OK, "R_OK \| W_OK \| X_OK"); exit(EXIT_SUCCESS); } / main */ This can be run against an Ext3 filesystem as well as against an XFS filesystem. If successful, it will show: [root@andromeda src]# ./t_access_root /tmp/xxx 0 4043 4043 ---------- 1 dhowells dhowells 0 2008-12-31 03:00 /tmp/xxx access(/tmp/xxx, 0) returns 0 access(/tmp/xxx, R_OK) returns 0 access(/tmp/xxx, W_OK) returns 0 access(/tmp/xxx, X_OK) returns -1 access(/tmp/xxx, R_OK \| W_OK) returns 0 access(/tmp/xxx, R_OK \| X_OK) returns -1 access(/tmp/xxx, W_OK \| X_OK) returns -1 access(/tmp/xxx, R_OK \| W_OK \| X_OK) returns -1 If unsuccessful, it will show: [root@andromeda src]# ./t_access_root /tmp/xxx 0 4043 4043 ---------- 1 dhowells dhowells 0 2008-12-31 02:56 /tmp/xxx access(/tmp/xxx, 0) returns 0 access(/tmp/xxx, R_OK) returns -1 access(/tmp/xxx, W_OK) returns -1 access(/tmp/xxx, X_OK) returns -1 access(/tmp/xxx, R_OK \| W_OK) returns -1 access(/tmp/xxx, R_OK \| X_OK) returns -1 access(/tmp/xxx, W_OK \| X_OK) returns -1 access(/tmp/xxx, R_OK \| W_OK \| X_OK) returns -1 I've also tested the fix with the SELinux and syscalls LTP testsuites. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-01-05 11:17:04 +11:00
James Morris	5c8c40be4b	Merge branch 'master' of git://git.infradead.org/users/pcmoore/lblnet-2.6_next into next	2009-01-05 08:56:01 +11:00
Al Viro	5af75d8d58	audit: validate comparison operations, store them in sane form Don't store the field->op in the messy (and very inconvenient for e.g. audit_comparator()) form; translate to dense set of values and do full validation of userland-submitted value while we are at it. ->audit_init_rule() and ->audit_match_rule() get new values now; in-tree instances updated. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-04 15:14:42 -05:00
Linus Torvalds	7d3b56ba37	Merge branch 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits) x86: setup_per_cpu_areas() cleanup cpumask: fix compile error when CONFIG_NR_CPUS is not defined cpumask: use alloc_cpumask_var_node where appropriate cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t x86: use cpumask_var_t in acpi/boot.c x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids sched: put back some stack hog changes that were undone in kernel/sched.c x86: enable cpus display of kernel_max and offlined cpus ia64: cpumask fix for is_affinity_mask_valid() cpumask: convert RCU implementations, fix xtensa: define __fls mn10300: define __fls m32r: define __fls h8300: define __fls frv: define __fls cris: define __fls cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS cpumask: zero extra bits in alloc_cpumask_var_node cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/ cpumask: convert mm/ ...	2009-01-03 12:04:39 -08:00
Rusty Russell	4f4b6c1a94	cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits.: core Impact: cleanup In future, all cpumask ops will only be valid (in general) for bit numbers < nr_cpu_ids. So use that instead of NR_CPUS in iterators and other comparisons. This is always safe: no cpu number can be >= nr_cpu_ids, and nr_cpu_ids is initialized to NR_CPUS at boot. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: James Morris <jmorris@namei.org> Cc: Eric Biederman <ebiederm@xmission.com>	2009-01-01 10:12:15 +10:30
James Morris	90bd49ab66	keys: fix sparse warning by adding __user annotation to cast Fix the following sparse warning: CC security/keys/key.o security/keys/keyctl.c:1297:10: warning: incorrect type in argument 2 (different address spaces) security/keys/keyctl.c:1297:10: expected char [noderef] <asn:1>buffer security/keys/keyctl.c:1297:10: got char <noident> which appears to be caused by lack of __user annotation to the cast of a syscall argument. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: David Howells <dhowells@redhat.com>	2009-01-01 10:32:44 +11:00
Kentaro Takeda	be6d3e56a6	introduce new LSM hooks where vfsmount is available. Add new LSM hooks for path-based checks. Call them on directory-modifying operations at the points where we still know the vfsmount involved. Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-12-31 18:07:37 -05:00
Casey Schaufler	6d3dc07cbb	smack: Add support for unlabeled network hosts and networks Add support for unlabeled network hosts and networks. Relies heavily on Paul Moore's netlabel support. Creates a new entry in /smack called netlabel. Writes to /smack/netlabel take the form: A.B.C.D LABEL or A.B.C.D/N LABEL where A.B.C.D is a network address, N is an integer between 0-32, and LABEL is the Smack label to be used. If /N is omitted /32 is assumed. N designates the netmask for the address. Entries are matched by the most specific address/mask pair. 0.0.0.0/0 will match everything, while 192.168.1.117/32 will match exactly one host. A new system label "@", pronounced "web", is defined. Processes can not be assigned the web label. An address assigned the web label can be written to by any process, and packets coming from a web address can be written to any socket. Use of the web label is a violation of any strict MAC policy, but the web label has been requested many times. The nltype entry has been removed from /smack. It did not work right and the netlabel interface can be used to specify that all hosts be treated as unlabeled. CIPSO labels on incoming packets will be honored, even from designated single label hosts. Single label hosts can only be written to by processes with labels that can write to the label of the host. Packets sent to single label hosts will always be unlabeled. Once added a single label designation cannot be removed, however the label may be changed. The behavior of the ambient label remains unchanged. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul.moore@hp.com>	2008-12-31 12:54:12 -05:00
Paul Moore	277d342fc4	selinux: Deprecate and schedule the removal of the the compat_net functionality This patch is the first step towards removing the old "compat_net" code from the kernel. Secmark, the "compat_net" replacement was first introduced in 2.6.18 (September 2006) and the major Linux distributions with SELinux support have transitioned to Secmark so it is time to start deprecating the "compat_net" mechanism. Testing a patched version of 2.6.28-rc6 with the initial release of Fedora Core 5 did not show any problems when running in enforcing mode. This patch adds an entry to the feature-removal-schedule.txt file and removes the SECURITY_SELINUX_ENABLE_SECMARK_DEFAULT configuration option, forcing Secmark on by default although it can still be disabled at runtime. The patch also makes the Secmark permission checks "dynamic" in the sense that they are only executed when Secmark is configured; this should help prevent problems with older distributions that have not yet migrated to Secmark. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org>	2008-12-31 12:54:11 -05:00
Paul Moore	6c2e8ac095	netlabel: Update kernel configuration API Update the NetLabel kernel API to expose the new features added in kernel releases 2.6.25 and 2.6.28: the static/fallback label functionality and network address based selectors. Signed-off-by: Paul Moore <paul.moore@hp.com>	2008-12-31 12:54:11 -05:00
David Howells	eca1bf5b4f	KEYS: Fix variable uninitialisation warnings Fix variable uninitialisation warnings introduced in: commit `8bbf4976b5` Author: David Howells <dhowells@redhat.com> Date: Fri Nov 14 10:39:14 2008 +1100 KEYS: Alter use of key instantiation link-to-keyring argument As: security/keys/keyctl.c: In function 'keyctl_negate_key': security/keys/keyctl.c:976: warning: 'dest_keyring' may be used uninitialized in this function security/keys/keyctl.c: In function 'keyctl_instantiate_key': security/keys/keyctl.c:898: warning: 'dest_keyring' may be used uninitialized in this function Some versions of gcc notice that get_instantiation_key() doesn't always set _dest_keyring, but fail to observe that if this happens then _dest_keyring will not be read by the caller. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-12-29 14:24:43 +11:00
James Morris	54d2f649a6	Merge branch 'next' into for-linus	2008-12-29 09:57:38 +11:00
Linus Torvalds	0191b625ca	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits) net: Allow dependancies of FDDI & Tokenring to be modular. igb: Fix build warning when DCA is disabled. net: Fix warning fallout from recent NAPI interface changes. gro: Fix potential use after free sfc: If AN is enabled, always read speed/duplex from the AN advertising bits sfc: When disabling the NIC, close the device rather than unregistering it sfc: SFT9001: Add cable diagnostics sfc: Add support for multiple PHY self-tests sfc: Merge top-level functions for self-tests sfc: Clean up PHY mode management in loopback self-test sfc: Fix unreliable link detection in some loopback modes sfc: Generate unique names for per-NIC workqueues 802.3ad: use standard ethhdr instead of ad_header 802.3ad: generalize out mac address initializer 802.3ad: initialize ports LACPDU from const initializer 802.3ad: remove typedef around ad_system 802.3ad: turn ports is_individual into a bool 802.3ad: turn ports is_enabled into a bool 802.3ad: make ntt bool ixgbe: Fix set_ringparam in ixgbe to use the same memory pools. ... Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due to the conversion to %pI (in this networking merge) and the addition of doing IPv6 addresses (from the earlier merge of CIFS).	2008-12-28 12:49:40 -08:00
Sergio Luis	81ea714bf1	smackfs: check for allocation failures in smk_set_access() smackfs: check for allocation failures in smk_set_access() While adding a new subject/object pair to smack_list, smk_set_access() didn't check the return of kzalloc(). This patch changes smk_set_access() to return 0 or -ENOMEM, based on kzalloc()'s return. It also updates its caller, smk_write_load(), to check for smk_set_access()'s return, given it is no longer a void return function. Signed-off-by: Sergio Luis <sergio@larces.uece.br> To: Casey Schaufler <casey@schaufler-ca.com> Cc: Ahmed S. Darwish <darwish.07@gmail.com> Cc: LSM <linux-security-module@vger.kernel.org> Cc: LKLM <linux-kernel@vger.kernel.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com>	2008-12-25 12:14:55 +11:00
James Morris	7419224691	SELinux: don't check permissions for kernel mounts Don't bother checking permissions when the kernel performs an internal mount, as this should always be allowed. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>	2008-12-20 09:03:39 +11:00
James Morris	12204e24b1	security: pass mount flags to security_sb_kern_mount() Pass mount flags to security_sb_kern_mount(), so security modules can determine if a mount operation is being performed by the kernel. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>	2008-12-20 09:02:39 +11:00
Stephen Smalley	459c19f524	SELinux: correctly detect proc filesystems of the form "proc/foo" Map all of these proc/ filesystem types to "proc" for the policy lookup at filesystem mount time. Signed-off-by: James Morris <jmorris@namei.org>	2008-12-20 09:01:03 +11:00
Hannes Eder	200036ca9b	CRED: fix sparse warnings Impact: fix sparse warnings Fix the following sparse warnings: security/security.c:228:2: warning: returning void-valued expression security/security.c:233:2: warning: returning void-valued expression security/security.c:616:2: warning: returning void-valued expression Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-25 06:33:17 +05:30
Eric Paris	e50a906e02	capabilities: define get_vfs_caps_from_disk when file caps are not enabled When CONFIG_SECURITY_FILE_CAPABILITIES is not set the audit system may try to call into the capabilities function vfs_cap_from_file. This patch defines that function so kernels can build and work. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-15 08:50:52 +11:00
David Howells	3a3b7ce933	CRED: Allow kernel services to override LSM settings for task actions Allow kernel services to override LSM settings appropriate to the actions performed by a task by duplicating a set of credentials, modifying it and then using task_struct::cred to point to it when performing operations on behalf of a task. This is used, for example, by CacheFiles which has to transparently access the cache on behalf of a process that thinks it is doing, say, NFS accesses with a potentially inappropriate (with respect to accessing the cache) set of credentials. This patch provides two LSM hooks for modifying a task security record: () security_kernel_act_as() which allows modification of the security datum with which a task acts on other objects (most notably files). () security_kernel_create_files_as() which allows modification of the security datum that is used to initialise the security data on a file that a task creates. The patch also provides four new credentials handling functions, which wrap the LSM functions: (1) prepare_kernel_cred() Prepare a set of credentials for a kernel service to use, based either on a daemon's credentials or on init_cred. All the keyrings are cleared. (2) set_security_override() Set the LSM security ID in a set of credentials to a specific security context, assuming permission from the LSM policy. (3) set_security_override_from_ctx() As (2), but takes the security context as a string. (4) set_create_files_as() Set the file creation LSM security ID in a set of credentials to be the same as that on a particular inode. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [Smack changes] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:28 +11:00
David Howells	1bfdc75ae0	CRED: Add a kernel_service object class to SELinux Add a 'kernel_service' object class to SELinux and give this object class two access vectors: 'use_as_override' and 'create_files_as'. The first vector is used to grant a process the right to nominate an alternate process security ID for the kernel to use as an override for the SELinux subjective security when accessing stuff on behalf of another process. For example, CacheFiles when accessing the cache on behalf on a process accessing an NFS file needs to use a subjective security ID appropriate to the cache rather then the one the calling process is using. The cachefilesd daemon will nominate the security ID to be used. The second vector is used to grant a process the right to nominate a file creation label for a kernel service to use. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:27 +11:00
David Howells	3b11a1dece	CRED: Differentiate objective and effective subjective credentials on a task Differentiate the objective and real subjective credentials from the effective subjective credentials on a task by introducing a second credentials pointer into the task_struct. task_struct::real_cred then refers to the objective and apparent real subjective credentials of a task, as perceived by the other tasks in the system. task_struct::cred then refers to the effective subjective credentials of a task, as used by that task when it's actually running. These are not visible to the other tasks in the system. __task_cred(task) then refers to the objective/real credentials of the task in question. current_cred() refers to the effective subjective credentials of the current task. prepare_creds() uses the objective creds as a base and commit_creds() changes both pointers in the task_struct (indeed commit_creds() requires them to be the same). override_creds() and revert_creds() change the subjective creds pointer only, and the former returns the old subjective creds. These are used by NFSD, faccessat() and do_coredump(), and will by used by CacheFiles. In SELinux, current_has_perm() is provided as an alternative to task_has_perm(). This uses the effective subjective context of current, whereas task_has_perm() uses the objective/real context of the subject. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:26 +11:00
David Howells	1d045980e1	CRED: Prettify commoncap.c Prettify commoncap.c. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:24 +11:00
David Howells	a6f76f23d2	CRED: Make execve() take advantage of copy-on-write credentials Make execve() take advantage of copy-on-write credentials, allowing it to set up the credentials in advance, and then commit the whole lot after the point of no return. This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). The credential bits from struct linux_binprm are, for the most part, replaced with a single credentials pointer (bprm->cred). This means that all the creds can be calculated in advance and then applied at the point of no return with no possibility of failure. I would like to replace bprm->cap_effective with: cap_isclear(bprm->cap_effective) but this seems impossible due to special behaviour for processes of pid 1 (they always retain their parent's capability masks where normally they'd be changed - see cap_bprm_set_creds()). The following sequence of events now happens: (a) At the start of do_execve, the current task's cred_exec_mutex is locked to prevent PTRACE_ATTACH from obsoleting the calculation of creds that we make. (a) prepare_exec_creds() is then called to make a copy of the current task's credentials and prepare it. This copy is then assigned to bprm->cred. This renders security_bprm_alloc() and security_bprm_free() unnecessary, and so they've been removed. (b) The determination of unsafe execution is now performed immediately after (a) rather than later on in the code. The result is stored in bprm->unsafe for future reference. (c) prepare_binprm() is called, possibly multiple times. (i) This applies the result of set[ug]id binaries to the new creds attached to bprm->cred. Personality bit clearance is recorded, but now deferred on the basis that the exec procedure may yet fail. (ii) This then calls the new security_bprm_set_creds(). This should calculate the new LSM and capability credentials into bprm->cred. This folds together security_bprm_set() and parts of security_bprm_apply_creds() (these two have been removed). Anything that might fail must be done at this point. (iii) bprm->cred_prepared is set to 1. bprm->cred_prepared is 0 on the first pass of the security calculations, and 1 on all subsequent passes. This allows SELinux in (ii) to base its calculations only on the initial script and not on the interpreter. (d) flush_old_exec() is called to commit the task to execution. This performs the following steps with regard to credentials: (i) Clear pdeath_signal and set dumpable on certain circumstances that may not be covered by commit_creds(). (ii) Clear any bits in current->personality that were deferred from (c.i). (e) install_exec_creds() [compute_creds() as was] is called to install the new credentials. This performs the following steps with regard to credentials: (i) Calls security_bprm_committing_creds() to apply any security requirements, such as flushing unauthorised files in SELinux, that must be done before the credentials are changed. This is made up of bits of security_bprm_apply_creds() and security_bprm_post_apply_creds(), both of which have been removed. This function is not allowed to fail; anything that might fail must have been done in (c.ii). (ii) Calls commit_creds() to apply the new credentials in a single assignment (more or less). Possibly pdeath_signal and dumpable should be part of struct creds. (iii) Unlocks the task's cred_replace_mutex, thus allowing PTRACE_ATTACH to take place. (iv) Clears The bprm->cred pointer as the credentials it was holding are now immutable. (v) Calls security_bprm_committed_creds() to apply any security alterations that must be done after the creds have been changed. SELinux uses this to flush signals and signal handlers. (f) If an error occurs before (d.i), bprm_free() will call abort_creds() to destroy the proposed new credentials and will then unlock cred_replace_mutex. No changes to the credentials will have been made. (2) LSM interface. A number of functions have been changed, added or removed: () security_bprm_alloc(), ->bprm_alloc_security() () security_bprm_free(), ->bprm_free_security() Removed in favour of preparing new credentials and modifying those. () security_bprm_apply_creds(), ->bprm_apply_creds() () security_bprm_post_apply_creds(), ->bprm_post_apply_creds() Removed; split between security_bprm_set_creds(), security_bprm_committing_creds() and security_bprm_committed_creds(). () security_bprm_set(), ->bprm_set_security() Removed; folded into security_bprm_set_creds(). () security_bprm_set_creds(), ->bprm_set_creds() New. The new credentials in bprm->creds should be checked and set up as appropriate. bprm->cred_prepared is 0 on the first call, 1 on the second and subsequent calls. () security_bprm_committing_creds(), ->bprm_committing_creds() (*) security_bprm_committed_creds(), ->bprm_committed_creds() New. Apply the security effects of the new credentials. This includes closing unauthorised files in SELinux. This function may not fail. When the former is called, the creds haven't yet been applied to the process; when the latter is called, they have. The former may access bprm->cred, the latter may not. (3) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) The bprm_security_struct struct has been removed in favour of using the credentials-under-construction approach. (c) flush_unauthorized_files() now takes a cred pointer and passes it on to inode_has_perm(), file_has_perm() and dentry_open(). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:24 +11:00
David Howells	d84f4f992c	CRED: Inaugurate COW credentials Inaugurate copy-on-write credentials management. This uses RCU to manage the credentials pointer in the task_struct with respect to accesses by other tasks. A process may only modify its own credentials, and so does not need locking to access or modify its own credentials. A mutex (cred_replace_mutex) is added to the task_struct to control the effect of PTRACE_ATTACHED on credential calculations, particularly with respect to execve(). With this patch, the contents of an active credentials struct may not be changed directly; rather a new set of credentials must be prepared, modified and committed using something like the following sequence of events: struct cred new = prepare_creds(); int ret = blah(new); if (ret < 0) { abort_creds(new); return ret; } return commit_creds(new); There are some exceptions to this rule: the keyrings pointed to by the active credentials may be instantiated - keyrings violate the COW rule as managing COW keyrings is tricky, given that it is possible for a task to directly alter the keys in a keyring in use by another task. To help enforce this, various pointers to sets of credentials, such as those in the task_struct, are declared const. The purpose of this is compile-time discouragement of altering credentials through those pointers. Once a set of credentials has been made public through one of these pointers, it may not be modified, except under special circumstances: (1) Its reference count may incremented and decremented. (2) The keyrings to which it points may be modified, but not replaced. The only safe way to modify anything else is to create a replacement and commit using the functions described in Documentation/credentials.txt (which will be added by a later patch). This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). This now prepares and commits credentials in various places in the security code rather than altering the current creds directly. (2) Temporary credential overrides. do_coredump() and sys_faccessat() now prepare their own credentials and temporarily override the ones currently on the acting thread, whilst preventing interference from other threads by holding cred_replace_mutex on the thread being dumped. This will be replaced in a future patch by something that hands down the credentials directly to the functions being called, rather than altering the task's objective credentials. (3) LSM interface. A number of functions have been changed, added or removed: () security_capset_check(), ->capset_check() () security_capset_set(), ->capset_set() Removed in favour of security_capset(). () security_capset(), ->capset() New. This is passed a pointer to the new creds, a pointer to the old creds and the proposed capability sets. It should fill in the new creds or return an error. All pointers, barring the pointer to the new creds, are now const. () security_bprm_apply_creds(), ->bprm_apply_creds() Changed; now returns a value, which will cause the process to be killed if it's an error. () security_task_alloc(), ->task_alloc_security() Removed in favour of security_prepare_creds(). () security_cred_free(), ->cred_free() New. Free security data attached to cred->security. () security_prepare_creds(), ->cred_prepare() New. Duplicate any security data attached to cred->security. () security_commit_creds(), ->cred_commit() New. Apply any security effects for the upcoming installation of new security by commit_creds(). () security_task_post_setuid(), ->task_post_setuid() Removed in favour of security_task_fix_setuid(). () security_task_fix_setuid(), ->task_fix_setuid() Fix up the proposed new credentials for setuid(). This is used by cap_set_fix_setuid() to implicitly adjust capabilities in line with setuid() changes. Changes are made to the new credentials, rather than the task itself as in security_task_post_setuid(). () security_task_reparent_to_init(), ->task_reparent_to_init() Removed. Instead the task being reparented to init is referred directly to init's credentials. NOTE! This results in the loss of some state: SELinux's osid no longer records the sid of the thread that forked it. () security_key_alloc(), ->key_alloc() () security_key_permission(), ->key_permission() Changed. These now take cred pointers rather than task pointers to refer to the security context. (4) sys_capset(). This has been simplified and uses less locking. The LSM functions it calls have been merged. (5) reparent_to_kthreadd(). This gives the current thread the same credentials as init by simply using commit_thread() to point that way. (6) __sigqueue_alloc() and switch_uid() __sigqueue_alloc() can't stop the target task from changing its creds beneath it, so this function gets a reference to the currently applicable user_struct which it then passes into the sigqueue struct it returns if successful. switch_uid() is now called from commit_creds(), and possibly should be folded into that. commit_creds() should take care of protecting __sigqueue_alloc(). (7) [sg]et[ug]id() and co and [sg]et_current_groups. The set functions now all use prepare_creds(), commit_creds() and abort_creds() to build and check a new set of credentials before applying it. security_task_set[ug]id() is called inside the prepared section. This guarantees that nothing else will affect the creds until we've finished. The calling of set_dumpable() has been moved into commit_creds(). Much of the functionality of set_user() has been moved into commit_creds(). The get functions all simply access the data directly. (8) security_task_prctl() and cap_task_prctl(). security_task_prctl() has been modified to return -ENOSYS if it doesn't want to handle a function, or otherwise return the return value directly rather than through an argument. Additionally, cap_task_prctl() now prepares a new set of credentials, even if it doesn't end up using it. (9) Keyrings. A number of changes have been made to the keyrings code: (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have all been dropped and built in to the credentials functions directly. They may want separating out again later. (b) key_alloc() and search_process_keyrings() now take a cred pointer rather than a task pointer to specify the security context. (c) copy_creds() gives a new thread within the same thread group a new thread keyring if its parent had one, otherwise it discards the thread keyring. (d) The authorisation key now points directly to the credentials to extend the search into rather pointing to the task that carries them. (e) Installing thread, process or session keyrings causes a new set of credentials to be created, even though it's not strictly necessary for process or session keyrings (they're shared). (10) Usermode helper. The usermode helper code now carries a cred struct pointer in its subprocess_info struct instead of a new session keyring pointer. This set of credentials is derived from init_cred and installed on the new process after it has been cloned. call_usermodehelper_setup() allocates the new credentials and call_usermodehelper_freeinfo() discards them if they haven't been used. A special cred function (prepare_usermodeinfo_creds()) is provided specifically for call_usermodehelper_setup() to call. call_usermodehelper_setkeys() adjusts the credentials to sport the supplied keyring as the new session keyring. (11) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) selinux_setprocattr() no longer does its check for whether the current ptracer can access processes with the new SID inside the lock that covers getting the ptracer's SID. Whilst this lock ensures that the check is done with the ptracer pinned, the result is only valid until the lock is released, so there's no point doing it inside the lock. (12) is_single_threaded(). This function has been extracted from selinux_setprocattr() and put into a file of its own in the lib/ directory as join_session_keyring() now wants to use it too. The code in SELinux just checked to see whether a task shared mm_structs with other tasks (CLONE_VM), but that isn't good enough. We really want to know if they're part of the same thread group (CLONE_THREAD). (13) nfsd. The NFS server daemon now has to use the COW credentials to set the credentials it is going to use. It really needs to pass the credentials down to the functions it calls, but it can't do that until other patches in this series have been applied. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:23 +11:00
David Howells	745ca2475a	CRED: Pass credentials through dentry_open() Pass credentials through dentry_open() so that the COW creds patch can have SELinux's flush_unauthorized_files() pass the appropriate creds back to itself when it opens its null chardev. The security_dentry_open() call also now takes a creds pointer, as does the dentry_open hook in struct security_operations. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:22 +11:00
David Howells	88e67f3b88	CRED: Make inode_has_perm() and file_has_perm() take a cred pointer Make inode_has_perm() and file_has_perm() take a cred pointer rather than a task pointer. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:21 +11:00
David Howells	bb952bb98a	CRED: Separate per-task-group keyrings from signal_struct Separate per-task-group keyrings from signal_struct and dangle their anchor from the cred struct rather than the signal_struct. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:20 +11:00
David Howells	275bb41e9d	CRED: Wrap access to SELinux's task SID Wrap access to SELinux's task SID, using task_sid() and current_sid() as appropriate. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:19 +11:00
David Howells	c69e8d9c01	CRED: Use RCU to access another task's creds and to release a task's own creds Use RCU to access another task's creds and to release a task's own creds. This means that it will be possible for the credentials of a task to be replaced without another task (a) requiring a full lock to read them, and (b) seeing deallocated memory. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:19 +11:00
David Howells	86a264abe5	CRED: Wrap current->cred and a few other accessors Wrap current->cred and a few other accessors to hide their actual implementation. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:18 +11:00
David Howells	f1752eec61	CRED: Detach the credentials from task_struct Detach the credentials from task_struct, duplicating them in copy_process() and releasing them in __put_task_struct(). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:17 +11:00
David Howells	b6dff3ec5e	CRED: Separate task security context from task_struct Separate the task security context from task_struct. At this point, the security data is temporarily embedded in the task_struct with two pointers pointing to it. Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in entry.S via asm-offsets. With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:16 +11:00
David Howells	15a2460ed0	CRED: Constify the kernel_cap_t arguments to the capset LSM hooks Constify the kernel_cap_t arguments to the capset LSM hooks. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:15 +11:00
David Howells	1cdcbec1a3	CRED: Neuter sys_capset() Take away the ability for sys_capset() to affect processes other than current. This means that current will not need to lock its own credentials when reading them against interference by other processes. This has effectively been the case for a while anyway, since: (1) Without LSM enabled, sys_capset() is disallowed. (2) With file-based capabilities, sys_capset() is neutered. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:14 +11:00
David Howells	8bbf4976b5	KEYS: Alter use of key instantiation link-to-keyring argument Alter the use of the key instantiation and negation functions' link-to-keyring arguments. Currently this specifies a keyring in the target process to link the key into, creating the keyring if it doesn't exist. This, however, can be a problem for copy-on-write credentials as it means that the instantiating process can alter the credentials of the requesting process. This patch alters the behaviour such that: (1) If keyctl_instantiate_key() or keyctl_negate_key() are given a specific keyring by ID (ringid >= 0), then that keyring will be used. (2) If keyctl_instantiate_key() or keyctl_negate_key() are given one of the special constants that refer to the requesting process's keyrings (KEY_SPEC_*_KEYRING, all <= 0), then: (a) If sys_request_key() was given a keyring to use (destringid) then the key will be attached to that keyring. (b) If sys_request_key() was given a NULL keyring, then the key being instantiated will be attached to the default keyring as set by keyctl_set_reqkey_keyring(). (3) No extra link will be made. Decision point (1) follows current behaviour, and allows those instantiators who've searched for a specifically named keyring in the requestor's keyring so as to partition the keys by type to still have their named keyrings. Decision point (2) allows the requestor to make sure that the key or keys that get produced by request_key() go where they want, whilst allowing the instantiator to request that the key is retained. This is mainly useful for situations where the instantiator makes a secondary request, the key for which should be retained by the initial requestor: +-----------+ +--------------+ +--------------+ \| \| \| \| \| \| \| Requestor \|------->\| Instantiator \|------->\| Instantiator \| \| \| \| \| \| \| +-----------+ +--------------+ +--------------+ request_key() request_key() This might be useful, for example, in Kerberos, where the requestor requests a ticket, and then the ticket instantiator requests the TGT, which someone else then has to go and fetch. The TGT, however, should be retained in the keyrings of the requestor, not the first instantiator. To make this explict an extra special keyring constant is also added. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:14 +11:00
David Howells	e9e349b051	KEYS: Disperse linux/key_ui.h Disperse the bits of linux/key_ui.h as the reason they were put here (keyfs) didn't get in. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:13 +11:00
David Howells	b103c59883	CRED: Wrap task credential accesses in the capabilities code Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(\|e\|s\|fs)[ug]id to current_(\|e\|s\|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Andrew G. Morgan <morgan@kernel.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:11 +11:00
David Howells	47d804bfa1	CRED: Wrap task credential accesses in the key management code Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(\|e\|s\|fs)[ug]id to current_(\|e\|s\|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:11 +11:00
David S. Miller	7e452baf6b	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/message/fusion/mptlan.c drivers/net/sfc/ethtool.c net/mac80211/debugfs_sta.c	2008-11-11 15:43:02 -08:00
Eric Paris	066746796b	Currently SELinux jumps through some ugly hoops to not audit a capbility check when determining if a process has additional powers to override memory limits or when trying to read/write illegal file labels. Use the new noaudit call instead. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-11 22:02:57 +11:00
Eric Paris	06112163f5	Add a new capable interface that will be used by systems that use audit to make an A or B type decision instead of a security decision. Currently this is the case at least for filesystems when deciding if a process can use the reserved 'root' blocks and for the case of things like the oom algorithm determining if processes are root processes and should be less likely to be killed. These types of security system requests should not be audited or logged since they are not really security decisions. It would be possible to solve this problem like the vm_enough_memory security check did by creating a new LSM interface and moving all of the policy into that interface but proves the needlessly bloat the LSM and provide complex indirection. This merely allows those decisions to be made where they belong and to not flood logs or printk with denials for thing that are not security decisions. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-11 22:02:50 +11:00
Eric Paris	3fc689e96c	Any time fcaps or a setuid app under SECURE_NOROOT is used to result in a non-zero pE we will crate a new audit record which contains the entire set of known information about the executable in question, fP, fI, fE, fversion and includes the process's pE, pI, pP. Before and after the bprm capability are applied. This record type will only be emitted from execve syscalls. an example of making ping use fcaps instead of setuid: setcap "cat_net_raw+pe" /bin/ping type=SYSCALL msg=audit(1225742021.015:236): arch=c000003e syscall=59 success=yes exit=0 a0=1457f30 a1=14606b0 a2=1463940 a3=321b770a70 items=2 ppid=2929 pid=2963 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=pts0 ses=3 comm="ping" exe="/bin/ping" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null) type=UNKNOWN[1321] msg=audit(1225742021.015:236): fver=2 fp=0000000000002000 fi=0000000000000000 fe=1 old_pp=0000000000000000 old_pi=0000000000000000 old_pe=0000000000000000 new_pp=0000000000002000 new_pi=0000000000000000 new_pe=0000000000002000 type=EXECVE msg=audit(1225742021.015:236): argc=2 a0="ping" a1="127.0.0.1" type=CWD msg=audit(1225742021.015:236): cwd="/home/test" type=PATH msg=audit(1225742021.015:236): item=0 name="/bin/ping" inode=49256 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ping_exec_t:s0 cap_fp=0000000000002000 cap_fe=1 cap_fver=2 type=PATH msg=audit(1225742021.015:236): item=1 name=(null) inode=507915 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ld_so_t:s0 Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-11 21:48:18 +11:00
Eric Paris	c0b004413a	This patch add a generic cpu endian caps structure and externally available functions which retrieve fcaps information from disk. This information is necessary so fcaps information can be collected and recorded by the audit system. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-11 21:48:10 +11:00
David Howells	1f8f5cf6e4	KEYS: Make request key instantiate the per-user keyrings Make request_key() instantiate the per-user keyrings so that it doesn't oops if it needs to get hold of the user session keyring because there isn't a session keyring in place. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Steve French <smfrench@gmail.com> Tested-by: Rutger Nijlunsing <rutger.nijlunsing@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-11-10 13:20:57 -08:00
Eric Paris	39c9aede2b	SELinux: Use unknown perm handling to handle unknown netlink msg types Currently when SELinux has not been updated to handle a netlink message type the operation is denied with EINVAL. This patch will leave the audit/warning message so things get fixed but if policy chose to allow unknowns this will allow the netlink operation. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-09 07:33:18 +08:00
David S. Miller	9eeda9abd1	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath5k/base.c net/8021q/vlan_core.c	2008-11-06 22:43:03 -08:00
Serge E. Hallyn	1f29fae297	file capabilities: add no_file_caps switch (v4) Add a no_file_caps boot option when file capabilities are compiled into the kernel (CONFIG_SECURITY_FILE_CAPABILITIES=y). This allows distributions to ship a kernel with file capabilities compiled in, without forcing users to use (and understand and trust) them. When no_file_caps is specified at boot, then when a process executes a file, any file capabilities stored with that file will not be used in the calculation of the process' new capability sets. This means that booting with the no_file_caps boot option will not be the same as booting a kernel with file capabilities compiled out - in particular a task with CAP_SETPCAP will not have any chance of passing capabilities to another task (which isn't "really" possible anyway, and which may soon by killed altogether by David Howells in any case), and it will instead be able to put new capabilities in its pI. However since fI will always be empty and pI is masked with fI, it gains the task nothing. We also support the extra prctl options, setting securebits and dropping capabilities from the per-process bounding set. The other remaining difference is that killpriv, task_setscheduler, setioprio, and setnice will continue to be hooked. That will be noticable in the case where a root task changed its uid while keeping some caps, and another task owned by the new uid tries to change settings for the more privileged task. Changelog: Nov 05 2008: (v4) trivial port on top of always-start-\ with-clear-caps patch Sep 23 2008: nixed file_caps_enabled when file caps are not compiled in as it isn't used. Document no_file_caps in kernel-parameters.txt. Signed-off-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-06 07:14:51 +08:00
James Morris	e21e696edb	Merge branch 'master' into next	2008-11-06 07:12:34 +08:00
Michal Schmidt	2f99db28af	selinux: recognize netlink messages for 'ip addrlabel' In enforcing mode '/sbin/ip addrlabel' results in a SELinux error: type=SELINUX_ERR msg=audit(1225698822.073:42): SELinux: unrecognized netlink message type=74 for sclass=43 The problem is missing RTM_*ADDRLABEL entries in SELinux's netlink message types table. Reported in https://bugzilla.redhat.com/show_bug.cgi?id=469423 Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-06 07:08:36 +08:00
Eric Paris	41d9f9c524	SELinux: hold tasklist_lock and siglock while waking wait_chldexit SELinux has long been calling wake_up_interruptible() on current->parent->signal->wait_chldexit without holding any locks. It appears that this operation should hold the tasklist_lock to dereference current->parent and we should hold the siglock when waking up the signal->wait_chldexit. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-05 08:44:11 +11:00
Linus Torvalds	0a6d2fac61	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: SELinux: properly handle empty tty_files list	2008-11-01 09:50:38 -07:00
Serge Hallyn	3318a386e4	file caps: always start with clear bprm->caps_* While Linux doesn't honor setuid on scripts. However, it mistakenly behaves differently for file capabilities. This patch fixes that behavior by making sure that get_file_caps() begins with empty bprm->caps_. That way when a script is loaded, its bprm->caps_ may be filled when binfmt_misc calls prepare_binprm(), but they will be cleared again when binfmt_elf calls prepare_binprm() next to read the interpreter's file capabilities. Signed-off-by: Serge Hallyn <serue@us.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-11-01 09:49:45 -07:00
Eric Paris	37dd0bd04a	SELinux: properly handle empty tty_files list SELinux has wrongly (since 2004) had an incorrect test for an empty tty->tty_files list. With an empty list selinux would be pointing to part of the tty struct itself and would then proceed to dereference that value and again dereference that result. An F10 change to plymouth on a ppc64 system is actually currently triggering this bug. This patch uses list_empty() to handle empty lists rather than looking at a meaningless location. [note, this fixes the oops reported in https://bugzilla.redhat.com/show_bug.cgi?id=469079] Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-01 09:38:48 +11:00
Harvey Harrison	3685f25de1	misc: replace NIPQUAD() Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u can be replaced with %pI4 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-31 00:56:49 -07:00
David S. Miller	a1744d3bee	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/p54/p54common.c	2008-10-31 00:17:34 -07:00
Alan Cox	731572d39f	nfsd: fix vm overcommit crash Junjiro R. Okajima reported a problem where knfsd crashes if you are using it to export shmemfs objects and run strict overcommit. In this situation the current->mm based modifier to the overcommit goes through a NULL pointer. We could simply check for NULL and skip the modifier but we've caught other real bugs in the past from mm being NULL here - cases where we did need a valid mm set up (eg the exec bug about a year ago). To preserve the checks and get the logic we want shuffle the checking around and add a new helper to the vm_ security wrappers Also fix a current->mm reference in nommu that should use the passed mm [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix build] Reported-by: Junjiro R. Okajima <hooanon05@yahoo.co.jp> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-30 11:38:47 -07:00
Eric Paris	8b6a5a37f8	SELinux: check open perms in dentry_open not inode_permission Some operations, like searching a directory path or connecting a unix domain socket, make explicit calls into inode_permission. Our choices are to either try to come up with a signature for all of the explicit calls to inode_permission and do not check open on those, or to move the open checks to dentry_open where we know this is always an open operation. This patch moves the checks to dentry_open. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-10-31 02:00:52 +11:00
Harvey Harrison	5b095d9892	net: replace %p6 with %pI6 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-29 12:52:50 -07:00
Harvey Harrison	1afa67f5e7	misc: replace NIP6_FMT with %p6 format specifier The iscsi_ibft.c changes are almost certainly a bugfix as the pointer 'ip' is a u8 *, so they never print the last 8 bytes of the IPv6 address, and the eight bytes they do print have a zero byte with them in each 16-bit word. Other than that, this should cause no difference in functionality. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-28 16:06:44 -07:00
Alexey Dobriyan	def8b4faff	net: reduce structures when XFRM=n ifdef out * struct sk_buff::sp (pointer) * struct dst_entry::xfrm (pointer) * struct sock::sk_policy (2 pointers) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-28 13:24:06 -07:00
Linus Torvalds	99ebcf8285	Merge branch 'v28-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'v28-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits) fix documentation of sysrq-q really Fix documentation of sysrq-q timer_list: add base address to clock base timer_list: print cpu number of clockevents device timer_list: print real timer address NOHZ: restart tick device from irq_enter() NOHZ: split tick_nohz_restart_sched_tick() NOHZ: unify the nohz function calls in irq_enter() timers: fix itimer/many thread hang, fix timers: fix itimer/many thread hang, v3 ntp: improve adjtimex frequency rounding timekeeping: fix rounding problem during clock update ntp: let update_persistent_clock() sleep hrtimer: reorder struct hrtimer to save 8 bytes on 64bit builds posix-timers: lock_timer: make it readable posix-timers: lock_timer: kill the bogus ->it_id check posix-timers: kill ->it_sigev_signo and ->it_sigev_value posix-timers: sys_timer_create: cleanup the error handling posix-timers: move the initialization of timer->sigq from send to create path posix-timers: sys_timer_create: simplify and s/tasklist/rcu/ ... Fix trivial conflicts due to sysrq-q description clahes in Documentation/sysrq.txt and drivers/char/sysrq.c	2008-10-20 13:19:56 -07:00
Lai Jiangshan	47c59803be	devcgroup: remove spin_lock() Since we introduced rcu for read side, spin_lock is used only for update. But we always hold cgroup_lock() when update, so spin_lock() is not need. Additional cleanup: 1) include linux/rcupdate.h explicitly 2) remove unused variable cur_devcgroup in devcgroup_update_access() Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: "Serge E. Hallyn" <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:38 -07:00
Li Zefan	c012a54ae0	devcgroup: remove unused variable Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:38 -07:00
Li Zefan	2cdc7241a2	devcgroup: use kmemdup() This saves 40 bytes on my x86_32 box. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:38 -07:00
Thomas Gleixner	c465a76af6	Merge branches 'timers/clocksource', 'timers/hrtimers', 'timers/nohz', 'timers/ntp', 'timers/posixtimers' and 'timers/debug' into v28-timers-for-linus	2008-10-20 13:14:06 +02:00
Steven Whitehouse	a447c09324	vfs: Use const for kernel parser table This is a much better version of a previous patch to make the parser tables constant. Rather than changing the typedef, we put the "const" in all the various places where its required, allowing the __initconst exception for nfsroot which was the cause of the previous trouble. This was posted for review some time ago and I believe its been in -mm since then. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Alexander Viro <aviro@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-13 10:10:37 -07:00
Linus Torvalds	8d71ff0bef	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (24 commits) integrity: special fs magic As pointed out by Jonathan Corbet, the timer must be deleted before ERROR: code indent should use tabs where possible The tpm_dev_release function is only called for platform devices, not pnp Protect tpm_chip_list when transversing it. Renames num_open to is_open, as only one process can open the file at a time. Remove the BKL calls from the TPM driver, which were added in the overall netlabel: Add configuration support for local labeling cipso: Add support for native local labeling and fixup mapping names netlabel: Changes to the NetLabel security attributes to allow LSMs to pass full contexts selinux: Cache NetLabel secattrs in the socket's security struct selinux: Set socket NetLabel based on connection endpoint netlabel: Add functionality to set the security attributes of a packet netlabel: Add network address selectors to the NetLabel/LSM domain mapping netlabel: Add a generic way to create ordered linked lists of network addrs netlabel: Replace protocol/NetLabel linking with refrerence counts smack: Fix missing calls to netlbl_skbuff_err() selinux: Fix missing calls to netlbl_skbuff_err() selinux: Fix a problem in security_netlbl_sid_to_secattr() selinux: Better local/forward check in selinux_ip_postroute() ...	2008-10-13 10:00:44 -07:00
Alan Cox	934e6ebf96	tty: Redo current tty locking Currently it is sometimes locked by the tty mutex and sometimes by the sighand lock. The latter is in fact correct and now we can hand back referenced objects we can fix this up without problems around sleeping functions. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-13 09:51:41 -07:00
Alan Cox	452a00d2ee	tty: Make get_current_tty use a kref We now return a kref covered tty reference. That ensures the tty structure doesn't go away when you have a return from get_current_tty. This is not enough to protect you from most of the resources being freed behind your back - yet. [Updated to include fixes for SELinux problems found by Andrew Morton and an s390 leak found while debugging the former] Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-13 09:51:41 -07:00
Mimi Zohar	9256292782	integrity: special fs magic Discussion on the mailing list questioned the use of these magic values in userspace, concluding these values are already exported to userspace via statfs and their correct/incorrect usage is left up to the userspace application. - Move special fs magic number definitions to magic.h - Add magic.h include Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-10-13 09:47:43 +11:00
James Morris	0da939b005	Merge branch 'master' of git://git.infradead.org/users/pcmoore/lblnet-2.6_next into next	2008-10-11 09:26:14 +11:00
Paul Moore	8d75899d03	netlabel: Changes to the NetLabel security attributes to allow LSMs to pass full contexts This patch provides support for including the LSM's secid in addition to the LSM's MLS information in the NetLabel security attributes structure. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:33 -04:00
Paul Moore	6c5b3fc014	selinux: Cache NetLabel secattrs in the socket's security struct Previous work enabled the use of address based NetLabel selectors, which while highly useful, brought the potential for additional per-packet overhead when used. This patch attempts to mitigate some of that overhead by caching the NetLabel security attribute struct within the SELinux socket security structure. This should help eliminate the need to recreate the NetLabel secattr structure for each packet resulting in less overhead. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:33 -04:00
Paul Moore	014ab19a69	selinux: Set socket NetLabel based on connection endpoint Previous work enabled the use of address based NetLabel selectors, which while highly useful, brought the potential for additional per-packet overhead when used. This patch attempts to solve that by applying NetLabel socket labels when sockets are connect()'d. This should alleviate the per-packet NetLabel labeling for all connected sockets (yes, it even works for connected DGRAM sockets). Signed-off-by: Paul Moore <paul.moore@hp.com> Reviewed-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:33 -04:00
Paul Moore	948bf85c1b	netlabel: Add functionality to set the security attributes of a packet This patch builds upon the new NetLabel address selector functionality by providing the NetLabel KAPI and CIPSO engine support needed to enable the new packet-based labeling. The only new addition to the NetLabel KAPI at this point is shown below: * int netlbl_skbuff_setattr(skb, family, secattr) ... and is designed to be called from a Netfilter hook after the packet's IP header has been populated such as in the FORWARD or LOCAL_OUT hooks. This patch also provides the necessary SELinux hooks to support this new functionality. Smack support is not currently included due to uncertainty regarding the permissions needed to expand the Smack network access controls. Signed-off-by: Paul Moore <paul.moore@hp.com> Reviewed-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:32 -04:00
Paul Moore	b1edeb1023	netlabel: Replace protocol/NetLabel linking with refrerence counts NetLabel has always had a list of backpointers in the CIPSO DOI definition structure which pointed to the NetLabel LSM domain mapping structures which referenced the CIPSO DOI struct. The rationale for this was that when an administrator removed a CIPSO DOI from the system all of the associated NetLabel LSM domain mappings should be removed as well; a list of backpointers made this a simple operation. Unfortunately, while the backpointers did make the removal easier they were a bit of a mess from an implementation point of view which was making further development difficult. Since the removal of a CIPSO DOI is a realtively rare event it seems to make sense to remove this backpointer list as the optimization was hurting us more then it was helping. However, we still need to be able to track when a CIPSO DOI definition is being used so replace the backpointer list with a reference count. In order to preserve the current functionality of removing the associated LSM domain mappings when a CIPSO DOI is removed we walk the LSM domain mapping table, removing the relevant entries. Signed-off-by: Paul Moore <paul.moore@hp.com> Reviewed-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:31 -04:00
Paul Moore	a8134296ba	smack: Fix missing calls to netlbl_skbuff_err() Smack needs to call netlbl_skbuff_err() to let NetLabel do the necessary protocol specific error handling. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com>	2008-10-10 10:16:31 -04:00
Paul Moore	dfaebe9825	selinux: Fix missing calls to netlbl_skbuff_err() At some point I think I messed up and dropped the calls to netlbl_skbuff_err() which are necessary for CIPSO to send error notifications to remote systems. This patch re-introduces the error handling calls into the SELinux code. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:31 -04:00
Paul Moore	99d854d231	selinux: Fix a problem in security_netlbl_sid_to_secattr() Currently when SELinux fails to allocate memory in security_netlbl_sid_to_secattr() the NetLabel LSM domain field is set to NULL which triggers the default NetLabel LSM domain mapping which may not always be the desired mapping. This patch fixes this by returning an error when the kernel is unable to allocate memory. This could result in more failures on a system with heavy memory pressure but it is the "correct" thing to do. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:30 -04:00
Paul Moore	d8395c876b	selinux: Better local/forward check in selinux_ip_postroute() It turns out that checking to see if skb->sk is NULL is not a very good indicator of a forwarded packet as some locally generated packets also have skb->sk set to NULL. Fix this by not only checking the skb->sk field but also the IP[6]CB(skb)->flags field for the IP[6]SKB_FORWARDED flag. While we are at it, we are calling selinux_parse_skb() much earlier than we really should resulting in potentially wasted cycles parsing packets for information we might no use; so shuffle the code around a bit to fix this. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:30 -04:00
Paul Moore	aa86290089	selinux: Correctly handle IPv4 packets on IPv6 sockets in all cases We did the right thing in a few cases but there were several areas where we determined a packet's address family based on the socket's address family which is not the right thing to do since we can get IPv4 packets on IPv6 sockets. This patch fixes these problems by either taking the address family directly from the packet. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:29 -04:00
Paul Moore	accc609322	selinux: Cleanup the NetLabel glue code We were doing a lot of extra work in selinux_netlbl_sock_graft() what wasn't necessary so this patch removes that code. It also removes the redundant second argument to selinux_netlbl_sock_setsid() which allows us to simplify a few other functions. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:29 -04:00
Paul Moore	3040a6d5a2	selinux: Fix an uninitialized variable BUG/panic in selinux_secattr_to_sid() At some point during the 2.6.27 development cycle two new fields were added to the SELinux context structure, a string pointer and a length field. The code in selinux_secattr_to_sid() was not modified and as a result these two fields were left uninitialized which could result in erratic behavior, including kernel panics, when NetLabel is used. This patch fixes the problem by fully initializing the context in selinux_secattr_to_sid() before use and reducing the level of direct context manipulation done to help prevent future problems. Please apply this to the 2.6.27-rcX release stream. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-10-04 08:25:18 +10:00
Paul Moore	81990fbdd1	selinux: Fix an uninitialized variable BUG/panic in selinux_secattr_to_sid() At some point during the 2.6.27 development cycle two new fields were added to the SELinux context structure, a string pointer and a length field. The code in selinux_secattr_to_sid() was not modified and as a result these two fields were left uninitialized which could result in erratic behavior, including kernel panics, when NetLabel is used. This patch fixes the problem by fully initializing the context in selinux_secattr_to_sid() before use and reducing the level of direct context manipulation done to help prevent future problems. Please apply this to the 2.6.27-rcX release stream. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-10-04 08:18:18 +10:00
Stephen Smalley	ea6b184f7d	selinux: use default proc sid on symlinks As we are not concerned with fine-grained control over reading of symlinks in proc, always use the default proc SID for all proc symlinks. This should help avoid permission issues upon changes to the proc tree as in the /proc/net -> /proc/self/net example. This does not alter labeling of symlinks within /proc/pid directories. ls -Zd /proc/net output before and after the patch should show the difference. Signed-off-by: Stephen D. Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-09-30 00:26:53 +10:00
Serge E. Hallyn	de45e806a8	file capabilities: uninline cap_safe_nice This reduces the kernel size by 289 bytes. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-09-27 15:07:56 +10:00
James Morris	ab2b49518e	Merge branch 'master' into next Conflicts: MAINTAINERS Thanks for breaking my tree :-) Signed-off-by: James Morris <jmorris@namei.org>	2008-09-21 17:41:56 -07:00
Frank Mayhar	f06febc96b	timers: fix itimer/many thread hang Overview This patch reworks the handling of POSIX CPU timers, including the ITIMER_PROF, ITIMER_VIRT timers and rlimit handling. It was put together with the help of Roland McGrath, the owner and original writer of this code. The problem we ran into, and the reason for this rework, has to do with using a profiling timer in a process with a large number of threads. It appears that the performance of the old implementation of run_posix_cpu_timers() was at least O(n3) (where "n" is the number of threads in a process) or worse. Everything is fine with an increasing number of threads until the time taken for that routine to run becomes the same as or greater than the tick time, at which point things degrade rather quickly. This patch fixes bug 9906, "Weird hang with NPTL and SIGPROF." Code Changes This rework corrects the implementation of run_posix_cpu_timers() to make it run in constant time for a particular machine. (Performance may vary between one machine and another depending upon whether the kernel is built as single- or multiprocessor and, in the latter case, depending upon the number of running processors.) To do this, at each tick we now update fields in signal_struct as well as task_struct. The run_posix_cpu_timers() function uses those fields to make its decisions. We define a new structure, "task_cputime," to contain user, system and scheduler times and use these in appropriate places: struct task_cputime { cputime_t utime; cputime_t stime; unsigned long long sum_exec_runtime; }; This is included in the structure "thread_group_cputime," which is a new substructure of signal_struct and which varies for uniprocessor versus multiprocessor kernels. For uniprocessor kernels, it uses "task_cputime" as a simple substructure, while for multiprocessor kernels it is a pointer: struct thread_group_cputime { struct task_cputime totals; }; struct thread_group_cputime { struct task_cputime totals; }; We also add a new task_cputime substructure directly to signal_struct, to cache the earliest expiration of process-wide timers, and task_cputime also replaces the it_*_expires fields of task_struct (used for earliest expiration of thread timers). The "thread_group_cputime" structure contains process-wide timers that are updated via account_user_time() and friends. In the non-SMP case the structure is a simple aggregator; unfortunately in the SMP case that simplicity was not achievable due to cache-line contention between CPUs (in one measured case performance was actually _worse_ on a 16-cpu system than the same test on a 4-cpu system, due to this contention). For SMP, the thread_group_cputime counters are maintained as a per-cpu structure allocated using alloc_percpu(). The timer functions update only the timer field in the structure corresponding to the running CPU, obtained using per_cpu_ptr(). We define a set of inline functions in sched.h that we use to maintain the thread_group_cputime structure and hide the differences between UP and SMP implementations from the rest of the kernel. The thread_group_cputime_init() function initializes the thread_group_cputime structure for the given task. The thread_group_cputime_alloc() is a no-op for UP; for SMP it calls the out-of-line function thread_group_cputime_alloc_smp() to allocate and fill in the per-cpu structures and fields. The thread_group_cputime_free() function, also a no-op for UP, in SMP frees the per-cpu structures. The thread_group_cputime_clone_thread() function (also a UP no-op) for SMP calls thread_group_cputime_alloc() if the per-cpu structures haven't yet been allocated. The thread_group_cputime() function fills the task_cputime structure it is passed with the contents of the thread_group_cputime fields; in UP it's that simple but in SMP it must also safely check that tsk->signal is non-NULL (if it is it just uses the appropriate fields of task_struct) and, if so, sums the per-cpu values for each online CPU. Finally, the three functions account_group_user_time(), account_group_system_time() and account_group_exec_runtime() are used by timer functions to update the respective fields of the thread_group_cputime structure. Non-SMP operation is trivial and will not be mentioned further. The per-cpu structure is always allocated when a task creates its first new thread, via a call to thread_group_cputime_clone_thread() from copy_signal(). It is freed at process exit via a call to thread_group_cputime_free() from cleanup_signal(). All functions that formerly summed utime/stime/sum_sched_runtime values from from all threads in the thread group now use thread_group_cputime() to snapshot the values in the thread_group_cputime structure or the values in the task structure itself if the per-cpu structure hasn't been allocated. Finally, the code in kernel/posix-cpu-timers.c has changed quite a bit. The run_posix_cpu_timers() function has been split into a fast path and a slow path; the former safely checks whether there are any expired thread timers and, if not, just returns, while the slow path does the heavy lifting. With the dedicated thread group fields, timers are no longer "rebalanced" and the process_timer_rebalance() function and related code has gone away. All summing loops are gone and all code that used them now uses the thread_group_cputime() inline. When process-wide timers are set, the new task_cputime structure in signal_struct is used to cache the earliest expiration; this is checked in the fast path. Performance The fix appears not to add significant overhead to existing operations. It generally performs the same as the current code except in two cases, one in which it performs slightly worse (Case 5 below) and one in which it performs very significantly better (Case 2 below). Overall it's a wash except in those two cases. I've since done somewhat more involved testing on a dual-core Opteron system. Case 1: With no itimer running, for a test with 100,000 threads, the fixed kernel took 1428.5 seconds, 513 seconds more than the unfixed system, all of which was spent in the system. There were twice as many voluntary context switches with the fix as without it. Case 2: With an itimer running at .01 second ticks and 4000 threads (the most an unmodified kernel can handle), the fixed kernel ran the test in eight percent of the time (5.8 seconds as opposed to 70 seconds) and had better tick accuracy (.012 seconds per tick as opposed to .023 seconds per tick). Case 3: A 4000-thread test with an initial timer tick of .01 second and an interval of 10,000 seconds (i.e. a timer that ticks only once) had very nearly the same performance in both cases: 6.3 seconds elapsed for the fixed kernel versus 5.5 seconds for the unfixed kernel. With fewer threads (eight in these tests), the Case 1 test ran in essentially the same time on both the modified and unmodified kernels (5.2 seconds versus 5.8 seconds). The Case 2 test ran in about the same time as well, 5.9 seconds versus 5.4 seconds but again with much better tick accuracy, .013 seconds per tick versus .025 seconds per tick for the unmodified kernel. Since the fix affected the rlimit code, I also tested soft and hard CPU limits. Case 4: With a hard CPU limit of 20 seconds and eight threads (and an itimer running), the modified kernel was very slightly favored in that while it killed the process in 19.997 seconds of CPU time (5.002 seconds of wall time), only .003 seconds of that was system time, the rest was user time. The unmodified kernel killed the process in 20.001 seconds of CPU (5.014 seconds of wall time) of which .016 seconds was system time. Really, though, the results were too close to call. The results were essentially the same with no itimer running. Case 5: With a soft limit of 20 seconds and a hard limit of 2000 seconds (where the hard limit would never be reached) and an itimer running, the modified kernel exhibited worse tick accuracy than the unmodified kernel: .050 seconds/tick versus .028 seconds/tick. Otherwise, performance was almost indistinguishable. With no itimer running this test exhibited virtually identical behavior and times in both cases. In times past I did some limited performance testing. those results are below. On a four-cpu Opteron system without this fix, a sixteen-thread test executed in 3569.991 seconds, of which user was 3568.435s and system was 1.556s. On the same system with the fix, user and elapsed time were about the same, but system time dropped to 0.007 seconds. Performance with eight, four and one thread were comparable. Interestingly, the timer ticks with the fix seemed more accurate: The sixteen-thread test with the fix received 149543 ticks for 0.024 seconds per tick, while the same test without the fix received 58720 for 0.061 seconds per tick. Both cases were configured for an interval of 0.01 seconds. Again, the other tests were comparable. Each thread in this test computed the primes up to 25,000,000. I also did a test with a large number of threads, 100,000 threads, which is impossible without the fix. In this case each thread computed the primes only up to 10,000 (to make the runtime manageable). System time dominated, at 1546.968 seconds out of a total 2176.906 seconds (giving a user time of 629.938s). It received 147651 ticks for 0.015 seconds per tick, still quite accurate. There is obviously no comparable test without the fix. Signed-off-by: Frank Mayhar <fmayhar@google.com> Cc: Roland McGrath <roland@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-14 16:25:35 +02:00
Stephen Smalley	f058925b20	Update selinux info in MAINTAINERS and Kconfig help text Update the SELinux entry in MAINTAINERS and drop the obsolete information from the selinux Kconfig help text. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-09-12 00:44:08 +10:00
Eric Paris	8e531af90f	SELinux: memory leak in security_context_to_sid_core Fix a bug and a philosophical decision about who handles errors. security_context_to_sid_core() was leaking a context in the common case. This was causing problems on fedora systems which recently have started making extensive use of this function. In discussion it was decided that if string_to_context_struct() had an error it was its own responsibility to clean up any mess it created along the way. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-09-04 08:35:13 +10:00
Li Zefan	36fd71d293	devcgroup: fix race against rmdir() During the use of a dev_cgroup, we should guarantee the corresponding cgroup won't be deleted (i.e. via rmdir). This can be done through css_get(&dev_cgroup->css), but here we can just get and use the dev_cgroup under rcu_read_lock. And also remove checking NULL dev_cgroup, it won't be NULL since a task always belongs to a cgroup. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-09-02 19:21:38 -07:00
KaiGai Kohei	d9250dea3f	SELinux: add boundary support and thread context assignment The purpose of this patch is to assign per-thread security context under a constraint. It enables multi-threaded server application to kick a request handler with its fair security context, and helps some of userspace object managers to handle user's request. When we assign a per-thread security context, it must not have wider permissions than the original one. Because a multi-threaded process shares a single local memory, an arbitary per-thread security context also means another thread can easily refer violated information. The constraint on a per-thread security context requires a new domain has to be equal or weaker than its original one, when it tries to assign a per-thread security context. Bounds relationship between two types is a way to ensure a domain can never have wider permission than its bounds. We can define it in two explicit or implicit ways. The first way is using new TYPEBOUNDS statement. It enables to define a boundary of types explicitly. The other one expand the concept of existing named based hierarchy. If we defines a type with "." separated name like "httpd_t.php", toolchain implicitly set its bounds on "httpd_t". This feature requires a new policy version. The 24th version (POLICYDB_VERSION_BOUNDARY) enables to ship them into kernel space, and the following patch enables to handle it. Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-08-29 00:33:33 +10:00
Eric Paris	da31894ed7	securityfs: do not depend on CONFIG_SECURITY Add a new Kconfig option SECURITYFS which will build securityfs support but does not require CONFIG_SECURITY. The only current user of securityfs does not depend on CONFIG_SECURITY and there is no reason the full LSM needs to be built to build this fs. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-08-28 10:47:42 +10:00
James Morris	86d688984d	Merge branch 'master' into next	2008-08-28 10:47:34 +10:00
Randy Dunlap	3f23d815c5	security: add/fix security kernel-doc Add security/inode.c functions to the kernel-api docbook. Use '%' on constants in kernel-doc notation. Fix several typos/spellos in security function descriptions. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-08-20 20:16:32 +10:00
Vesa-Matti Kari	dbc74c65b3	selinux: Unify for- and while-loop style Replace "thing != NULL" comparisons with just "thing" to make the code look more uniform (mixed styles were used even in the same source file). Signed-off-by: Vesa-Matti Kari <vmkari@cc.helsinki.fi> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-08-15 08:40:47 +10:00
David Howells	5cd9c58fbe	security: Fix setting of PF_SUPERPRIV by __capable() Fix the setting of PF_SUPERPRIV by __capable() as it could corrupt the flags the target process if that is not the current process and it is trying to change its own flags in a different way at the same time. __capable() is using neither atomic ops nor locking to protect t->flags. This patch removes __capable() and introduces has_capability() that doesn't set PF_SUPERPRIV on the process being queried. This patch further splits security_ptrace() in two: (1) security_ptrace_may_access(). This passes judgement on whether one process may access another only (PTRACE_MODE_ATTACH for ptrace() and PTRACE_MODE_READ for /proc), and takes a pointer to the child process. current is the parent. (2) security_ptrace_traceme(). This passes judgement on PTRACE_TRACEME only, and takes only a pointer to the parent process. current is the child. In Smack and commoncap, this uses has_capability() to determine whether the parent will be permitted to use PTRACE_ATTACH if normal checks fail. This does not set PF_SUPERPRIV. Two of the instances of __capable() actually only act on current, and so have been changed to calls to capable(). Of the places that were using __capable(): (1) The OOM killer calls __capable() thrice when weighing the killability of a process. All of these now use has_capability(). (2) cap_ptrace() and smack_ptrace() were using __capable() to check to see whether the parent was allowed to trace any process. As mentioned above, these have been split. For PTRACE_ATTACH and /proc, capable() is now used, and for PTRACE_TRACEME, has_capability() is used. (3) cap_safe_nice() only ever saw current, so now uses capable(). (4) smack_setprocattr() rejected accesses to tasks other than current just after calling __capable(), so the order of these two tests have been switched and capable() is used instead. (5) In smack_file_send_sigiotask(), we need to allow privileged processes to receive SIGIO on files they're manipulating. (6) In smack_task_wait(), we let a process wait for a privileged process, whether or not the process doing the waiting is privileged. I've tested this with the LTP SELinux and syscalls testscripts. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>	2008-08-14 22:59:43 +10:00
Vesa-Matti Kari	421fae06be	selinux: conditional expression type validation was off-by-one expr_isvalid() in conditional.c was off-by-one and allowed invalid expression type COND_LAST. However, it is this header file that needs to be fixed. That way the if-statement's disjunction's second component reads more naturally, "if expr type is greater than the last allowed value" ( rather than using ">=" in conditional.c): if (expr->expr_type <= 0 \|\| expr->expr_type > COND_LAST) Signed-off-by: Vesa-Matti Kari <vmkari@cc.helsinki.fi> Signed-off-by: James Morris <jmorris@namei.org>	2008-08-07 08:56:16 +10:00
Casey Schaufler	1544623536	smack: limit privilege by label There have been a number of requests to make the Smack LSM enforce MAC even in the face of privilege, either capability based or superuser based. This is not universally desired, however, so it seems desirable to make it optional. Further, at least one legacy OS implemented a scheme whereby only processes running with one particular label could be exempt from MAC. This patch supports these three cases. If /smack/onlycap is empty (unset or null-string) privilege is enforced in the normal way. If /smack/onlycap contains a label only processes running with that label may be MAC exempt. If the label in /smack/onlycap is the star label ("*") the semantics of the star label combine with the privilege restrictions to prevent any violations of MAC, even in the presence of privilege. Again, this will be independent of the privilege scheme. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: James Morris <jmorris@namei.org>	2008-08-05 10:55:53 +10:00
David Howells	cf9481e289	SELinux: Fix a potentially uninitialised variable in SELinux hooks Fix a potentially uninitialised variable in SELinux hooks that's given a pointer to the network address by selinux_parse_skb() passing a pointer back through its argument list. By restructuring selinux_parse_skb(), the compiler can see that the error case need not set it as the caller will return immediately. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-08-05 10:55:47 +10:00
Vesa-Matti J Kari	0c0e186f81	SELinux: trivial, remove unneeded local variable Hello, Remove unneeded local variable: struct avtab_node *newnode Signed-off-by: Vesa-Matti Kari <vmkari@cc.helsinki.fi> Signed-off-by: James Morris <jmorris@namei.org>	2008-08-05 10:55:38 +10:00
Vesa-Matti J Kari	df4ea865f0	SELinux: Trivial minor fixes that change C null character style Trivial minor fixes that change C null character style. Signed-off-by: Vesa-Matti Kari <vmkari@cc.helsinki.fi> Signed-off-by: James Morris <jmorris@namei.org>	2008-08-05 10:55:30 +10:00
Adrian Bunk	3583a71183	make selinux_write_opts() static This patch makes the needlessly global selinux_write_opts() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-08-05 10:55:24 +10:00
Eric Paris	383795c206	SELinux: /proc/mounts should show what it can Given a hosed SELinux config in which a system never loads policy or disables SELinux we currently just return -EINVAL for anyone trying to read /proc/mounts. This is a configuration problem but we can certainly be more graceful. This patch just ignores -EINVAL when displaying LSM options and causes /proc/mounts display everything else it can. If policy isn't loaded the obviously there are no options, so we aren't really loosing any information here. This is safe as the only other return of EINVAL comes from security_sid_to_context_core() in the case of an invalid sid. Even if a FS was mounted with a now invalidated context that sid should have been remapped to unlabeled and so we won't hit the EINVAL and will work like we should. (yes, I tested to make sure it worked like I thought) Signed-off-by: Eric Paris <eparis@redhat.com> Tested-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-30 08:31:28 +10:00
Linus Torvalds	4836e30078	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (39 commits) [PATCH] fix RLIM_NOFILE handling [PATCH] get rid of corner case in dup3() entirely [PATCH] remove remaining namei_{32,64}.h crap [PATCH] get rid of indirect users of namei.h [PATCH] get rid of __user_path_lookup_open [PATCH] f_count may wrap around [PATCH] dup3 fix [PATCH] don't pass nameidata to __ncp_lookup_validate() [PATCH] don't pass nameidata to gfs2_lookupi() [PATCH] new (local) helper: user_path_parent() [PATCH] sanitize __user_walk_fd() et.al. [PATCH] preparation to __user_walk_fd cleanup [PATCH] kill nameidata passing to permission(), rename to inode_permission() [PATCH] take noexec checks to very few callers that care Re: [PATCH 3/6] vfs: open_exec cleanup [patch 4/4] vfs: immutable inode checking cleanup [patch 3/4] fat: dont call notify_change [patch 2/4] vfs: utimes cleanup [patch 1/4] vfs: utimes: move owner check into inode_change_ok() [PATCH] vfs: use kstrdup() and check failing allocation ...	2008-07-26 20:23:44 -07:00
Linus Torvalds	2284284281	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: netns: fix ip_rt_frag_needed rt_is_expired netfilter: nf_conntrack_extend: avoid unnecessary "ct->ext" dereferences netfilter: fix double-free and use-after free netfilter: arptables in netns for real netfilter: ip{,6}tables_security: fix future section mismatch selinux: use nf_register_hooks() netfilter: ebtables: use nf_register_hooks() Revert "pkt_sched: sch_sfq: dump a real number of flows" qeth: use dev->ml_priv instead of dev->priv syncookies: Make sure ECN is disabled net: drop unused BUG_TRAP() net: convert BUG_TRAP to generic WARN_ON drivers/net: convert BUG_TRAP to generic WARN_ON	2008-07-26 20:17:56 -07:00
Miklos Szeredi	b1da47e29e	[patch 3/4] fat: dont call notify_change The FAT_IOCTL_SET_ATTRIBUTES ioctl() calls notify_change() to change the file mode before changing the inode attributes. Replace with explicit calls to security_inode_setattr(), fat_setattr() and fsnotify_change(). This is equivalent to the original. The reason it is needed, is that later in the series we move the immutable check into notify_change(). That would break the FAT_IOCTL_SET_ATTRIBUTES ioctl, as it needs to perform the mode change regardless of the immutability of the file. [Fix error if fat is built as a module. Thanks to OGAWA Hirofumi for noticing.] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:27 -04:00
Al Viro	b77b0646ef	[PATCH] pass MAY_OPEN to vfs_permission() explicitly ... and get rid of the last "let's deduce mask from nameidata->flags" bit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:22 -04:00
Alexey Dobriyan	6c5a9d2e15	selinux: use nf_register_hooks() Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 17:48:15 -07:00
Roland McGrath	0d094efeb1	tracehook: tracehook_tracer_task This adds the tracehook_tracer_task() hook to consolidate all forms of "Who is using ptrace on me?" logic. This is used for "TracerPid:" in /proc and for permission checks. We also clean up the selinux code the called an identical accessor. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:08 -07:00
Li Zefan	7759fc9d10	devcgroup: code cleanup - clean up set_majmin() - use simple_strtoul() to parse major/minor [akpm@linux-foundation.org: fix simple_strtoul() usage] [kosaki.motohiro@jp.fujitsu.com: fix warnings] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:37 -07:00
Pavel Emelyanov	4efd1a1b2f	devcgroup: relax white-list protection down to RCU Currently this list is protected with a simple spinlock, even for reading from one. This is OK, but can be better. Actually I want it to be better very much, since after replacing the OpenVZ device permissions engine with the cgroup-based one I noticed, that we set 12 default device permissions for each newly created container (for /dev/null, full, terminals, ect devices), and people sometimes have up to 20 perms more, so traversing the ~30-40 elements list under a spinlock doesn't seem very good. Here's the RCU protection for white-list - dev_whitelist_item-s are added and removed under the devcg->lock, but are looked up in permissions checking under the rcu_read_lock. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:37 -07:00
Paul Menage	f92523e3a7	cgroup files: convert devcgroup_access_write() into a cgroup write_string() handler This patch converts devcgroup_access_write() from a raw file handler into a handler for the cgroup write_string() method. This allows some boilerplate copying/locking/checking to be removed and simplifies the cleanup path, since these functions are performed by the cgroups framework before calling the handler. Signed-off-by: Paul Menage <menage@google.com> Cc: Paul Jackson <pj@sgi.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Balbir Singh <balbir@in.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:36 -07:00
Andrew G. Morgan	84aaa7ab4c	security: filesystem capabilities no longer experimental Filesystem capabilities have come of age. Remove the experimental tag for configuring filesystem capabilities. Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:22 -07:00
Andrew G. Morgan	5459c164f0	security: protect legacy applications from executing with insufficient privilege When cap_bset suppresses some of the forced (fP) capabilities of a file, it is generally only safe to execute the program if it understands how to recognize it doesn't have enough privilege to work correctly. For legacy applications (fE!=0), which have no non-destructive way to determine that they are missing privilege, we fail to execute (EPERM) any executable that requires fP capabilities, but would otherwise get pP' < fP. This is a fail-safe permission check. For some discussion of why it is problematic for (legacy) privileged applications to run with less than the set of capabilities requested for them, see: http://userweb.kernel.org/~morgan/sendmail-capabilities-war-story.html With this iteration of this support, we do not include setuid-0 based privilege protection from the bounding set. That is, the admin can still (ab)use the bounding set to suppress the privileges of a setuid-0 program. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: cleanup] Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:22 -07:00
James Morris	089be43e40	Revert "SELinux: allow fstype unknown to policy to use xattrs if present" This reverts commit `811f379927`. From Eric Paris: "Please drop this patch for now. It deadlocks on ntfs-3g. I need to rework it to handle fuse filesystems better. (casey was right)"	2008-07-15 18:32:49 +10:00
James Morris	6f0f0fd496	security: remove register_security hook The register security hook is no longer required, as the capability module is always registered. LSMs wishing to stack capability as a secondary module should do so explicitly. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-07-14 15:04:06 +10:00
Miklos Szeredi	93cbace7a0	security: remove dummy module fix Fix small oversight in "security: remove dummy module": CONFIG_SECURITY_FILE_CAPABILITIES doesn't depend on CONFIG_SECURITY Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:03:41 +10:00
Miklos Szeredi	5915eb5386	security: remove dummy module Remove the dummy module and make the "capability" module the default. Compile and boot tested. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:03:04 +10:00
Miklos Szeredi	b478a9f988	security: remove unused sb_get_mnt_opts hook The sb_get_mnt_opts() hook is unused, and is superseded by the sb_show_options() hook. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: James Morris <jmorris@namei.org>	2008-07-14 15:02:05 +10:00
Eric Paris	2069f45784	LSM/SELinux: show LSM mount options in /proc/mounts This patch causes SELinux mount options to show up in /proc/mounts. As with other code in the area seq_put errors are ignored. Other LSM's will not have their mount options displayed until they fill in their own security_sb_show_options() function. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:02:05 +10:00
Eric Paris	811f379927	SELinux: allow fstype unknown to policy to use xattrs if present Currently if a FS is mounted for which SELinux policy does not define an fs_use_* that FS will either be genfs labeled or not labeled at all. This decision is based on the existence of a genfscon rule in policy and is irrespective of the capabilities of the filesystem itself. This patch allows the kernel to check if the filesystem supports security xattrs and if so will use those if there is no fs_use_* rule in policy. An fstype with a no fs_use_* rule but with a genfs rule will use xattrs if available and will follow the genfs rule. This can be particularly interesting for things like ecryptfs which actually overlays a real underlying FS. If we define excryptfs in policy to use xattrs we will likely get this wrong at times, so with this path we just don't need to define it! Overlay ecryptfs on top of NFS with no xattr support: SELinux: initialized (dev ecryptfs, type ecryptfs), uses genfs_contexts Overlay ecryptfs on top of ext4 with xattr support: SELinux: initialized (dev ecryptfs, type ecryptfs), uses xattr It is also useful as the kernel adds new FS we don't need to add them in policy if they support xattrs and that is how we want to handle them. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:02:04 +10:00
James Morris	65fc766800	security: fix return of void-valued expressions Fix several warnings generated by sparse of the form "returning void-valued expression". Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Serge Hallyn <serue@us.ibm.com>	2008-07-14 15:02:03 +10:00
James Morris	2baf06df85	SELinux: use do_each_thread as a proper do/while block Use do_each_thread as a proper do/while block. Sparse complained. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>	2008-07-14 15:02:02 +10:00
James Morris	e399f98224	SELinux: remove unused and shadowed addrlen variable Remove unused and shadowed addrlen variable. Picked up by sparse. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Paul Moore <paul.moore@hp.com>	2008-07-14 15:02:01 +10:00
Eric Paris	6cbe27061a	SELinux: more user friendly unknown handling printk I've gotten complaints and reports about people not understanding the meaning of the current unknown class/perm handling the kernel emits on every policy load. Hopefully this will make make it clear to everyone the meaning of the message and won't waste a printk the user won't care about anyway on systems where the kernel and the policy agree on everything. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:02:00 +10:00
Stephen Smalley	22df4adb04	selinux: change handling of invalid classes (Was: Re: 2.6.26-rc5-mm1 selinux whine) On Mon, 2008-06-09 at 01:24 -0700, Andrew Morton wrote: > Getting a few of these with FC5: > > SELinux: context_struct_compute_av: unrecognized class 69 > SELinux: context_struct_compute_av: unrecognized class 69 > > one came out when I logged in. > > No other symptoms, yet. Change handling of invalid classes by SELinux, reporting class values unknown to the kernel as errors (w/ ratelimit applied) and handling class values unknown to policy as normal denials. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:59 +10:00
Eric Paris	89abd0acf0	SELinux: drop load_mutex in security_load_policy We used to protect against races of policy load in security_load_policy by using the load_mutex. Since then we have added a new mutex, sel_mutex, in sel_write_load() which is always held across all calls to security_load_policy we are covered and can safely just drop this one. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:58 +10:00
Eric Paris	cea78dc4ca	SELinux: fix off by 1 reference of class_to_string in context_struct_compute_av The class_to_string array is referenced by tclass. My code mistakenly was using tclass - 1. If the proceeding class is a userspace class rather than kernel class this may cause a denial/EINVAL even if unknown handling is set to allow. The bug shouldn't be allowing excess privileges since those are given based on the contents of another array which should be correctly referenced. At this point in time its pretty unlikely this is going to cause problems. The most recently added kernel classes which could be affected are association, dccp_socket, and peer. Its pretty unlikely any policy with handle_unknown=allow doesn't have association and dccp_socket undefined (they've been around longer than unknown handling) and peer is conditionalized on a policy cap which should only be defined if that class exists in policy. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:58 +10:00
James Morris	bdd581c143	SELinux: open code sidtab lock Open code sidtab lock to make Andrew Morton happy. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>	2008-07-14 15:01:57 +10:00
James Morris	972ccac2b2	SELinux: open code load_mutex Open code load_mutex as suggested by Andrew Morton. Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:56 +10:00
James Morris	0804d1133c	SELinux: open code policy_rwlock Open code policy_rwlock, as suggested by Andrew Morton. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>	2008-07-14 15:01:55 +10:00
Stephen Smalley	59dbd1ba98	selinux: fix endianness bug in network node address handling Fix an endianness bug in the handling of network node addresses by SELinux. This yields no change on little endian hardware but fixes the incorrect handling on big endian hardware. The network node addresses are stored in network order in memory by checkpolicy, not in cpu/host order, and thus should not have cpu_to_le32/le32_to_cpu conversions applied upon policy write/read unlike other data in the policy. Bug reported by John Weeks of Sun, who noticed that binary policy files built from the same policy source on x86 and sparc differed and tracked it down to the ipv4 address handling in checkpolicy. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:54 +10:00
Stephen Smalley	242631c49d	selinux: simplify ioctl checking Simplify and improve the robustness of the SELinux ioctl checking by using the "access mode" bits of the ioctl command to determine the permission check rather than dealing with individual command values. This removes any knowledge of specific ioctl commands from SELinux and follows the same guidance we gave to Smack earlier. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:53 +10:00
Stephen Smalley	abc69bb633	SELinux: enable processes with mac_admin to get the raw inode contexts Enable processes with CAP_MAC_ADMIN + mac_admin permission in policy to get undefined contexts on inodes. This extends the support for deferred mapping of security contexts in order to permit restorecon and similar programs to see the raw file contexts unknown to the system policy in order to check them. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:52 +10:00
Stephen Smalley	006ebb40d3	Security: split proc ptrace checking into read vs. attach Enable security modules to distinguish reading of process state via proc from full ptrace access by renaming ptrace_may_attach to ptrace_may_access and adding a mode argument indicating whether only read access or full attach access is requested. This allows security modules to permit access to reading process state without granting full ptrace access. The base DAC/capability checking remains unchanged. Read access to /proc/pid/mem continues to apply a full ptrace attach check since check_mem_permission() already requires the current task to already be ptracing the target. The other ptrace checks within proc for elements like environ, maps, and fds are changed to pass the read mode instead of attach. In the SELinux case, we model such reading of process state as a reading of a proc file labeled with the target process' label. This enables SELinux policy to permit such reading of process state without permitting control or manipulation of the target process, as there are a number of cases where programs probe for such information via proc but do not need to be able to control the target (e.g. procps, lsof, PolicyKit, ConsoleKit). At present we have to choose between allowing full ptrace in policy (more permissive than required/desired) or breaking functionality (or in some cases just silencing the denials via dontaudit rules but this can hide genuine attacks). This version of the patch incorporates comments from Casey Schaufler (change/replace existing ptrace_may_attach interface, pass access mode), and Chris Wright (provide greater consistency in the checking). Note that like their predecessors __ptrace_may_attach and ptrace_may_attach, the __ptrace_may_access and ptrace_may_access interfaces use different return value conventions from each other (0 or -errno vs. 1 or 0). I retained this difference to avoid any changes to the caller logic but made the difference clearer by changing the latter interface to return a bool rather than an int and by adding a comment about it to ptrace.h for any future callers. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:47 +10:00
James Morris	feb2a5b82d	SELinux: remove inherit field from inode_security_struct Remove inherit field from inode_security_struct, per Stephen Smalley: "Let's just drop inherit altogether - dead field." Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:38 +10:00
Richard Kennedy	fdeb05184b	SELinux: reorder inode_security_struct to increase objs/slab on 64bit reorder inode_security_struct to remove padding on 64 bit builds size reduced from 72 to 64 bytes increasing objects per slab to 64. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:37 +10:00
Eric Paris	f526971078	SELinux: keep the code clean formating and syntax Formatting and syntax changes whitespace, tabs to spaces, trailing space put open { on same line as struct def remove unneeded {} after if statements change printk("Lu") to printk("llu") convert asm/uaccess.h to linux/uaacess.h includes remove unnecessary asm/bug.h includes convert all users of simple_strtol to strict_strtol Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:36 +10:00
Stephen Smalley	9a59daa03d	SELinux: fix sleeping allocation in security_context_to_sid Fix a sleeping function called from invalid context bug by moving allocation to the callers prior to taking the policy rdlock. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:35 +10:00
Stephen Smalley	12b29f3455	selinux: support deferred mapping of contexts Introduce SELinux support for deferred mapping of security contexts in the SID table upon policy reload, and use this support for inode security contexts when the context is not yet valid under the current policy. Only processes with CAP_MAC_ADMIN + mac_admin permission in policy can set undefined security contexts on inodes. Inodes with such undefined contexts are treated as having the unlabeled context until the context becomes valid upon a policy reload that defines the context. Context invalidation upon policy reload also uses this support to save the context information in the SID table and later recover it upon a subsequent policy reload that defines the context again. This support is to enable package managers and similar programs to set down file contexts unknown to the system policy at the time the file is created in order to better support placing loadable policy modules in packages and to support build systems that need to create images of different distro releases with different policies w/o requiring all of the contexts to be defined or legal in the build host policy. With this patch applied, the following sequence is possible, although in practice it is recommended that this permission only be allowed to specific program domains such as the package manager. # rmdir baz # rm bar # touch bar # chcon -t foo_exec_t bar # foo_exec_t is not yet defined chcon: failed to change context of `bar' to `system_u:object_r:foo_exec_t': Invalid argument # mkdir -Z system_u:object_r:foo_exec_t baz mkdir: failed to set default file creation context to `system_u:object_r:foo_exec_t': Invalid argument # cat setundefined.te policy_module(setundefined, 1.0) require { type unconfined_t; type unlabeled_t; } files_type(unlabeled_t) allow unconfined_t self:capability2 mac_admin; # make -f /usr/share/selinux/devel/Makefile setundefined.pp # semodule -i setundefined.pp # chcon -t foo_exec_t bar # foo_exec_t is not yet defined # mkdir -Z system_u:object_r:foo_exec_t baz # ls -Zd bar baz -rw-r--r-- root root system_u:object_r:unlabeled_t bar drwxr-xr-x root root system_u:object_r:unlabeled_t baz # cat foo.te policy_module(foo, 1.0) type foo_exec_t; files_type(foo_exec_t) # make -f /usr/share/selinux/devel/Makefile foo.pp # semodule -i foo.pp # defines foo_exec_t # ls -Zd bar baz -rw-r--r-- root root user_u:object_r:foo_exec_t bar drwxr-xr-x root root system_u:object_r:foo_exec_t baz # semodule -r foo # ls -Zd bar baz -rw-r--r-- root root system_u:object_r:unlabeled_t bar drwxr-xr-x root root system_u:object_r:unlabeled_t baz # semodule -i foo.pp # ls -Zd bar baz -rw-r--r-- root root user_u:object_r:foo_exec_t bar drwxr-xr-x root root system_u:object_r:foo_exec_t baz # semodule -r setundefined foo # chcon -t foo_exec_t bar # no longer defined and not allowed chcon: failed to change context of `bar' to `system_u:object_r:foo_exec_t': Invalid argument # rmdir baz # mkdir -Z system_u:object_r:foo_exec_t baz mkdir: failed to set default file creation context to `system_u:object_r:foo_exec_t': Invalid argument Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:01:34 +10:00
Li Zefan	ec229e8300	devcgroup: fix permission check when adding entry to child cgroup # cat devices.list c 1:3 r # echo 'c 1:3 w' > sub/devices.allow # cat sub/devices.list c 1:3 w As illustrated, the parent group has no write permission to /dev/null, so it's child should not be allowed to add this write permission. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-13 12:51:18 -07:00
Li Zefan	17d213f806	devcgroup: always show positive major/minor num # echo "b $((0x7fffffff)):$((0x80000000)) rwm" > devices.allow # cat devices.list b 214748364:-21474836 rwm though a major/minor number of 0x800000000 is meaningless, we should not cast it to a negative value. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-13 12:51:18 -07:00
Li Zefan	d823f6bfec	devcgroup: fix odd behaviour when writing 'a' to devices.allow # cat /devcg/devices.list a : rwm # echo a > devices.allow # cat /devcg/devices.list a : rwm a 0:0 rwm This is odd and maybe confusing. With this patch, writing 'a' to devices.allow will add 'a : rwm' to the whitelist. Also a few fixes and updates to the document. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-04 10:40:10 -07:00
Andrew G. Morgan	1209726ce9	security: filesystem capabilities: fix CAP_SETPCAP handling The filesystem capability support meaning for CAP_SETPCAP is less powerful than the non-filesystem capability support. As such, when filesystem capabilities are configured, we should not permit CAP_SETPCAP to 'enhance' the current process through strace manipulation of a child process. Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-04 10:40:08 -07:00
Andrew G. Morgan	8cdbc2b982	capabilities: add (back) dummy support for KEEPCAPS The dummy module is used by folk that run security conscious code(!?). A feature of such code (for example, dhclient) is that it tries to operate with minimum privilege (dropping unneeded capabilities). While the dummy module doesn't restrict code execution based on capability state, the user code expects the kernel to appear to support it. This patch adds back faked support for the PR_SET_KEEPCAPS etc., calls - making the kernel behave as before 2.6.26. For details see: http://bugzilla.kernel.org/show_bug.cgi?id=10748 Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-06-12 18:05:40 -07:00
Daniel Walker	dba6a4d32d	keys: remove unused key_alloc_sem This semaphore doesn't appear to be used, so remove it. Signed-off-by: Daniel Walker <dwalker@mvista.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-06-06 11:29:11 -07:00
Pavel Emelyanov	d1ee2971f5	devscgroup: make white list more compact in some cases Consider you added a 'c foo:bar r' permission to some cgroup and then (a bit later) 'c'foo:bar w' for it. After this you'll see the c foo:bar r c foo:bar w lines in a devices.list file. Another example - consider you added 10 'c foo:bar r' permissions to some cgroup (e.g. by mistake). After this you'll see 10 c foo:bar r lines in a list file. This is weird. This situation also has one more annoying consequence. Having many items in a white list makes permissions checking slower, sine it has to walk a longer list. The proposal is to merge permissions for items, that correspond to the same device. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-06-06 11:29:11 -07:00
Pavel Emelyanov	cc9cb219aa	devscgroup: relax task to dev_cgroup conversion Two functions, that need to get a device_cgroup from a task (they are devcgroup_inode_permission and devcgroup_inode_mknod) make it in a strange way: They get a css_set from task, then a subsys_state from css_set, then a cgroup from the state and then a subsys_state again from the cgroup. Besides, the devices_subsys_id is read from memory, whilst there's a enum-ed constant for it. Optimize this part a bit: 1. Get the subsys_stats form the task and be done - no 2 extra dereferences, 2. Use the device_subsys_id constant, not the value from memory (i.e. one less dereference). Found while preparing 2.6.26 OpenVZ port. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-06-06 11:29:11 -07:00
Pavel Emelyanov	b66862f766	devcgroup: make a helper to convert cgroup_subsys_state to devs_cgroup This is just picking the container_of out of cgroup_to_devcgroup into a separate function. This new css_to_devcgroup will be used in the 2nd patch. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-06-06 11:29:11 -07:00
Casey Schaufler	e97dcb0ead	Smack: fuse mount hang fix The d_instantiate hook for Smack can hang on the root inode of a filesystem if the file system code has not really done all the set-up. Fuse is known to encounter this problem. This change detects an attempt to instantiate a root inode and addresses it early in the processing, before any attempt is made to do something that might hang. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Tested-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-06-04 08:50:43 -07:00
Al Viro	9f3acc3140	[PATCH] split linux/file.h Initial splitoff of the low-level stuff; taken to fdtable.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-05-01 13:08:16 -04:00
Oleg Nesterov	3b5e9e53c6	signals: cleanup security_task_kill() usage/implementation Every implementation of ->task_kill() does nothing when the signal comes from the kernel. This is correct, but means that check_kill_permission() should call security_task_kill() only for SI_FROMUSER() case, and we can remove the same check from ->task_kill() implementations. (sadly, check_kill_permission() is the last user of signal->session/__session but we can't s/task_session_nr/task_session/ here). NOTE: Eric W. Biederman pointed out cap_task_kill() should die, and I think he is very right. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Roland McGrath <roland@redhat.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: David Quigley <dpquigl@tycho.nsa.gov> Cc: Eric Paris <eparis@redhat.com> Cc: Harald Welte <laforge@gnumonks.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-30 08:29:34 -07:00
Ahmed S. Darwish	d20bdda6d4	Smack: Integrate Smack with Audit Setup the new Audit hooks for Smack. SELinux Audit rule fields are recycled to avoid `auditd' userspace modifications. Currently only equality testing is supported on labels acting as a subject (AUDIT_SUBJ_USER) or as an object (AUDIT_OBJ_USER). Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com>	2008-04-30 08:34:10 +10:00
David Howells	e52c1764f1	Security: Make secctx_to_secid() take const secdata Make secctx_to_secid() take constant secdata. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-04-30 08:23:51 +10:00
Linus Torvalds	9781db7b34	Merge branch 'audit.b50' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b50' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: [PATCH] new predicate - AUDIT_FILETYPE [patch 2/2] Use find_task_by_vpid in audit code [patch 1/2] audit: let userspace fully control TTY input auditing [PATCH 2/2] audit: fix sparse shadowed variable warnings [PATCH 1/2] audit: move extern declarations to audit.h Audit: MAINTAINERS update Audit: increase the maximum length of the key field Audit: standardize string audit interfaces Audit: stop deadlock from signals under load Audit: save audit_backlog_limit audit messages in case auditd comes back Audit: collect sessionid in netlink messages Audit: end printk with newline	2008-04-29 11:41:22 -07:00
Robert P. J. Day	fdb89bce6c	keys: explicitly include required slab.h header file. Since these two source files invoke kmalloc(), they should explicitly include <linux/slab.h>. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:17 -07:00
David Howells	0b77f5bfb4	keys: make the keyring quotas controllable through /proc/sys Make the keyring quotas controllable through /proc/sys files: () /proc/sys/kernel/keys/root_maxkeys /proc/sys/kernel/keys/root_maxbytes Maximum number of keys that root may have and the maximum total number of bytes of data that root may have stored in those keys. () /proc/sys/kernel/keys/maxkeys /proc/sys/kernel/keys/maxbytes Maximum number of keys that each non-root user may have and the maximum total number of bytes of data that each of those users may have stored in their keys. Also increase the quotas as a number of people have been complaining that it's not big enough. I'm not sure that it's big enough now either, but on the other hand, it can now be set in /etc/sysctl.conf. Signed-off-by: David Howells <dhowells@redhat.com> Cc: <kwc@citi.umich.edu> Cc: <arunsr@cse.iitk.ac.in> Cc: <dwalsh@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:17 -07:00
David Howells	69664cf16a	keys: don't generate user and user session keyrings unless they're accessed Don't generate the per-UID user and user session keyrings unless they're explicitly accessed. This solves a problem during a login process whereby set*uid() is called before the SELinux PAM module, resulting in the per-UID keyrings having the wrong security labels. This also cures the problem of multiple per-UID keyrings sometimes appearing due to PAM modules (including pam_keyinit) setuiding and causing user_structs to come into and go out of existence whilst the session keyring pins the user keyring. This is achieved by first searching for extant per-UID keyrings before inventing new ones. The serial bound argument is also dropped from find_keyring_by_name() as it's not currently made use of (setting it to 0 disables the feature). Signed-off-by: David Howells <dhowells@redhat.com> Cc: <kwc@citi.umich.edu> Cc: <arunsr@cse.iitk.ac.in> Cc: <dwalsh@redhat.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:17 -07:00
Arun Raghavan	6b79ccb514	keys: allow clients to set key perms in key_create_or_update() The key_create_or_update() function provided by the keyring code has a default set of permissions that are always applied to the key when created. This might not be desirable to all clients. Here's a patch that adds a "perm" parameter to the function to address this, which can be set to KEY_PERM_UNDEF to revert to the current behaviour. Signed-off-by: Arun Raghavan <arunsr@cse.iitk.ac.in> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Satyam Sharma <ssatyam@cse.iitk.ac.in> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:16 -07:00
Alexey Dobriyan	da91d2ef9f	keys: switch to proc_create() Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:16 -07:00
David Howells	70a5bb72b5	keys: add keyctl function to get a security label Add a keyctl() function to get the security label of a key. The following is added to Documentation/keys.txt: () Get the LSM security context attached to a key. long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char buffer, size_t buflen) This function returns a string that represents the LSM security context attached to a key in the buffer provided. Unless there's an error, it always returns the amount of data it could produce, even if that's too big for the buffer, but it won't copy more than requested to userspace. If the buffer pointer is NULL then no copy will take place. A NUL character is included at the end of the string if the buffer is sufficiently big. This is included in the returned count. If no LSM is in force then an empty string will be returned. A process must have view permission on the key for this function to be successful. [akpm@linux-foundation.org: declare keyctl_get_security()] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: Paul Moore <paul.moore@hp.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: James Morris <jmorris@namei.org> Cc: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:16 -07:00
David Howells	4a38e122e2	keys: allow the callout data to be passed as a blob rather than a string Allow the callout data to be passed as a blob rather than a string for internal kernel services that call any request_key_*() interface other than request_key(). request_key() itself still takes a NUL-terminated string. The functions that change are: request_key_with_auxdata() request_key_async() request_key_async_with_auxdata() Signed-off-by: David Howells <dhowells@redhat.com> Cc: Paul Moore <paul.moore@hp.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:16 -07:00
Kevin Coffman	dceba99441	keys: check starting keyring as part of search Check the starting keyring as part of the search to (a) see if that is what we're searching for, and (b) to check it is still valid for searching. The scenario: User in process A does things that cause things to be created in its process session keyring. The user then does an su to another user and starts a new process, B. The two processes now share the same process session keyring. Process B does an NFS access which results in an upcall to gssd. When gssd attempts to instantiate the context key (to be linked into the process session keyring), it is denied access even though it has an authorization key. The order of calls is: keyctl_instantiate_key() lookup_user_key() (the default: case) search_process_keyrings(current) search_process_keyrings(rka->context) (recursive call) keyring_search_aux() keyring_search_aux() verifies the keys and keyrings underneath the top-level keyring it is given, but that top-level keyring is neither fully validated nor checked to see if it is the thing being searched for. This patch changes keyring_search_aux() to: 1) do more validation on the top keyring it is given and 2) check whether that top-level keyring is the thing being searched for Signed-off-by: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Paul Moore <paul.moore@hp.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Kevin Coffman <kwc@citi.umich.edu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:16 -07:00
David Howells	38bbca6b6f	keys: increase the payload size when instantiating a key Increase the size of a payload that can be used to instantiate a key in add_key() and keyctl_instantiate_key(). This permits huge CIFS SPNEGO blobs to be passed around. The limit is raised to 1MB. If kmalloc() can't allocate a buffer of sufficient size, vmalloc() will be tried instead. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Paul Moore <paul.moore@hp.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Kevin Coffman <kwc@citi.umich.edu> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:16 -07:00
Serge E. Hallyn	29486df325	cgroups: introduce cft->read_seq() Introduce a read_seq() helper in cftype, which uses seq_file to print out lists. Use it in the devices cgroup. Also split devices.allow into two files, so now devices.deny and devices.allow are the ones to use to manipulate the whitelist, while devices.list outputs the cgroup's current whitelist. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:10 -07:00
Serge E. Hallyn	08ce5f16ee	cgroups: implement device whitelist Implement a cgroup to track and enforce open and mknod restrictions on device files. A device cgroup associates a device access whitelist with each cgroup. A whitelist entry has 4 fields. 'type' is a (all), c (char), or b (block). 'all' means it applies to all types and all major and minor numbers. Major and minor are either an integer or * for all. Access is a composition of r (read), w (write), and m (mknod). The root device cgroup starts with rwm to 'all'. A child devcg gets a copy of the parent. Admins can then remove devices from the whitelist or add new entries. A child cgroup can never receive a device access which is denied its parent. However when a device access is removed from a parent it will not also be removed from the child(ren). An entry is added using devices.allow, and removed using devices.deny. For instance echo 'c 1:3 mr' > /cgroups/1/devices.allow allows cgroup 1 to read and mknod the device usually known as /dev/null. Doing echo a > /cgroups/1/devices.deny will remove the default 'a : mrw' entry. CAP_SYS_ADMIN is needed to change permissions or move another task to a new cgroup. A cgroup may not be granted more permissions than the cgroup's parent has. Any task can move itself between cgroups. This won't be sufficient, but we can decide the best way to adequately restrict movement later. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix may-be-used-uninitialized warning] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: James Morris <jmorris@namei.org> Looks-good-to: Pavel Emelyanov <xemul@openvz.org> Cc: Daniel Hokka Zakrisson <daniel@hozac.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:09 -07:00
David Howells	8f0cfa52a1	xattr: add missing consts to function arguments Add missing consts to xattr function arguments. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:06 -07:00
Linus Torvalds	cfd299dffe	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6: SELinux: Fix a RCU free problem with the netport cache SELinux: Made netnode cache adds faster SELinux: include/security.h whitespace, syntax, and other cleanups SELinux: policydb.h whitespace, syntax, and other cleanups SELinux: mls_types.h whitespace, syntax, and other cleanups SELinux: mls.h whitespace, syntax, and other cleanups SELinux: hashtab.h whitespace, syntax, and other cleanups SELinux: context.h whitespace, syntax, and other cleanups SELinux: ss/conditional.h whitespace, syntax, and other cleanups SELinux: selinux/include/security.h whitespace, syntax, and other cleanups SELinux: objsec.h whitespace, syntax, and other cleanups SELinux: netlabel.h whitespace, syntax, and other cleanups SELinux: avc_ss.h whitespace, syntax, and other cleanups Fixed up conflict in include/linux/security.h manually	2008-04-28 10:08:49 -07:00
Serge E. Hallyn	1236cc3cf8	smack: use cap_task_prctl With the introduction of per-process securebits, the capabilities-related prctl callbacks were moved into cap_task_prctl(). Have smack use cap_task_prctl() so that PR_SET_KEEPCAPS is defined. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:27 -07:00
Casey Schaufler	30aa4faf62	smack: make smk_cipso_doi() and smk_unlbl_ambient() The functions smk_cipso_doi and smk_unlbl_ambient are not used outside smackfs.c and should hence be static. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:27 -07:00
Serge E. Hallyn	55d00ccfb3	root_plug: use cap_task_prctl With the introduction of per-process securebits, the capabilities-related prctl callbacks were moved into cap_task_prctl(). Have root_plug use cap_task_prctl() so that PR_SET_KEEPCAPS is defined. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:27 -07:00
Harvey Harrison	c60264c494	smack: fix integer as NULL pointer warning in smack_lsm.c security/smack/smack_lsm.c:1257:16: warning: Using plain integer as NULL pointer Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:27 -07:00

... 3 4 5 6 7 ...

940 Commits