linux/security/apparmor/lsm.c

2292 lines
60 KiB
C
Raw Normal View History

// SPDX-License-Identifier: GPL-2.0-only
/*
* AppArmor security module
*
* This file contains AppArmor LSM hooks.
*
* Copyright (C) 1998-2008 Novell/SUSE
* Copyright 2009-2010 Canonical Ltd.
*/
#include <linux/lsm_hooks.h>
#include <linux/moduleparam.h>
#include <linux/mm.h>
#include <linux/mman.h>
#include <linux/mount.h>
#include <linux/namei.h>
#include <linux/ptrace.h>
#include <linux/ctype.h>
#include <linux/sysctl.h>
#include <linux/audit.h>
userns: security: make capabilities relative to the user namespace - Introduce ns_capable to test for a capability in a non-default user namespace. - Teach cap_capable to handle capabilities in a non-default user namespace. The motivation is to get to the unprivileged creation of new namespaces. It looks like this gets us 90% of the way there, with only potential uid confusion issues left. I still need to handle getting all caps after creation but otherwise I think I have a good starter patch that achieves all of your goals. Changelog: 11/05/2010: [serge] add apparmor 12/14/2010: [serge] fix capabilities to created user namespaces Without this, if user serge creates a user_ns, he won't have capabilities to the user_ns he created. THis is because we were first checking whether his effective caps had the caps he needed and returning -EPERM if not, and THEN checking whether he was the creator. Reverse those checks. 12/16/2010: [serge] security_real_capable needs ns argument in !security case 01/11/2011: [serge] add task_ns_capable helper 01/11/2011: [serge] add nsown_capable() helper per Bastian Blank suggestion 02/16/2011: [serge] fix a logic bug: the root user is always creator of init_user_ns, but should not always have capabilities to it! Fix the check in cap_capable(). 02/21/2011: Add the required user_ns parameter to security_capable, fixing a compile failure. 02/23/2011: Convert some macros to functions as per akpm comments. Some couldn't be converted because we can't easily forward-declare them (they are inline if !SECURITY, extern if SECURITY). Add a current_user_ns function so we can use it in capability.h without #including cred.h. Move all forward declarations together to the top of the #ifdef __KERNEL__ section, and use kernel-doc format. 02/23/2011: Per dhowells, clean up comment in cap_capable(). 02/23/2011: Per akpm, remove unreachable 'return -EPERM' in cap_capable. (Original written and signed off by Eric; latest, modified version acked by him) [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: export current_user_ns() for ecryptfs] [serge.hallyn@canonical.com: remove unneeded extra argument in selinux's task_has_capability] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23 23:43:17 +00:00
#include <linux/user_namespace.h>
#include <linux/netfilter_ipv4.h>
#include <linux/netfilter_ipv6.h>
#include <linux/zstd.h>
#include <net/sock.h>
#include <uapi/linux/mount.h>
LSM: Identify modules by more than name Create a struct lsm_id to contain identifying information about Linux Security Modules (LSMs). At inception this contains the name of the module and an identifier associated with the security module. Change the security_add_hooks() interface to use this structure. Change the individual modules to maintain their own struct lsm_id and pass it to security_add_hooks(). The values are for LSM identifiers are defined in a new UAPI header file linux/lsm.h. Each existing LSM has been updated to include it's LSMID in the lsm_id. The LSM ID values are sequential, with the oldest module LSM_ID_CAPABILITY being the lowest value and the existing modules numbered in the order they were included in the main line kernel. This is an arbitrary convention for assigning the values, but none better presents itself. The value 0 is defined as being invalid. The values 1-99 are reserved for any special case uses which may arise in the future. This may include attributes of the LSM infrastructure itself, possibly related to namespacing or network attribute management. A special range is identified for such attributes to help reduce confusion for developers unfamiliar with LSMs. LSM attribute values are defined for the attributes presented by modules that are available today. As with the LSM IDs, The value 0 is defined as being invalid. The values 1-99 are reserved for any special case uses which may arise in the future. Cc: linux-security-module <linux-security-module@vger.kernel.org> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Mickael Salaun <mic@digikod.net> Reviewed-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Kees Cook <keescook@chromium.org> Nacked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [PM: forward ported beyond v6.6 due merge window changes] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-09-12 20:56:46 +00:00
#include <uapi/linux/lsm.h>
#include "include/apparmor.h"
#include "include/apparmorfs.h"
#include "include/audit.h"
#include "include/capability.h"
#include "include/cred.h"
#include "include/file.h"
#include "include/ipc.h"
#include "include/net.h"
#include "include/path.h"
#include "include/label.h"
#include "include/policy.h"
#include "include/policy_ns.h"
#include "include/procattr.h"
apparmor: add mount mediation Add basic mount mediation. That allows controlling based on basic mount parameters. It does not include special mount parameters for apparmor, super block labeling, or any triggers for apparmor namespace parameter modifications on pivot root. default userspace policy rules have the form of MOUNT RULE = ( MOUNT | REMOUNT | UMOUNT ) MOUNT = [ QUALIFIERS ] 'mount' [ MOUNT CONDITIONS ] [ SOURCE FILEGLOB ] [ '->' MOUNTPOINT FILEGLOB ] REMOUNT = [ QUALIFIERS ] 'remount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB UMOUNT = [ QUALIFIERS ] 'umount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB MOUNT CONDITIONS = [ ( 'fstype' | 'vfstype' ) ( '=' | 'in' ) MOUNT FSTYPE EXPRESSION ] [ 'options' ( '=' | 'in' ) MOUNT FLAGS EXPRESSION ] MOUNT FSTYPE EXPRESSION = ( MOUNT FSTYPE LIST | MOUNT EXPRESSION ) MOUNT FSTYPE LIST = Comma separated list of valid filesystem and virtual filesystem types (eg ext4, debugfs, etc) MOUNT FLAGS EXPRESSION = ( MOUNT FLAGS LIST | MOUNT EXPRESSION ) MOUNT FLAGS LIST = Comma separated list of MOUNT FLAGS. MOUNT FLAGS = ( 'ro' | 'rw' | 'nosuid' | 'suid' | 'nodev' | 'dev' | 'noexec' | 'exec' | 'sync' | 'async' | 'remount' | 'mand' | 'nomand' | 'dirsync' | 'noatime' | 'atime' | 'nodiratime' | 'diratime' | 'bind' | 'rbind' | 'move' | 'verbose' | 'silent' | 'loud' | 'acl' | 'noacl' | 'unbindable' | 'runbindable' | 'private' | 'rprivate' | 'slave' | 'rslave' | 'shared' | 'rshared' | 'relatime' | 'norelatime' | 'iversion' | 'noiversion' | 'strictatime' | 'nouser' | 'user' ) MOUNT EXPRESSION = ( ALPHANUMERIC | AARE ) ... PIVOT ROOT RULE = [ QUALIFIERS ] pivot_root [ oldroot=OLD PUT FILEGLOB ] [ NEW ROOT FILEGLOB ] SOURCE FILEGLOB = FILEGLOB MOUNTPOINT FILEGLOB = FILEGLOB eg. mount, mount /dev/foo, mount options=ro /dev/foo -> /mnt/, mount options in (ro,atime) /dev/foo -> /mnt/, mount options=ro options=atime, Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2017-07-19 06:04:47 +00:00
#include "include/mount.h"
#include "include/secid.h"
/* Flag indicating whether initialization completed */
int apparmor_initialized;
union aa_buffer {
struct list_head list;
DECLARE_FLEX_ARRAY(char, buffer);
};
apparmor: cache buffers on percpu list if there is lock contention commit df323337e507 ("apparmor: Use a memory pool instead per-CPU caches") changed buffer allocation to use a memory pool, however on a heavily loaded machine there can be lock contention on the global buffers lock. Add a percpu list to cache buffers on when lock contention is encountered. When allocating buffers attempt to use cached buffers first, before taking the global buffers lock. When freeing buffers try to put them back to the global list but if contention is encountered, put the buffer on the percpu list. The length of time a buffer is held on the percpu list is dynamically adjusted based on lock contention. The amount of hold time is increased and decreased linearly. v5: - simplify base patch by removing: improvements can be added later - MAX_LOCAL and must lock - contention scaling. v4: - fix percpu ->count buffer count which had been spliced across a debug patch. - introduce define for MAX_LOCAL_COUNT - rework count check and locking around it. - update commit message to reference commit that introduced the memory. v3: - limit number of buffers that can be pushed onto the percpu list. This avoids a problem on some kernels where one percpu list can inherit buffers from another cpu after a reschedule, causing more kernel memory to used than is necessary. Under normal conditions this should eventually return to normal but under pathelogical conditions the extra memory consumption may have been unbouanded v2: - dynamically adjust buffer hold time on percpu list based on lock contention. v1: - cache buffers on percpu list on lock contention Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2022-10-25 08:18:41 +00:00
struct aa_local_cache {
unsigned int hold;
unsigned int count;
struct list_head head;
};
#define RESERVE_COUNT 2
static int reserve_count = RESERVE_COUNT;
static int buffer_count;
static LIST_HEAD(aa_global_buffers);
static DEFINE_SPINLOCK(aa_buffers_lock);
apparmor: cache buffers on percpu list if there is lock contention commit df323337e507 ("apparmor: Use a memory pool instead per-CPU caches") changed buffer allocation to use a memory pool, however on a heavily loaded machine there can be lock contention on the global buffers lock. Add a percpu list to cache buffers on when lock contention is encountered. When allocating buffers attempt to use cached buffers first, before taking the global buffers lock. When freeing buffers try to put them back to the global list but if contention is encountered, put the buffer on the percpu list. The length of time a buffer is held on the percpu list is dynamically adjusted based on lock contention. The amount of hold time is increased and decreased linearly. v5: - simplify base patch by removing: improvements can be added later - MAX_LOCAL and must lock - contention scaling. v4: - fix percpu ->count buffer count which had been spliced across a debug patch. - introduce define for MAX_LOCAL_COUNT - rework count check and locking around it. - update commit message to reference commit that introduced the memory. v3: - limit number of buffers that can be pushed onto the percpu list. This avoids a problem on some kernels where one percpu list can inherit buffers from another cpu after a reschedule, causing more kernel memory to used than is necessary. Under normal conditions this should eventually return to normal but under pathelogical conditions the extra memory consumption may have been unbouanded v2: - dynamically adjust buffer hold time on percpu list based on lock contention. v1: - cache buffers on percpu list on lock contention Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2022-10-25 08:18:41 +00:00
static DEFINE_PER_CPU(struct aa_local_cache, aa_local_buffers);
/*
* LSM hook functions
*/
/*
* put the associated labels
*/
static void apparmor_cred_free(struct cred *cred)
{
aa_put_label(cred_label(cred));
set_cred_label(cred, NULL);
}
/*
* allocate the apparmor part of blank credentials
*/
static int apparmor_cred_alloc_blank(struct cred *cred, gfp_t gfp)
{
set_cred_label(cred, NULL);
return 0;
}
/*
* prepare new cred label for modification by prepare_cred block
*/
static int apparmor_cred_prepare(struct cred *new, const struct cred *old,
gfp_t gfp)
{
set_cred_label(new, aa_get_newest_label(cred_label(old)));
return 0;
}
/*
* transfer the apparmor data to a blank set of creds
*/
static void apparmor_cred_transfer(struct cred *new, const struct cred *old)
{
set_cred_label(new, aa_get_newest_label(cred_label(old)));
}
static void apparmor_task_free(struct task_struct *task)
{
aa_free_task_ctx(task_ctx(task));
}
static int apparmor_task_alloc(struct task_struct *task,
unsigned long clone_flags)
{
struct aa_task_ctx *new = task_ctx(task);
aa_dup_task_ctx(new, task_ctx(current));
return 0;
}
static int apparmor_ptrace_access_check(struct task_struct *child,
unsigned int mode)
{
struct aa_label *tracer, *tracee;
const struct cred *cred;
int error;
cred = get_task_cred(child);
tracee = cred_label(cred); /* ref count on cred */
tracer = __begin_current_label_crit_section();
error = aa_may_ptrace(current_cred(), tracer, cred, tracee,
(mode & PTRACE_MODE_READ) ? AA_PTRACE_READ
: AA_PTRACE_TRACE);
__end_current_label_crit_section(tracer);
put_cred(cred);
return error;
}
static int apparmor_ptrace_traceme(struct task_struct *parent)
{
struct aa_label *tracer, *tracee;
const struct cred *cred;
int error;
tracee = __begin_current_label_crit_section();
cred = get_task_cred(parent);
tracer = cred_label(cred); /* ref count on cred */
error = aa_may_ptrace(cred, tracer, current_cred(), tracee,
AA_PTRACE_TRACE);
put_cred(cred);
__end_current_label_crit_section(tracee);
return error;
}
/* Derived from security/commoncap.c:cap_capget */
static int apparmor_capget(const struct task_struct *target, kernel_cap_t *effective,
kernel_cap_t *inheritable, kernel_cap_t *permitted)
{
struct aa_label *label;
const struct cred *cred;
rcu_read_lock();
cred = __task_cred(target);
label = aa_get_newest_cred_label(cred);
/*
* cap_capget is stacked ahead of this and will
* initialize effective and permitted.
*/
if (!unconfined(label)) {
struct aa_profile *profile;
struct label_it i;
label_for_each_confined(i, label, profile) {
struct aa_ruleset *rules;
if (COMPLAIN_MODE(profile))
continue;
rules = list_first_entry(&profile->rules,
typeof(*rules), list);
*effective = cap_intersect(*effective,
rules->caps.allow);
*permitted = cap_intersect(*permitted,
rules->caps.allow);
}
}
rcu_read_unlock();
aa_put_label(label);
return 0;
}
static int apparmor_capable(const struct cred *cred, struct user_namespace *ns,
int cap, unsigned int opts)
{
struct aa_label *label;
int error = 0;
label = aa_get_newest_cred_label(cred);
if (!unconfined(label))
error = aa_capable(cred, label, cap, opts);
aa_put_label(label);
return error;
}
/**
* common_perm - basic common permission check wrapper fn for paths
* @op: operation being checked
* @path: path to check permission of (NOT NULL)
* @mask: requested permissions mask
* @cond: conditional info for the permission request (NOT NULL)
*
* Returns: %0 else error code if error or permission denied
*/
static int common_perm(const char *op, const struct path *path, u32 mask,
struct path_cond *cond)
{
struct aa_label *label;
int error = 0;
label = __begin_current_label_crit_section();
if (!unconfined(label))
error = aa_path_perm(op, current_cred(), label, path, 0, mask,
cond);
__end_current_label_crit_section(label);
return error;
}
/**
* common_perm_cond - common permission wrapper around inode cond
* @op: operation being checked
* @path: location to check (NOT NULL)
* @mask: requested permissions mask
*
* Returns: %0 else error code if error or permission denied
*/
static int common_perm_cond(const char *op, const struct path *path, u32 mask)
{
vfsuid_t vfsuid = i_uid_into_vfsuid(mnt_idmap(path->mnt),
d_backing_inode(path->dentry));
struct path_cond cond = {
vfsuid_into_kuid(vfsuid),
d_backing_inode(path->dentry)->i_mode
};
if (!path_mediated_fs(path->dentry))
return 0;
return common_perm(op, path, mask, &cond);
}
/**
* common_perm_dir_dentry - common permission wrapper when path is dir, dentry
* @op: operation being checked
* @dir: directory of the dentry (NOT NULL)
* @dentry: dentry to check (NOT NULL)
* @mask: requested permissions mask
* @cond: conditional info for the permission request (NOT NULL)
*
* Returns: %0 else error code if error or permission denied
*/
static int common_perm_dir_dentry(const char *op, const struct path *dir,
struct dentry *dentry, u32 mask,
struct path_cond *cond)
{
struct path path = { .mnt = dir->mnt, .dentry = dentry };
return common_perm(op, &path, mask, cond);
}
/**
* common_perm_rm - common permission wrapper for operations doing rm
* @op: operation being checked
* @dir: directory that the dentry is in (NOT NULL)
* @dentry: dentry being rm'd (NOT NULL)
* @mask: requested permission mask
*
* Returns: %0 else error code if error or permission denied
*/
static int common_perm_rm(const char *op, const struct path *dir,
struct dentry *dentry, u32 mask)
{
struct inode *inode = d_backing_inode(dentry);
struct path_cond cond = { };
vfsuid_t vfsuid;
if (!inode || !path_mediated_fs(dentry))
return 0;
vfsuid = i_uid_into_vfsuid(mnt_idmap(dir->mnt), inode);
cond.uid = vfsuid_into_kuid(vfsuid);
cond.mode = inode->i_mode;
return common_perm_dir_dentry(op, dir, dentry, mask, &cond);
}
/**
* common_perm_create - common permission wrapper for operations doing create
* @op: operation being checked
* @dir: directory that dentry will be created in (NOT NULL)
* @dentry: dentry to create (NOT NULL)
* @mask: request permission mask
* @mode: created file mode
*
* Returns: %0 else error code if error or permission denied
*/
static int common_perm_create(const char *op, const struct path *dir,
struct dentry *dentry, u32 mask, umode_t mode)
{
struct path_cond cond = { current_fsuid(), mode };
if (!path_mediated_fs(dir->dentry))
return 0;
return common_perm_dir_dentry(op, dir, dentry, mask, &cond);
}
static int apparmor_path_unlink(const struct path *dir, struct dentry *dentry)
{
return common_perm_rm(OP_UNLINK, dir, dentry, AA_MAY_DELETE);
}
static int apparmor_path_mkdir(const struct path *dir, struct dentry *dentry,
umode_t mode)
{
return common_perm_create(OP_MKDIR, dir, dentry, AA_MAY_CREATE,
S_IFDIR);
}
static int apparmor_path_rmdir(const struct path *dir, struct dentry *dentry)
{
return common_perm_rm(OP_RMDIR, dir, dentry, AA_MAY_DELETE);
}
static int apparmor_path_mknod(const struct path *dir, struct dentry *dentry,
umode_t mode, unsigned int dev)
{
return common_perm_create(OP_MKNOD, dir, dentry, AA_MAY_CREATE, mode);
}
static int apparmor_path_truncate(const struct path *path)
{
return common_perm_cond(OP_TRUNC, path, MAY_WRITE | AA_MAY_SETATTR);
}
static int apparmor_file_truncate(struct file *file)
{
return apparmor_path_truncate(&file->f_path);
}
static int apparmor_path_symlink(const struct path *dir, struct dentry *dentry,
const char *old_name)
{
return common_perm_create(OP_SYMLINK, dir, dentry, AA_MAY_CREATE,
S_IFLNK);
}
static int apparmor_path_link(struct dentry *old_dentry, const struct path *new_dir,
struct dentry *new_dentry)
{
struct aa_label *label;
int error = 0;
if (!path_mediated_fs(old_dentry))
return 0;
label = begin_current_label_crit_section();
if (!unconfined(label))
error = aa_path_link(current_cred(), label, old_dentry, new_dir,
new_dentry);
end_current_label_crit_section(label);
return error;
}
static int apparmor_path_rename(const struct path *old_dir, struct dentry *old_dentry,
const struct path *new_dir, struct dentry *new_dentry,
const unsigned int flags)
{
struct aa_label *label;
int error = 0;
if (!path_mediated_fs(old_dentry))
return 0;
if ((flags & RENAME_EXCHANGE) && !path_mediated_fs(new_dentry))
return 0;
label = begin_current_label_crit_section();
if (!unconfined(label)) {
struct mnt_idmap *idmap = mnt_idmap(old_dir->mnt);
vfsuid_t vfsuid;
struct path old_path = { .mnt = old_dir->mnt,
.dentry = old_dentry };
struct path new_path = { .mnt = new_dir->mnt,
.dentry = new_dentry };
struct path_cond cond = {
.mode = d_backing_inode(old_dentry)->i_mode
};
vfsuid = i_uid_into_vfsuid(idmap, d_backing_inode(old_dentry));
cond.uid = vfsuid_into_kuid(vfsuid);
if (flags & RENAME_EXCHANGE) {
struct path_cond cond_exchange = {
.mode = d_backing_inode(new_dentry)->i_mode,
};
vfsuid = i_uid_into_vfsuid(idmap, d_backing_inode(old_dentry));
cond_exchange.uid = vfsuid_into_kuid(vfsuid);
error = aa_path_perm(OP_RENAME_SRC, current_cred(),
label, &new_path, 0,
MAY_READ | AA_MAY_GETATTR | MAY_WRITE |
AA_MAY_SETATTR | AA_MAY_DELETE,
&cond_exchange);
if (!error)
error = aa_path_perm(OP_RENAME_DEST, current_cred(),
label, &old_path,
0, MAY_WRITE | AA_MAY_SETATTR |
AA_MAY_CREATE, &cond_exchange);
}
if (!error)
error = aa_path_perm(OP_RENAME_SRC, current_cred(),
label, &old_path, 0,
MAY_READ | AA_MAY_GETATTR | MAY_WRITE |
AA_MAY_SETATTR | AA_MAY_DELETE,
&cond);
if (!error)
error = aa_path_perm(OP_RENAME_DEST, current_cred(),
label, &new_path,
0, MAY_WRITE | AA_MAY_SETATTR |
AA_MAY_CREATE, &cond);
}
end_current_label_crit_section(label);
return error;
}
static int apparmor_path_chmod(const struct path *path, umode_t mode)
{
return common_perm_cond(OP_CHMOD, path, AA_MAY_CHMOD);
}
static int apparmor_path_chown(const struct path *path, kuid_t uid, kgid_t gid)
{
return common_perm_cond(OP_CHOWN, path, AA_MAY_CHOWN);
}
static int apparmor_inode_getattr(const struct path *path)
{
return common_perm_cond(OP_GETATTR, path, AA_MAY_GETATTR);
}
static int apparmor_file_open(struct file *file)
{
struct aa_file_ctx *fctx = file_ctx(file);
struct aa_label *label;
int error = 0;
bool needput;
if (!path_mediated_fs(file->f_path.dentry))
return 0;
/* If in exec, permission is handled by bprm hooks.
* Cache permissions granted by the previous exec check, with
* implicit read and executable mmap which are required to
* actually execute the image.
exec: Check __FMODE_EXEC instead of in_execve for LSMs After commit 978ffcbf00d8 ("execve: open the executable file before doing anything else"), current->in_execve was no longer in sync with the open(). This broke AppArmor and TOMOYO which depend on this flag to distinguish "open" operations from being "exec" operations. Instead of moving around in_execve, switch to using __FMODE_EXEC, which is where the "is this an exec?" intent is stored. Note that TOMOYO still uses in_execve around cred handling. Reported-by: Kevin Locke <kevin@kevinlocke.name> Closes: https://lore.kernel.org/all/ZbE4qn9_h14OqADK@kevinlocke.name Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: 978ffcbf00d8 ("execve: open the executable file before doing anything else") Cc: Josh Triplett <josh@joshtriplett.org> Cc: John Johansen <john.johansen@canonical.com> Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: <linux-fsdevel@vger.kernel.org> Cc: <linux-mm@kvack.org> Cc: <apparmor@lists.ubuntu.com> Cc: <linux-security-module@vger.kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-01-24 19:22:32 +00:00
*
* Illogically, FMODE_EXEC is in f_flags, not f_mode.
*/
exec: Check __FMODE_EXEC instead of in_execve for LSMs After commit 978ffcbf00d8 ("execve: open the executable file before doing anything else"), current->in_execve was no longer in sync with the open(). This broke AppArmor and TOMOYO which depend on this flag to distinguish "open" operations from being "exec" operations. Instead of moving around in_execve, switch to using __FMODE_EXEC, which is where the "is this an exec?" intent is stored. Note that TOMOYO still uses in_execve around cred handling. Reported-by: Kevin Locke <kevin@kevinlocke.name> Closes: https://lore.kernel.org/all/ZbE4qn9_h14OqADK@kevinlocke.name Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: 978ffcbf00d8 ("execve: open the executable file before doing anything else") Cc: Josh Triplett <josh@joshtriplett.org> Cc: John Johansen <john.johansen@canonical.com> Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: <linux-fsdevel@vger.kernel.org> Cc: <linux-mm@kvack.org> Cc: <apparmor@lists.ubuntu.com> Cc: <linux-security-module@vger.kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-01-24 19:22:32 +00:00
if (file->f_flags & __FMODE_EXEC) {
fctx->allow = MAY_EXEC | MAY_READ | AA_EXEC_MMAP;
return 0;
}
label = aa_get_newest_cred_label_condref(file->f_cred, &needput);
if (!unconfined(label)) {
struct mnt_idmap *idmap = file_mnt_idmap(file);
struct inode *inode = file_inode(file);
vfsuid_t vfsuid;
struct path_cond cond = {
.mode = inode->i_mode,
};
vfsuid = i_uid_into_vfsuid(idmap, inode);
cond.uid = vfsuid_into_kuid(vfsuid);
error = aa_path_perm(OP_OPEN, file->f_cred,
label, &file->f_path, 0,
aa_map_file_to_perms(file), &cond);
/* todo cache full allowed permissions set and state */
fctx->allow = aa_map_file_to_perms(file);
}
aa_put_label_condref(label, needput);
return error;
}
static int apparmor_file_alloc_security(struct file *file)
{
struct aa_file_ctx *ctx = file_ctx(file);
struct aa_label *label = begin_current_label_crit_section();
spin_lock_init(&ctx->lock);
rcu_assign_pointer(ctx->label, aa_get_label(label));
end_current_label_crit_section(label);
return 0;
}
static void apparmor_file_free_security(struct file *file)
{
struct aa_file_ctx *ctx = file_ctx(file);
if (ctx)
aa_put_label(rcu_access_pointer(ctx->label));
}
static int common_file_perm(const char *op, struct file *file, u32 mask,
bool in_atomic)
{
struct aa_label *label;
int error = 0;
/* don't reaudit files closed during inheritance */
if (file->f_path.dentry == aa_null.dentry)
return -EACCES;
label = __begin_current_label_crit_section();
error = aa_file_perm(op, current_cred(), label, file, mask, in_atomic);
__end_current_label_crit_section(label);
return error;
}
static int apparmor_file_receive(struct file *file)
{
return common_file_perm(OP_FRECEIVE, file, aa_map_file_to_perms(file),
false);
}
static int apparmor_file_permission(struct file *file, int mask)
{
return common_file_perm(OP_FPERM, file, mask, false);
}
static int apparmor_file_lock(struct file *file, unsigned int cmd)
{
u32 mask = AA_MAY_LOCK;
if (cmd == F_WRLCK)
mask |= MAY_WRITE;
return common_file_perm(OP_FLOCK, file, mask, false);
}
static int common_mmap(const char *op, struct file *file, unsigned long prot,
unsigned long flags, bool in_atomic)
{
int mask = 0;
if (!file || !file_ctx(file))
return 0;
if (prot & PROT_READ)
mask |= MAY_READ;
/*
* Private mappings don't require write perms since they don't
* write back to the files
*/
if ((prot & PROT_WRITE) && !(flags & MAP_PRIVATE))
mask |= MAY_WRITE;
if (prot & PROT_EXEC)
mask |= AA_EXEC_MMAP;
return common_file_perm(op, file, mask, in_atomic);
}
static int apparmor_mmap_file(struct file *file, unsigned long reqprot,
unsigned long prot, unsigned long flags)
{
return common_mmap(OP_FMMAP, file, prot, flags, GFP_ATOMIC);
}
static int apparmor_file_mprotect(struct vm_area_struct *vma,
unsigned long reqprot, unsigned long prot)
{
return common_mmap(OP_FMPROT, vma->vm_file, prot,
!(vma->vm_flags & VM_SHARED) ? MAP_PRIVATE : 0,
false);
}
#ifdef CONFIG_IO_URING
static const char *audit_uring_mask(u32 mask)
{
if (mask & AA_MAY_CREATE_SQPOLL)
return "sqpoll";
if (mask & AA_MAY_OVERRIDE_CRED)
return "override_creds";
return "";
}
static void audit_uring_cb(struct audit_buffer *ab, void *va)
{
struct apparmor_audit_data *ad = aad_of_va(va);
if (ad->request & AA_URING_PERM_MASK) {
audit_log_format(ab, " requested=\"%s\"",
audit_uring_mask(ad->request));
if (ad->denied & AA_URING_PERM_MASK) {
audit_log_format(ab, " denied=\"%s\"",
audit_uring_mask(ad->denied));
}
}
if (ad->uring.target) {
audit_log_format(ab, " tcontext=");
aa_label_xaudit(ab, labels_ns(ad->subj_label),
ad->uring.target,
FLAGS_NONE, GFP_ATOMIC);
}
}
static int profile_uring(struct aa_profile *profile, u32 request,
struct aa_label *new, int cap,
struct apparmor_audit_data *ad)
{
unsigned int state;
struct aa_ruleset *rules;
int error = 0;
AA_BUG(!profile);
rules = list_first_entry(&profile->rules, typeof(*rules), list);
state = RULE_MEDIATES(rules, AA_CLASS_IO_URING);
if (state) {
struct aa_perms perms = { };
if (new) {
aa_label_match(profile, rules, new, state,
false, request, &perms);
} else {
perms = *aa_lookup_perms(rules->policy, state);
}
aa_apply_modes_to_perms(profile, &perms);
error = aa_check_perms(profile, &perms, request, ad,
audit_uring_cb);
}
return error;
}
/**
* apparmor_uring_override_creds - check the requested cred override
* @new: the target creds
*
* Check to see if the current task is allowed to override it's credentials
* to service an io_uring operation.
*/
static int apparmor_uring_override_creds(const struct cred *new)
{
struct aa_profile *profile;
struct aa_label *label;
int error;
DEFINE_AUDIT_DATA(ad, LSM_AUDIT_DATA_NONE, AA_CLASS_IO_URING,
OP_URING_OVERRIDE);
ad.uring.target = cred_label(new);
label = __begin_current_label_crit_section();
error = fn_for_each(label, profile,
profile_uring(profile, AA_MAY_OVERRIDE_CRED,
cred_label(new), CAP_SYS_ADMIN, &ad));
__end_current_label_crit_section(label);
return error;
}
/**
* apparmor_uring_sqpoll - check if a io_uring polling thread can be created
*
* Check to see if the current task is allowed to create a new io_uring
* kernel polling thread.
*/
static int apparmor_uring_sqpoll(void)
{
struct aa_profile *profile;
struct aa_label *label;
int error;
DEFINE_AUDIT_DATA(ad, LSM_AUDIT_DATA_NONE, AA_CLASS_IO_URING,
OP_URING_SQPOLL);
label = __begin_current_label_crit_section();
error = fn_for_each(label, profile,
profile_uring(profile, AA_MAY_CREATE_SQPOLL,
NULL, CAP_SYS_ADMIN, &ad));
__end_current_label_crit_section(label);
return error;
}
#endif /* CONFIG_IO_URING */
apparmor: add mount mediation Add basic mount mediation. That allows controlling based on basic mount parameters. It does not include special mount parameters for apparmor, super block labeling, or any triggers for apparmor namespace parameter modifications on pivot root. default userspace policy rules have the form of MOUNT RULE = ( MOUNT | REMOUNT | UMOUNT ) MOUNT = [ QUALIFIERS ] 'mount' [ MOUNT CONDITIONS ] [ SOURCE FILEGLOB ] [ '->' MOUNTPOINT FILEGLOB ] REMOUNT = [ QUALIFIERS ] 'remount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB UMOUNT = [ QUALIFIERS ] 'umount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB MOUNT CONDITIONS = [ ( 'fstype' | 'vfstype' ) ( '=' | 'in' ) MOUNT FSTYPE EXPRESSION ] [ 'options' ( '=' | 'in' ) MOUNT FLAGS EXPRESSION ] MOUNT FSTYPE EXPRESSION = ( MOUNT FSTYPE LIST | MOUNT EXPRESSION ) MOUNT FSTYPE LIST = Comma separated list of valid filesystem and virtual filesystem types (eg ext4, debugfs, etc) MOUNT FLAGS EXPRESSION = ( MOUNT FLAGS LIST | MOUNT EXPRESSION ) MOUNT FLAGS LIST = Comma separated list of MOUNT FLAGS. MOUNT FLAGS = ( 'ro' | 'rw' | 'nosuid' | 'suid' | 'nodev' | 'dev' | 'noexec' | 'exec' | 'sync' | 'async' | 'remount' | 'mand' | 'nomand' | 'dirsync' | 'noatime' | 'atime' | 'nodiratime' | 'diratime' | 'bind' | 'rbind' | 'move' | 'verbose' | 'silent' | 'loud' | 'acl' | 'noacl' | 'unbindable' | 'runbindable' | 'private' | 'rprivate' | 'slave' | 'rslave' | 'shared' | 'rshared' | 'relatime' | 'norelatime' | 'iversion' | 'noiversion' | 'strictatime' | 'nouser' | 'user' ) MOUNT EXPRESSION = ( ALPHANUMERIC | AARE ) ... PIVOT ROOT RULE = [ QUALIFIERS ] pivot_root [ oldroot=OLD PUT FILEGLOB ] [ NEW ROOT FILEGLOB ] SOURCE FILEGLOB = FILEGLOB MOUNTPOINT FILEGLOB = FILEGLOB eg. mount, mount /dev/foo, mount options=ro /dev/foo -> /mnt/, mount options in (ro,atime) /dev/foo -> /mnt/, mount options=ro options=atime, Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2017-07-19 06:04:47 +00:00
static int apparmor_sb_mount(const char *dev_name, const struct path *path,
const char *type, unsigned long flags, void *data)
{
struct aa_label *label;
int error = 0;
/* Discard magic */
if ((flags & MS_MGC_MSK) == MS_MGC_VAL)
flags &= ~MS_MGC_MSK;
flags &= ~AA_MS_IGNORE_MASK;
label = __begin_current_label_crit_section();
if (!unconfined(label)) {
if (flags & MS_REMOUNT)
error = aa_remount(current_cred(), label, path, flags,
data);
apparmor: add mount mediation Add basic mount mediation. That allows controlling based on basic mount parameters. It does not include special mount parameters for apparmor, super block labeling, or any triggers for apparmor namespace parameter modifications on pivot root. default userspace policy rules have the form of MOUNT RULE = ( MOUNT | REMOUNT | UMOUNT ) MOUNT = [ QUALIFIERS ] 'mount' [ MOUNT CONDITIONS ] [ SOURCE FILEGLOB ] [ '->' MOUNTPOINT FILEGLOB ] REMOUNT = [ QUALIFIERS ] 'remount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB UMOUNT = [ QUALIFIERS ] 'umount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB MOUNT CONDITIONS = [ ( 'fstype' | 'vfstype' ) ( '=' | 'in' ) MOUNT FSTYPE EXPRESSION ] [ 'options' ( '=' | 'in' ) MOUNT FLAGS EXPRESSION ] MOUNT FSTYPE EXPRESSION = ( MOUNT FSTYPE LIST | MOUNT EXPRESSION ) MOUNT FSTYPE LIST = Comma separated list of valid filesystem and virtual filesystem types (eg ext4, debugfs, etc) MOUNT FLAGS EXPRESSION = ( MOUNT FLAGS LIST | MOUNT EXPRESSION ) MOUNT FLAGS LIST = Comma separated list of MOUNT FLAGS. MOUNT FLAGS = ( 'ro' | 'rw' | 'nosuid' | 'suid' | 'nodev' | 'dev' | 'noexec' | 'exec' | 'sync' | 'async' | 'remount' | 'mand' | 'nomand' | 'dirsync' | 'noatime' | 'atime' | 'nodiratime' | 'diratime' | 'bind' | 'rbind' | 'move' | 'verbose' | 'silent' | 'loud' | 'acl' | 'noacl' | 'unbindable' | 'runbindable' | 'private' | 'rprivate' | 'slave' | 'rslave' | 'shared' | 'rshared' | 'relatime' | 'norelatime' | 'iversion' | 'noiversion' | 'strictatime' | 'nouser' | 'user' ) MOUNT EXPRESSION = ( ALPHANUMERIC | AARE ) ... PIVOT ROOT RULE = [ QUALIFIERS ] pivot_root [ oldroot=OLD PUT FILEGLOB ] [ NEW ROOT FILEGLOB ] SOURCE FILEGLOB = FILEGLOB MOUNTPOINT FILEGLOB = FILEGLOB eg. mount, mount /dev/foo, mount options=ro /dev/foo -> /mnt/, mount options in (ro,atime) /dev/foo -> /mnt/, mount options=ro options=atime, Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2017-07-19 06:04:47 +00:00
else if (flags & MS_BIND)
error = aa_bind_mount(current_cred(), label, path,
dev_name, flags);
apparmor: add mount mediation Add basic mount mediation. That allows controlling based on basic mount parameters. It does not include special mount parameters for apparmor, super block labeling, or any triggers for apparmor namespace parameter modifications on pivot root. default userspace policy rules have the form of MOUNT RULE = ( MOUNT | REMOUNT | UMOUNT ) MOUNT = [ QUALIFIERS ] 'mount' [ MOUNT CONDITIONS ] [ SOURCE FILEGLOB ] [ '->' MOUNTPOINT FILEGLOB ] REMOUNT = [ QUALIFIERS ] 'remount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB UMOUNT = [ QUALIFIERS ] 'umount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB MOUNT CONDITIONS = [ ( 'fstype' | 'vfstype' ) ( '=' | 'in' ) MOUNT FSTYPE EXPRESSION ] [ 'options' ( '=' | 'in' ) MOUNT FLAGS EXPRESSION ] MOUNT FSTYPE EXPRESSION = ( MOUNT FSTYPE LIST | MOUNT EXPRESSION ) MOUNT FSTYPE LIST = Comma separated list of valid filesystem and virtual filesystem types (eg ext4, debugfs, etc) MOUNT FLAGS EXPRESSION = ( MOUNT FLAGS LIST | MOUNT EXPRESSION ) MOUNT FLAGS LIST = Comma separated list of MOUNT FLAGS. MOUNT FLAGS = ( 'ro' | 'rw' | 'nosuid' | 'suid' | 'nodev' | 'dev' | 'noexec' | 'exec' | 'sync' | 'async' | 'remount' | 'mand' | 'nomand' | 'dirsync' | 'noatime' | 'atime' | 'nodiratime' | 'diratime' | 'bind' | 'rbind' | 'move' | 'verbose' | 'silent' | 'loud' | 'acl' | 'noacl' | 'unbindable' | 'runbindable' | 'private' | 'rprivate' | 'slave' | 'rslave' | 'shared' | 'rshared' | 'relatime' | 'norelatime' | 'iversion' | 'noiversion' | 'strictatime' | 'nouser' | 'user' ) MOUNT EXPRESSION = ( ALPHANUMERIC | AARE ) ... PIVOT ROOT RULE = [ QUALIFIERS ] pivot_root [ oldroot=OLD PUT FILEGLOB ] [ NEW ROOT FILEGLOB ] SOURCE FILEGLOB = FILEGLOB MOUNTPOINT FILEGLOB = FILEGLOB eg. mount, mount /dev/foo, mount options=ro /dev/foo -> /mnt/, mount options in (ro,atime) /dev/foo -> /mnt/, mount options=ro options=atime, Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2017-07-19 06:04:47 +00:00
else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE |
MS_UNBINDABLE))
error = aa_mount_change_type(current_cred(), label,
path, flags);
apparmor: add mount mediation Add basic mount mediation. That allows controlling based on basic mount parameters. It does not include special mount parameters for apparmor, super block labeling, or any triggers for apparmor namespace parameter modifications on pivot root. default userspace policy rules have the form of MOUNT RULE = ( MOUNT | REMOUNT | UMOUNT ) MOUNT = [ QUALIFIERS ] 'mount' [ MOUNT CONDITIONS ] [ SOURCE FILEGLOB ] [ '->' MOUNTPOINT FILEGLOB ] REMOUNT = [ QUALIFIERS ] 'remount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB UMOUNT = [ QUALIFIERS ] 'umount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB MOUNT CONDITIONS = [ ( 'fstype' | 'vfstype' ) ( '=' | 'in' ) MOUNT FSTYPE EXPRESSION ] [ 'options' ( '=' | 'in' ) MOUNT FLAGS EXPRESSION ] MOUNT FSTYPE EXPRESSION = ( MOUNT FSTYPE LIST | MOUNT EXPRESSION ) MOUNT FSTYPE LIST = Comma separated list of valid filesystem and virtual filesystem types (eg ext4, debugfs, etc) MOUNT FLAGS EXPRESSION = ( MOUNT FLAGS LIST | MOUNT EXPRESSION ) MOUNT FLAGS LIST = Comma separated list of MOUNT FLAGS. MOUNT FLAGS = ( 'ro' | 'rw' | 'nosuid' | 'suid' | 'nodev' | 'dev' | 'noexec' | 'exec' | 'sync' | 'async' | 'remount' | 'mand' | 'nomand' | 'dirsync' | 'noatime' | 'atime' | 'nodiratime' | 'diratime' | 'bind' | 'rbind' | 'move' | 'verbose' | 'silent' | 'loud' | 'acl' | 'noacl' | 'unbindable' | 'runbindable' | 'private' | 'rprivate' | 'slave' | 'rslave' | 'shared' | 'rshared' | 'relatime' | 'norelatime' | 'iversion' | 'noiversion' | 'strictatime' | 'nouser' | 'user' ) MOUNT EXPRESSION = ( ALPHANUMERIC | AARE ) ... PIVOT ROOT RULE = [ QUALIFIERS ] pivot_root [ oldroot=OLD PUT FILEGLOB ] [ NEW ROOT FILEGLOB ] SOURCE FILEGLOB = FILEGLOB MOUNTPOINT FILEGLOB = FILEGLOB eg. mount, mount /dev/foo, mount options=ro /dev/foo -> /mnt/, mount options in (ro,atime) /dev/foo -> /mnt/, mount options=ro options=atime, Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2017-07-19 06:04:47 +00:00
else if (flags & MS_MOVE)
error = aa_move_mount_old(current_cred(), label, path,
dev_name);
apparmor: add mount mediation Add basic mount mediation. That allows controlling based on basic mount parameters. It does not include special mount parameters for apparmor, super block labeling, or any triggers for apparmor namespace parameter modifications on pivot root. default userspace policy rules have the form of MOUNT RULE = ( MOUNT | REMOUNT | UMOUNT ) MOUNT = [ QUALIFIERS ] 'mount' [ MOUNT CONDITIONS ] [ SOURCE FILEGLOB ] [ '->' MOUNTPOINT FILEGLOB ] REMOUNT = [ QUALIFIERS ] 'remount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB UMOUNT = [ QUALIFIERS ] 'umount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB MOUNT CONDITIONS = [ ( 'fstype' | 'vfstype' ) ( '=' | 'in' ) MOUNT FSTYPE EXPRESSION ] [ 'options' ( '=' | 'in' ) MOUNT FLAGS EXPRESSION ] MOUNT FSTYPE EXPRESSION = ( MOUNT FSTYPE LIST | MOUNT EXPRESSION ) MOUNT FSTYPE LIST = Comma separated list of valid filesystem and virtual filesystem types (eg ext4, debugfs, etc) MOUNT FLAGS EXPRESSION = ( MOUNT FLAGS LIST | MOUNT EXPRESSION ) MOUNT FLAGS LIST = Comma separated list of MOUNT FLAGS. MOUNT FLAGS = ( 'ro' | 'rw' | 'nosuid' | 'suid' | 'nodev' | 'dev' | 'noexec' | 'exec' | 'sync' | 'async' | 'remount' | 'mand' | 'nomand' | 'dirsync' | 'noatime' | 'atime' | 'nodiratime' | 'diratime' | 'bind' | 'rbind' | 'move' | 'verbose' | 'silent' | 'loud' | 'acl' | 'noacl' | 'unbindable' | 'runbindable' | 'private' | 'rprivate' | 'slave' | 'rslave' | 'shared' | 'rshared' | 'relatime' | 'norelatime' | 'iversion' | 'noiversion' | 'strictatime' | 'nouser' | 'user' ) MOUNT EXPRESSION = ( ALPHANUMERIC | AARE ) ... PIVOT ROOT RULE = [ QUALIFIERS ] pivot_root [ oldroot=OLD PUT FILEGLOB ] [ NEW ROOT FILEGLOB ] SOURCE FILEGLOB = FILEGLOB MOUNTPOINT FILEGLOB = FILEGLOB eg. mount, mount /dev/foo, mount options=ro /dev/foo -> /mnt/, mount options in (ro,atime) /dev/foo -> /mnt/, mount options=ro options=atime, Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2017-07-19 06:04:47 +00:00
else
error = aa_new_mount(current_cred(), label, dev_name,
path, type, flags, data);
apparmor: add mount mediation Add basic mount mediation. That allows controlling based on basic mount parameters. It does not include special mount parameters for apparmor, super block labeling, or any triggers for apparmor namespace parameter modifications on pivot root. default userspace policy rules have the form of MOUNT RULE = ( MOUNT | REMOUNT | UMOUNT ) MOUNT = [ QUALIFIERS ] 'mount' [ MOUNT CONDITIONS ] [ SOURCE FILEGLOB ] [ '->' MOUNTPOINT FILEGLOB ] REMOUNT = [ QUALIFIERS ] 'remount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB UMOUNT = [ QUALIFIERS ] 'umount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB MOUNT CONDITIONS = [ ( 'fstype' | 'vfstype' ) ( '=' | 'in' ) MOUNT FSTYPE EXPRESSION ] [ 'options' ( '=' | 'in' ) MOUNT FLAGS EXPRESSION ] MOUNT FSTYPE EXPRESSION = ( MOUNT FSTYPE LIST | MOUNT EXPRESSION ) MOUNT FSTYPE LIST = Comma separated list of valid filesystem and virtual filesystem types (eg ext4, debugfs, etc) MOUNT FLAGS EXPRESSION = ( MOUNT FLAGS LIST | MOUNT EXPRESSION ) MOUNT FLAGS LIST = Comma separated list of MOUNT FLAGS. MOUNT FLAGS = ( 'ro' | 'rw' | 'nosuid' | 'suid' | 'nodev' | 'dev' | 'noexec' | 'exec' | 'sync' | 'async' | 'remount' | 'mand' | 'nomand' | 'dirsync' | 'noatime' | 'atime' | 'nodiratime' | 'diratime' | 'bind' | 'rbind' | 'move' | 'verbose' | 'silent' | 'loud' | 'acl' | 'noacl' | 'unbindable' | 'runbindable' | 'private' | 'rprivate' | 'slave' | 'rslave' | 'shared' | 'rshared' | 'relatime' | 'norelatime' | 'iversion' | 'noiversion' | 'strictatime' | 'nouser' | 'user' ) MOUNT EXPRESSION = ( ALPHANUMERIC | AARE ) ... PIVOT ROOT RULE = [ QUALIFIERS ] pivot_root [ oldroot=OLD PUT FILEGLOB ] [ NEW ROOT FILEGLOB ] SOURCE FILEGLOB = FILEGLOB MOUNTPOINT FILEGLOB = FILEGLOB eg. mount, mount /dev/foo, mount options=ro /dev/foo -> /mnt/, mount options in (ro,atime) /dev/foo -> /mnt/, mount options=ro options=atime, Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2017-07-19 06:04:47 +00:00
}
__end_current_label_crit_section(label);
return error;
}
static int apparmor_move_mount(const struct path *from_path,
const struct path *to_path)
{
struct aa_label *label;
int error = 0;
label = __begin_current_label_crit_section();
if (!unconfined(label))
error = aa_move_mount(current_cred(), label, from_path,
to_path);
__end_current_label_crit_section(label);
return error;
}
apparmor: add mount mediation Add basic mount mediation. That allows controlling based on basic mount parameters. It does not include special mount parameters for apparmor, super block labeling, or any triggers for apparmor namespace parameter modifications on pivot root. default userspace policy rules have the form of MOUNT RULE = ( MOUNT | REMOUNT | UMOUNT ) MOUNT = [ QUALIFIERS ] 'mount' [ MOUNT CONDITIONS ] [ SOURCE FILEGLOB ] [ '->' MOUNTPOINT FILEGLOB ] REMOUNT = [ QUALIFIERS ] 'remount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB UMOUNT = [ QUALIFIERS ] 'umount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB MOUNT CONDITIONS = [ ( 'fstype' | 'vfstype' ) ( '=' | 'in' ) MOUNT FSTYPE EXPRESSION ] [ 'options' ( '=' | 'in' ) MOUNT FLAGS EXPRESSION ] MOUNT FSTYPE EXPRESSION = ( MOUNT FSTYPE LIST | MOUNT EXPRESSION ) MOUNT FSTYPE LIST = Comma separated list of valid filesystem and virtual filesystem types (eg ext4, debugfs, etc) MOUNT FLAGS EXPRESSION = ( MOUNT FLAGS LIST | MOUNT EXPRESSION ) MOUNT FLAGS LIST = Comma separated list of MOUNT FLAGS. MOUNT FLAGS = ( 'ro' | 'rw' | 'nosuid' | 'suid' | 'nodev' | 'dev' | 'noexec' | 'exec' | 'sync' | 'async' | 'remount' | 'mand' | 'nomand' | 'dirsync' | 'noatime' | 'atime' | 'nodiratime' | 'diratime' | 'bind' | 'rbind' | 'move' | 'verbose' | 'silent' | 'loud' | 'acl' | 'noacl' | 'unbindable' | 'runbindable' | 'private' | 'rprivate' | 'slave' | 'rslave' | 'shared' | 'rshared' | 'relatime' | 'norelatime' | 'iversion' | 'noiversion' | 'strictatime' | 'nouser' | 'user' ) MOUNT EXPRESSION = ( ALPHANUMERIC | AARE ) ... PIVOT ROOT RULE = [ QUALIFIERS ] pivot_root [ oldroot=OLD PUT FILEGLOB ] [ NEW ROOT FILEGLOB ] SOURCE FILEGLOB = FILEGLOB MOUNTPOINT FILEGLOB = FILEGLOB eg. mount, mount /dev/foo, mount options=ro /dev/foo -> /mnt/, mount options in (ro,atime) /dev/foo -> /mnt/, mount options=ro options=atime, Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2017-07-19 06:04:47 +00:00
static int apparmor_sb_umount(struct vfsmount *mnt, int flags)
{
struct aa_label *label;
int error = 0;
label = __begin_current_label_crit_section();
if (!unconfined(label))
error = aa_umount(current_cred(), label, mnt, flags);
apparmor: add mount mediation Add basic mount mediation. That allows controlling based on basic mount parameters. It does not include special mount parameters for apparmor, super block labeling, or any triggers for apparmor namespace parameter modifications on pivot root. default userspace policy rules have the form of MOUNT RULE = ( MOUNT | REMOUNT | UMOUNT ) MOUNT = [ QUALIFIERS ] 'mount' [ MOUNT CONDITIONS ] [ SOURCE FILEGLOB ] [ '->' MOUNTPOINT FILEGLOB ] REMOUNT = [ QUALIFIERS ] 'remount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB UMOUNT = [ QUALIFIERS ] 'umount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB MOUNT CONDITIONS = [ ( 'fstype' | 'vfstype' ) ( '=' | 'in' ) MOUNT FSTYPE EXPRESSION ] [ 'options' ( '=' | 'in' ) MOUNT FLAGS EXPRESSION ] MOUNT FSTYPE EXPRESSION = ( MOUNT FSTYPE LIST | MOUNT EXPRESSION ) MOUNT FSTYPE LIST = Comma separated list of valid filesystem and virtual filesystem types (eg ext4, debugfs, etc) MOUNT FLAGS EXPRESSION = ( MOUNT FLAGS LIST | MOUNT EXPRESSION ) MOUNT FLAGS LIST = Comma separated list of MOUNT FLAGS. MOUNT FLAGS = ( 'ro' | 'rw' | 'nosuid' | 'suid' | 'nodev' | 'dev' | 'noexec' | 'exec' | 'sync' | 'async' | 'remount' | 'mand' | 'nomand' | 'dirsync' | 'noatime' | 'atime' | 'nodiratime' | 'diratime' | 'bind' | 'rbind' | 'move' | 'verbose' | 'silent' | 'loud' | 'acl' | 'noacl' | 'unbindable' | 'runbindable' | 'private' | 'rprivate' | 'slave' | 'rslave' | 'shared' | 'rshared' | 'relatime' | 'norelatime' | 'iversion' | 'noiversion' | 'strictatime' | 'nouser' | 'user' ) MOUNT EXPRESSION = ( ALPHANUMERIC | AARE ) ... PIVOT ROOT RULE = [ QUALIFIERS ] pivot_root [ oldroot=OLD PUT FILEGLOB ] [ NEW ROOT FILEGLOB ] SOURCE FILEGLOB = FILEGLOB MOUNTPOINT FILEGLOB = FILEGLOB eg. mount, mount /dev/foo, mount options=ro /dev/foo -> /mnt/, mount options in (ro,atime) /dev/foo -> /mnt/, mount options=ro options=atime, Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2017-07-19 06:04:47 +00:00
__end_current_label_crit_section(label);
return error;
}
static int apparmor_sb_pivotroot(const struct path *old_path,
const struct path *new_path)
{
struct aa_label *label;
int error = 0;
label = aa_get_current_label();
if (!unconfined(label))
error = aa_pivotroot(current_cred(), label, old_path, new_path);
apparmor: add mount mediation Add basic mount mediation. That allows controlling based on basic mount parameters. It does not include special mount parameters for apparmor, super block labeling, or any triggers for apparmor namespace parameter modifications on pivot root. default userspace policy rules have the form of MOUNT RULE = ( MOUNT | REMOUNT | UMOUNT ) MOUNT = [ QUALIFIERS ] 'mount' [ MOUNT CONDITIONS ] [ SOURCE FILEGLOB ] [ '->' MOUNTPOINT FILEGLOB ] REMOUNT = [ QUALIFIERS ] 'remount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB UMOUNT = [ QUALIFIERS ] 'umount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB MOUNT CONDITIONS = [ ( 'fstype' | 'vfstype' ) ( '=' | 'in' ) MOUNT FSTYPE EXPRESSION ] [ 'options' ( '=' | 'in' ) MOUNT FLAGS EXPRESSION ] MOUNT FSTYPE EXPRESSION = ( MOUNT FSTYPE LIST | MOUNT EXPRESSION ) MOUNT FSTYPE LIST = Comma separated list of valid filesystem and virtual filesystem types (eg ext4, debugfs, etc) MOUNT FLAGS EXPRESSION = ( MOUNT FLAGS LIST | MOUNT EXPRESSION ) MOUNT FLAGS LIST = Comma separated list of MOUNT FLAGS. MOUNT FLAGS = ( 'ro' | 'rw' | 'nosuid' | 'suid' | 'nodev' | 'dev' | 'noexec' | 'exec' | 'sync' | 'async' | 'remount' | 'mand' | 'nomand' | 'dirsync' | 'noatime' | 'atime' | 'nodiratime' | 'diratime' | 'bind' | 'rbind' | 'move' | 'verbose' | 'silent' | 'loud' | 'acl' | 'noacl' | 'unbindable' | 'runbindable' | 'private' | 'rprivate' | 'slave' | 'rslave' | 'shared' | 'rshared' | 'relatime' | 'norelatime' | 'iversion' | 'noiversion' | 'strictatime' | 'nouser' | 'user' ) MOUNT EXPRESSION = ( ALPHANUMERIC | AARE ) ... PIVOT ROOT RULE = [ QUALIFIERS ] pivot_root [ oldroot=OLD PUT FILEGLOB ] [ NEW ROOT FILEGLOB ] SOURCE FILEGLOB = FILEGLOB MOUNTPOINT FILEGLOB = FILEGLOB eg. mount, mount /dev/foo, mount options=ro /dev/foo -> /mnt/, mount options in (ro,atime) /dev/foo -> /mnt/, mount options=ro options=atime, Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2017-07-19 06:04:47 +00:00
aa_put_label(label);
return error;
}
static int apparmor_getselfattr(unsigned int attr, struct lsm_ctx __user *lx,
u32 *size, u32 flags)
{
int error = -ENOENT;
struct aa_task_ctx *ctx = task_ctx(current);
struct aa_label *label = NULL;
char *value = NULL;
switch (attr) {
case LSM_ATTR_CURRENT:
label = aa_get_newest_label(cred_label(current_cred()));
break;
case LSM_ATTR_PREV:
if (ctx->previous)
label = aa_get_newest_label(ctx->previous);
break;
case LSM_ATTR_EXEC:
if (ctx->onexec)
label = aa_get_newest_label(ctx->onexec);
break;
default:
error = -EOPNOTSUPP;
break;
}
if (label) {
error = aa_getprocattr(label, &value, false);
if (error > 0)
error = lsm_fill_user_ctx(lx, size, value, error,
LSM_ID_APPARMOR, 0);
kfree(value);
}
aa_put_label(label);
if (error < 0)
return error;
return 1;
}
static int apparmor_getprocattr(struct task_struct *task, const char *name,
char **value)
{
int error = -ENOENT;
/* released below */
const struct cred *cred = get_task_cred(task);
struct aa_task_ctx *ctx = task_ctx(current);
struct aa_label *label = NULL;
if (strcmp(name, "current") == 0)
label = aa_get_newest_label(cred_label(cred));
else if (strcmp(name, "prev") == 0 && ctx->previous)
label = aa_get_newest_label(ctx->previous);
else if (strcmp(name, "exec") == 0 && ctx->onexec)
label = aa_get_newest_label(ctx->onexec);
else
error = -EINVAL;
if (label)
error = aa_getprocattr(label, value, true);
aa_put_label(label);
put_cred(cred);
return error;
}
static int do_setattr(u64 attr, void *value, size_t size)
{
char *command, *largs = NULL, *args = value;
size_t arg_size;
int error;
DEFINE_AUDIT_DATA(ad, LSM_AUDIT_DATA_NONE, AA_CLASS_NONE,
OP_SETPROCATTR);
if (size == 0)
return -EINVAL;
/* AppArmor requires that the buffer must be null terminated atm */
if (args[size - 1] != '\0') {
/* null terminate */
largs = args = kmalloc(size + 1, GFP_KERNEL);
if (!args)
return -ENOMEM;
memcpy(args, value, size);
args[size] = '\0';
}
error = -EINVAL;
args = strim(args);
command = strsep(&args, " ");
if (!args)
goto out;
args = skip_spaces(args);
if (!*args)
goto out;
arg_size = size - (args - (largs ? largs : (char *) value));
if (attr == LSM_ATTR_CURRENT) {
if (strcmp(command, "changehat") == 0) {
error = aa_setprocattr_changehat(args, arg_size,
AA_CHANGE_NOFLAGS);
} else if (strcmp(command, "permhat") == 0) {
error = aa_setprocattr_changehat(args, arg_size,
AA_CHANGE_TEST);
} else if (strcmp(command, "changeprofile") == 0) {
error = aa_change_profile(args, AA_CHANGE_NOFLAGS);
} else if (strcmp(command, "permprofile") == 0) {
error = aa_change_profile(args, AA_CHANGE_TEST);
} else if (strcmp(command, "stack") == 0) {
error = aa_change_profile(args, AA_CHANGE_STACK);
} else
goto fail;
} else if (attr == LSM_ATTR_EXEC) {
if (strcmp(command, "exec") == 0)
error = aa_change_profile(args, AA_CHANGE_ONEXEC);
else if (strcmp(command, "stack") == 0)
error = aa_change_profile(args, (AA_CHANGE_ONEXEC |
AA_CHANGE_STACK));
else
goto fail;
} else
/* only support the "current" and "exec" process attributes */
goto fail;
if (!error)
error = size;
out:
kfree(largs);
return error;
fail:
ad.subj_label = begin_current_label_crit_section();
if (attr == LSM_ATTR_CURRENT)
ad.info = "current";
else if (attr == LSM_ATTR_EXEC)
ad.info = "exec";
else
ad.info = "invalid";
ad.error = error = -EINVAL;
aa_audit_msg(AUDIT_APPARMOR_DENIED, &ad, NULL);
end_current_label_crit_section(ad.subj_label);
goto out;
}
static int apparmor_setselfattr(unsigned int attr, struct lsm_ctx *ctx,
u32 size, u32 flags)
{
int rc;
if (attr != LSM_ATTR_CURRENT && attr != LSM_ATTR_EXEC)
return -EOPNOTSUPP;
rc = do_setattr(attr, ctx->ctx, ctx->ctx_len);
if (rc > 0)
return 0;
return rc;
}
static int apparmor_setprocattr(const char *name, void *value,
size_t size)
{
int attr = lsm_name_to_attr(name);
if (attr)
return do_setattr(attr, value, size);
return -EINVAL;
}
/**
* apparmor_bprm_committing_creds - do task cleanup on committing new creds
* @bprm: binprm for the exec (NOT NULL)
*/
static void apparmor_bprm_committing_creds(const struct linux_binprm *bprm)
{
struct aa_label *label = aa_current_raw_label();
struct aa_label *new_label = cred_label(bprm->cred);
/* bail out if unconfined or not changing profile */
if ((new_label->proxy == label->proxy) ||
(unconfined(new_label)))
return;
aa_inherit_files(bprm->cred, current->files);
current->pdeath_signal = 0;
/* reset soft limits and set hard limits for the new label */
__aa_transition_rlimits(label, new_label);
}
/**
* apparmor_bprm_committed_creds() - do cleanup after new creds committed
* @bprm: binprm for the exec (NOT NULL)
*/
static void apparmor_bprm_committed_creds(const struct linux_binprm *bprm)
{
/* clear out temporary/transitional state from the context */
aa_clear_task_ctx_trans(task_ctx(current));
return;
}
static void apparmor_current_getsecid_subj(u32 *secid)
{
apparmor: Optimize retrieving current task secid When running will-it-scale[1] open2_process testcase, in a system with a large number of cores, a bottleneck in retrieving the current task secid was detected: 27.73% ima_file_check;do_open (inlined);path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_x64 (inlined);do_syscall_64;entry_SYSCALL_64_after_hwframe (inlined);__libc_open64 (inlined) 27.72% 0.01% [kernel.vmlinux] [k] security_current_getsecid_subj - - 27.71% security_current_getsecid_subj;ima_file_check;do_open (inlined);path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_x64 (inlined);do_syscall_64;entry_SYSCALL_64_after_hwframe (inlined);__libc_open64 (inlined) 27.71% 27.68% [kernel.vmlinux] [k] apparmor_current_getsecid_subj - - 19.94% __refcount_add (inlined);__refcount_inc (inlined);refcount_inc (inlined);kref_get (inlined);aa_get_label (inlined);aa_get_label (inlined);aa_get_current_label (inlined);apparmor_current_getsecid_subj;security_current_getsecid_subj;ima_file_check;do_open (inlined);path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_x64 (inlined);do_syscall_64;entry_SYSCALL_64_after_hwframe (inlined);__libc_open64 (inlined) 7.72% __refcount_sub_and_test (inlined);__refcount_dec_and_test (inlined);refcount_dec_and_test (inlined);kref_put (inlined);aa_put_label (inlined);aa_put_label (inlined);apparmor_current_getsecid_subj;security_current_getsecid_subj;ima_file_check;do_open (inlined);path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_x64 (inlined);do_syscall_64;entry_SYSCALL_64_after_hwframe (inlined);__libc_open64 (inlined) A large amount of time was spent in the refcount. The most common case is that the current task label is available, and no need to take references for that one. That is exactly what the critical section helpers do, make use of them. New perf output: 39.12% vfs_open;path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_64;entry_SYSCALL_64_after_hwframe;__libc_open64 (inlined) 39.07% 0.13% [kernel.vmlinux] [k] do_dentry_open - - 39.05% do_dentry_open;vfs_open;path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_64;entry_SYSCALL_64_after_hwframe;__libc_open64 (inlined) 38.71% 0.01% [kernel.vmlinux] [k] security_file_open - - 38.70% security_file_open;do_dentry_open;vfs_open;path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_64;entry_SYSCALL_64_after_hwframe;__libc_open64 (inlined) 38.65% 38.60% [kernel.vmlinux] [k] apparmor_file_open - - 38.65% apparmor_file_open;security_file_open;do_dentry_open;vfs_open;path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_64;entry_SYSCALL_64_after_hwframe;__libc_open64 (inlined) The result is a throughput improvement of around 20% across the board on the open2 testcase. On more realistic workloads the impact should be much less. [1] https://github.com/antonblanchard/will-it-scale Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2023-08-31 23:22:24 +00:00
struct aa_label *label = __begin_current_label_crit_section();
*secid = label->secid;
apparmor: Optimize retrieving current task secid When running will-it-scale[1] open2_process testcase, in a system with a large number of cores, a bottleneck in retrieving the current task secid was detected: 27.73% ima_file_check;do_open (inlined);path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_x64 (inlined);do_syscall_64;entry_SYSCALL_64_after_hwframe (inlined);__libc_open64 (inlined) 27.72% 0.01% [kernel.vmlinux] [k] security_current_getsecid_subj - - 27.71% security_current_getsecid_subj;ima_file_check;do_open (inlined);path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_x64 (inlined);do_syscall_64;entry_SYSCALL_64_after_hwframe (inlined);__libc_open64 (inlined) 27.71% 27.68% [kernel.vmlinux] [k] apparmor_current_getsecid_subj - - 19.94% __refcount_add (inlined);__refcount_inc (inlined);refcount_inc (inlined);kref_get (inlined);aa_get_label (inlined);aa_get_label (inlined);aa_get_current_label (inlined);apparmor_current_getsecid_subj;security_current_getsecid_subj;ima_file_check;do_open (inlined);path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_x64 (inlined);do_syscall_64;entry_SYSCALL_64_after_hwframe (inlined);__libc_open64 (inlined) 7.72% __refcount_sub_and_test (inlined);__refcount_dec_and_test (inlined);refcount_dec_and_test (inlined);kref_put (inlined);aa_put_label (inlined);aa_put_label (inlined);apparmor_current_getsecid_subj;security_current_getsecid_subj;ima_file_check;do_open (inlined);path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_x64 (inlined);do_syscall_64;entry_SYSCALL_64_after_hwframe (inlined);__libc_open64 (inlined) A large amount of time was spent in the refcount. The most common case is that the current task label is available, and no need to take references for that one. That is exactly what the critical section helpers do, make use of them. New perf output: 39.12% vfs_open;path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_64;entry_SYSCALL_64_after_hwframe;__libc_open64 (inlined) 39.07% 0.13% [kernel.vmlinux] [k] do_dentry_open - - 39.05% do_dentry_open;vfs_open;path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_64;entry_SYSCALL_64_after_hwframe;__libc_open64 (inlined) 38.71% 0.01% [kernel.vmlinux] [k] security_file_open - - 38.70% security_file_open;do_dentry_open;vfs_open;path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_64;entry_SYSCALL_64_after_hwframe;__libc_open64 (inlined) 38.65% 38.60% [kernel.vmlinux] [k] apparmor_file_open - - 38.65% apparmor_file_open;security_file_open;do_dentry_open;vfs_open;path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_64;entry_SYSCALL_64_after_hwframe;__libc_open64 (inlined) The result is a throughput improvement of around 20% across the board on the open2 testcase. On more realistic workloads the impact should be much less. [1] https://github.com/antonblanchard/will-it-scale Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2023-08-31 23:22:24 +00:00
__end_current_label_crit_section(label);
}
static void apparmor_task_getsecid_obj(struct task_struct *p, u32 *secid)
{
struct aa_label *label = aa_get_task_label(p);
*secid = label->secid;
aa_put_label(label);
}
static int apparmor_task_setrlimit(struct task_struct *task,
unsigned int resource, struct rlimit *new_rlim)
{
struct aa_label *label = __begin_current_label_crit_section();
int error = 0;
if (!unconfined(label))
error = aa_task_setrlimit(current_cred(), label, task,
resource, new_rlim);
__end_current_label_crit_section(label);
return error;
}
static int apparmor_task_kill(struct task_struct *target, struct kernel_siginfo *info,
int sig, const struct cred *cred)
{
const struct cred *tc;
struct aa_label *cl, *tl;
int error;
tc = get_task_cred(target);
tl = aa_get_newest_cred_label(tc);
if (cred) {
/*
* Dealing with USB IO specific behavior
*/
cl = aa_get_newest_cred_label(cred);
error = aa_may_signal(cred, cl, tc, tl, sig);
aa_put_label(cl);
} else {
cl = __begin_current_label_crit_section();
error = aa_may_signal(current_cred(), cl, tc, tl, sig);
__end_current_label_crit_section(cl);
}
aa_put_label(tl);
put_cred(tc);
return error;
}
static int apparmor_userns_create(const struct cred *cred)
{
struct aa_label *label;
struct aa_profile *profile;
int error = 0;
DEFINE_AUDIT_DATA(ad, LSM_AUDIT_DATA_TASK, AA_CLASS_NS,
OP_USERNS_CREATE);
ad.subj_cred = current_cred();
label = begin_current_label_crit_section();
if (!unconfined(label)) {
error = fn_for_each(label, profile,
aa_profile_ns_perm(profile, &ad,
AA_USERNS_CREATE));
}
end_current_label_crit_section(label);
return error;
}
static int apparmor_sk_alloc_security(struct sock *sk, int family, gfp_t flags)
{
struct aa_sk_ctx *ctx;
ctx = kzalloc(sizeof(*ctx), flags);
if (!ctx)
return -ENOMEM;
sk->sk_security = ctx;
return 0;
}
static void apparmor_sk_free_security(struct sock *sk)
{
struct aa_sk_ctx *ctx = aa_sock(sk);
sk->sk_security = NULL;
aa_put_label(ctx->label);
aa_put_label(ctx->peer);
kfree(ctx);
}
/**
* apparmor_sk_clone_security - clone the sk_security field
* @sk: sock to have security cloned
* @newsk: sock getting clone
*/
static void apparmor_sk_clone_security(const struct sock *sk,
struct sock *newsk)
{
struct aa_sk_ctx *ctx = aa_sock(sk);
struct aa_sk_ctx *new = aa_sock(newsk);
apparmor: check/put label on apparmor_sk_clone_security() Currently apparmor_sk_clone_security() does not check for existing label/peer in the 'new' struct sock; it just overwrites it, if any (with another reference to the label of the source sock.) static void apparmor_sk_clone_security(const struct sock *sk, struct sock *newsk) { struct aa_sk_ctx *ctx = SK_CTX(sk); struct aa_sk_ctx *new = SK_CTX(newsk); new->label = aa_get_label(ctx->label); new->peer = aa_get_label(ctx->peer); } This might leak label references, which might overflow under load. Thus, check for and put labels, to prevent such errors. Note this is similarly done on: static int apparmor_socket_post_create(struct socket *sock, ...) ... if (sock->sk) { struct aa_sk_ctx *ctx = SK_CTX(sock->sk); aa_put_label(ctx->label); ctx->label = aa_get_label(label); } ... Context: ------- The label reference count leak is observed if apparmor_sock_graft() is called previously: this sets the 'ctx->label' field by getting a reference to the current label (later overwritten, without put.) static void apparmor_sock_graft(struct sock *sk, ...) { struct aa_sk_ctx *ctx = SK_CTX(sk); if (!ctx->label) ctx->label = aa_get_current_label(); } And that is the case on crypto/af_alg.c:af_alg_accept(): int af_alg_accept(struct sock *sk, struct socket *newsock, ...) ... struct sock *sk2; ... sk2 = sk_alloc(...); ... security_sock_graft(sk2, newsock); security_sk_clone(sk, sk2); ... Apparently both calls are done on their own right, especially for other LSMs, being introduced in 2010/2014, before apparmor socket mediation in 2017 (see commits [1,2,3,4]). So, it looks OK there! Let's fix the reference leak in apparmor. Test-case: --------- Exercise that code path enough to overflow label reference count. $ cat aa-refcnt-af_alg.c #include <stdio.h> #include <string.h> #include <unistd.h> #include <sys/socket.h> #include <linux/if_alg.h> int main() { int sockfd; struct sockaddr_alg sa; /* Setup the crypto API socket */ sockfd = socket(AF_ALG, SOCK_SEQPACKET, 0); if (sockfd < 0) { perror("socket"); return 1; } memset(&sa, 0, sizeof(sa)); sa.salg_family = AF_ALG; strcpy((char *) sa.salg_type, "rng"); strcpy((char *) sa.salg_name, "stdrng"); if (bind(sockfd, (struct sockaddr *) &sa, sizeof(sa)) < 0) { perror("bind"); return 1; } /* Accept a "connection" and close it; repeat. */ while (!close(accept(sockfd, NULL, 0))); return 0; } $ gcc -o aa-refcnt-af_alg aa-refcnt-af_alg.c $ ./aa-refcnt-af_alg <a few hours later> [ 9928.475953] refcount_t overflow at apparmor_sk_clone_security+0x37/0x70 in aa-refcnt-af_alg[1322], uid/euid: 1000/1000 ... [ 9928.507443] RIP: 0010:apparmor_sk_clone_security+0x37/0x70 ... [ 9928.514286] security_sk_clone+0x33/0x50 [ 9928.514807] af_alg_accept+0x81/0x1c0 [af_alg] [ 9928.516091] alg_accept+0x15/0x20 [af_alg] [ 9928.516682] SYSC_accept4+0xff/0x210 [ 9928.519609] SyS_accept+0x10/0x20 [ 9928.520190] do_syscall_64+0x73/0x130 [ 9928.520808] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Note that other messages may be seen, not just overflow, depending on the value being incremented by kref_get(); on another run: [ 7273.182666] refcount_t: saturated; leaking memory. ... [ 7273.185789] refcount_t: underflow; use-after-free. Kprobes: ------- Using kprobe events to monitor sk -> sk_security -> label -> count (kref): Original v5.7 (one reference leak every iteration) ... (af_alg_accept+0x0/0x1c0) label=0xffff8a0f36c25eb0 label_refcnt=0x11fd2 ... (af_alg_release_parent+0x0/0xd0) label=0xffff8a0f36c25eb0 label_refcnt=0x11fd4 ... (af_alg_accept+0x0/0x1c0) label=0xffff8a0f36c25eb0 label_refcnt=0x11fd3 ... (af_alg_release_parent+0x0/0xd0) label=0xffff8a0f36c25eb0 label_refcnt=0x11fd5 ... (af_alg_accept+0x0/0x1c0) label=0xffff8a0f36c25eb0 label_refcnt=0x11fd4 ... (af_alg_release_parent+0x0/0xd0) label=0xffff8a0f36c25eb0 label_refcnt=0x11fd6 Patched v5.7 (zero reference leak per iteration) ... (af_alg_accept+0x0/0x1c0) label=0xffff9ff376c25eb0 label_refcnt=0x593 ... (af_alg_release_parent+0x0/0xd0) label=0xffff9ff376c25eb0 label_refcnt=0x594 ... (af_alg_accept+0x0/0x1c0) label=0xffff9ff376c25eb0 label_refcnt=0x593 ... (af_alg_release_parent+0x0/0xd0) label=0xffff9ff376c25eb0 label_refcnt=0x594 ... (af_alg_accept+0x0/0x1c0) label=0xffff9ff376c25eb0 label_refcnt=0x593 ... (af_alg_release_parent+0x0/0xd0) label=0xffff9ff376c25eb0 label_refcnt=0x594 Commits: ------- [1] commit 507cad355fc9 ("crypto: af_alg - Make sure sk_security is initialized on accept()ed sockets") [2] commit 4c63f83c2c2e ("crypto: af_alg - properly label AF_ALG socket") [3] commit 2acce6aa9f65 ("Networking") a.k.a ("crypto: af_alg - Avoid sock_graft call warning) [4] commit 56974a6fcfef ("apparmor: add base infastructure for socket mediation") Fixes: 56974a6fcfef ("apparmor: add base infastructure for socket mediation") Reported-by: Brian Moyles <bmoyles@netflix.com> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2020-06-02 21:15:16 +00:00
if (new->label)
aa_put_label(new->label);
new->label = aa_get_label(ctx->label);
apparmor: check/put label on apparmor_sk_clone_security() Currently apparmor_sk_clone_security() does not check for existing label/peer in the 'new' struct sock; it just overwrites it, if any (with another reference to the label of the source sock.) static void apparmor_sk_clone_security(const struct sock *sk, struct sock *newsk) { struct aa_sk_ctx *ctx = SK_CTX(sk); struct aa_sk_ctx *new = SK_CTX(newsk); new->label = aa_get_label(ctx->label); new->peer = aa_get_label(ctx->peer); } This might leak label references, which might overflow under load. Thus, check for and put labels, to prevent such errors. Note this is similarly done on: static int apparmor_socket_post_create(struct socket *sock, ...) ... if (sock->sk) { struct aa_sk_ctx *ctx = SK_CTX(sock->sk); aa_put_label(ctx->label); ctx->label = aa_get_label(label); } ... Context: ------- The label reference count leak is observed if apparmor_sock_graft() is called previously: this sets the 'ctx->label' field by getting a reference to the current label (later overwritten, without put.) static void apparmor_sock_graft(struct sock *sk, ...) { struct aa_sk_ctx *ctx = SK_CTX(sk); if (!ctx->label) ctx->label = aa_get_current_label(); } And that is the case on crypto/af_alg.c:af_alg_accept(): int af_alg_accept(struct sock *sk, struct socket *newsock, ...) ... struct sock *sk2; ... sk2 = sk_alloc(...); ... security_sock_graft(sk2, newsock); security_sk_clone(sk, sk2); ... Apparently both calls are done on their own right, especially for other LSMs, being introduced in 2010/2014, before apparmor socket mediation in 2017 (see commits [1,2,3,4]). So, it looks OK there! Let's fix the reference leak in apparmor. Test-case: --------- Exercise that code path enough to overflow label reference count. $ cat aa-refcnt-af_alg.c #include <stdio.h> #include <string.h> #include <unistd.h> #include <sys/socket.h> #include <linux/if_alg.h> int main() { int sockfd; struct sockaddr_alg sa; /* Setup the crypto API socket */ sockfd = socket(AF_ALG, SOCK_SEQPACKET, 0); if (sockfd < 0) { perror("socket"); return 1; } memset(&sa, 0, sizeof(sa)); sa.salg_family = AF_ALG; strcpy((char *) sa.salg_type, "rng"); strcpy((char *) sa.salg_name, "stdrng"); if (bind(sockfd, (struct sockaddr *) &sa, sizeof(sa)) < 0) { perror("bind"); return 1; } /* Accept a "connection" and close it; repeat. */ while (!close(accept(sockfd, NULL, 0))); return 0; } $ gcc -o aa-refcnt-af_alg aa-refcnt-af_alg.c $ ./aa-refcnt-af_alg <a few hours later> [ 9928.475953] refcount_t overflow at apparmor_sk_clone_security+0x37/0x70 in aa-refcnt-af_alg[1322], uid/euid: 1000/1000 ... [ 9928.507443] RIP: 0010:apparmor_sk_clone_security+0x37/0x70 ... [ 9928.514286] security_sk_clone+0x33/0x50 [ 9928.514807] af_alg_accept+0x81/0x1c0 [af_alg] [ 9928.516091] alg_accept+0x15/0x20 [af_alg] [ 9928.516682] SYSC_accept4+0xff/0x210 [ 9928.519609] SyS_accept+0x10/0x20 [ 9928.520190] do_syscall_64+0x73/0x130 [ 9928.520808] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Note that other messages may be seen, not just overflow, depending on the value being incremented by kref_get(); on another run: [ 7273.182666] refcount_t: saturated; leaking memory. ... [ 7273.185789] refcount_t: underflow; use-after-free. Kprobes: ------- Using kprobe events to monitor sk -> sk_security -> label -> count (kref): Original v5.7 (one reference leak every iteration) ... (af_alg_accept+0x0/0x1c0) label=0xffff8a0f36c25eb0 label_refcnt=0x11fd2 ... (af_alg_release_parent+0x0/0xd0) label=0xffff8a0f36c25eb0 label_refcnt=0x11fd4 ... (af_alg_accept+0x0/0x1c0) label=0xffff8a0f36c25eb0 label_refcnt=0x11fd3 ... (af_alg_release_parent+0x0/0xd0) label=0xffff8a0f36c25eb0 label_refcnt=0x11fd5 ... (af_alg_accept+0x0/0x1c0) label=0xffff8a0f36c25eb0 label_refcnt=0x11fd4 ... (af_alg_release_parent+0x0/0xd0) label=0xffff8a0f36c25eb0 label_refcnt=0x11fd6 Patched v5.7 (zero reference leak per iteration) ... (af_alg_accept+0x0/0x1c0) label=0xffff9ff376c25eb0 label_refcnt=0x593 ... (af_alg_release_parent+0x0/0xd0) label=0xffff9ff376c25eb0 label_refcnt=0x594 ... (af_alg_accept+0x0/0x1c0) label=0xffff9ff376c25eb0 label_refcnt=0x593 ... (af_alg_release_parent+0x0/0xd0) label=0xffff9ff376c25eb0 label_refcnt=0x594 ... (af_alg_accept+0x0/0x1c0) label=0xffff9ff376c25eb0 label_refcnt=0x593 ... (af_alg_release_parent+0x0/0xd0) label=0xffff9ff376c25eb0 label_refcnt=0x594 Commits: ------- [1] commit 507cad355fc9 ("crypto: af_alg - Make sure sk_security is initialized on accept()ed sockets") [2] commit 4c63f83c2c2e ("crypto: af_alg - properly label AF_ALG socket") [3] commit 2acce6aa9f65 ("Networking") a.k.a ("crypto: af_alg - Avoid sock_graft call warning) [4] commit 56974a6fcfef ("apparmor: add base infastructure for socket mediation") Fixes: 56974a6fcfef ("apparmor: add base infastructure for socket mediation") Reported-by: Brian Moyles <bmoyles@netflix.com> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2020-06-02 21:15:16 +00:00
if (new->peer)
aa_put_label(new->peer);
new->peer = aa_get_label(ctx->peer);
}
static int apparmor_socket_create(int family, int type, int protocol, int kern)
{
struct aa_label *label;
int error = 0;
AA_BUG(in_interrupt());
label = begin_current_label_crit_section();
if (!(kern || unconfined(label)))
error = af_select(family,
create_perm(label, family, type, protocol),
aa_af_perm(current_cred(), label,
OP_CREATE, AA_MAY_CREATE,
family, type, protocol));
end_current_label_crit_section(label);
return error;
}
/**
* apparmor_socket_post_create - setup the per-socket security struct
* @sock: socket that is being setup
* @family: family of socket being created
* @type: type of the socket
* @protocol: protocol of the socket
* @kern: socket is a special kernel socket
*
* Note:
* - kernel sockets labeled kernel_t used to use unconfined
* - socket may not have sk here if created with sock_create_lite or
* sock_alloc. These should be accept cases which will be handled in
* sock_graft.
*/
static int apparmor_socket_post_create(struct socket *sock, int family,
int type, int protocol, int kern)
{
struct aa_label *label;
if (kern) {
label = aa_get_label(kernel_t);
} else
label = aa_get_current_label();
if (sock->sk) {
struct aa_sk_ctx *ctx = aa_sock(sock->sk);
aa_put_label(ctx->label);
ctx->label = aa_get_label(label);
}
aa_put_label(label);
return 0;
}
static int apparmor_socket_bind(struct socket *sock,
struct sockaddr *address, int addrlen)
{
AA_BUG(!sock);
AA_BUG(!sock->sk);
AA_BUG(!address);
AA_BUG(in_interrupt());
return af_select(sock->sk->sk_family,
bind_perm(sock, address, addrlen),
aa_sk_perm(OP_BIND, AA_MAY_BIND, sock->sk));
}
static int apparmor_socket_connect(struct socket *sock,
struct sockaddr *address, int addrlen)
{
AA_BUG(!sock);
AA_BUG(!sock->sk);
AA_BUG(!address);
AA_BUG(in_interrupt());
return af_select(sock->sk->sk_family,
connect_perm(sock, address, addrlen),
aa_sk_perm(OP_CONNECT, AA_MAY_CONNECT, sock->sk));
}
static int apparmor_socket_listen(struct socket *sock, int backlog)
{
AA_BUG(!sock);
AA_BUG(!sock->sk);
AA_BUG(in_interrupt());
return af_select(sock->sk->sk_family,
listen_perm(sock, backlog),
aa_sk_perm(OP_LISTEN, AA_MAY_LISTEN, sock->sk));
}
/*
* Note: while @newsock is created and has some information, the accept
* has not been done.
*/
static int apparmor_socket_accept(struct socket *sock, struct socket *newsock)
{
AA_BUG(!sock);
AA_BUG(!sock->sk);
AA_BUG(!newsock);
AA_BUG(in_interrupt());
return af_select(sock->sk->sk_family,
accept_perm(sock, newsock),
aa_sk_perm(OP_ACCEPT, AA_MAY_ACCEPT, sock->sk));
}
static int aa_sock_msg_perm(const char *op, u32 request, struct socket *sock,
struct msghdr *msg, int size)
{
AA_BUG(!sock);
AA_BUG(!sock->sk);
AA_BUG(!msg);
AA_BUG(in_interrupt());
return af_select(sock->sk->sk_family,
msg_perm(op, request, sock, msg, size),
aa_sk_perm(op, request, sock->sk));
}
static int apparmor_socket_sendmsg(struct socket *sock,
struct msghdr *msg, int size)
{
return aa_sock_msg_perm(OP_SENDMSG, AA_MAY_SEND, sock, msg, size);
}
static int apparmor_socket_recvmsg(struct socket *sock,
struct msghdr *msg, int size, int flags)
{
return aa_sock_msg_perm(OP_RECVMSG, AA_MAY_RECEIVE, sock, msg, size);
}
/* revaliation, get/set attr, shutdown */
static int aa_sock_perm(const char *op, u32 request, struct socket *sock)
{
AA_BUG(!sock);
AA_BUG(!sock->sk);
AA_BUG(in_interrupt());
return af_select(sock->sk->sk_family,
sock_perm(op, request, sock),
aa_sk_perm(op, request, sock->sk));
}
static int apparmor_socket_getsockname(struct socket *sock)
{
return aa_sock_perm(OP_GETSOCKNAME, AA_MAY_GETATTR, sock);
}
static int apparmor_socket_getpeername(struct socket *sock)
{
return aa_sock_perm(OP_GETPEERNAME, AA_MAY_GETATTR, sock);
}
/* revaliation, get/set attr, opt */
static int aa_sock_opt_perm(const char *op, u32 request, struct socket *sock,
int level, int optname)
{
AA_BUG(!sock);
AA_BUG(!sock->sk);
AA_BUG(in_interrupt());
return af_select(sock->sk->sk_family,
opt_perm(op, request, sock, level, optname),
aa_sk_perm(op, request, sock->sk));
}
static int apparmor_socket_getsockopt(struct socket *sock, int level,
int optname)
{
return aa_sock_opt_perm(OP_GETSOCKOPT, AA_MAY_GETOPT, sock,
level, optname);
}
static int apparmor_socket_setsockopt(struct socket *sock, int level,
int optname)
{
return aa_sock_opt_perm(OP_SETSOCKOPT, AA_MAY_SETOPT, sock,
level, optname);
}
static int apparmor_socket_shutdown(struct socket *sock, int how)
{
return aa_sock_perm(OP_SHUTDOWN, AA_MAY_SHUTDOWN, sock);
}
#ifdef CONFIG_NETWORK_SECMARK
/**
* apparmor_socket_sock_rcv_skb - check perms before associating skb to sk
* @sk: sk to associate @skb with
* @skb: skb to check for perms
*
* Note: can not sleep may be called with locks held
*
* dont want protocol specific in __skb_recv_datagram()
* to deny an incoming connection socket_sock_rcv_skb()
*/
static int apparmor_socket_sock_rcv_skb(struct sock *sk, struct sk_buff *skb)
{
struct aa_sk_ctx *ctx = aa_sock(sk);
if (!skb->secmark)
return 0;
apparmor: Fix null pointer deref when receiving skb during sock creation The panic below is observed when receiving ICMP packets with secmark set while an ICMP raw socket is being created. SK_CTX(sk)->label is updated in apparmor_socket_post_create(), but the packet is delivered to the socket before that, causing the null pointer dereference. Drop the packet if label context is not set. BUG: kernel NULL pointer dereference, address: 000000000000004c #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 407 Comm: a.out Not tainted 6.4.12-arch1-1 #1 3e6fa2753a2d75925c34ecb78e22e85a65d083df Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/28/2020 RIP: 0010:aa_label_next_confined+0xb/0x40 Code: 00 00 48 89 ef e8 d5 25 0c 00 e9 66 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 89 f0 <8b> 77 4c 39 c6 7e 1f 48 63 d0 48 8d 14 d7 eb 0b 83 c0 01 48 83 c2 RSP: 0018:ffffa92940003b08 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffa92940003be8 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff8b57471e7800 R08: ffff8b574c642400 R09: 0000000000000002 R10: ffffffffbd820eeb R11: ffffffffbeb7ff00 R12: ffff8b574c642400 R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000 FS: 00007fb092ea7640(0000) GS:ffff8b577bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000004c CR3: 00000001020f2005 CR4: 00000000007706f0 PKRU: 55555554 Call Trace: <IRQ> ? __die+0x23/0x70 ? page_fault_oops+0x171/0x4e0 ? exc_page_fault+0x7f/0x180 ? asm_exc_page_fault+0x26/0x30 ? aa_label_next_confined+0xb/0x40 apparmor_secmark_check+0xec/0x330 security_sock_rcv_skb+0x35/0x50 sk_filter_trim_cap+0x47/0x250 sock_queue_rcv_skb_reason+0x20/0x60 raw_rcv+0x13c/0x210 raw_local_deliver+0x1f3/0x250 ip_protocol_deliver_rcu+0x4f/0x2f0 ip_local_deliver_finish+0x76/0xa0 __netif_receive_skb_one_core+0x89/0xa0 netif_receive_skb+0x119/0x170 ? __netdev_alloc_skb+0x3d/0x140 vmxnet3_rq_rx_complete+0xb23/0x1010 [vmxnet3 56a84f9c97178c57a43a24ec073b45a9d6f01f3a] vmxnet3_poll_rx_only+0x36/0xb0 [vmxnet3 56a84f9c97178c57a43a24ec073b45a9d6f01f3a] __napi_poll+0x28/0x1b0 net_rx_action+0x2a4/0x380 __do_softirq+0xd1/0x2c8 __irq_exit_rcu+0xbb/0xf0 common_interrupt+0x86/0xa0 </IRQ> <TASK> asm_common_interrupt+0x26/0x40 RIP: 0010:apparmor_socket_post_create+0xb/0x200 Code: 08 48 85 ff 75 a1 eb b1 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 54 <55> 48 89 fd 53 45 85 c0 0f 84 b2 00 00 00 48 8b 1d 80 56 3f 02 48 RSP: 0018:ffffa92940ce7e50 EFLAGS: 00000286 RAX: ffffffffbc756440 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 0000000000000003 RSI: 0000000000000002 RDI: ffff8b574eaab740 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: ffff8b57444cec70 R11: 0000000000000000 R12: 0000000000000003 R13: 0000000000000002 R14: ffff8b574eaab740 R15: ffffffffbd8e4748 ? __pfx_apparmor_socket_post_create+0x10/0x10 security_socket_post_create+0x4b/0x80 __sock_create+0x176/0x1f0 __sys_socket+0x89/0x100 __x64_sys_socket+0x17/0x20 do_syscall_64+0x5d/0x90 ? do_syscall_64+0x6c/0x90 ? do_syscall_64+0x6c/0x90 ? do_syscall_64+0x6c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Fixes: ab9f2115081a ("apparmor: Allow filtering based on secmark policy") Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2023-09-02 00:48:38 +00:00
/*
* If reach here before socket_post_create hook is called, in which
* case label is null, drop the packet.
*/
if (!ctx->label)
return -EACCES;
return apparmor_secmark_check(ctx->label, OP_RECVMSG, AA_MAY_RECEIVE,
skb->secmark, sk);
}
#endif
static struct aa_label *sk_peer_label(struct sock *sk)
{
struct aa_sk_ctx *ctx = aa_sock(sk);
if (ctx->peer)
return ctx->peer;
return ERR_PTR(-ENOPROTOOPT);
}
/**
* apparmor_socket_getpeersec_stream - get security context of peer
* @sock: socket that we are trying to get the peer context of
* @optval: output - buffer to copy peer name to
* @optlen: output - size of copied name in @optval
* @len: size of @optval buffer
* Returns: 0 on success, -errno of failure
*
* Note: for tcp only valid if using ipsec or cipso on lan
*/
static int apparmor_socket_getpeersec_stream(struct socket *sock,
sockptr_t optval, sockptr_t optlen,
unsigned int len)
{
char *name = NULL;
int slen, error = 0;
struct aa_label *label;
struct aa_label *peer;
label = begin_current_label_crit_section();
peer = sk_peer_label(sock->sk);
if (IS_ERR(peer)) {
error = PTR_ERR(peer);
goto done;
}
slen = aa_label_asxprint(&name, labels_ns(label), peer,
FLAG_SHOW_MODE | FLAG_VIEW_SUBNS |
FLAG_HIDDEN_UNCONFINED, GFP_KERNEL);
/* don't include terminating \0 in slen, it breaks some apps */
if (slen < 0) {
error = -ENOMEM;
goto done;
}
if (slen > len) {
error = -ERANGE;
goto done_len;
}
if (copy_to_sockptr(optval, name, slen))
error = -EFAULT;
done_len:
if (copy_to_sockptr(optlen, &slen, sizeof(slen)))
error = -EFAULT;
done:
end_current_label_crit_section(label);
kfree(name);
return error;
}
/**
* apparmor_socket_getpeersec_dgram - get security label of packet
* @sock: the peer socket
* @skb: packet data
* @secid: pointer to where to put the secid of the packet
*
* Sets the netlabel socket state on sk from parent
*/
static int apparmor_socket_getpeersec_dgram(struct socket *sock,
struct sk_buff *skb, u32 *secid)
{
/* TODO: requires secid support */
return -ENOPROTOOPT;
}
/**
* apparmor_sock_graft - Initialize newly created socket
* @sk: child sock
* @parent: parent socket
*
* Note: could set off of SOCK_CTX(parent) but need to track inode and we can
* just set sk security information off of current creating process label
* Labeling of sk for accept case - probably should be sock based
* instead of task, because of the case where an implicitly labeled
* socket is shared by different tasks.
*/
static void apparmor_sock_graft(struct sock *sk, struct socket *parent)
{
struct aa_sk_ctx *ctx = aa_sock(sk);
if (!ctx->label)
ctx->label = aa_get_current_label();
}
#ifdef CONFIG_NETWORK_SECMARK
static int apparmor_inet_conn_request(const struct sock *sk, struct sk_buff *skb,
struct request_sock *req)
{
struct aa_sk_ctx *ctx = aa_sock(sk);
if (!skb->secmark)
return 0;
return apparmor_secmark_check(ctx->label, OP_CONNECT, AA_MAY_CONNECT,
skb->secmark, sk);
}
#endif
/*
* The cred blob is a pointer to, not an instance of, an aa_label.
*/
selinux: remove the runtime disable functionality After working with the larger SELinux-based distros for several years, we're finally at a place where we can disable the SELinux runtime disable functionality. The existing kernel deprecation notice explains the functionality and why we want to remove it: The selinuxfs "disable" node allows SELinux to be disabled at runtime prior to a policy being loaded into the kernel. If disabled via this mechanism, SELinux will remain disabled until the system is rebooted. The preferred method of disabling SELinux is via the "selinux=0" boot parameter, but the selinuxfs "disable" node was created to make it easier for systems with primitive bootloaders that did not allow for easy modification of the kernel command line. Unfortunately, allowing for SELinux to be disabled at runtime makes it difficult to secure the kernel's LSM hooks using the "__ro_after_init" feature. It is that last sentence, mentioning the '__ro_after_init' hardening, which is the real motivation for this change, and if you look at the diffstat you'll see that the impact of this patch reaches across all the different LSMs, helping prevent tampering at the LSM hook level. From a SELinux perspective, it is important to note that if you continue to disable SELinux via "/etc/selinux/config" it may appear that SELinux is disabled, but it is simply in an uninitialized state. If you load a policy with `load_policy -i`, you will see SELinux come alive just as if you had loaded the policy during early-boot. It is also worth noting that the "/sys/fs/selinux/disable" file is always writable now, regardless of the Kconfig settings, but writing to the file has no effect on the system, other than to display an error on the console if a non-zero/true value is written. Finally, in the several years where we have been working on deprecating this functionality, there has only been one instance of someone mentioning any user visible breakage. In this particular case it was an individual's kernel test system, and the workaround documented in the deprecation notice ("selinux=0" on the kernel command line) resolved the issue without problem. Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-03-17 16:43:07 +00:00
struct lsm_blob_sizes apparmor_blob_sizes __ro_after_init = {
.lbs_cred = sizeof(struct aa_label *),
.lbs_file = sizeof(struct aa_file_ctx),
.lbs_task = sizeof(struct aa_task_ctx),
};
static const struct lsm_id apparmor_lsmid = {
LSM: Identify modules by more than name Create a struct lsm_id to contain identifying information about Linux Security Modules (LSMs). At inception this contains the name of the module and an identifier associated with the security module. Change the security_add_hooks() interface to use this structure. Change the individual modules to maintain their own struct lsm_id and pass it to security_add_hooks(). The values are for LSM identifiers are defined in a new UAPI header file linux/lsm.h. Each existing LSM has been updated to include it's LSMID in the lsm_id. The LSM ID values are sequential, with the oldest module LSM_ID_CAPABILITY being the lowest value and the existing modules numbered in the order they were included in the main line kernel. This is an arbitrary convention for assigning the values, but none better presents itself. The value 0 is defined as being invalid. The values 1-99 are reserved for any special case uses which may arise in the future. This may include attributes of the LSM infrastructure itself, possibly related to namespacing or network attribute management. A special range is identified for such attributes to help reduce confusion for developers unfamiliar with LSMs. LSM attribute values are defined for the attributes presented by modules that are available today. As with the LSM IDs, The value 0 is defined as being invalid. The values 1-99 are reserved for any special case uses which may arise in the future. Cc: linux-security-module <linux-security-module@vger.kernel.org> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Mickael Salaun <mic@digikod.net> Reviewed-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Kees Cook <keescook@chromium.org> Nacked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [PM: forward ported beyond v6.6 due merge window changes] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-09-12 20:56:46 +00:00
.name = "apparmor",
.id = LSM_ID_APPARMOR,
};
selinux: remove the runtime disable functionality After working with the larger SELinux-based distros for several years, we're finally at a place where we can disable the SELinux runtime disable functionality. The existing kernel deprecation notice explains the functionality and why we want to remove it: The selinuxfs "disable" node allows SELinux to be disabled at runtime prior to a policy being loaded into the kernel. If disabled via this mechanism, SELinux will remain disabled until the system is rebooted. The preferred method of disabling SELinux is via the "selinux=0" boot parameter, but the selinuxfs "disable" node was created to make it easier for systems with primitive bootloaders that did not allow for easy modification of the kernel command line. Unfortunately, allowing for SELinux to be disabled at runtime makes it difficult to secure the kernel's LSM hooks using the "__ro_after_init" feature. It is that last sentence, mentioning the '__ro_after_init' hardening, which is the real motivation for this change, and if you look at the diffstat you'll see that the impact of this patch reaches across all the different LSMs, helping prevent tampering at the LSM hook level. From a SELinux perspective, it is important to note that if you continue to disable SELinux via "/etc/selinux/config" it may appear that SELinux is disabled, but it is simply in an uninitialized state. If you load a policy with `load_policy -i`, you will see SELinux come alive just as if you had loaded the policy during early-boot. It is also worth noting that the "/sys/fs/selinux/disable" file is always writable now, regardless of the Kconfig settings, but writing to the file has no effect on the system, other than to display an error on the console if a non-zero/true value is written. Finally, in the several years where we have been working on deprecating this functionality, there has only been one instance of someone mentioning any user visible breakage. In this particular case it was an individual's kernel test system, and the workaround documented in the deprecation notice ("selinux=0" on the kernel command line) resolved the issue without problem. Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-03-17 16:43:07 +00:00
static struct security_hook_list apparmor_hooks[] __ro_after_init = {
LSM_HOOK_INIT(ptrace_access_check, apparmor_ptrace_access_check),
LSM_HOOK_INIT(ptrace_traceme, apparmor_ptrace_traceme),
LSM_HOOK_INIT(capget, apparmor_capget),
LSM_HOOK_INIT(capable, apparmor_capable),
LSM_HOOK_INIT(move_mount, apparmor_move_mount),
apparmor: add mount mediation Add basic mount mediation. That allows controlling based on basic mount parameters. It does not include special mount parameters for apparmor, super block labeling, or any triggers for apparmor namespace parameter modifications on pivot root. default userspace policy rules have the form of MOUNT RULE = ( MOUNT | REMOUNT | UMOUNT ) MOUNT = [ QUALIFIERS ] 'mount' [ MOUNT CONDITIONS ] [ SOURCE FILEGLOB ] [ '->' MOUNTPOINT FILEGLOB ] REMOUNT = [ QUALIFIERS ] 'remount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB UMOUNT = [ QUALIFIERS ] 'umount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB MOUNT CONDITIONS = [ ( 'fstype' | 'vfstype' ) ( '=' | 'in' ) MOUNT FSTYPE EXPRESSION ] [ 'options' ( '=' | 'in' ) MOUNT FLAGS EXPRESSION ] MOUNT FSTYPE EXPRESSION = ( MOUNT FSTYPE LIST | MOUNT EXPRESSION ) MOUNT FSTYPE LIST = Comma separated list of valid filesystem and virtual filesystem types (eg ext4, debugfs, etc) MOUNT FLAGS EXPRESSION = ( MOUNT FLAGS LIST | MOUNT EXPRESSION ) MOUNT FLAGS LIST = Comma separated list of MOUNT FLAGS. MOUNT FLAGS = ( 'ro' | 'rw' | 'nosuid' | 'suid' | 'nodev' | 'dev' | 'noexec' | 'exec' | 'sync' | 'async' | 'remount' | 'mand' | 'nomand' | 'dirsync' | 'noatime' | 'atime' | 'nodiratime' | 'diratime' | 'bind' | 'rbind' | 'move' | 'verbose' | 'silent' | 'loud' | 'acl' | 'noacl' | 'unbindable' | 'runbindable' | 'private' | 'rprivate' | 'slave' | 'rslave' | 'shared' | 'rshared' | 'relatime' | 'norelatime' | 'iversion' | 'noiversion' | 'strictatime' | 'nouser' | 'user' ) MOUNT EXPRESSION = ( ALPHANUMERIC | AARE ) ... PIVOT ROOT RULE = [ QUALIFIERS ] pivot_root [ oldroot=OLD PUT FILEGLOB ] [ NEW ROOT FILEGLOB ] SOURCE FILEGLOB = FILEGLOB MOUNTPOINT FILEGLOB = FILEGLOB eg. mount, mount /dev/foo, mount options=ro /dev/foo -> /mnt/, mount options in (ro,atime) /dev/foo -> /mnt/, mount options=ro options=atime, Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2017-07-19 06:04:47 +00:00
LSM_HOOK_INIT(sb_mount, apparmor_sb_mount),
LSM_HOOK_INIT(sb_umount, apparmor_sb_umount),
LSM_HOOK_INIT(sb_pivotroot, apparmor_sb_pivotroot),
LSM_HOOK_INIT(path_link, apparmor_path_link),
LSM_HOOK_INIT(path_unlink, apparmor_path_unlink),
LSM_HOOK_INIT(path_symlink, apparmor_path_symlink),
LSM_HOOK_INIT(path_mkdir, apparmor_path_mkdir),
LSM_HOOK_INIT(path_rmdir, apparmor_path_rmdir),
LSM_HOOK_INIT(path_mknod, apparmor_path_mknod),
LSM_HOOK_INIT(path_rename, apparmor_path_rename),
LSM_HOOK_INIT(path_chmod, apparmor_path_chmod),
LSM_HOOK_INIT(path_chown, apparmor_path_chown),
LSM_HOOK_INIT(path_truncate, apparmor_path_truncate),
LSM_HOOK_INIT(inode_getattr, apparmor_inode_getattr),
LSM_HOOK_INIT(file_open, apparmor_file_open),
LSM_HOOK_INIT(file_receive, apparmor_file_receive),
LSM_HOOK_INIT(file_permission, apparmor_file_permission),
LSM_HOOK_INIT(file_alloc_security, apparmor_file_alloc_security),
LSM_HOOK_INIT(file_free_security, apparmor_file_free_security),
LSM_HOOK_INIT(mmap_file, apparmor_mmap_file),
LSM_HOOK_INIT(file_mprotect, apparmor_file_mprotect),
LSM_HOOK_INIT(file_lock, apparmor_file_lock),
LSM_HOOK_INIT(file_truncate, apparmor_file_truncate),
LSM_HOOK_INIT(getselfattr, apparmor_getselfattr),
LSM_HOOK_INIT(setselfattr, apparmor_setselfattr),
LSM_HOOK_INIT(getprocattr, apparmor_getprocattr),
LSM_HOOK_INIT(setprocattr, apparmor_setprocattr),
LSM_HOOK_INIT(sk_alloc_security, apparmor_sk_alloc_security),
LSM_HOOK_INIT(sk_free_security, apparmor_sk_free_security),
LSM_HOOK_INIT(sk_clone_security, apparmor_sk_clone_security),
LSM_HOOK_INIT(socket_create, apparmor_socket_create),
LSM_HOOK_INIT(socket_post_create, apparmor_socket_post_create),
LSM_HOOK_INIT(socket_bind, apparmor_socket_bind),
LSM_HOOK_INIT(socket_connect, apparmor_socket_connect),
LSM_HOOK_INIT(socket_listen, apparmor_socket_listen),
LSM_HOOK_INIT(socket_accept, apparmor_socket_accept),
LSM_HOOK_INIT(socket_sendmsg, apparmor_socket_sendmsg),
LSM_HOOK_INIT(socket_recvmsg, apparmor_socket_recvmsg),
LSM_HOOK_INIT(socket_getsockname, apparmor_socket_getsockname),
LSM_HOOK_INIT(socket_getpeername, apparmor_socket_getpeername),
LSM_HOOK_INIT(socket_getsockopt, apparmor_socket_getsockopt),
LSM_HOOK_INIT(socket_setsockopt, apparmor_socket_setsockopt),
LSM_HOOK_INIT(socket_shutdown, apparmor_socket_shutdown),
#ifdef CONFIG_NETWORK_SECMARK
LSM_HOOK_INIT(socket_sock_rcv_skb, apparmor_socket_sock_rcv_skb),
#endif
LSM_HOOK_INIT(socket_getpeersec_stream,
apparmor_socket_getpeersec_stream),
LSM_HOOK_INIT(socket_getpeersec_dgram,
apparmor_socket_getpeersec_dgram),
LSM_HOOK_INIT(sock_graft, apparmor_sock_graft),
#ifdef CONFIG_NETWORK_SECMARK
LSM_HOOK_INIT(inet_conn_request, apparmor_inet_conn_request),
#endif
LSM_HOOK_INIT(cred_alloc_blank, apparmor_cred_alloc_blank),
LSM_HOOK_INIT(cred_free, apparmor_cred_free),
LSM_HOOK_INIT(cred_prepare, apparmor_cred_prepare),
LSM_HOOK_INIT(cred_transfer, apparmor_cred_transfer),
LSM_HOOK_INIT(bprm_creds_for_exec, apparmor_bprm_creds_for_exec),
LSM_HOOK_INIT(bprm_committing_creds, apparmor_bprm_committing_creds),
LSM_HOOK_INIT(bprm_committed_creds, apparmor_bprm_committed_creds),
LSM_HOOK_INIT(task_free, apparmor_task_free),
LSM_HOOK_INIT(task_alloc, apparmor_task_alloc),
LSM_HOOK_INIT(current_getsecid_subj, apparmor_current_getsecid_subj),
LSM_HOOK_INIT(task_getsecid_obj, apparmor_task_getsecid_obj),
LSM_HOOK_INIT(task_setrlimit, apparmor_task_setrlimit),
LSM_HOOK_INIT(task_kill, apparmor_task_kill),
LSM_HOOK_INIT(userns_create, apparmor_userns_create),
#ifdef CONFIG_AUDIT
LSM_HOOK_INIT(audit_rule_init, aa_audit_rule_init),
LSM_HOOK_INIT(audit_rule_known, aa_audit_rule_known),
LSM_HOOK_INIT(audit_rule_match, aa_audit_rule_match),
LSM_HOOK_INIT(audit_rule_free, aa_audit_rule_free),
#endif
LSM_HOOK_INIT(secid_to_secctx, apparmor_secid_to_secctx),
LSM_HOOK_INIT(secctx_to_secid, apparmor_secctx_to_secid),
LSM_HOOK_INIT(release_secctx, apparmor_release_secctx),
#ifdef CONFIG_IO_URING
LSM_HOOK_INIT(uring_override_creds, apparmor_uring_override_creds),
LSM_HOOK_INIT(uring_sqpoll, apparmor_uring_sqpoll),
#endif
};
/*
* AppArmor sysfs module parameters
*/
static int param_set_aabool(const char *val, const struct kernel_param *kp);
static int param_get_aabool(char *buffer, const struct kernel_param *kp);
#define param_check_aabool param_check_bool
static const struct kernel_param_ops param_ops_aabool = {
.flags = KERNEL_PARAM_OPS_FL_NOARG,
.set = param_set_aabool,
.get = param_get_aabool
};
static int param_set_aauint(const char *val, const struct kernel_param *kp);
static int param_get_aauint(char *buffer, const struct kernel_param *kp);
#define param_check_aauint param_check_uint
static const struct kernel_param_ops param_ops_aauint = {
.set = param_set_aauint,
.get = param_get_aauint
};
static int param_set_aacompressionlevel(const char *val,
const struct kernel_param *kp);
static int param_get_aacompressionlevel(char *buffer,
const struct kernel_param *kp);
#define param_check_aacompressionlevel param_check_int
static const struct kernel_param_ops param_ops_aacompressionlevel = {
.set = param_set_aacompressionlevel,
.get = param_get_aacompressionlevel
};
static int param_set_aalockpolicy(const char *val, const struct kernel_param *kp);
static int param_get_aalockpolicy(char *buffer, const struct kernel_param *kp);
#define param_check_aalockpolicy param_check_bool
static const struct kernel_param_ops param_ops_aalockpolicy = {
.flags = KERNEL_PARAM_OPS_FL_NOARG,
.set = param_set_aalockpolicy,
.get = param_get_aalockpolicy
};
static int param_set_audit(const char *val, const struct kernel_param *kp);
static int param_get_audit(char *buffer, const struct kernel_param *kp);
static int param_set_mode(const char *val, const struct kernel_param *kp);
static int param_get_mode(char *buffer, const struct kernel_param *kp);
/* Flag values, also controllable via /sys/module/apparmor/parameters
* We define special types as we want to do additional mediation.
*/
/* AppArmor global enforcement switch - complain, enforce, kill */
enum profile_mode aa_g_profile_mode = APPARMOR_ENFORCE;
module_param_call(mode, param_set_mode, param_get_mode,
&aa_g_profile_mode, S_IRUSR | S_IWUSR);
/* whether policy verification hashing is enabled */
bool aa_g_hash_policy = IS_ENABLED(CONFIG_SECURITY_APPARMOR_HASH_DEFAULT);
#ifdef CONFIG_SECURITY_APPARMOR_HASH
module_param_named(hash_policy, aa_g_hash_policy, aabool, S_IRUSR | S_IWUSR);
#endif
/* whether policy exactly as loaded is retained for debug and checkpointing */
bool aa_g_export_binary = IS_ENABLED(CONFIG_SECURITY_APPARMOR_EXPORT_BINARY);
#ifdef CONFIG_SECURITY_APPARMOR_EXPORT_BINARY
module_param_named(export_binary, aa_g_export_binary, aabool, 0600);
#endif
/* policy loaddata compression level */
int aa_g_rawdata_compression_level = AA_DEFAULT_CLEVEL;
module_param_named(rawdata_compression_level, aa_g_rawdata_compression_level,
aacompressionlevel, 0400);
/* Debug mode */
bool aa_g_debug = IS_ENABLED(CONFIG_SECURITY_APPARMOR_DEBUG_MESSAGES);
module_param_named(debug, aa_g_debug, aabool, S_IRUSR | S_IWUSR);
/* Audit mode */
enum audit_mode aa_g_audit;
module_param_call(audit, param_set_audit, param_get_audit,
&aa_g_audit, S_IRUSR | S_IWUSR);
/* Determines if audit header is included in audited messages. This
* provides more context if the audit daemon is not running
*/
bool aa_g_audit_header = true;
module_param_named(audit_header, aa_g_audit_header, aabool,
S_IRUSR | S_IWUSR);
/* lock out loading/removal of policy
* TODO: add in at boot loading of policy, which is the only way to
* load policy, if lock_policy is set
*/
bool aa_g_lock_policy;
module_param_named(lock_policy, aa_g_lock_policy, aalockpolicy,
S_IRUSR | S_IWUSR);
/* Syscall logging mode */
bool aa_g_logsyscall;
module_param_named(logsyscall, aa_g_logsyscall, aabool, S_IRUSR | S_IWUSR);
/* Maximum pathname length before accesses will start getting rejected */
unsigned int aa_g_path_max = 2 * PATH_MAX;
apparmor: Make path_max parameter readonly The path_max parameter determines the max size of buffers allocated but it should not be setable at run time. If can be used to cause an oops root@ubuntu:~# echo 16777216 > /sys/module/apparmor/parameters/path_max root@ubuntu:~# cat /sys/module/apparmor/parameters/path_max Killed [ 122.141911] BUG: unable to handle kernel paging request at ffff880080945fff [ 122.143497] IP: [<ffffffff81228844>] d_absolute_path+0x44/0xa0 [ 122.144742] PGD 220c067 PUD 0 [ 122.145453] Oops: 0002 [#1] SMP [ 122.146204] Modules linked in: vmw_vsock_vmci_transport vsock ppdev vmw_balloon snd_ens1371 btusb snd_ac97_codec gameport snd_rawmidi btrtl snd_seq_device ac97_bus btbcm btintel snd_pcm input_leds bluetooth snd_timer snd joydev soundcore serio_raw coretemp shpchp nfit parport_pc i2c_piix4 8250_fintek vmw_vmci parport mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd vmwgfx psmouse mptspi ttm mptscsih drm_kms_helper mptbase syscopyarea scsi_transport_spi sysfillrect [ 122.163365] ahci sysimgblt e1000 fb_sys_fops libahci drm pata_acpi fjes [ 122.164747] CPU: 3 PID: 1501 Comm: bash Not tainted 4.4.0-59-generic #80-Ubuntu [ 122.166250] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [ 122.168611] task: ffff88003496aa00 ti: ffff880076474000 task.ti: ffff880076474000 [ 122.170018] RIP: 0010:[<ffffffff81228844>] [<ffffffff81228844>] d_absolute_path+0x44/0xa0 [ 122.171525] RSP: 0018:ffff880076477b90 EFLAGS: 00010206 [ 122.172462] RAX: ffff880080945fff RBX: 0000000000000000 RCX: 0000000001000000 [ 122.173709] RDX: 0000000000ffffff RSI: ffff880080946000 RDI: ffff8800348a1010 [ 122.174978] RBP: ffff880076477bb8 R08: ffff880076477c80 R09: 0000000000000000 [ 122.176227] R10: 00007ffffffff000 R11: ffff88007f946000 R12: ffff88007f946000 [ 122.177496] R13: ffff880076477c80 R14: ffff8800348a1010 R15: ffff8800348a2400 [ 122.178745] FS: 00007fd459eb4700(0000) GS:ffff88007b6c0000(0000) knlGS:0000000000000000 [ 122.180176] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 122.181186] CR2: ffff880080945fff CR3: 0000000073422000 CR4: 00000000001406e0 [ 122.182469] Stack: [ 122.182843] 00ffffff00000001 ffff880080946000 0000000000000000 0000000000000000 [ 122.184409] 00000000570f789c ffff880076477c30 ffffffff81385671 ffff88007a2e7a58 [ 122.185810] 0000000000000000 ffff880076477c88 01000000008a1000 0000000000000000 [ 122.187231] Call Trace: [ 122.187680] [<ffffffff81385671>] aa_path_name+0x81/0x370 [ 122.188637] [<ffffffff813875dd>] profile_transition+0xbd/0xb80 [ 122.190181] [<ffffffff811af9bc>] ? zone_statistics+0x7c/0xa0 [ 122.191674] [<ffffffff81389b20>] apparmor_bprm_set_creds+0x9b0/0xac0 [ 122.193288] [<ffffffff812e1971>] ? ext4_xattr_get+0x81/0x220 [ 122.194793] [<ffffffff812e800c>] ? ext4_xattr_security_get+0x1c/0x30 [ 122.196392] [<ffffffff813449b9>] ? get_vfs_caps_from_disk+0x69/0x110 [ 122.198004] [<ffffffff81232d4f>] ? mnt_may_suid+0x3f/0x50 [ 122.199737] [<ffffffff81344b03>] ? cap_bprm_set_creds+0xa3/0x600 [ 122.201377] [<ffffffff81346e53>] security_bprm_set_creds+0x33/0x50 [ 122.203024] [<ffffffff81214ce5>] prepare_binprm+0x85/0x190 [ 122.204515] [<ffffffff81216545>] do_execveat_common.isra.33+0x485/0x710 [ 122.206200] [<ffffffff81216a6a>] SyS_execve+0x3a/0x50 [ 122.207615] [<ffffffff81838795>] stub_execve+0x5/0x5 [ 122.208978] [<ffffffff818384f2>] ? entry_SYSCALL_64_fastpath+0x16/0x71 [ 122.210615] Code: f8 31 c0 48 63 c2 83 ea 01 48 c7 45 e8 00 00 00 00 48 01 c6 85 d2 48 c7 45 f0 00 00 00 00 48 89 75 e0 89 55 dc 78 0c 48 8d 46 ff <c6> 46 ff 00 48 89 45 e0 48 8d 55 e0 48 8d 4d dc 48 8d 75 e8 e8 [ 122.217320] RIP [<ffffffff81228844>] d_absolute_path+0x44/0xa0 [ 122.218860] RSP <ffff880076477b90> [ 122.219919] CR2: ffff880080945fff [ 122.220936] ---[ end trace 506cdbd85eb6c55e ]--- Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-04-06 13:55:24 +00:00
module_param_named(path_max, aa_g_path_max, aauint, S_IRUSR);
/* Determines how paranoid loading of policy is and how much verification
* on the loaded policy is done.
* DEPRECATED: read only as strict checking of load is always done now
* that none root users (user namespaces) can load policy.
*/
bool aa_g_paranoid_load = IS_ENABLED(CONFIG_SECURITY_APPARMOR_PARANOID_LOAD);
module_param_named(paranoid_load, aa_g_paranoid_load, aabool, S_IRUGO);
static int param_get_aaintbool(char *buffer, const struct kernel_param *kp);
static int param_set_aaintbool(const char *val, const struct kernel_param *kp);
#define param_check_aaintbool param_check_int
static const struct kernel_param_ops param_ops_aaintbool = {
.set = param_set_aaintbool,
.get = param_get_aaintbool
};
/* Boot time disable flag */
selinux: remove the runtime disable functionality After working with the larger SELinux-based distros for several years, we're finally at a place where we can disable the SELinux runtime disable functionality. The existing kernel deprecation notice explains the functionality and why we want to remove it: The selinuxfs "disable" node allows SELinux to be disabled at runtime prior to a policy being loaded into the kernel. If disabled via this mechanism, SELinux will remain disabled until the system is rebooted. The preferred method of disabling SELinux is via the "selinux=0" boot parameter, but the selinuxfs "disable" node was created to make it easier for systems with primitive bootloaders that did not allow for easy modification of the kernel command line. Unfortunately, allowing for SELinux to be disabled at runtime makes it difficult to secure the kernel's LSM hooks using the "__ro_after_init" feature. It is that last sentence, mentioning the '__ro_after_init' hardening, which is the real motivation for this change, and if you look at the diffstat you'll see that the impact of this patch reaches across all the different LSMs, helping prevent tampering at the LSM hook level. From a SELinux perspective, it is important to note that if you continue to disable SELinux via "/etc/selinux/config" it may appear that SELinux is disabled, but it is simply in an uninitialized state. If you load a policy with `load_policy -i`, you will see SELinux come alive just as if you had loaded the policy during early-boot. It is also worth noting that the "/sys/fs/selinux/disable" file is always writable now, regardless of the Kconfig settings, but writing to the file has no effect on the system, other than to display an error on the console if a non-zero/true value is written. Finally, in the several years where we have been working on deprecating this functionality, there has only been one instance of someone mentioning any user visible breakage. In this particular case it was an individual's kernel test system, and the workaround documented in the deprecation notice ("selinux=0" on the kernel command line) resolved the issue without problem. Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-03-17 16:43:07 +00:00
static int apparmor_enabled __ro_after_init = 1;
module_param_named(enabled, apparmor_enabled, aaintbool, 0444);
static int __init apparmor_enabled_setup(char *str)
{
unsigned long enabled;
int error = kstrtoul(str, 0, &enabled);
if (!error)
apparmor_enabled = enabled ? 1 : 0;
return 1;
}
__setup("apparmor=", apparmor_enabled_setup);
/* set global flag turning off the ability to load policy */
static int param_set_aalockpolicy(const char *val, const struct kernel_param *kp)
{
if (!apparmor_enabled)
return -EINVAL;
if (apparmor_initialized && !aa_current_policy_admin_capable(NULL))
return -EPERM;
return param_set_bool(val, kp);
}
static int param_get_aalockpolicy(char *buffer, const struct kernel_param *kp)
{
if (!apparmor_enabled)
return -EINVAL;
if (apparmor_initialized && !aa_current_policy_view_capable(NULL))
return -EPERM;
return param_get_bool(buffer, kp);
}
static int param_set_aabool(const char *val, const struct kernel_param *kp)
{
if (!apparmor_enabled)
return -EINVAL;
if (apparmor_initialized && !aa_current_policy_admin_capable(NULL))
return -EPERM;
return param_set_bool(val, kp);
}
static int param_get_aabool(char *buffer, const struct kernel_param *kp)
{
if (!apparmor_enabled)
return -EINVAL;
if (apparmor_initialized && !aa_current_policy_view_capable(NULL))
return -EPERM;
return param_get_bool(buffer, kp);
}
static int param_set_aauint(const char *val, const struct kernel_param *kp)
{
int error;
if (!apparmor_enabled)
return -EINVAL;
/* file is ro but enforce 2nd line check */
if (apparmor_initialized)
return -EPERM;
error = param_set_uint(val, kp);
aa_g_path_max = max_t(uint32_t, aa_g_path_max, sizeof(union aa_buffer));
pr_info("AppArmor: buffer size set to %d bytes\n", aa_g_path_max);
return error;
}
static int param_get_aauint(char *buffer, const struct kernel_param *kp)
{
if (!apparmor_enabled)
return -EINVAL;
if (apparmor_initialized && !aa_current_policy_view_capable(NULL))
return -EPERM;
return param_get_uint(buffer, kp);
}
/* Can only be set before AppArmor is initialized (i.e. on boot cmdline). */
static int param_set_aaintbool(const char *val, const struct kernel_param *kp)
{
struct kernel_param kp_local;
bool value;
int error;
if (apparmor_initialized)
return -EPERM;
/* Create local copy, with arg pointing to bool type. */
value = !!*((int *)kp->arg);
memcpy(&kp_local, kp, sizeof(kp_local));
kp_local.arg = &value;
error = param_set_bool(val, &kp_local);
if (!error)
*((int *)kp->arg) = *((bool *)kp_local.arg);
return error;
}
/*
* To avoid changing /sys/module/apparmor/parameters/enabled from Y/N to
* 1/0, this converts the "int that is actually bool" back to bool for
* display in the /sys filesystem, while keeping it "int" for the LSM
* infrastructure.
*/
static int param_get_aaintbool(char *buffer, const struct kernel_param *kp)
{
struct kernel_param kp_local;
bool value;
/* Create local copy, with arg pointing to bool type. */
value = !!*((int *)kp->arg);
memcpy(&kp_local, kp, sizeof(kp_local));
kp_local.arg = &value;
return param_get_bool(buffer, &kp_local);
}
static int param_set_aacompressionlevel(const char *val,
const struct kernel_param *kp)
{
int error;
if (!apparmor_enabled)
return -EINVAL;
if (apparmor_initialized)
return -EPERM;
error = param_set_int(val, kp);
aa_g_rawdata_compression_level = clamp(aa_g_rawdata_compression_level,
AA_MIN_CLEVEL, AA_MAX_CLEVEL);
pr_info("AppArmor: policy rawdata compression level set to %d\n",
aa_g_rawdata_compression_level);
return error;
}
static int param_get_aacompressionlevel(char *buffer,
const struct kernel_param *kp)
{
if (!apparmor_enabled)
return -EINVAL;
if (apparmor_initialized && !aa_current_policy_view_capable(NULL))
return -EPERM;
return param_get_int(buffer, kp);
}
static int param_get_audit(char *buffer, const struct kernel_param *kp)
{
if (!apparmor_enabled)
return -EINVAL;
if (apparmor_initialized && !aa_current_policy_view_capable(NULL))
return -EPERM;
return sprintf(buffer, "%s", audit_mode_names[aa_g_audit]);
}
static int param_set_audit(const char *val, const struct kernel_param *kp)
{
int i;
if (!apparmor_enabled)
return -EINVAL;
if (!val)
return -EINVAL;
if (apparmor_initialized && !aa_current_policy_admin_capable(NULL))
return -EPERM;
i = match_string(audit_mode_names, AUDIT_MAX_INDEX, val);
if (i < 0)
return -EINVAL;
aa_g_audit = i;
return 0;
}
static int param_get_mode(char *buffer, const struct kernel_param *kp)
{
if (!apparmor_enabled)
return -EINVAL;
if (apparmor_initialized && !aa_current_policy_view_capable(NULL))
return -EPERM;
return sprintf(buffer, "%s", aa_profile_mode_names[aa_g_profile_mode]);
}
static int param_set_mode(const char *val, const struct kernel_param *kp)
{
int i;
if (!apparmor_enabled)
return -EINVAL;
if (!val)
return -EINVAL;
if (apparmor_initialized && !aa_current_policy_admin_capable(NULL))
return -EPERM;
i = match_string(aa_profile_mode_names, APPARMOR_MODE_NAMES_MAX_INDEX,
val);
if (i < 0)
return -EINVAL;
aa_g_profile_mode = i;
return 0;
}
char *aa_get_buffer(bool in_atomic)
{
union aa_buffer *aa_buf;
apparmor: cache buffers on percpu list if there is lock contention commit df323337e507 ("apparmor: Use a memory pool instead per-CPU caches") changed buffer allocation to use a memory pool, however on a heavily loaded machine there can be lock contention on the global buffers lock. Add a percpu list to cache buffers on when lock contention is encountered. When allocating buffers attempt to use cached buffers first, before taking the global buffers lock. When freeing buffers try to put them back to the global list but if contention is encountered, put the buffer on the percpu list. The length of time a buffer is held on the percpu list is dynamically adjusted based on lock contention. The amount of hold time is increased and decreased linearly. v5: - simplify base patch by removing: improvements can be added later - MAX_LOCAL and must lock - contention scaling. v4: - fix percpu ->count buffer count which had been spliced across a debug patch. - introduce define for MAX_LOCAL_COUNT - rework count check and locking around it. - update commit message to reference commit that introduced the memory. v3: - limit number of buffers that can be pushed onto the percpu list. This avoids a problem on some kernels where one percpu list can inherit buffers from another cpu after a reschedule, causing more kernel memory to used than is necessary. Under normal conditions this should eventually return to normal but under pathelogical conditions the extra memory consumption may have been unbouanded v2: - dynamically adjust buffer hold time on percpu list based on lock contention. v1: - cache buffers on percpu list on lock contention Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2022-10-25 08:18:41 +00:00
struct aa_local_cache *cache;
bool try_again = true;
gfp_t flags = (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
apparmor: cache buffers on percpu list if there is lock contention commit df323337e507 ("apparmor: Use a memory pool instead per-CPU caches") changed buffer allocation to use a memory pool, however on a heavily loaded machine there can be lock contention on the global buffers lock. Add a percpu list to cache buffers on when lock contention is encountered. When allocating buffers attempt to use cached buffers first, before taking the global buffers lock. When freeing buffers try to put them back to the global list but if contention is encountered, put the buffer on the percpu list. The length of time a buffer is held on the percpu list is dynamically adjusted based on lock contention. The amount of hold time is increased and decreased linearly. v5: - simplify base patch by removing: improvements can be added later - MAX_LOCAL and must lock - contention scaling. v4: - fix percpu ->count buffer count which had been spliced across a debug patch. - introduce define for MAX_LOCAL_COUNT - rework count check and locking around it. - update commit message to reference commit that introduced the memory. v3: - limit number of buffers that can be pushed onto the percpu list. This avoids a problem on some kernels where one percpu list can inherit buffers from another cpu after a reschedule, causing more kernel memory to used than is necessary. Under normal conditions this should eventually return to normal but under pathelogical conditions the extra memory consumption may have been unbouanded v2: - dynamically adjust buffer hold time on percpu list based on lock contention. v1: - cache buffers on percpu list on lock contention Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2022-10-25 08:18:41 +00:00
/* use per cpu cached buffers first */
cache = get_cpu_ptr(&aa_local_buffers);
if (!list_empty(&cache->head)) {
aa_buf = list_first_entry(&cache->head, union aa_buffer, list);
list_del(&aa_buf->list);
cache->hold--;
cache->count--;
put_cpu_ptr(&aa_local_buffers);
return &aa_buf->buffer[0];
}
put_cpu_ptr(&aa_local_buffers);
if (!spin_trylock(&aa_buffers_lock)) {
cache = get_cpu_ptr(&aa_local_buffers);
cache->hold += 1;
put_cpu_ptr(&aa_local_buffers);
spin_lock(&aa_buffers_lock);
} else {
cache = get_cpu_ptr(&aa_local_buffers);
put_cpu_ptr(&aa_local_buffers);
}
retry:
if (buffer_count > reserve_count ||
(in_atomic && !list_empty(&aa_global_buffers))) {
aa_buf = list_first_entry(&aa_global_buffers, union aa_buffer,
list);
list_del(&aa_buf->list);
buffer_count--;
spin_unlock(&aa_buffers_lock);
return aa_buf->buffer;
}
if (in_atomic) {
/*
* out of reserve buffers and in atomic context so increase
* how many buffers to keep in reserve
*/
reserve_count++;
flags = GFP_ATOMIC;
}
spin_unlock(&aa_buffers_lock);
if (!in_atomic)
might_sleep();
aa_buf = kmalloc(aa_g_path_max, flags);
if (!aa_buf) {
if (try_again) {
try_again = false;
apparmor: cache buffers on percpu list if there is lock contention commit df323337e507 ("apparmor: Use a memory pool instead per-CPU caches") changed buffer allocation to use a memory pool, however on a heavily loaded machine there can be lock contention on the global buffers lock. Add a percpu list to cache buffers on when lock contention is encountered. When allocating buffers attempt to use cached buffers first, before taking the global buffers lock. When freeing buffers try to put them back to the global list but if contention is encountered, put the buffer on the percpu list. The length of time a buffer is held on the percpu list is dynamically adjusted based on lock contention. The amount of hold time is increased and decreased linearly. v5: - simplify base patch by removing: improvements can be added later - MAX_LOCAL and must lock - contention scaling. v4: - fix percpu ->count buffer count which had been spliced across a debug patch. - introduce define for MAX_LOCAL_COUNT - rework count check and locking around it. - update commit message to reference commit that introduced the memory. v3: - limit number of buffers that can be pushed onto the percpu list. This avoids a problem on some kernels where one percpu list can inherit buffers from another cpu after a reschedule, causing more kernel memory to used than is necessary. Under normal conditions this should eventually return to normal but under pathelogical conditions the extra memory consumption may have been unbouanded v2: - dynamically adjust buffer hold time on percpu list based on lock contention. v1: - cache buffers on percpu list on lock contention Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2022-10-25 08:18:41 +00:00
spin_lock(&aa_buffers_lock);
goto retry;
}
pr_warn_once("AppArmor: Failed to allocate a memory buffer.\n");
return NULL;
}
return aa_buf->buffer;
}
void aa_put_buffer(char *buf)
{
union aa_buffer *aa_buf;
apparmor: cache buffers on percpu list if there is lock contention commit df323337e507 ("apparmor: Use a memory pool instead per-CPU caches") changed buffer allocation to use a memory pool, however on a heavily loaded machine there can be lock contention on the global buffers lock. Add a percpu list to cache buffers on when lock contention is encountered. When allocating buffers attempt to use cached buffers first, before taking the global buffers lock. When freeing buffers try to put them back to the global list but if contention is encountered, put the buffer on the percpu list. The length of time a buffer is held on the percpu list is dynamically adjusted based on lock contention. The amount of hold time is increased and decreased linearly. v5: - simplify base patch by removing: improvements can be added later - MAX_LOCAL and must lock - contention scaling. v4: - fix percpu ->count buffer count which had been spliced across a debug patch. - introduce define for MAX_LOCAL_COUNT - rework count check and locking around it. - update commit message to reference commit that introduced the memory. v3: - limit number of buffers that can be pushed onto the percpu list. This avoids a problem on some kernels where one percpu list can inherit buffers from another cpu after a reschedule, causing more kernel memory to used than is necessary. Under normal conditions this should eventually return to normal but under pathelogical conditions the extra memory consumption may have been unbouanded v2: - dynamically adjust buffer hold time on percpu list based on lock contention. v1: - cache buffers on percpu list on lock contention Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2022-10-25 08:18:41 +00:00
struct aa_local_cache *cache;
if (!buf)
return;
aa_buf = container_of(buf, union aa_buffer, buffer[0]);
apparmor: cache buffers on percpu list if there is lock contention commit df323337e507 ("apparmor: Use a memory pool instead per-CPU caches") changed buffer allocation to use a memory pool, however on a heavily loaded machine there can be lock contention on the global buffers lock. Add a percpu list to cache buffers on when lock contention is encountered. When allocating buffers attempt to use cached buffers first, before taking the global buffers lock. When freeing buffers try to put them back to the global list but if contention is encountered, put the buffer on the percpu list. The length of time a buffer is held on the percpu list is dynamically adjusted based on lock contention. The amount of hold time is increased and decreased linearly. v5: - simplify base patch by removing: improvements can be added later - MAX_LOCAL and must lock - contention scaling. v4: - fix percpu ->count buffer count which had been spliced across a debug patch. - introduce define for MAX_LOCAL_COUNT - rework count check and locking around it. - update commit message to reference commit that introduced the memory. v3: - limit number of buffers that can be pushed onto the percpu list. This avoids a problem on some kernels where one percpu list can inherit buffers from another cpu after a reschedule, causing more kernel memory to used than is necessary. Under normal conditions this should eventually return to normal but under pathelogical conditions the extra memory consumption may have been unbouanded v2: - dynamically adjust buffer hold time on percpu list based on lock contention. v1: - cache buffers on percpu list on lock contention Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2022-10-25 08:18:41 +00:00
cache = get_cpu_ptr(&aa_local_buffers);
if (!cache->hold) {
put_cpu_ptr(&aa_local_buffers);
if (spin_trylock(&aa_buffers_lock)) {
/* put back on global list */
list_add(&aa_buf->list, &aa_global_buffers);
buffer_count++;
spin_unlock(&aa_buffers_lock);
cache = get_cpu_ptr(&aa_local_buffers);
put_cpu_ptr(&aa_local_buffers);
return;
}
/* contention on global list, fallback to percpu */
cache = get_cpu_ptr(&aa_local_buffers);
cache->hold += 1;
}
/* cache in percpu list */
list_add(&aa_buf->list, &cache->head);
cache->count++;
put_cpu_ptr(&aa_local_buffers);
}
/*
* AppArmor init functions
*/
/**
* set_init_ctx - set a task context and profile on the first task.
*
* TODO: allow setting an alternate profile than unconfined
*/
static int __init set_init_ctx(void)
{
struct cred *cred = (__force struct cred *)current->real_cred;
set_cred_label(cred, aa_get_label(ns_unconfined(root_ns)));
return 0;
}
static void destroy_buffers(void)
{
union aa_buffer *aa_buf;
spin_lock(&aa_buffers_lock);
while (!list_empty(&aa_global_buffers)) {
aa_buf = list_first_entry(&aa_global_buffers, union aa_buffer,
list);
list_del(&aa_buf->list);
spin_unlock(&aa_buffers_lock);
kfree(aa_buf);
spin_lock(&aa_buffers_lock);
}
spin_unlock(&aa_buffers_lock);
}
static int __init alloc_buffers(void)
{
union aa_buffer *aa_buf;
int i, num;
apparmor: cache buffers on percpu list if there is lock contention commit df323337e507 ("apparmor: Use a memory pool instead per-CPU caches") changed buffer allocation to use a memory pool, however on a heavily loaded machine there can be lock contention on the global buffers lock. Add a percpu list to cache buffers on when lock contention is encountered. When allocating buffers attempt to use cached buffers first, before taking the global buffers lock. When freeing buffers try to put them back to the global list but if contention is encountered, put the buffer on the percpu list. The length of time a buffer is held on the percpu list is dynamically adjusted based on lock contention. The amount of hold time is increased and decreased linearly. v5: - simplify base patch by removing: improvements can be added later - MAX_LOCAL and must lock - contention scaling. v4: - fix percpu ->count buffer count which had been spliced across a debug patch. - introduce define for MAX_LOCAL_COUNT - rework count check and locking around it. - update commit message to reference commit that introduced the memory. v3: - limit number of buffers that can be pushed onto the percpu list. This avoids a problem on some kernels where one percpu list can inherit buffers from another cpu after a reschedule, causing more kernel memory to used than is necessary. Under normal conditions this should eventually return to normal but under pathelogical conditions the extra memory consumption may have been unbouanded v2: - dynamically adjust buffer hold time on percpu list based on lock contention. v1: - cache buffers on percpu list on lock contention Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2022-10-25 08:18:41 +00:00
/*
* per cpu set of cached allocated buffers used to help reduce
* lock contention
*/
for_each_possible_cpu(i) {
per_cpu(aa_local_buffers, i).hold = 0;
per_cpu(aa_local_buffers, i).count = 0;
INIT_LIST_HEAD(&per_cpu(aa_local_buffers, i).head);
}
/*
* A function may require two buffers at once. Usually the buffers are
* used for a short period of time and are shared. On UP kernel buffers
* two should be enough, with more CPUs it is possible that more
* buffers will be used simultaneously. The preallocated pool may grow.
* This preallocation has also the side-effect that AppArmor will be
* disabled early at boot if aa_g_path_max is extremly high.
*/
if (num_online_cpus() > 1)
num = 4 + RESERVE_COUNT;
else
num = 2 + RESERVE_COUNT;
for (i = 0; i < num; i++) {
aa_buf = kmalloc(aa_g_path_max, GFP_KERNEL |
__GFP_RETRY_MAYFAIL | __GFP_NOWARN);
if (!aa_buf) {
destroy_buffers();
return -ENOMEM;
}
aa_put_buffer(aa_buf->buffer);
}
return 0;
}
#ifdef CONFIG_SYSCTL
static int apparmor_dointvec(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
if (!aa_current_policy_admin_capable(NULL))
return -EPERM;
if (!apparmor_enabled)
return -EINVAL;
return proc_dointvec(table, write, buffer, lenp, ppos);
}
static struct ctl_table apparmor_sysctl_table[] = {
#ifdef CONFIG_USER_NS
{
.procname = "unprivileged_userns_apparmor_policy",
.data = &unprivileged_userns_apparmor_policy,
.maxlen = sizeof(int),
.mode = 0600,
.proc_handler = apparmor_dointvec,
},
#endif /* CONFIG_USER_NS */
{
.procname = "apparmor_display_secid_mode",
.data = &apparmor_display_secid_mode,
.maxlen = sizeof(int),
.mode = 0600,
.proc_handler = apparmor_dointvec,
},
{
.procname = "apparmor_restrict_unprivileged_unconfined",
.data = &aa_unprivileged_unconfined_restricted,
.maxlen = sizeof(int),
.mode = 0600,
.proc_handler = apparmor_dointvec,
},
{ }
};
static int __init apparmor_init_sysctl(void)
{
return register_sysctl("kernel", apparmor_sysctl_table) ? 0 : -ENOMEM;
}
#else
static inline int apparmor_init_sysctl(void)
{
return 0;
}
#endif /* CONFIG_SYSCTL */
#if defined(CONFIG_NETFILTER) && defined(CONFIG_NETWORK_SECMARK)
static unsigned int apparmor_ip_postroute(void *priv,
struct sk_buff *skb,
const struct nf_hook_state *state)
{
struct aa_sk_ctx *ctx;
struct sock *sk;
if (!skb->secmark)
return NF_ACCEPT;
sk = skb_to_full_sk(skb);
if (sk == NULL)
return NF_ACCEPT;
ctx = aa_sock(sk);
if (!apparmor_secmark_check(ctx->label, OP_SENDMSG, AA_MAY_SEND,
skb->secmark, sk))
return NF_ACCEPT;
return NF_DROP_ERR(-ECONNREFUSED);
}
static const struct nf_hook_ops apparmor_nf_ops[] = {
{
.hook = apparmor_ip_postroute,
.pf = NFPROTO_IPV4,
.hooknum = NF_INET_POST_ROUTING,
.priority = NF_IP_PRI_SELINUX_FIRST,
},
#if IS_ENABLED(CONFIG_IPV6)
{
.hook = apparmor_ip_postroute,
.pf = NFPROTO_IPV6,
.hooknum = NF_INET_POST_ROUTING,
.priority = NF_IP6_PRI_SELINUX_FIRST,
},
#endif
};
static int __net_init apparmor_nf_register(struct net *net)
{
return nf_register_net_hooks(net, apparmor_nf_ops,
ARRAY_SIZE(apparmor_nf_ops));
}
static void __net_exit apparmor_nf_unregister(struct net *net)
{
nf_unregister_net_hooks(net, apparmor_nf_ops,
ARRAY_SIZE(apparmor_nf_ops));
}
static struct pernet_operations apparmor_net_ops = {
.init = apparmor_nf_register,
.exit = apparmor_nf_unregister,
};
static int __init apparmor_nf_ip_init(void)
{
int err;
if (!apparmor_enabled)
return 0;
err = register_pernet_subsys(&apparmor_net_ops);
if (err)
panic("Apparmor: register_pernet_subsys: error %d\n", err);
return 0;
}
__initcall(apparmor_nf_ip_init);
#endif
static char nulldfa_src[] = {
#include "nulldfa.in"
};
static struct aa_dfa *nulldfa;
static char stacksplitdfa_src[] = {
#include "stacksplitdfa.in"
};
struct aa_dfa *stacksplitdfa;
struct aa_policydb *nullpdb;
static int __init aa_setup_dfa_engine(void)
{
int error = -ENOMEM;
nullpdb = aa_alloc_pdb(GFP_KERNEL);
if (!nullpdb)
return -ENOMEM;
nulldfa = aa_dfa_unpack(nulldfa_src, sizeof(nulldfa_src),
TO_ACCEPT1_FLAG(YYTD_DATA32) |
TO_ACCEPT2_FLAG(YYTD_DATA32));
if (IS_ERR(nulldfa)) {
error = PTR_ERR(nulldfa);
goto fail;
}
nullpdb->dfa = aa_get_dfa(nulldfa);
nullpdb->perms = kcalloc(2, sizeof(struct aa_perms), GFP_KERNEL);
if (!nullpdb->perms)
goto fail;
nullpdb->size = 2;
stacksplitdfa = aa_dfa_unpack(stacksplitdfa_src,
sizeof(stacksplitdfa_src),
TO_ACCEPT1_FLAG(YYTD_DATA32) |
TO_ACCEPT2_FLAG(YYTD_DATA32));
if (IS_ERR(stacksplitdfa)) {
error = PTR_ERR(stacksplitdfa);
goto fail;
}
return 0;
fail:
aa_put_pdb(nullpdb);
aa_put_dfa(nulldfa);
nullpdb = NULL;
nulldfa = NULL;
stacksplitdfa = NULL;
return error;
}
static void __init aa_teardown_dfa_engine(void)
{
aa_put_dfa(stacksplitdfa);
aa_put_dfa(nulldfa);
aa_put_pdb(nullpdb);
nullpdb = NULL;
stacksplitdfa = NULL;
nulldfa = NULL;
}
static int __init apparmor_init(void)
{
int error;
error = aa_setup_dfa_engine();
if (error) {
AA_ERROR("Unable to setup dfa engine\n");
goto alloc_out;
}
error = aa_alloc_root_ns();
if (error) {
AA_ERROR("Unable to allocate default profile namespace\n");
goto alloc_out;
}
error = apparmor_init_sysctl();
if (error) {
AA_ERROR("Unable to register sysctls\n");
goto alloc_out;
}
error = alloc_buffers();
if (error) {
AA_ERROR("Unable to allocate work buffers\n");
goto alloc_out;
}
error = set_init_ctx();
if (error) {
AA_ERROR("Failed to set context on init task\n");
aa_free_root_ns();
goto buffers_out;
}
security_add_hooks(apparmor_hooks, ARRAY_SIZE(apparmor_hooks),
LSM: Identify modules by more than name Create a struct lsm_id to contain identifying information about Linux Security Modules (LSMs). At inception this contains the name of the module and an identifier associated with the security module. Change the security_add_hooks() interface to use this structure. Change the individual modules to maintain their own struct lsm_id and pass it to security_add_hooks(). The values are for LSM identifiers are defined in a new UAPI header file linux/lsm.h. Each existing LSM has been updated to include it's LSMID in the lsm_id. The LSM ID values are sequential, with the oldest module LSM_ID_CAPABILITY being the lowest value and the existing modules numbered in the order they were included in the main line kernel. This is an arbitrary convention for assigning the values, but none better presents itself. The value 0 is defined as being invalid. The values 1-99 are reserved for any special case uses which may arise in the future. This may include attributes of the LSM infrastructure itself, possibly related to namespacing or network attribute management. A special range is identified for such attributes to help reduce confusion for developers unfamiliar with LSMs. LSM attribute values are defined for the attributes presented by modules that are available today. As with the LSM IDs, The value 0 is defined as being invalid. The values 1-99 are reserved for any special case uses which may arise in the future. Cc: linux-security-module <linux-security-module@vger.kernel.org> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Mickael Salaun <mic@digikod.net> Reviewed-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Kees Cook <keescook@chromium.org> Nacked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [PM: forward ported beyond v6.6 due merge window changes] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-09-12 20:56:46 +00:00
&apparmor_lsmid);
/* Report that AppArmor successfully initialized */
apparmor_initialized = 1;
if (aa_g_profile_mode == APPARMOR_COMPLAIN)
aa_info_message("AppArmor initialized: complain mode enabled");
else if (aa_g_profile_mode == APPARMOR_KILL)
aa_info_message("AppArmor initialized: kill mode enabled");
else
aa_info_message("AppArmor initialized");
return error;
buffers_out:
destroy_buffers();
alloc_out:
aa_destroy_aafs();
aa_teardown_dfa_engine();
apparmor_enabled = false;
return error;
}
DEFINE_LSM(apparmor) = {
.name = "apparmor",
.flags = LSM_FLAG_LEGACY_MAJOR | LSM_FLAG_EXCLUSIVE,
.enabled = &apparmor_enabled,
.blobs = &apparmor_blob_sizes,
.init = apparmor_init,
};