linux/security/security.c

4096 lines
113 KiB
C
Raw Normal View History

// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Security plug functions
*
* Copyright (C) 2001 WireX Communications, Inc <chris@wirex.com>
* Copyright (C) 2001-2002 Greg Kroah-Hartman <greg@kroah.com>
* Copyright (C) 2001 Networks Associates Technology, Inc <ssmalley@nai.com>
IB/core: Enforce PKey security on QPs Add new LSM hooks to allocate and free security contexts and check for permission to access a PKey. Allocate and free a security context when creating and destroying a QP. This context is used for controlling access to PKeys. When a request is made to modify a QP that changes the port, PKey index, or alternate path, check that the QP has permission for the PKey in the PKey table index on the subnet prefix of the port. If the QP is shared make sure all handles to the QP also have access. Store which port and PKey index a QP is using. After the reset to init transition the user can modify the port, PKey index and alternate path independently. So port and PKey settings changes can be a merge of the previous settings and the new ones. In order to maintain access control if there are PKey table or subnet prefix change keep a list of all QPs are using each PKey index on each port. If a change occurs all QPs using that device and port must have access enforced for the new cache settings. These changes add a transaction to the QP modify process. Association with the old port and PKey index must be maintained if the modify fails, and must be removed if it succeeds. Association with the new port and PKey index must be established prior to the modify and removed if the modify fails. 1. When a QP is modified to a particular Port, PKey index or alternate path insert that QP into the appropriate lists. 2. Check permission to access the new settings. 3. If step 2 grants access attempt to modify the QP. 4a. If steps 2 and 3 succeed remove any prior associations. 4b. If ether fails remove the new setting associations. If a PKey table or subnet prefix changes walk the list of QPs and check that they have permission. If not send the QP to the error state and raise a fatal error event. If it's a shared QP make sure all the QPs that share the real_qp have permission as well. If the QP that owns a security structure is denied access the security structure is marked as such and the QP is added to an error_list. Once the moving the QP to error is complete the security structure mark is cleared. Maintaining the lists correctly turns QP destroy into a transaction. The hardware driver for the device frees the ib_qp structure, so while the destroy is in progress the ib_qp pointer in the ib_qp_security struct is undefined. When the destroy process begins the ib_qp_security structure is marked as destroying. This prevents any action from being taken on the QP pointer. After the QP is destroyed successfully it could still listed on an error_list wait for it to be processed by that flow before cleaning up the structure. If the destroy fails the QPs port and PKey settings are reinserted into the appropriate lists, the destroying flag is cleared, and access control is enforced, in case there were any cache changes during the destroy flow. To keep the security changes isolated a new file is used to hold security related functionality. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Acked-by: Doug Ledford <dledford@redhat.com> [PM: merge fixup in ib_verbs.h and uverbs_cmd.c] Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-19 12:48:52 +00:00
* Copyright (C) 2016 Mellanox Technologies
* Copyright (C) 2023 Microsoft Corporation <paul@paul-moore.com>
*/
#define pr_fmt(fmt) "LSM: " fmt
#include <linux/bpf.h>
#include <linux/capability.h>
#include <linux/dcache.h>
#include <linux/export.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/kernel_read_file.h>
#include <linux/lsm_hooks.h>
#include <linux/integrity.h>
#include <linux/ima.h>
#include <linux/evm.h>
#include <linux/fsnotify.h>
#include <linux/mman.h>
#include <linux/mount.h>
#include <linux/personality.h>
#include <linux/backing-dev.h>
#include <linux/string.h>
#include <linux/msg.h>
#include <net/flow.h>
#define MAX_LSM_EVM_XATTR 2
/* How many LSMs were built into the kernel? */
#define LSM_COUNT (__end_lsm_info - __start_lsm_info)
security,lockdown,selinux: implement SELinux lockdown Implement a SELinux hook for lockdown. If the lockdown module is also enabled, then a denial by the lockdown module will take precedence over SELinux, so SELinux can only further restrict lockdown decisions. The SELinux hook only distinguishes at the granularity of integrity versus confidentiality similar to the lockdown module, but includes the full lockdown reason as part of the audit record as a hint in diagnosing what triggered the denial. To support this auditing, move the lockdown_reasons[] string array from being private to the lockdown module to the security framework so that it can be used by the lsm audit code and so that it is always available even when the lockdown module is disabled. Note that the SELinux implementation allows the integrity and confidentiality reasons to be controlled independently from one another. Thus, in an SELinux policy, one could allow operations that specify an integrity reason while blocking operations that specify a confidentiality reason. The SELinux hook implementation is stricter than the lockdown module in validating the provided reason value. Sample AVC audit output from denials: avc: denied { integrity } for pid=3402 comm="fwupd" lockdown_reason="/dev/mem,kmem,port" scontext=system_u:system_r:fwupd_t:s0 tcontext=system_u:system_r:fwupd_t:s0 tclass=lockdown permissive=0 avc: denied { confidentiality } for pid=4628 comm="cp" lockdown_reason="/proc/kcore access" scontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023 tclass=lockdown permissive=0 Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Reviewed-by: James Morris <jamorris@linux.microsoft.com> [PM: some merge fuzz do the the perf hooks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-11-27 17:04:36 +00:00
/*
* These are descriptions of the reasons that can be passed to the
* security_locked_down() LSM hook. Placing this array here allows
* all security modules to use the same descriptions for auditing
* purposes.
*/
const char *const lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
[LOCKDOWN_NONE] = "none",
[LOCKDOWN_MODULE_SIGNATURE] = "unsigned module loading",
[LOCKDOWN_DEV_MEM] = "/dev/mem,kmem,port",
[LOCKDOWN_EFI_TEST] = "/dev/efi_test access",
[LOCKDOWN_KEXEC] = "kexec of unsigned images",
[LOCKDOWN_HIBERNATION] = "hibernation",
[LOCKDOWN_PCI_ACCESS] = "direct PCI access",
[LOCKDOWN_IOPORT] = "raw io port access",
[LOCKDOWN_MSR] = "raw MSR access",
[LOCKDOWN_ACPI_TABLES] = "modifying ACPI tables",
[LOCKDOWN_DEVICE_TREE] = "modifying device tree contents",
security,lockdown,selinux: implement SELinux lockdown Implement a SELinux hook for lockdown. If the lockdown module is also enabled, then a denial by the lockdown module will take precedence over SELinux, so SELinux can only further restrict lockdown decisions. The SELinux hook only distinguishes at the granularity of integrity versus confidentiality similar to the lockdown module, but includes the full lockdown reason as part of the audit record as a hint in diagnosing what triggered the denial. To support this auditing, move the lockdown_reasons[] string array from being private to the lockdown module to the security framework so that it can be used by the lsm audit code and so that it is always available even when the lockdown module is disabled. Note that the SELinux implementation allows the integrity and confidentiality reasons to be controlled independently from one another. Thus, in an SELinux policy, one could allow operations that specify an integrity reason while blocking operations that specify a confidentiality reason. The SELinux hook implementation is stricter than the lockdown module in validating the provided reason value. Sample AVC audit output from denials: avc: denied { integrity } for pid=3402 comm="fwupd" lockdown_reason="/dev/mem,kmem,port" scontext=system_u:system_r:fwupd_t:s0 tcontext=system_u:system_r:fwupd_t:s0 tclass=lockdown permissive=0 avc: denied { confidentiality } for pid=4628 comm="cp" lockdown_reason="/proc/kcore access" scontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023 tclass=lockdown permissive=0 Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Reviewed-by: James Morris <jamorris@linux.microsoft.com> [PM: some merge fuzz do the the perf hooks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-11-27 17:04:36 +00:00
[LOCKDOWN_PCMCIA_CIS] = "direct PCMCIA CIS storage",
[LOCKDOWN_TIOCSSERIAL] = "reconfiguration of serial port IO",
[LOCKDOWN_MODULE_PARAMETERS] = "unsafe module parameters",
[LOCKDOWN_MMIOTRACE] = "unsafe mmio",
[LOCKDOWN_DEBUGFS] = "debugfs access",
[LOCKDOWN_XMON_WR] = "xmon write access",
bpf: Add lockdown check for probe_write_user helper Back then, commit 96ae52279594 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers") added the bpf_probe_write_user() helper in order to allow to override user space memory. Its original goal was to have a facility to "debug, divert, and manipulate execution of semi-cooperative processes" under CAP_SYS_ADMIN. Write to kernel was explicitly disallowed since it would otherwise tamper with its integrity. One use case was shown in cf9b1199de27 ("samples/bpf: Add test/example of using bpf_probe_write_user bpf helper") where the program DNATs traffic at the time of connect(2) syscall, meaning, it rewrites the arguments to a syscall while they're still in userspace, and before the syscall has a chance to copy the argument into kernel space. These days we have better mechanisms in BPF for achieving the same (e.g. for load-balancers), but without having to write to userspace memory. Of course the bpf_probe_write_user() helper can also be used to abuse many other things for both good or bad purpose. Outside of BPF, there is a similar mechanism for ptrace(2) such as PTRACE_PEEK{TEXT,DATA} and PTRACE_POKE{TEXT,DATA}, but would likely require some more effort. Commit 96ae52279594 explicitly dedicated the helper for experimentation purpose only. Thus, move the helper's availability behind a newly added LOCKDOWN_BPF_WRITE_USER lockdown knob so that the helper is disabled under the "integrity" mode. More fine-grained control can be implemented also from LSM side with this change. Fixes: 96ae52279594 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org>
2021-08-09 10:43:17 +00:00
[LOCKDOWN_BPF_WRITE_USER] = "use of bpf to write user RAM",
[LOCKDOWN_DBG_WRITE_KERNEL] = "use of kgdb/kdb to write kernel RAM",
[LOCKDOWN_RTAS_ERROR_INJECTION] = "RTAS error injection",
security,lockdown,selinux: implement SELinux lockdown Implement a SELinux hook for lockdown. If the lockdown module is also enabled, then a denial by the lockdown module will take precedence over SELinux, so SELinux can only further restrict lockdown decisions. The SELinux hook only distinguishes at the granularity of integrity versus confidentiality similar to the lockdown module, but includes the full lockdown reason as part of the audit record as a hint in diagnosing what triggered the denial. To support this auditing, move the lockdown_reasons[] string array from being private to the lockdown module to the security framework so that it can be used by the lsm audit code and so that it is always available even when the lockdown module is disabled. Note that the SELinux implementation allows the integrity and confidentiality reasons to be controlled independently from one another. Thus, in an SELinux policy, one could allow operations that specify an integrity reason while blocking operations that specify a confidentiality reason. The SELinux hook implementation is stricter than the lockdown module in validating the provided reason value. Sample AVC audit output from denials: avc: denied { integrity } for pid=3402 comm="fwupd" lockdown_reason="/dev/mem,kmem,port" scontext=system_u:system_r:fwupd_t:s0 tcontext=system_u:system_r:fwupd_t:s0 tclass=lockdown permissive=0 avc: denied { confidentiality } for pid=4628 comm="cp" lockdown_reason="/proc/kcore access" scontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023 tclass=lockdown permissive=0 Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Reviewed-by: James Morris <jamorris@linux.microsoft.com> [PM: some merge fuzz do the the perf hooks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-11-27 17:04:36 +00:00
[LOCKDOWN_INTEGRITY_MAX] = "integrity",
[LOCKDOWN_KCORE] = "/proc/kcore access",
[LOCKDOWN_KPROBES] = "use of kprobes",
[LOCKDOWN_BPF_READ_KERNEL] = "use of bpf to read kernel RAM",
[LOCKDOWN_DBG_READ_KERNEL] = "use of kgdb/kdb to read kernel RAM",
security,lockdown,selinux: implement SELinux lockdown Implement a SELinux hook for lockdown. If the lockdown module is also enabled, then a denial by the lockdown module will take precedence over SELinux, so SELinux can only further restrict lockdown decisions. The SELinux hook only distinguishes at the granularity of integrity versus confidentiality similar to the lockdown module, but includes the full lockdown reason as part of the audit record as a hint in diagnosing what triggered the denial. To support this auditing, move the lockdown_reasons[] string array from being private to the lockdown module to the security framework so that it can be used by the lsm audit code and so that it is always available even when the lockdown module is disabled. Note that the SELinux implementation allows the integrity and confidentiality reasons to be controlled independently from one another. Thus, in an SELinux policy, one could allow operations that specify an integrity reason while blocking operations that specify a confidentiality reason. The SELinux hook implementation is stricter than the lockdown module in validating the provided reason value. Sample AVC audit output from denials: avc: denied { integrity } for pid=3402 comm="fwupd" lockdown_reason="/dev/mem,kmem,port" scontext=system_u:system_r:fwupd_t:s0 tcontext=system_u:system_r:fwupd_t:s0 tclass=lockdown permissive=0 avc: denied { confidentiality } for pid=4628 comm="cp" lockdown_reason="/proc/kcore access" scontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023 tclass=lockdown permissive=0 Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Reviewed-by: James Morris <jamorris@linux.microsoft.com> [PM: some merge fuzz do the the perf hooks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-11-27 17:04:36 +00:00
[LOCKDOWN_PERF] = "unsafe use of perf",
[LOCKDOWN_TRACEFS] = "use of tracefs",
[LOCKDOWN_XMON_RW] = "xmon read and write access",
[LOCKDOWN_XFRM_SECRET] = "xfrm SA secret",
security,lockdown,selinux: implement SELinux lockdown Implement a SELinux hook for lockdown. If the lockdown module is also enabled, then a denial by the lockdown module will take precedence over SELinux, so SELinux can only further restrict lockdown decisions. The SELinux hook only distinguishes at the granularity of integrity versus confidentiality similar to the lockdown module, but includes the full lockdown reason as part of the audit record as a hint in diagnosing what triggered the denial. To support this auditing, move the lockdown_reasons[] string array from being private to the lockdown module to the security framework so that it can be used by the lsm audit code and so that it is always available even when the lockdown module is disabled. Note that the SELinux implementation allows the integrity and confidentiality reasons to be controlled independently from one another. Thus, in an SELinux policy, one could allow operations that specify an integrity reason while blocking operations that specify a confidentiality reason. The SELinux hook implementation is stricter than the lockdown module in validating the provided reason value. Sample AVC audit output from denials: avc: denied { integrity } for pid=3402 comm="fwupd" lockdown_reason="/dev/mem,kmem,port" scontext=system_u:system_r:fwupd_t:s0 tcontext=system_u:system_r:fwupd_t:s0 tclass=lockdown permissive=0 avc: denied { confidentiality } for pid=4628 comm="cp" lockdown_reason="/proc/kcore access" scontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023 tclass=lockdown permissive=0 Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Reviewed-by: James Morris <jamorris@linux.microsoft.com> [PM: some merge fuzz do the the perf hooks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-11-27 17:04:36 +00:00
[LOCKDOWN_CONFIDENTIALITY_MAX] = "confidentiality",
};
struct security_hook_heads security_hook_heads __lsm_ro_after_init;
static BLOCKING_NOTIFIER_HEAD(blocking_lsm_notifier_chain);
static struct kmem_cache *lsm_file_cache;
static struct kmem_cache *lsm_inode_cache;
char *lsm_names;
static struct lsm_blob_sizes blob_sizes __lsm_ro_after_init;
/* Boot-time LSM user choice */
static __initdata const char *chosen_lsm_order;
static __initdata const char *chosen_major_lsm;
static __initconst const char * const builtin_lsm_order = CONFIG_LSM;
/* Ordered list of LSMs to initialize. */
static __initdata struct lsm_info **ordered_lsms;
static __initdata struct lsm_info *exclusive;
static __initdata bool debug;
#define init_debug(...) \
do { \
if (debug) \
pr_info(__VA_ARGS__); \
} while (0)
static bool __init is_enabled(struct lsm_info *lsm)
{
if (!lsm->enabled)
return false;
return *lsm->enabled;
}
/* Mark an LSM's enabled flag. */
static int lsm_enabled_true __initdata = 1;
static int lsm_enabled_false __initdata = 0;
static void __init set_enabled(struct lsm_info *lsm, bool enabled)
{
/*
* When an LSM hasn't configured an enable variable, we can use
* a hard-coded location for storing the default enabled state.
*/
if (!lsm->enabled) {
if (enabled)
lsm->enabled = &lsm_enabled_true;
else
lsm->enabled = &lsm_enabled_false;
} else if (lsm->enabled == &lsm_enabled_true) {
if (!enabled)
lsm->enabled = &lsm_enabled_false;
} else if (lsm->enabled == &lsm_enabled_false) {
if (enabled)
lsm->enabled = &lsm_enabled_true;
} else {
*lsm->enabled = enabled;
}
}
/* Is an LSM already listed in the ordered LSMs list? */
static bool __init exists_ordered_lsm(struct lsm_info *lsm)
{
struct lsm_info **check;
for (check = ordered_lsms; *check; check++)
if (*check == lsm)
return true;
return false;
}
/* Append an LSM to the list of ordered LSMs to initialize. */
static int last_lsm __initdata;
static void __init append_ordered_lsm(struct lsm_info *lsm, const char *from)
{
/* Ignore duplicate selections. */
if (exists_ordered_lsm(lsm))
return;
if (WARN(last_lsm == LSM_COUNT, "%s: out of LSM slots!?\n", from))
return;
/* Enable this LSM, if it is not already set. */
if (!lsm->enabled)
lsm->enabled = &lsm_enabled_true;
ordered_lsms[last_lsm++] = lsm;
LSM: Better reporting of actual LSMs at boot Enhance the details reported by "lsm.debug" in several ways: - report contents of "security=" - report contents of "CONFIG_LSM" - report contents of "lsm=" - report any early LSM details - whitespace-align the output of similar phases for easier visual parsing - change "disabled" to more accurate "skipped" - explain what "skipped" and "ignored" mean in a parenthetical Upgrade the "security= is ignored" warning from pr_info to pr_warn, and include full arguments list to make the cause even more clear. Replace static "Security Framework initializing" pr_info with specific list of the resulting order of enabled LSMs. For example, if the kernel is built with: CONFIG_SECURITY_SELINUX=y CONFIG_SECURITY_APPARMOR=y CONFIG_SECURITY_LOADPIN=y CONFIG_SECURITY_YAMA=y CONFIG_SECURITY_SAFESETID=y CONFIG_SECURITY_LOCKDOWN_LSM=y CONFIG_SECURITY_LANDLOCK=y CONFIG_INTEGRITY=y CONFIG_BPF_LSM=y CONFIG_DEFAULT_SECURITY_APPARMOR=y CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,selinux, smack,tomoyo,apparmor,bpf" Booting without options will show: LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf landlock: Up and running. Yama: becoming mindful. LoadPin: ready to pin (currently not enforcing) SELinux: Initializing. LSM support for eBPF active Boot with "lsm.debug" will show: LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (enabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: exclusive disabled: apparmor LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf LSM: cred blob size = 32 LSM: file blob size = 16 LSM: inode blob size = 72 LSM: ipc blob size = 8 LSM: msg_msg blob size = 4 LSM: superblock blob size = 80 LSM: task blob size = 8 LSM: initializing capability LSM: initializing landlock landlock: Up and running. LSM: initializing yama Yama: becoming mindful. LSM: initializing loadpin LoadPin: ready to pin (currently not enforcing) LSM: initializing safesetid LSM: initializing integrity LSM: initializing selinux SELinux: Initializing. LSM: initializing bpf LSM support for eBPF active And some examples of how the lsm.debug ordering report changes... With "lsm.debug security=selinux": LSM: legacy security=selinux LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: security=selinux disabled: apparmor (only one legacy major LSM) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (disabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf With "lsm.debug lsm=integrity,selinux,loadpin,crabability,bpf, loadpin,loadpin": LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm=integrity,selinux,loadpin,capability,bpf,loadpin, loadpin LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: cmdline ordered: integrity (enabled) LSM: cmdline ordered: selinux (enabled) LSM: cmdline ordered: loadpin (enabled) LSM: cmdline ignored: crabability (not built into kernel) LSM: cmdline ordered: bpf (enabled) LSM: cmdline skipped: apparmor (not in requested order) LSM: cmdline skipped: yama (not in requested order) LSM: cmdline skipped: safesetid (not in requested order) LSM: cmdline skipped: landlock (not in requested order) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,integrity,selinux,loadpin,bpf Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: linux-security-module@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Mickaël Salaün <mic@digikod.net> [PM: line wrapped commit description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-11-02 00:05:29 +00:00
init_debug("%s ordered: %s (%s)\n", from, lsm->name,
is_enabled(lsm) ? "enabled" : "disabled");
}
/* Is an LSM allowed to be initialized? */
static bool __init lsm_allowed(struct lsm_info *lsm)
{
/* Skip if the LSM is disabled. */
if (!is_enabled(lsm))
return false;
/* Not allowed if another exclusive LSM already initialized. */
if ((lsm->flags & LSM_FLAG_EXCLUSIVE) && exclusive) {
init_debug("exclusive disabled: %s\n", lsm->name);
return false;
}
return true;
}
static void __init lsm_set_blob_size(int *need, int *lbs)
{
int offset;
landlock: Support file truncation Introduce the LANDLOCK_ACCESS_FS_TRUNCATE flag for file truncation. This flag hooks into the path_truncate, file_truncate and file_alloc_security LSM hooks and covers file truncation using truncate(2), ftruncate(2), open(2) with O_TRUNC, as well as creat(). This change also increments the Landlock ABI version, updates corresponding selftests, and updates code documentation to document the flag. In security/security.c, allocate security blobs at pointer-aligned offsets. This fixes the problem where one LSM's security blob can shift another LSM's security blob to an unaligned address (reported by Nathan Chancellor). The following operations are restricted: open(2): requires the LANDLOCK_ACCESS_FS_TRUNCATE right if a file gets implicitly truncated as part of the open() (e.g. using O_TRUNC). Notable special cases: * open(..., O_RDONLY|O_TRUNC) can truncate files as well in Linux * open() with O_TRUNC does *not* need the TRUNCATE right when it creates a new file. truncate(2) (on a path): requires the LANDLOCK_ACCESS_FS_TRUNCATE right. ftruncate(2) (on a file): requires that the file had the TRUNCATE right when it was previously opened. File descriptors acquired by other means than open(2) (e.g. memfd_create(2)) continue to support truncation with ftruncate(2). Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Günther Noack <gnoack3000@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> (LSM) Link: https://lore.kernel.org/r/20221018182216.301684-5-gnoack3000@gmail.com Signed-off-by: Mickaël Salaün <mic@digikod.net>
2022-10-18 18:22:09 +00:00
if (*need <= 0)
return;
offset = ALIGN(*lbs, sizeof(void *));
*lbs = offset + *need;
*need = offset;
}
static void __init lsm_set_blob_sizes(struct lsm_blob_sizes *needed)
{
if (!needed)
return;
lsm_set_blob_size(&needed->lbs_cred, &blob_sizes.lbs_cred);
lsm_set_blob_size(&needed->lbs_file, &blob_sizes.lbs_file);
/*
* The inode blob gets an rcu_head in addition to
* what the modules might need.
*/
if (needed->lbs_inode && blob_sizes.lbs_inode == 0)
blob_sizes.lbs_inode = sizeof(struct rcu_head);
lsm_set_blob_size(&needed->lbs_inode, &blob_sizes.lbs_inode);
lsm_set_blob_size(&needed->lbs_ipc, &blob_sizes.lbs_ipc);
lsm_set_blob_size(&needed->lbs_msg_msg, &blob_sizes.lbs_msg_msg);
lsm_set_blob_size(&needed->lbs_superblock, &blob_sizes.lbs_superblock);
lsm_set_blob_size(&needed->lbs_task, &blob_sizes.lbs_task);
}
/* Prepare LSM for initialization. */
static void __init prepare_lsm(struct lsm_info *lsm)
{
int enabled = lsm_allowed(lsm);
/* Record enablement (to handle any following exclusive LSMs). */
set_enabled(lsm, enabled);
/* If enabled, do pre-initialization work. */
if (enabled) {
if ((lsm->flags & LSM_FLAG_EXCLUSIVE) && !exclusive) {
exclusive = lsm;
LSM: Better reporting of actual LSMs at boot Enhance the details reported by "lsm.debug" in several ways: - report contents of "security=" - report contents of "CONFIG_LSM" - report contents of "lsm=" - report any early LSM details - whitespace-align the output of similar phases for easier visual parsing - change "disabled" to more accurate "skipped" - explain what "skipped" and "ignored" mean in a parenthetical Upgrade the "security= is ignored" warning from pr_info to pr_warn, and include full arguments list to make the cause even more clear. Replace static "Security Framework initializing" pr_info with specific list of the resulting order of enabled LSMs. For example, if the kernel is built with: CONFIG_SECURITY_SELINUX=y CONFIG_SECURITY_APPARMOR=y CONFIG_SECURITY_LOADPIN=y CONFIG_SECURITY_YAMA=y CONFIG_SECURITY_SAFESETID=y CONFIG_SECURITY_LOCKDOWN_LSM=y CONFIG_SECURITY_LANDLOCK=y CONFIG_INTEGRITY=y CONFIG_BPF_LSM=y CONFIG_DEFAULT_SECURITY_APPARMOR=y CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,selinux, smack,tomoyo,apparmor,bpf" Booting without options will show: LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf landlock: Up and running. Yama: becoming mindful. LoadPin: ready to pin (currently not enforcing) SELinux: Initializing. LSM support for eBPF active Boot with "lsm.debug" will show: LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (enabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: exclusive disabled: apparmor LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf LSM: cred blob size = 32 LSM: file blob size = 16 LSM: inode blob size = 72 LSM: ipc blob size = 8 LSM: msg_msg blob size = 4 LSM: superblock blob size = 80 LSM: task blob size = 8 LSM: initializing capability LSM: initializing landlock landlock: Up and running. LSM: initializing yama Yama: becoming mindful. LSM: initializing loadpin LoadPin: ready to pin (currently not enforcing) LSM: initializing safesetid LSM: initializing integrity LSM: initializing selinux SELinux: Initializing. LSM: initializing bpf LSM support for eBPF active And some examples of how the lsm.debug ordering report changes... With "lsm.debug security=selinux": LSM: legacy security=selinux LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: security=selinux disabled: apparmor (only one legacy major LSM) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (disabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf With "lsm.debug lsm=integrity,selinux,loadpin,crabability,bpf, loadpin,loadpin": LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm=integrity,selinux,loadpin,capability,bpf,loadpin, loadpin LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: cmdline ordered: integrity (enabled) LSM: cmdline ordered: selinux (enabled) LSM: cmdline ordered: loadpin (enabled) LSM: cmdline ignored: crabability (not built into kernel) LSM: cmdline ordered: bpf (enabled) LSM: cmdline skipped: apparmor (not in requested order) LSM: cmdline skipped: yama (not in requested order) LSM: cmdline skipped: safesetid (not in requested order) LSM: cmdline skipped: landlock (not in requested order) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,integrity,selinux,loadpin,bpf Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: linux-security-module@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Mickaël Salaün <mic@digikod.net> [PM: line wrapped commit description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-11-02 00:05:29 +00:00
init_debug("exclusive chosen: %s\n", lsm->name);
}
lsm_set_blob_sizes(lsm->blobs);
}
}
/* Initialize a given LSM, if it is enabled. */
static void __init initialize_lsm(struct lsm_info *lsm)
{
if (is_enabled(lsm)) {
int ret;
init_debug("initializing %s\n", lsm->name);
ret = lsm->init();
WARN(ret, "%s failed to initialize: %d\n", lsm->name, ret);
}
}
/* Populate ordered LSMs list from comma-separated LSM name list. */
static void __init ordered_lsm_parse(const char *order, const char *origin)
{
struct lsm_info *lsm;
char *sep, *name, *next;
/* LSM_ORDER_FIRST is always first. */
for (lsm = __start_lsm_info; lsm < __end_lsm_info; lsm++) {
if (lsm->order == LSM_ORDER_FIRST)
LSM: Better reporting of actual LSMs at boot Enhance the details reported by "lsm.debug" in several ways: - report contents of "security=" - report contents of "CONFIG_LSM" - report contents of "lsm=" - report any early LSM details - whitespace-align the output of similar phases for easier visual parsing - change "disabled" to more accurate "skipped" - explain what "skipped" and "ignored" mean in a parenthetical Upgrade the "security= is ignored" warning from pr_info to pr_warn, and include full arguments list to make the cause even more clear. Replace static "Security Framework initializing" pr_info with specific list of the resulting order of enabled LSMs. For example, if the kernel is built with: CONFIG_SECURITY_SELINUX=y CONFIG_SECURITY_APPARMOR=y CONFIG_SECURITY_LOADPIN=y CONFIG_SECURITY_YAMA=y CONFIG_SECURITY_SAFESETID=y CONFIG_SECURITY_LOCKDOWN_LSM=y CONFIG_SECURITY_LANDLOCK=y CONFIG_INTEGRITY=y CONFIG_BPF_LSM=y CONFIG_DEFAULT_SECURITY_APPARMOR=y CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,selinux, smack,tomoyo,apparmor,bpf" Booting without options will show: LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf landlock: Up and running. Yama: becoming mindful. LoadPin: ready to pin (currently not enforcing) SELinux: Initializing. LSM support for eBPF active Boot with "lsm.debug" will show: LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (enabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: exclusive disabled: apparmor LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf LSM: cred blob size = 32 LSM: file blob size = 16 LSM: inode blob size = 72 LSM: ipc blob size = 8 LSM: msg_msg blob size = 4 LSM: superblock blob size = 80 LSM: task blob size = 8 LSM: initializing capability LSM: initializing landlock landlock: Up and running. LSM: initializing yama Yama: becoming mindful. LSM: initializing loadpin LoadPin: ready to pin (currently not enforcing) LSM: initializing safesetid LSM: initializing integrity LSM: initializing selinux SELinux: Initializing. LSM: initializing bpf LSM support for eBPF active And some examples of how the lsm.debug ordering report changes... With "lsm.debug security=selinux": LSM: legacy security=selinux LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: security=selinux disabled: apparmor (only one legacy major LSM) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (disabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf With "lsm.debug lsm=integrity,selinux,loadpin,crabability,bpf, loadpin,loadpin": LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm=integrity,selinux,loadpin,capability,bpf,loadpin, loadpin LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: cmdline ordered: integrity (enabled) LSM: cmdline ordered: selinux (enabled) LSM: cmdline ordered: loadpin (enabled) LSM: cmdline ignored: crabability (not built into kernel) LSM: cmdline ordered: bpf (enabled) LSM: cmdline skipped: apparmor (not in requested order) LSM: cmdline skipped: yama (not in requested order) LSM: cmdline skipped: safesetid (not in requested order) LSM: cmdline skipped: landlock (not in requested order) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,integrity,selinux,loadpin,bpf Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: linux-security-module@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Mickaël Salaün <mic@digikod.net> [PM: line wrapped commit description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-11-02 00:05:29 +00:00
append_ordered_lsm(lsm, " first");
}
/* Process "security=", if given. */
if (chosen_major_lsm) {
struct lsm_info *major;
/*
* To match the original "security=" behavior, this
* explicitly does NOT fallback to another Legacy Major
* if the selected one was separately disabled: disable
* all non-matching Legacy Major LSMs.
*/
for (major = __start_lsm_info; major < __end_lsm_info;
major++) {
if ((major->flags & LSM_FLAG_LEGACY_MAJOR) &&
strcmp(major->name, chosen_major_lsm) != 0) {
set_enabled(major, false);
LSM: Better reporting of actual LSMs at boot Enhance the details reported by "lsm.debug" in several ways: - report contents of "security=" - report contents of "CONFIG_LSM" - report contents of "lsm=" - report any early LSM details - whitespace-align the output of similar phases for easier visual parsing - change "disabled" to more accurate "skipped" - explain what "skipped" and "ignored" mean in a parenthetical Upgrade the "security= is ignored" warning from pr_info to pr_warn, and include full arguments list to make the cause even more clear. Replace static "Security Framework initializing" pr_info with specific list of the resulting order of enabled LSMs. For example, if the kernel is built with: CONFIG_SECURITY_SELINUX=y CONFIG_SECURITY_APPARMOR=y CONFIG_SECURITY_LOADPIN=y CONFIG_SECURITY_YAMA=y CONFIG_SECURITY_SAFESETID=y CONFIG_SECURITY_LOCKDOWN_LSM=y CONFIG_SECURITY_LANDLOCK=y CONFIG_INTEGRITY=y CONFIG_BPF_LSM=y CONFIG_DEFAULT_SECURITY_APPARMOR=y CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,selinux, smack,tomoyo,apparmor,bpf" Booting without options will show: LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf landlock: Up and running. Yama: becoming mindful. LoadPin: ready to pin (currently not enforcing) SELinux: Initializing. LSM support for eBPF active Boot with "lsm.debug" will show: LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (enabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: exclusive disabled: apparmor LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf LSM: cred blob size = 32 LSM: file blob size = 16 LSM: inode blob size = 72 LSM: ipc blob size = 8 LSM: msg_msg blob size = 4 LSM: superblock blob size = 80 LSM: task blob size = 8 LSM: initializing capability LSM: initializing landlock landlock: Up and running. LSM: initializing yama Yama: becoming mindful. LSM: initializing loadpin LoadPin: ready to pin (currently not enforcing) LSM: initializing safesetid LSM: initializing integrity LSM: initializing selinux SELinux: Initializing. LSM: initializing bpf LSM support for eBPF active And some examples of how the lsm.debug ordering report changes... With "lsm.debug security=selinux": LSM: legacy security=selinux LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: security=selinux disabled: apparmor (only one legacy major LSM) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (disabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf With "lsm.debug lsm=integrity,selinux,loadpin,crabability,bpf, loadpin,loadpin": LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm=integrity,selinux,loadpin,capability,bpf,loadpin, loadpin LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: cmdline ordered: integrity (enabled) LSM: cmdline ordered: selinux (enabled) LSM: cmdline ordered: loadpin (enabled) LSM: cmdline ignored: crabability (not built into kernel) LSM: cmdline ordered: bpf (enabled) LSM: cmdline skipped: apparmor (not in requested order) LSM: cmdline skipped: yama (not in requested order) LSM: cmdline skipped: safesetid (not in requested order) LSM: cmdline skipped: landlock (not in requested order) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,integrity,selinux,loadpin,bpf Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: linux-security-module@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Mickaël Salaün <mic@digikod.net> [PM: line wrapped commit description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-11-02 00:05:29 +00:00
init_debug("security=%s disabled: %s (only one legacy major LSM)\n",
chosen_major_lsm, major->name);
}
}
}
sep = kstrdup(order, GFP_KERNEL);
next = sep;
/* Walk the list, looking for matching LSMs. */
while ((name = strsep(&next, ",")) != NULL) {
bool found = false;
for (lsm = __start_lsm_info; lsm < __end_lsm_info; lsm++) {
if (lsm->order == LSM_ORDER_MUTABLE &&
strcmp(lsm->name, name) == 0) {
append_ordered_lsm(lsm, origin);
found = true;
}
}
if (!found)
LSM: Better reporting of actual LSMs at boot Enhance the details reported by "lsm.debug" in several ways: - report contents of "security=" - report contents of "CONFIG_LSM" - report contents of "lsm=" - report any early LSM details - whitespace-align the output of similar phases for easier visual parsing - change "disabled" to more accurate "skipped" - explain what "skipped" and "ignored" mean in a parenthetical Upgrade the "security= is ignored" warning from pr_info to pr_warn, and include full arguments list to make the cause even more clear. Replace static "Security Framework initializing" pr_info with specific list of the resulting order of enabled LSMs. For example, if the kernel is built with: CONFIG_SECURITY_SELINUX=y CONFIG_SECURITY_APPARMOR=y CONFIG_SECURITY_LOADPIN=y CONFIG_SECURITY_YAMA=y CONFIG_SECURITY_SAFESETID=y CONFIG_SECURITY_LOCKDOWN_LSM=y CONFIG_SECURITY_LANDLOCK=y CONFIG_INTEGRITY=y CONFIG_BPF_LSM=y CONFIG_DEFAULT_SECURITY_APPARMOR=y CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,selinux, smack,tomoyo,apparmor,bpf" Booting without options will show: LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf landlock: Up and running. Yama: becoming mindful. LoadPin: ready to pin (currently not enforcing) SELinux: Initializing. LSM support for eBPF active Boot with "lsm.debug" will show: LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (enabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: exclusive disabled: apparmor LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf LSM: cred blob size = 32 LSM: file blob size = 16 LSM: inode blob size = 72 LSM: ipc blob size = 8 LSM: msg_msg blob size = 4 LSM: superblock blob size = 80 LSM: task blob size = 8 LSM: initializing capability LSM: initializing landlock landlock: Up and running. LSM: initializing yama Yama: becoming mindful. LSM: initializing loadpin LoadPin: ready to pin (currently not enforcing) LSM: initializing safesetid LSM: initializing integrity LSM: initializing selinux SELinux: Initializing. LSM: initializing bpf LSM support for eBPF active And some examples of how the lsm.debug ordering report changes... With "lsm.debug security=selinux": LSM: legacy security=selinux LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: security=selinux disabled: apparmor (only one legacy major LSM) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (disabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf With "lsm.debug lsm=integrity,selinux,loadpin,crabability,bpf, loadpin,loadpin": LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm=integrity,selinux,loadpin,capability,bpf,loadpin, loadpin LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: cmdline ordered: integrity (enabled) LSM: cmdline ordered: selinux (enabled) LSM: cmdline ordered: loadpin (enabled) LSM: cmdline ignored: crabability (not built into kernel) LSM: cmdline ordered: bpf (enabled) LSM: cmdline skipped: apparmor (not in requested order) LSM: cmdline skipped: yama (not in requested order) LSM: cmdline skipped: safesetid (not in requested order) LSM: cmdline skipped: landlock (not in requested order) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,integrity,selinux,loadpin,bpf Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: linux-security-module@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Mickaël Salaün <mic@digikod.net> [PM: line wrapped commit description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-11-02 00:05:29 +00:00
init_debug("%s ignored: %s (not built into kernel)\n",
origin, name);
}
/* Process "security=", if given. */
if (chosen_major_lsm) {
for (lsm = __start_lsm_info; lsm < __end_lsm_info; lsm++) {
if (exists_ordered_lsm(lsm))
continue;
if (strcmp(lsm->name, chosen_major_lsm) == 0)
append_ordered_lsm(lsm, "security=");
}
}
/* Disable all LSMs not in the ordered list. */
for (lsm = __start_lsm_info; lsm < __end_lsm_info; lsm++) {
if (exists_ordered_lsm(lsm))
continue;
set_enabled(lsm, false);
LSM: Better reporting of actual LSMs at boot Enhance the details reported by "lsm.debug" in several ways: - report contents of "security=" - report contents of "CONFIG_LSM" - report contents of "lsm=" - report any early LSM details - whitespace-align the output of similar phases for easier visual parsing - change "disabled" to more accurate "skipped" - explain what "skipped" and "ignored" mean in a parenthetical Upgrade the "security= is ignored" warning from pr_info to pr_warn, and include full arguments list to make the cause even more clear. Replace static "Security Framework initializing" pr_info with specific list of the resulting order of enabled LSMs. For example, if the kernel is built with: CONFIG_SECURITY_SELINUX=y CONFIG_SECURITY_APPARMOR=y CONFIG_SECURITY_LOADPIN=y CONFIG_SECURITY_YAMA=y CONFIG_SECURITY_SAFESETID=y CONFIG_SECURITY_LOCKDOWN_LSM=y CONFIG_SECURITY_LANDLOCK=y CONFIG_INTEGRITY=y CONFIG_BPF_LSM=y CONFIG_DEFAULT_SECURITY_APPARMOR=y CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,selinux, smack,tomoyo,apparmor,bpf" Booting without options will show: LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf landlock: Up and running. Yama: becoming mindful. LoadPin: ready to pin (currently not enforcing) SELinux: Initializing. LSM support for eBPF active Boot with "lsm.debug" will show: LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (enabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: exclusive disabled: apparmor LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf LSM: cred blob size = 32 LSM: file blob size = 16 LSM: inode blob size = 72 LSM: ipc blob size = 8 LSM: msg_msg blob size = 4 LSM: superblock blob size = 80 LSM: task blob size = 8 LSM: initializing capability LSM: initializing landlock landlock: Up and running. LSM: initializing yama Yama: becoming mindful. LSM: initializing loadpin LoadPin: ready to pin (currently not enforcing) LSM: initializing safesetid LSM: initializing integrity LSM: initializing selinux SELinux: Initializing. LSM: initializing bpf LSM support for eBPF active And some examples of how the lsm.debug ordering report changes... With "lsm.debug security=selinux": LSM: legacy security=selinux LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: security=selinux disabled: apparmor (only one legacy major LSM) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (disabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf With "lsm.debug lsm=integrity,selinux,loadpin,crabability,bpf, loadpin,loadpin": LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm=integrity,selinux,loadpin,capability,bpf,loadpin, loadpin LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: cmdline ordered: integrity (enabled) LSM: cmdline ordered: selinux (enabled) LSM: cmdline ordered: loadpin (enabled) LSM: cmdline ignored: crabability (not built into kernel) LSM: cmdline ordered: bpf (enabled) LSM: cmdline skipped: apparmor (not in requested order) LSM: cmdline skipped: yama (not in requested order) LSM: cmdline skipped: safesetid (not in requested order) LSM: cmdline skipped: landlock (not in requested order) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,integrity,selinux,loadpin,bpf Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: linux-security-module@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Mickaël Salaün <mic@digikod.net> [PM: line wrapped commit description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-11-02 00:05:29 +00:00
init_debug("%s skipped: %s (not in requested order)\n",
origin, lsm->name);
}
kfree(sep);
}
static void __init lsm_early_cred(struct cred *cred);
static void __init lsm_early_task(struct task_struct *task);
static int lsm_append(const char *new, char **result);
LSM: Better reporting of actual LSMs at boot Enhance the details reported by "lsm.debug" in several ways: - report contents of "security=" - report contents of "CONFIG_LSM" - report contents of "lsm=" - report any early LSM details - whitespace-align the output of similar phases for easier visual parsing - change "disabled" to more accurate "skipped" - explain what "skipped" and "ignored" mean in a parenthetical Upgrade the "security= is ignored" warning from pr_info to pr_warn, and include full arguments list to make the cause even more clear. Replace static "Security Framework initializing" pr_info with specific list of the resulting order of enabled LSMs. For example, if the kernel is built with: CONFIG_SECURITY_SELINUX=y CONFIG_SECURITY_APPARMOR=y CONFIG_SECURITY_LOADPIN=y CONFIG_SECURITY_YAMA=y CONFIG_SECURITY_SAFESETID=y CONFIG_SECURITY_LOCKDOWN_LSM=y CONFIG_SECURITY_LANDLOCK=y CONFIG_INTEGRITY=y CONFIG_BPF_LSM=y CONFIG_DEFAULT_SECURITY_APPARMOR=y CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,selinux, smack,tomoyo,apparmor,bpf" Booting without options will show: LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf landlock: Up and running. Yama: becoming mindful. LoadPin: ready to pin (currently not enforcing) SELinux: Initializing. LSM support for eBPF active Boot with "lsm.debug" will show: LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (enabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: exclusive disabled: apparmor LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf LSM: cred blob size = 32 LSM: file blob size = 16 LSM: inode blob size = 72 LSM: ipc blob size = 8 LSM: msg_msg blob size = 4 LSM: superblock blob size = 80 LSM: task blob size = 8 LSM: initializing capability LSM: initializing landlock landlock: Up and running. LSM: initializing yama Yama: becoming mindful. LSM: initializing loadpin LoadPin: ready to pin (currently not enforcing) LSM: initializing safesetid LSM: initializing integrity LSM: initializing selinux SELinux: Initializing. LSM: initializing bpf LSM support for eBPF active And some examples of how the lsm.debug ordering report changes... With "lsm.debug security=selinux": LSM: legacy security=selinux LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: security=selinux disabled: apparmor (only one legacy major LSM) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (disabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf With "lsm.debug lsm=integrity,selinux,loadpin,crabability,bpf, loadpin,loadpin": LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm=integrity,selinux,loadpin,capability,bpf,loadpin, loadpin LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: cmdline ordered: integrity (enabled) LSM: cmdline ordered: selinux (enabled) LSM: cmdline ordered: loadpin (enabled) LSM: cmdline ignored: crabability (not built into kernel) LSM: cmdline ordered: bpf (enabled) LSM: cmdline skipped: apparmor (not in requested order) LSM: cmdline skipped: yama (not in requested order) LSM: cmdline skipped: safesetid (not in requested order) LSM: cmdline skipped: landlock (not in requested order) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,integrity,selinux,loadpin,bpf Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: linux-security-module@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Mickaël Salaün <mic@digikod.net> [PM: line wrapped commit description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-11-02 00:05:29 +00:00
static void __init report_lsm_order(void)
{
struct lsm_info **lsm, *early;
int first = 0;
pr_info("initializing lsm=");
/* Report each enabled LSM name, comma separated. */
for (early = __start_early_lsm_info; early < __end_early_lsm_info; early++)
if (is_enabled(early))
pr_cont("%s%s", first++ == 0 ? "" : ",", early->name);
for (lsm = ordered_lsms; *lsm; lsm++)
if (is_enabled(*lsm))
pr_cont("%s%s", first++ == 0 ? "" : ",", (*lsm)->name);
pr_cont("\n");
}
static void __init ordered_lsm_init(void)
{
struct lsm_info **lsm;
ordered_lsms = kcalloc(LSM_COUNT + 1, sizeof(*ordered_lsms),
GFP_KERNEL);
if (chosen_lsm_order) {
if (chosen_major_lsm) {
LSM: Better reporting of actual LSMs at boot Enhance the details reported by "lsm.debug" in several ways: - report contents of "security=" - report contents of "CONFIG_LSM" - report contents of "lsm=" - report any early LSM details - whitespace-align the output of similar phases for easier visual parsing - change "disabled" to more accurate "skipped" - explain what "skipped" and "ignored" mean in a parenthetical Upgrade the "security= is ignored" warning from pr_info to pr_warn, and include full arguments list to make the cause even more clear. Replace static "Security Framework initializing" pr_info with specific list of the resulting order of enabled LSMs. For example, if the kernel is built with: CONFIG_SECURITY_SELINUX=y CONFIG_SECURITY_APPARMOR=y CONFIG_SECURITY_LOADPIN=y CONFIG_SECURITY_YAMA=y CONFIG_SECURITY_SAFESETID=y CONFIG_SECURITY_LOCKDOWN_LSM=y CONFIG_SECURITY_LANDLOCK=y CONFIG_INTEGRITY=y CONFIG_BPF_LSM=y CONFIG_DEFAULT_SECURITY_APPARMOR=y CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,selinux, smack,tomoyo,apparmor,bpf" Booting without options will show: LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf landlock: Up and running. Yama: becoming mindful. LoadPin: ready to pin (currently not enforcing) SELinux: Initializing. LSM support for eBPF active Boot with "lsm.debug" will show: LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (enabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: exclusive disabled: apparmor LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf LSM: cred blob size = 32 LSM: file blob size = 16 LSM: inode blob size = 72 LSM: ipc blob size = 8 LSM: msg_msg blob size = 4 LSM: superblock blob size = 80 LSM: task blob size = 8 LSM: initializing capability LSM: initializing landlock landlock: Up and running. LSM: initializing yama Yama: becoming mindful. LSM: initializing loadpin LoadPin: ready to pin (currently not enforcing) LSM: initializing safesetid LSM: initializing integrity LSM: initializing selinux SELinux: Initializing. LSM: initializing bpf LSM support for eBPF active And some examples of how the lsm.debug ordering report changes... With "lsm.debug security=selinux": LSM: legacy security=selinux LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: security=selinux disabled: apparmor (only one legacy major LSM) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (disabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf With "lsm.debug lsm=integrity,selinux,loadpin,crabability,bpf, loadpin,loadpin": LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm=integrity,selinux,loadpin,capability,bpf,loadpin, loadpin LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: cmdline ordered: integrity (enabled) LSM: cmdline ordered: selinux (enabled) LSM: cmdline ordered: loadpin (enabled) LSM: cmdline ignored: crabability (not built into kernel) LSM: cmdline ordered: bpf (enabled) LSM: cmdline skipped: apparmor (not in requested order) LSM: cmdline skipped: yama (not in requested order) LSM: cmdline skipped: safesetid (not in requested order) LSM: cmdline skipped: landlock (not in requested order) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,integrity,selinux,loadpin,bpf Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: linux-security-module@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Mickaël Salaün <mic@digikod.net> [PM: line wrapped commit description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-11-02 00:05:29 +00:00
pr_warn("security=%s is ignored because it is superseded by lsm=%s\n",
chosen_major_lsm, chosen_lsm_order);
chosen_major_lsm = NULL;
}
ordered_lsm_parse(chosen_lsm_order, "cmdline");
} else
ordered_lsm_parse(builtin_lsm_order, "builtin");
for (lsm = ordered_lsms; *lsm; lsm++)
prepare_lsm(*lsm);
LSM: Better reporting of actual LSMs at boot Enhance the details reported by "lsm.debug" in several ways: - report contents of "security=" - report contents of "CONFIG_LSM" - report contents of "lsm=" - report any early LSM details - whitespace-align the output of similar phases for easier visual parsing - change "disabled" to more accurate "skipped" - explain what "skipped" and "ignored" mean in a parenthetical Upgrade the "security= is ignored" warning from pr_info to pr_warn, and include full arguments list to make the cause even more clear. Replace static "Security Framework initializing" pr_info with specific list of the resulting order of enabled LSMs. For example, if the kernel is built with: CONFIG_SECURITY_SELINUX=y CONFIG_SECURITY_APPARMOR=y CONFIG_SECURITY_LOADPIN=y CONFIG_SECURITY_YAMA=y CONFIG_SECURITY_SAFESETID=y CONFIG_SECURITY_LOCKDOWN_LSM=y CONFIG_SECURITY_LANDLOCK=y CONFIG_INTEGRITY=y CONFIG_BPF_LSM=y CONFIG_DEFAULT_SECURITY_APPARMOR=y CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,selinux, smack,tomoyo,apparmor,bpf" Booting without options will show: LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf landlock: Up and running. Yama: becoming mindful. LoadPin: ready to pin (currently not enforcing) SELinux: Initializing. LSM support for eBPF active Boot with "lsm.debug" will show: LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (enabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: exclusive disabled: apparmor LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf LSM: cred blob size = 32 LSM: file blob size = 16 LSM: inode blob size = 72 LSM: ipc blob size = 8 LSM: msg_msg blob size = 4 LSM: superblock blob size = 80 LSM: task blob size = 8 LSM: initializing capability LSM: initializing landlock landlock: Up and running. LSM: initializing yama Yama: becoming mindful. LSM: initializing loadpin LoadPin: ready to pin (currently not enforcing) LSM: initializing safesetid LSM: initializing integrity LSM: initializing selinux SELinux: Initializing. LSM: initializing bpf LSM support for eBPF active And some examples of how the lsm.debug ordering report changes... With "lsm.debug security=selinux": LSM: legacy security=selinux LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: security=selinux disabled: apparmor (only one legacy major LSM) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (disabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf With "lsm.debug lsm=integrity,selinux,loadpin,crabability,bpf, loadpin,loadpin": LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm=integrity,selinux,loadpin,capability,bpf,loadpin, loadpin LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: cmdline ordered: integrity (enabled) LSM: cmdline ordered: selinux (enabled) LSM: cmdline ordered: loadpin (enabled) LSM: cmdline ignored: crabability (not built into kernel) LSM: cmdline ordered: bpf (enabled) LSM: cmdline skipped: apparmor (not in requested order) LSM: cmdline skipped: yama (not in requested order) LSM: cmdline skipped: safesetid (not in requested order) LSM: cmdline skipped: landlock (not in requested order) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,integrity,selinux,loadpin,bpf Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: linux-security-module@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Mickaël Salaün <mic@digikod.net> [PM: line wrapped commit description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-11-02 00:05:29 +00:00
report_lsm_order();
init_debug("cred blob size = %d\n", blob_sizes.lbs_cred);
init_debug("file blob size = %d\n", blob_sizes.lbs_file);
init_debug("inode blob size = %d\n", blob_sizes.lbs_inode);
init_debug("ipc blob size = %d\n", blob_sizes.lbs_ipc);
init_debug("msg_msg blob size = %d\n", blob_sizes.lbs_msg_msg);
init_debug("superblock blob size = %d\n", blob_sizes.lbs_superblock);
init_debug("task blob size = %d\n", blob_sizes.lbs_task);
/*
* Create any kmem_caches needed for blobs
*/
if (blob_sizes.lbs_file)
lsm_file_cache = kmem_cache_create("lsm_file_cache",
blob_sizes.lbs_file, 0,
SLAB_PANIC, NULL);
if (blob_sizes.lbs_inode)
lsm_inode_cache = kmem_cache_create("lsm_inode_cache",
blob_sizes.lbs_inode, 0,
SLAB_PANIC, NULL);
lsm_early_cred((struct cred *) current->cred);
lsm_early_task(current);
for (lsm = ordered_lsms; *lsm; lsm++)
initialize_lsm(*lsm);
kfree(ordered_lsms);
}
int __init early_security_init(void)
{
struct lsm_info *lsm;
#define LSM_HOOK(RET, DEFAULT, NAME, ...) \
INIT_HLIST_HEAD(&security_hook_heads.NAME);
#include "linux/lsm_hook_defs.h"
#undef LSM_HOOK
for (lsm = __start_early_lsm_info; lsm < __end_early_lsm_info; lsm++) {
if (!lsm->enabled)
lsm->enabled = &lsm_enabled_true;
prepare_lsm(lsm);
initialize_lsm(lsm);
}
return 0;
}
/**
* security_init - initializes the security framework
*
* This should be called early in the kernel initialization sequence.
*/
int __init security_init(void)
{
struct lsm_info *lsm;
LSM: Better reporting of actual LSMs at boot Enhance the details reported by "lsm.debug" in several ways: - report contents of "security=" - report contents of "CONFIG_LSM" - report contents of "lsm=" - report any early LSM details - whitespace-align the output of similar phases for easier visual parsing - change "disabled" to more accurate "skipped" - explain what "skipped" and "ignored" mean in a parenthetical Upgrade the "security= is ignored" warning from pr_info to pr_warn, and include full arguments list to make the cause even more clear. Replace static "Security Framework initializing" pr_info with specific list of the resulting order of enabled LSMs. For example, if the kernel is built with: CONFIG_SECURITY_SELINUX=y CONFIG_SECURITY_APPARMOR=y CONFIG_SECURITY_LOADPIN=y CONFIG_SECURITY_YAMA=y CONFIG_SECURITY_SAFESETID=y CONFIG_SECURITY_LOCKDOWN_LSM=y CONFIG_SECURITY_LANDLOCK=y CONFIG_INTEGRITY=y CONFIG_BPF_LSM=y CONFIG_DEFAULT_SECURITY_APPARMOR=y CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,selinux, smack,tomoyo,apparmor,bpf" Booting without options will show: LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf landlock: Up and running. Yama: becoming mindful. LoadPin: ready to pin (currently not enforcing) SELinux: Initializing. LSM support for eBPF active Boot with "lsm.debug" will show: LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (enabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: exclusive disabled: apparmor LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf LSM: cred blob size = 32 LSM: file blob size = 16 LSM: inode blob size = 72 LSM: ipc blob size = 8 LSM: msg_msg blob size = 4 LSM: superblock blob size = 80 LSM: task blob size = 8 LSM: initializing capability LSM: initializing landlock landlock: Up and running. LSM: initializing yama Yama: becoming mindful. LSM: initializing loadpin LoadPin: ready to pin (currently not enforcing) LSM: initializing safesetid LSM: initializing integrity LSM: initializing selinux SELinux: Initializing. LSM: initializing bpf LSM support for eBPF active And some examples of how the lsm.debug ordering report changes... With "lsm.debug security=selinux": LSM: legacy security=selinux LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: security=selinux disabled: apparmor (only one legacy major LSM) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (disabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf With "lsm.debug lsm=integrity,selinux,loadpin,crabability,bpf, loadpin,loadpin": LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm=integrity,selinux,loadpin,capability,bpf,loadpin, loadpin LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: cmdline ordered: integrity (enabled) LSM: cmdline ordered: selinux (enabled) LSM: cmdline ordered: loadpin (enabled) LSM: cmdline ignored: crabability (not built into kernel) LSM: cmdline ordered: bpf (enabled) LSM: cmdline skipped: apparmor (not in requested order) LSM: cmdline skipped: yama (not in requested order) LSM: cmdline skipped: safesetid (not in requested order) LSM: cmdline skipped: landlock (not in requested order) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,integrity,selinux,loadpin,bpf Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: linux-security-module@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Mickaël Salaün <mic@digikod.net> [PM: line wrapped commit description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-11-02 00:05:29 +00:00
init_debug("legacy security=%s\n", chosen_major_lsm ?: " *unspecified*");
init_debug(" CONFIG_LSM=%s\n", builtin_lsm_order);
init_debug("boot arg lsm=%s\n", chosen_lsm_order ?: " *unspecified*");
/*
* Append the names of the early LSM modules now that kmalloc() is
* available
*/
for (lsm = __start_early_lsm_info; lsm < __end_early_lsm_info; lsm++) {
LSM: Better reporting of actual LSMs at boot Enhance the details reported by "lsm.debug" in several ways: - report contents of "security=" - report contents of "CONFIG_LSM" - report contents of "lsm=" - report any early LSM details - whitespace-align the output of similar phases for easier visual parsing - change "disabled" to more accurate "skipped" - explain what "skipped" and "ignored" mean in a parenthetical Upgrade the "security= is ignored" warning from pr_info to pr_warn, and include full arguments list to make the cause even more clear. Replace static "Security Framework initializing" pr_info with specific list of the resulting order of enabled LSMs. For example, if the kernel is built with: CONFIG_SECURITY_SELINUX=y CONFIG_SECURITY_APPARMOR=y CONFIG_SECURITY_LOADPIN=y CONFIG_SECURITY_YAMA=y CONFIG_SECURITY_SAFESETID=y CONFIG_SECURITY_LOCKDOWN_LSM=y CONFIG_SECURITY_LANDLOCK=y CONFIG_INTEGRITY=y CONFIG_BPF_LSM=y CONFIG_DEFAULT_SECURITY_APPARMOR=y CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,selinux, smack,tomoyo,apparmor,bpf" Booting without options will show: LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf landlock: Up and running. Yama: becoming mindful. LoadPin: ready to pin (currently not enforcing) SELinux: Initializing. LSM support for eBPF active Boot with "lsm.debug" will show: LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (enabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: exclusive disabled: apparmor LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf LSM: cred blob size = 32 LSM: file blob size = 16 LSM: inode blob size = 72 LSM: ipc blob size = 8 LSM: msg_msg blob size = 4 LSM: superblock blob size = 80 LSM: task blob size = 8 LSM: initializing capability LSM: initializing landlock landlock: Up and running. LSM: initializing yama Yama: becoming mindful. LSM: initializing loadpin LoadPin: ready to pin (currently not enforcing) LSM: initializing safesetid LSM: initializing integrity LSM: initializing selinux SELinux: Initializing. LSM: initializing bpf LSM support for eBPF active And some examples of how the lsm.debug ordering report changes... With "lsm.debug security=selinux": LSM: legacy security=selinux LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm= *unspecified* LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: security=selinux disabled: apparmor (only one legacy major LSM) LSM: builtin ordered: landlock (enabled) LSM: builtin ignored: lockdown (not built into kernel) LSM: builtin ordered: yama (enabled) LSM: builtin ordered: loadpin (enabled) LSM: builtin ordered: safesetid (enabled) LSM: builtin ordered: integrity (enabled) LSM: builtin ordered: selinux (enabled) LSM: builtin ignored: smack (not built into kernel) LSM: builtin ignored: tomoyo (not built into kernel) LSM: builtin ordered: apparmor (disabled) LSM: builtin ordered: bpf (enabled) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,landlock,yama,loadpin, safesetid,integrity,selinux,bpf With "lsm.debug lsm=integrity,selinux,loadpin,crabability,bpf, loadpin,loadpin": LSM: legacy security= *unspecified* LSM: CONFIG_LSM=landlock,lockdown,yama,loadpin,safesetid,integrity, selinux,smack,tomoyo,apparmor,bpf LSM: boot arg lsm=integrity,selinux,loadpin,capability,bpf,loadpin, loadpin LSM: early started: lockdown (enabled) LSM: first ordered: capability (enabled) LSM: cmdline ordered: integrity (enabled) LSM: cmdline ordered: selinux (enabled) LSM: cmdline ordered: loadpin (enabled) LSM: cmdline ignored: crabability (not built into kernel) LSM: cmdline ordered: bpf (enabled) LSM: cmdline skipped: apparmor (not in requested order) LSM: cmdline skipped: yama (not in requested order) LSM: cmdline skipped: safesetid (not in requested order) LSM: cmdline skipped: landlock (not in requested order) LSM: exclusive chosen: selinux LSM: initializing lsm=lockdown,capability,integrity,selinux,loadpin,bpf Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: linux-security-module@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Mickaël Salaün <mic@digikod.net> [PM: line wrapped commit description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-11-02 00:05:29 +00:00
init_debug(" early started: %s (%s)\n", lsm->name,
is_enabled(lsm) ? "enabled" : "disabled");
if (lsm->enabled)
lsm_append(lsm->name, &lsm_names);
}
/* Load LSMs in specified order. */
ordered_lsm_init();
return 0;
}
/* Save user chosen LSM */
static int __init choose_major_lsm(char *str)
{
chosen_major_lsm = str;
return 1;
}
__setup("security=", choose_major_lsm);
/* Explicitly choose LSM initialization order. */
static int __init choose_lsm_order(char *str)
{
chosen_lsm_order = str;
return 1;
}
__setup("lsm=", choose_lsm_order);
/* Enable LSM order debugging. */
static int __init enable_debug(char *str)
{
debug = true;
return 1;
}
__setup("lsm.debug", enable_debug);
static bool match_last_lsm(const char *list, const char *lsm)
{
const char *last;
if (WARN_ON(!list || !lsm))
return false;
last = strrchr(list, ',');
if (last)
/* Pass the comma, strcmp() will check for '\0' */
last++;
else
last = list;
return !strcmp(last, lsm);
}
static int lsm_append(const char *new, char **result)
{
char *cp;
if (*result == NULL) {
*result = kstrdup(new, GFP_KERNEL);
if (*result == NULL)
return -ENOMEM;
} else {
/* Check if it is the last registered name */
if (match_last_lsm(*result, new))
return 0;
cp = kasprintf(GFP_KERNEL, "%s,%s", *result, new);
if (cp == NULL)
return -ENOMEM;
kfree(*result);
*result = cp;
}
return 0;
}
/**
* security_add_hooks - Add a modules hooks to the hook lists.
* @hooks: the hooks to add
* @count: the number of hooks to add
* @lsm: the name of the security module
*
* Each LSM has to register its hooks with the infrastructure.
*/
void __init security_add_hooks(struct security_hook_list *hooks, int count,
const char *lsm)
{
int i;
for (i = 0; i < count; i++) {
hooks[i].lsm = lsm;
hlist_add_tail_rcu(&hooks[i].list, hooks[i].head);
}
/*
* Don't try to append during early_security_init(), we'll come back
* and fix this up afterwards.
*/
if (slab_is_available()) {
if (lsm_append(lsm, &lsm_names) < 0)
panic("%s - Cannot get early memory.\n", __func__);
}
}
int call_blocking_lsm_notifier(enum lsm_event event, void *data)
{
return blocking_notifier_call_chain(&blocking_lsm_notifier_chain,
event, data);
}
EXPORT_SYMBOL(call_blocking_lsm_notifier);
int register_blocking_lsm_notifier(struct notifier_block *nb)
{
return blocking_notifier_chain_register(&blocking_lsm_notifier_chain,
nb);
}
EXPORT_SYMBOL(register_blocking_lsm_notifier);
int unregister_blocking_lsm_notifier(struct notifier_block *nb)
{
return blocking_notifier_chain_unregister(&blocking_lsm_notifier_chain,
nb);
}
EXPORT_SYMBOL(unregister_blocking_lsm_notifier);
/**
* lsm_cred_alloc - allocate a composite cred blob
* @cred: the cred that needs a blob
* @gfp: allocation type
*
* Allocate the cred blob for all the modules
*
* Returns 0, or -ENOMEM if memory can't be allocated.
*/
static int lsm_cred_alloc(struct cred *cred, gfp_t gfp)
{
if (blob_sizes.lbs_cred == 0) {
cred->security = NULL;
return 0;
}
cred->security = kzalloc(blob_sizes.lbs_cred, gfp);
if (cred->security == NULL)
return -ENOMEM;
return 0;
}
/**
* lsm_early_cred - during initialization allocate a composite cred blob
* @cred: the cred that needs a blob
*
* Allocate the cred blob for all the modules
*/
static void __init lsm_early_cred(struct cred *cred)
{
int rc = lsm_cred_alloc(cred, GFP_KERNEL);
if (rc)
panic("%s: Early cred alloc failed.\n", __func__);
}
/**
* lsm_file_alloc - allocate a composite file blob
* @file: the file that needs a blob
*
* Allocate the file blob for all the modules
*
* Returns 0, or -ENOMEM if memory can't be allocated.
*/
static int lsm_file_alloc(struct file *file)
{
if (!lsm_file_cache) {
file->f_security = NULL;
return 0;
}
file->f_security = kmem_cache_zalloc(lsm_file_cache, GFP_KERNEL);
if (file->f_security == NULL)
return -ENOMEM;
return 0;
}
/**
* lsm_inode_alloc - allocate a composite inode blob
* @inode: the inode that needs a blob
*
* Allocate the inode blob for all the modules
*
* Returns 0, or -ENOMEM if memory can't be allocated.
*/
int lsm_inode_alloc(struct inode *inode)
{
if (!lsm_inode_cache) {
inode->i_security = NULL;
return 0;
}
inode->i_security = kmem_cache_zalloc(lsm_inode_cache, GFP_NOFS);
if (inode->i_security == NULL)
return -ENOMEM;
return 0;
}
/**
* lsm_task_alloc - allocate a composite task blob
* @task: the task that needs a blob
*
* Allocate the task blob for all the modules
*
* Returns 0, or -ENOMEM if memory can't be allocated.
*/
static int lsm_task_alloc(struct task_struct *task)
{
if (blob_sizes.lbs_task == 0) {
task->security = NULL;
return 0;
}
task->security = kzalloc(blob_sizes.lbs_task, GFP_KERNEL);
if (task->security == NULL)
return -ENOMEM;
return 0;
}
/**
* lsm_ipc_alloc - allocate a composite ipc blob
* @kip: the ipc that needs a blob
*
* Allocate the ipc blob for all the modules
*
* Returns 0, or -ENOMEM if memory can't be allocated.
*/
static int lsm_ipc_alloc(struct kern_ipc_perm *kip)
{
if (blob_sizes.lbs_ipc == 0) {
kip->security = NULL;
return 0;
}
kip->security = kzalloc(blob_sizes.lbs_ipc, GFP_KERNEL);
if (kip->security == NULL)
return -ENOMEM;
return 0;
}
/**
* lsm_msg_msg_alloc - allocate a composite msg_msg blob
* @mp: the msg_msg that needs a blob
*
* Allocate the ipc blob for all the modules
*
* Returns 0, or -ENOMEM if memory can't be allocated.
*/
static int lsm_msg_msg_alloc(struct msg_msg *mp)
{
if (blob_sizes.lbs_msg_msg == 0) {
mp->security = NULL;
return 0;
}
mp->security = kzalloc(blob_sizes.lbs_msg_msg, GFP_KERNEL);
if (mp->security == NULL)
return -ENOMEM;
return 0;
}
/**
* lsm_early_task - during initialization allocate a composite task blob
* @task: the task that needs a blob
*
* Allocate the task blob for all the modules
*/
static void __init lsm_early_task(struct task_struct *task)
{
int rc = lsm_task_alloc(task);
if (rc)
panic("%s: Early task alloc failed.\n", __func__);
}
/**
* lsm_superblock_alloc - allocate a composite superblock blob
* @sb: the superblock that needs a blob
*
* Allocate the superblock blob for all the modules
*
* Returns 0, or -ENOMEM if memory can't be allocated.
*/
static int lsm_superblock_alloc(struct super_block *sb)
{
if (blob_sizes.lbs_superblock == 0) {
sb->s_security = NULL;
return 0;
}
sb->s_security = kzalloc(blob_sizes.lbs_superblock, GFP_KERNEL);
if (sb->s_security == NULL)
return -ENOMEM;
return 0;
}
/*
* The default value of the LSM hook is defined in linux/lsm_hook_defs.h and
* can be accessed with:
*
* LSM_RET_DEFAULT(<hook_name>)
*
* The macros below define static constants for the default value of each
* LSM hook.
*/
#define LSM_RET_DEFAULT(NAME) (NAME##_default)
#define DECLARE_LSM_RET_DEFAULT_void(DEFAULT, NAME)
#define DECLARE_LSM_RET_DEFAULT_int(DEFAULT, NAME) \
static const int __maybe_unused LSM_RET_DEFAULT(NAME) = (DEFAULT);
#define LSM_HOOK(RET, DEFAULT, NAME, ...) \
DECLARE_LSM_RET_DEFAULT_##RET(DEFAULT, NAME)
#include <linux/lsm_hook_defs.h>
#undef LSM_HOOK
/*
* Hook list operation macros.
*
* call_void_hook:
* This is a hook that does not return a value.
*
* call_int_hook:
* This is a hook that returns a value.
*/
#define call_void_hook(FUNC, ...) \
do { \
struct security_hook_list *P; \
\
hlist_for_each_entry(P, &security_hook_heads.FUNC, list) \
P->hook.FUNC(__VA_ARGS__); \
} while (0)
#define call_int_hook(FUNC, IRC, ...) ({ \
int RC = IRC; \
do { \
struct security_hook_list *P; \
\
hlist_for_each_entry(P, &security_hook_heads.FUNC, list) { \
RC = P->hook.FUNC(__VA_ARGS__); \
if (RC != 0) \
break; \
} \
} while (0); \
RC; \
})
/* Security operations */
int security_binder_set_context_mgr(const struct cred *mgr)
{
return call_int_hook(binder_set_context_mgr, 0, mgr);
}
int security_binder_transaction(const struct cred *from,
const struct cred *to)
{
return call_int_hook(binder_transaction, 0, from, to);
}
int security_binder_transfer_binder(const struct cred *from,
const struct cred *to)
{
return call_int_hook(binder_transfer_binder, 0, from, to);
}
int security_binder_transfer_file(const struct cred *from,
const struct cred *to, struct file *file)
{
return call_int_hook(binder_transfer_file, 0, from, to, file);
}
int security_ptrace_access_check(struct task_struct *child, unsigned int mode)
{
return call_int_hook(ptrace_access_check, 0, child, mode);
security: Fix setting of PF_SUPERPRIV by __capable() Fix the setting of PF_SUPERPRIV by __capable() as it could corrupt the flags the target process if that is not the current process and it is trying to change its own flags in a different way at the same time. __capable() is using neither atomic ops nor locking to protect t->flags. This patch removes __capable() and introduces has_capability() that doesn't set PF_SUPERPRIV on the process being queried. This patch further splits security_ptrace() in two: (1) security_ptrace_may_access(). This passes judgement on whether one process may access another only (PTRACE_MODE_ATTACH for ptrace() and PTRACE_MODE_READ for /proc), and takes a pointer to the child process. current is the parent. (2) security_ptrace_traceme(). This passes judgement on PTRACE_TRACEME only, and takes only a pointer to the parent process. current is the child. In Smack and commoncap, this uses has_capability() to determine whether the parent will be permitted to use PTRACE_ATTACH if normal checks fail. This does not set PF_SUPERPRIV. Two of the instances of __capable() actually only act on current, and so have been changed to calls to capable(). Of the places that were using __capable(): (1) The OOM killer calls __capable() thrice when weighing the killability of a process. All of these now use has_capability(). (2) cap_ptrace() and smack_ptrace() were using __capable() to check to see whether the parent was allowed to trace any process. As mentioned above, these have been split. For PTRACE_ATTACH and /proc, capable() is now used, and for PTRACE_TRACEME, has_capability() is used. (3) cap_safe_nice() only ever saw current, so now uses capable(). (4) smack_setprocattr() rejected accesses to tasks other than current just after calling __capable(), so the order of these two tests have been switched and capable() is used instead. (5) In smack_file_send_sigiotask(), we need to allow privileged processes to receive SIGIO on files they're manipulating. (6) In smack_task_wait(), we let a process wait for a privileged process, whether or not the process doing the waiting is privileged. I've tested this with the LTP SELinux and syscalls testscripts. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>
2008-08-14 10:37:28 +00:00
}
int security_ptrace_traceme(struct task_struct *parent)
{
return call_int_hook(ptrace_traceme, 0, parent);
}
int security_capget(struct task_struct *target,
kernel_cap_t *effective,
kernel_cap_t *inheritable,
kernel_cap_t *permitted)
{
return call_int_hook(capget, 0, target,
effective, inheritable, permitted);
}
CRED: Inaugurate COW credentials Inaugurate copy-on-write credentials management. This uses RCU to manage the credentials pointer in the task_struct with respect to accesses by other tasks. A process may only modify its own credentials, and so does not need locking to access or modify its own credentials. A mutex (cred_replace_mutex) is added to the task_struct to control the effect of PTRACE_ATTACHED on credential calculations, particularly with respect to execve(). With this patch, the contents of an active credentials struct may not be changed directly; rather a new set of credentials must be prepared, modified and committed using something like the following sequence of events: struct cred *new = prepare_creds(); int ret = blah(new); if (ret < 0) { abort_creds(new); return ret; } return commit_creds(new); There are some exceptions to this rule: the keyrings pointed to by the active credentials may be instantiated - keyrings violate the COW rule as managing COW keyrings is tricky, given that it is possible for a task to directly alter the keys in a keyring in use by another task. To help enforce this, various pointers to sets of credentials, such as those in the task_struct, are declared const. The purpose of this is compile-time discouragement of altering credentials through those pointers. Once a set of credentials has been made public through one of these pointers, it may not be modified, except under special circumstances: (1) Its reference count may incremented and decremented. (2) The keyrings to which it points may be modified, but not replaced. The only safe way to modify anything else is to create a replacement and commit using the functions described in Documentation/credentials.txt (which will be added by a later patch). This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). This now prepares and commits credentials in various places in the security code rather than altering the current creds directly. (2) Temporary credential overrides. do_coredump() and sys_faccessat() now prepare their own credentials and temporarily override the ones currently on the acting thread, whilst preventing interference from other threads by holding cred_replace_mutex on the thread being dumped. This will be replaced in a future patch by something that hands down the credentials directly to the functions being called, rather than altering the task's objective credentials. (3) LSM interface. A number of functions have been changed, added or removed: (*) security_capset_check(), ->capset_check() (*) security_capset_set(), ->capset_set() Removed in favour of security_capset(). (*) security_capset(), ->capset() New. This is passed a pointer to the new creds, a pointer to the old creds and the proposed capability sets. It should fill in the new creds or return an error. All pointers, barring the pointer to the new creds, are now const. (*) security_bprm_apply_creds(), ->bprm_apply_creds() Changed; now returns a value, which will cause the process to be killed if it's an error. (*) security_task_alloc(), ->task_alloc_security() Removed in favour of security_prepare_creds(). (*) security_cred_free(), ->cred_free() New. Free security data attached to cred->security. (*) security_prepare_creds(), ->cred_prepare() New. Duplicate any security data attached to cred->security. (*) security_commit_creds(), ->cred_commit() New. Apply any security effects for the upcoming installation of new security by commit_creds(). (*) security_task_post_setuid(), ->task_post_setuid() Removed in favour of security_task_fix_setuid(). (*) security_task_fix_setuid(), ->task_fix_setuid() Fix up the proposed new credentials for setuid(). This is used by cap_set_fix_setuid() to implicitly adjust capabilities in line with setuid() changes. Changes are made to the new credentials, rather than the task itself as in security_task_post_setuid(). (*) security_task_reparent_to_init(), ->task_reparent_to_init() Removed. Instead the task being reparented to init is referred directly to init's credentials. NOTE! This results in the loss of some state: SELinux's osid no longer records the sid of the thread that forked it. (*) security_key_alloc(), ->key_alloc() (*) security_key_permission(), ->key_permission() Changed. These now take cred pointers rather than task pointers to refer to the security context. (4) sys_capset(). This has been simplified and uses less locking. The LSM functions it calls have been merged. (5) reparent_to_kthreadd(). This gives the current thread the same credentials as init by simply using commit_thread() to point that way. (6) __sigqueue_alloc() and switch_uid() __sigqueue_alloc() can't stop the target task from changing its creds beneath it, so this function gets a reference to the currently applicable user_struct which it then passes into the sigqueue struct it returns if successful. switch_uid() is now called from commit_creds(), and possibly should be folded into that. commit_creds() should take care of protecting __sigqueue_alloc(). (7) [sg]et[ug]id() and co and [sg]et_current_groups. The set functions now all use prepare_creds(), commit_creds() and abort_creds() to build and check a new set of credentials before applying it. security_task_set[ug]id() is called inside the prepared section. This guarantees that nothing else will affect the creds until we've finished. The calling of set_dumpable() has been moved into commit_creds(). Much of the functionality of set_user() has been moved into commit_creds(). The get functions all simply access the data directly. (8) security_task_prctl() and cap_task_prctl(). security_task_prctl() has been modified to return -ENOSYS if it doesn't want to handle a function, or otherwise return the return value directly rather than through an argument. Additionally, cap_task_prctl() now prepares a new set of credentials, even if it doesn't end up using it. (9) Keyrings. A number of changes have been made to the keyrings code: (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have all been dropped and built in to the credentials functions directly. They may want separating out again later. (b) key_alloc() and search_process_keyrings() now take a cred pointer rather than a task pointer to specify the security context. (c) copy_creds() gives a new thread within the same thread group a new thread keyring if its parent had one, otherwise it discards the thread keyring. (d) The authorisation key now points directly to the credentials to extend the search into rather pointing to the task that carries them. (e) Installing thread, process or session keyrings causes a new set of credentials to be created, even though it's not strictly necessary for process or session keyrings (they're shared). (10) Usermode helper. The usermode helper code now carries a cred struct pointer in its subprocess_info struct instead of a new session keyring pointer. This set of credentials is derived from init_cred and installed on the new process after it has been cloned. call_usermodehelper_setup() allocates the new credentials and call_usermodehelper_freeinfo() discards them if they haven't been used. A special cred function (prepare_usermodeinfo_creds()) is provided specifically for call_usermodehelper_setup() to call. call_usermodehelper_setkeys() adjusts the credentials to sport the supplied keyring as the new session keyring. (11) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) selinux_setprocattr() no longer does its check for whether the current ptracer can access processes with the new SID inside the lock that covers getting the ptracer's SID. Whilst this lock ensures that the check is done with the ptracer pinned, the result is only valid until the lock is released, so there's no point doing it inside the lock. (12) is_single_threaded(). This function has been extracted from selinux_setprocattr() and put into a file of its own in the lib/ directory as join_session_keyring() now wants to use it too. The code in SELinux just checked to see whether a task shared mm_structs with other tasks (CLONE_VM), but that isn't good enough. We really want to know if they're part of the same thread group (CLONE_THREAD). (13) nfsd. The NFS server daemon now has to use the COW credentials to set the credentials it is going to use. It really needs to pass the credentials down to the functions it calls, but it can't do that until other patches in this series have been applied. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-13 23:39:23 +00:00
int security_capset(struct cred *new, const struct cred *old,
const kernel_cap_t *effective,
const kernel_cap_t *inheritable,
const kernel_cap_t *permitted)
{
return call_int_hook(capset, 0, new, old,
effective, inheritable, permitted);
}
int security_capable(const struct cred *cred,
struct user_namespace *ns,
int cap,
unsigned int opts)
{
return call_int_hook(capable, 0, cred, ns, cap, opts);
}
int security_quotactl(int cmds, int type, int id, struct super_block *sb)
{
return call_int_hook(quotactl, 0, cmds, type, id, sb);
}
int security_quota_on(struct dentry *dentry)
{
return call_int_hook(quota_on, 0, dentry);
}
int security_syslog(int type)
{
return call_int_hook(syslog, 0, type);
}
int security_settime64(const struct timespec64 *ts, const struct timezone *tz)
{
return call_int_hook(settime, 0, ts, tz);
}
int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
{
struct security_hook_list *hp;
int cap_sys_admin = 1;
int rc;
/*
* The module will respond with a positive value if
* it thinks the __vm_enough_memory() call should be
* made with the cap_sys_admin set. If all of the modules
* agree that it should be set it will. If any module
* thinks it should not be set it won't.
*/
hlist_for_each_entry(hp, &security_hook_heads.vm_enough_memory, list) {
rc = hp->hook.vm_enough_memory(mm, pages);
if (rc <= 0) {
cap_sys_admin = 0;
break;
}
}
return __vm_enough_memory(mm, pages, cap_sys_admin);
}
/**
* security_bprm_creds_for_exec() - Prepare the credentials for exec()
* @bprm: binary program information
*
* If the setup in prepare_exec_creds did not setup @bprm->cred->security
* properly for executing @bprm->file, update the LSM's portion of
* @bprm->cred->security to be what commit_creds needs to install for the new
* program. This hook may also optionally check permissions (e.g. for
* transitions between security domains). The hook must set @bprm->secureexec
* to 1 if AT_SECURE should be set to request libc enable secure mode. @bprm
* contains the linux_binprm structure.
*
* Return: Returns 0 if the hook is successful and permission is granted.
*/
int security_bprm_creds_for_exec(struct linux_binprm *bprm)
{
return call_int_hook(bprm_creds_for_exec, 0, bprm);
}
/**
* security_bprm_creds_from_file() - Update linux_binprm creds based on file
* @bprm: binary program information
* @file: associated file
*
* If @file is setpcap, suid, sgid or otherwise marked to change privilege upon
* exec, update @bprm->cred to reflect that change. This is called after
* finding the binary that will be executed without an interpreter. This
* ensures that the credentials will not be derived from a script that the
* binary will need to reopen, which when reopend may end up being a completely
* different file. This hook may also optionally check permissions (e.g. for
* transitions between security domains). The hook must set @bprm->secureexec
* to 1 if AT_SECURE should be set to request libc enable secure mode. The
* hook must add to @bprm->per_clear any personality flags that should be
* cleared from current->personality. @bprm contains the linux_binprm
* structure.
*
* Return: Returns 0 if the hook is successful and permission is granted.
*/
exec: Compute file based creds only once Move the computation of creds from prepare_binfmt into begin_new_exec so that the creds need only be computed once. This is just code reorganization no semantic changes of any kind are made. Moving the computation is safe. I have looked through the kernel and verified none of the binfmts look at bprm->cred directly, and that there are no helpers that look at bprm->cred indirectly. Which means that it is not a problem to compute the bprm->cred later in the execution flow as it is not used until it becomes current->cred. A new function bprm_creds_from_file is added to contain the work that needs to be done. bprm_creds_from_file first computes which file bprm->executable or most likely bprm->file that the bprm->creds will be computed from. The funciton bprm_fill_uid is updated to receive the file instead of accessing bprm->file. The now unnecessary work needed to reset the bprm->cred->euid, and bprm->cred->egid is removed from brpm_fill_uid. A small comment to document that bprm_fill_uid now only deals with the work to handle suid and sgid files. The default case is already heandled by prepare_exec_creds. The function security_bprm_repopulate_creds is renamed security_bprm_creds_from_file and now is explicitly passed the file from which to compute the creds. The documentation of the bprm_creds_from_file security hook is updated to explain when the hook is called and what it needs to do. The file is passed from cap_bprm_creds_from_file into get_file_caps so that the caps are computed for the appropriate file. The now unnecessary work in cap_bprm_creds_from_file to reset the ambient capabilites has been removed. A small comment to document that the work of cap_bprm_creds_from_file is to read capabilities from the files secureity attribute and derive capabilities from the fact the user had uid 0 has been added. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-05-30 03:00:54 +00:00
int security_bprm_creds_from_file(struct linux_binprm *bprm, struct file *file)
{
exec: Compute file based creds only once Move the computation of creds from prepare_binfmt into begin_new_exec so that the creds need only be computed once. This is just code reorganization no semantic changes of any kind are made. Moving the computation is safe. I have looked through the kernel and verified none of the binfmts look at bprm->cred directly, and that there are no helpers that look at bprm->cred indirectly. Which means that it is not a problem to compute the bprm->cred later in the execution flow as it is not used until it becomes current->cred. A new function bprm_creds_from_file is added to contain the work that needs to be done. bprm_creds_from_file first computes which file bprm->executable or most likely bprm->file that the bprm->creds will be computed from. The funciton bprm_fill_uid is updated to receive the file instead of accessing bprm->file. The now unnecessary work needed to reset the bprm->cred->euid, and bprm->cred->egid is removed from brpm_fill_uid. A small comment to document that bprm_fill_uid now only deals with the work to handle suid and sgid files. The default case is already heandled by prepare_exec_creds. The function security_bprm_repopulate_creds is renamed security_bprm_creds_from_file and now is explicitly passed the file from which to compute the creds. The documentation of the bprm_creds_from_file security hook is updated to explain when the hook is called and what it needs to do. The file is passed from cap_bprm_creds_from_file into get_file_caps so that the caps are computed for the appropriate file. The now unnecessary work in cap_bprm_creds_from_file to reset the ambient capabilites has been removed. A small comment to document that the work of cap_bprm_creds_from_file is to read capabilities from the files secureity attribute and derive capabilities from the fact the user had uid 0 has been added. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-05-30 03:00:54 +00:00
return call_int_hook(bprm_creds_from_file, 0, bprm, file);
}
/**
* security_bprm_check() - Mediate binary handler search
* @bprm: binary program information
*
* This hook mediates the point when a search for a binary handler will begin.
* It allows a check against the @bprm->cred->security value which was set in
* the preceding creds_for_exec call. The argv list and envp list are reliably
* available in @bprm. This hook may be called multiple times during a single
* execve. @bprm contains the linux_binprm structure.
*
* Return: Returns 0 if the hook is successful and permission is granted.
*/
CRED: Make execve() take advantage of copy-on-write credentials Make execve() take advantage of copy-on-write credentials, allowing it to set up the credentials in advance, and then commit the whole lot after the point of no return. This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). The credential bits from struct linux_binprm are, for the most part, replaced with a single credentials pointer (bprm->cred). This means that all the creds can be calculated in advance and then applied at the point of no return with no possibility of failure. I would like to replace bprm->cap_effective with: cap_isclear(bprm->cap_effective) but this seems impossible due to special behaviour for processes of pid 1 (they always retain their parent's capability masks where normally they'd be changed - see cap_bprm_set_creds()). The following sequence of events now happens: (a) At the start of do_execve, the current task's cred_exec_mutex is locked to prevent PTRACE_ATTACH from obsoleting the calculation of creds that we make. (a) prepare_exec_creds() is then called to make a copy of the current task's credentials and prepare it. This copy is then assigned to bprm->cred. This renders security_bprm_alloc() and security_bprm_free() unnecessary, and so they've been removed. (b) The determination of unsafe execution is now performed immediately after (a) rather than later on in the code. The result is stored in bprm->unsafe for future reference. (c) prepare_binprm() is called, possibly multiple times. (i) This applies the result of set[ug]id binaries to the new creds attached to bprm->cred. Personality bit clearance is recorded, but now deferred on the basis that the exec procedure may yet fail. (ii) This then calls the new security_bprm_set_creds(). This should calculate the new LSM and capability credentials into *bprm->cred. This folds together security_bprm_set() and parts of security_bprm_apply_creds() (these two have been removed). Anything that might fail must be done at this point. (iii) bprm->cred_prepared is set to 1. bprm->cred_prepared is 0 on the first pass of the security calculations, and 1 on all subsequent passes. This allows SELinux in (ii) to base its calculations only on the initial script and not on the interpreter. (d) flush_old_exec() is called to commit the task to execution. This performs the following steps with regard to credentials: (i) Clear pdeath_signal and set dumpable on certain circumstances that may not be covered by commit_creds(). (ii) Clear any bits in current->personality that were deferred from (c.i). (e) install_exec_creds() [compute_creds() as was] is called to install the new credentials. This performs the following steps with regard to credentials: (i) Calls security_bprm_committing_creds() to apply any security requirements, such as flushing unauthorised files in SELinux, that must be done before the credentials are changed. This is made up of bits of security_bprm_apply_creds() and security_bprm_post_apply_creds(), both of which have been removed. This function is not allowed to fail; anything that might fail must have been done in (c.ii). (ii) Calls commit_creds() to apply the new credentials in a single assignment (more or less). Possibly pdeath_signal and dumpable should be part of struct creds. (iii) Unlocks the task's cred_replace_mutex, thus allowing PTRACE_ATTACH to take place. (iv) Clears The bprm->cred pointer as the credentials it was holding are now immutable. (v) Calls security_bprm_committed_creds() to apply any security alterations that must be done after the creds have been changed. SELinux uses this to flush signals and signal handlers. (f) If an error occurs before (d.i), bprm_free() will call abort_creds() to destroy the proposed new credentials and will then unlock cred_replace_mutex. No changes to the credentials will have been made. (2) LSM interface. A number of functions have been changed, added or removed: (*) security_bprm_alloc(), ->bprm_alloc_security() (*) security_bprm_free(), ->bprm_free_security() Removed in favour of preparing new credentials and modifying those. (*) security_bprm_apply_creds(), ->bprm_apply_creds() (*) security_bprm_post_apply_creds(), ->bprm_post_apply_creds() Removed; split between security_bprm_set_creds(), security_bprm_committing_creds() and security_bprm_committed_creds(). (*) security_bprm_set(), ->bprm_set_security() Removed; folded into security_bprm_set_creds(). (*) security_bprm_set_creds(), ->bprm_set_creds() New. The new credentials in bprm->creds should be checked and set up as appropriate. bprm->cred_prepared is 0 on the first call, 1 on the second and subsequent calls. (*) security_bprm_committing_creds(), ->bprm_committing_creds() (*) security_bprm_committed_creds(), ->bprm_committed_creds() New. Apply the security effects of the new credentials. This includes closing unauthorised files in SELinux. This function may not fail. When the former is called, the creds haven't yet been applied to the process; when the latter is called, they have. The former may access bprm->cred, the latter may not. (3) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) The bprm_security_struct struct has been removed in favour of using the credentials-under-construction approach. (c) flush_unauthorized_files() now takes a cred pointer and passes it on to inode_has_perm(), file_has_perm() and dentry_open(). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-13 23:39:24 +00:00
int security_bprm_check(struct linux_binprm *bprm)
{
int ret;
ret = call_int_hook(bprm_check_security, 0, bprm);
if (ret)
return ret;
return ima_bprm_check(bprm);
}
/**
* security_bprm_committing_creds() - Install creds for a process during exec()
* @bprm: binary program information
*
* Prepare to install the new security attributes of a process being
* transformed by an execve operation, based on the old credentials pointed to
* by @current->cred and the information set in @bprm->cred by the
* bprm_creds_for_exec hook. @bprm points to the linux_binprm structure. This
* hook is a good place to perform state changes on the process such as closing
* open file descriptors to which access will no longer be granted when the
* attributes are changed. This is called immediately before commit_creds().
*/
CRED: Make execve() take advantage of copy-on-write credentials Make execve() take advantage of copy-on-write credentials, allowing it to set up the credentials in advance, and then commit the whole lot after the point of no return. This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). The credential bits from struct linux_binprm are, for the most part, replaced with a single credentials pointer (bprm->cred). This means that all the creds can be calculated in advance and then applied at the point of no return with no possibility of failure. I would like to replace bprm->cap_effective with: cap_isclear(bprm->cap_effective) but this seems impossible due to special behaviour for processes of pid 1 (they always retain their parent's capability masks where normally they'd be changed - see cap_bprm_set_creds()). The following sequence of events now happens: (a) At the start of do_execve, the current task's cred_exec_mutex is locked to prevent PTRACE_ATTACH from obsoleting the calculation of creds that we make. (a) prepare_exec_creds() is then called to make a copy of the current task's credentials and prepare it. This copy is then assigned to bprm->cred. This renders security_bprm_alloc() and security_bprm_free() unnecessary, and so they've been removed. (b) The determination of unsafe execution is now performed immediately after (a) rather than later on in the code. The result is stored in bprm->unsafe for future reference. (c) prepare_binprm() is called, possibly multiple times. (i) This applies the result of set[ug]id binaries to the new creds attached to bprm->cred. Personality bit clearance is recorded, but now deferred on the basis that the exec procedure may yet fail. (ii) This then calls the new security_bprm_set_creds(). This should calculate the new LSM and capability credentials into *bprm->cred. This folds together security_bprm_set() and parts of security_bprm_apply_creds() (these two have been removed). Anything that might fail must be done at this point. (iii) bprm->cred_prepared is set to 1. bprm->cred_prepared is 0 on the first pass of the security calculations, and 1 on all subsequent passes. This allows SELinux in (ii) to base its calculations only on the initial script and not on the interpreter. (d) flush_old_exec() is called to commit the task to execution. This performs the following steps with regard to credentials: (i) Clear pdeath_signal and set dumpable on certain circumstances that may not be covered by commit_creds(). (ii) Clear any bits in current->personality that were deferred from (c.i). (e) install_exec_creds() [compute_creds() as was] is called to install the new credentials. This performs the following steps with regard to credentials: (i) Calls security_bprm_committing_creds() to apply any security requirements, such as flushing unauthorised files in SELinux, that must be done before the credentials are changed. This is made up of bits of security_bprm_apply_creds() and security_bprm_post_apply_creds(), both of which have been removed. This function is not allowed to fail; anything that might fail must have been done in (c.ii). (ii) Calls commit_creds() to apply the new credentials in a single assignment (more or less). Possibly pdeath_signal and dumpable should be part of struct creds. (iii) Unlocks the task's cred_replace_mutex, thus allowing PTRACE_ATTACH to take place. (iv) Clears The bprm->cred pointer as the credentials it was holding are now immutable. (v) Calls security_bprm_committed_creds() to apply any security alterations that must be done after the creds have been changed. SELinux uses this to flush signals and signal handlers. (f) If an error occurs before (d.i), bprm_free() will call abort_creds() to destroy the proposed new credentials and will then unlock cred_replace_mutex. No changes to the credentials will have been made. (2) LSM interface. A number of functions have been changed, added or removed: (*) security_bprm_alloc(), ->bprm_alloc_security() (*) security_bprm_free(), ->bprm_free_security() Removed in favour of preparing new credentials and modifying those. (*) security_bprm_apply_creds(), ->bprm_apply_creds() (*) security_bprm_post_apply_creds(), ->bprm_post_apply_creds() Removed; split between security_bprm_set_creds(), security_bprm_committing_creds() and security_bprm_committed_creds(). (*) security_bprm_set(), ->bprm_set_security() Removed; folded into security_bprm_set_creds(). (*) security_bprm_set_creds(), ->bprm_set_creds() New. The new credentials in bprm->creds should be checked and set up as appropriate. bprm->cred_prepared is 0 on the first call, 1 on the second and subsequent calls. (*) security_bprm_committing_creds(), ->bprm_committing_creds() (*) security_bprm_committed_creds(), ->bprm_committed_creds() New. Apply the security effects of the new credentials. This includes closing unauthorised files in SELinux. This function may not fail. When the former is called, the creds haven't yet been applied to the process; when the latter is called, they have. The former may access bprm->cred, the latter may not. (3) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) The bprm_security_struct struct has been removed in favour of using the credentials-under-construction approach. (c) flush_unauthorized_files() now takes a cred pointer and passes it on to inode_has_perm(), file_has_perm() and dentry_open(). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-13 23:39:24 +00:00
void security_bprm_committing_creds(struct linux_binprm *bprm)
{
call_void_hook(bprm_committing_creds, bprm);
}
/**
* security_bprm_committed_creds() - Tidy up after cred install during exec()
* @bprm: binary program information
*
* Tidy up after the installation of the new security attributes of a process
* being transformed by an execve operation. The new credentials have, by this
* point, been set to @current->cred. @bprm points to the linux_binprm
* structure. This hook is a good place to perform state changes on the
* process such as clearing out non-inheritable signal state. This is called
* immediately after commit_creds().
*/
CRED: Make execve() take advantage of copy-on-write credentials Make execve() take advantage of copy-on-write credentials, allowing it to set up the credentials in advance, and then commit the whole lot after the point of no return. This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). The credential bits from struct linux_binprm are, for the most part, replaced with a single credentials pointer (bprm->cred). This means that all the creds can be calculated in advance and then applied at the point of no return with no possibility of failure. I would like to replace bprm->cap_effective with: cap_isclear(bprm->cap_effective) but this seems impossible due to special behaviour for processes of pid 1 (they always retain their parent's capability masks where normally they'd be changed - see cap_bprm_set_creds()). The following sequence of events now happens: (a) At the start of do_execve, the current task's cred_exec_mutex is locked to prevent PTRACE_ATTACH from obsoleting the calculation of creds that we make. (a) prepare_exec_creds() is then called to make a copy of the current task's credentials and prepare it. This copy is then assigned to bprm->cred. This renders security_bprm_alloc() and security_bprm_free() unnecessary, and so they've been removed. (b) The determination of unsafe execution is now performed immediately after (a) rather than later on in the code. The result is stored in bprm->unsafe for future reference. (c) prepare_binprm() is called, possibly multiple times. (i) This applies the result of set[ug]id binaries to the new creds attached to bprm->cred. Personality bit clearance is recorded, but now deferred on the basis that the exec procedure may yet fail. (ii) This then calls the new security_bprm_set_creds(). This should calculate the new LSM and capability credentials into *bprm->cred. This folds together security_bprm_set() and parts of security_bprm_apply_creds() (these two have been removed). Anything that might fail must be done at this point. (iii) bprm->cred_prepared is set to 1. bprm->cred_prepared is 0 on the first pass of the security calculations, and 1 on all subsequent passes. This allows SELinux in (ii) to base its calculations only on the initial script and not on the interpreter. (d) flush_old_exec() is called to commit the task to execution. This performs the following steps with regard to credentials: (i) Clear pdeath_signal and set dumpable on certain circumstances that may not be covered by commit_creds(). (ii) Clear any bits in current->personality that were deferred from (c.i). (e) install_exec_creds() [compute_creds() as was] is called to install the new credentials. This performs the following steps with regard to credentials: (i) Calls security_bprm_committing_creds() to apply any security requirements, such as flushing unauthorised files in SELinux, that must be done before the credentials are changed. This is made up of bits of security_bprm_apply_creds() and security_bprm_post_apply_creds(), both of which have been removed. This function is not allowed to fail; anything that might fail must have been done in (c.ii). (ii) Calls commit_creds() to apply the new credentials in a single assignment (more or less). Possibly pdeath_signal and dumpable should be part of struct creds. (iii) Unlocks the task's cred_replace_mutex, thus allowing PTRACE_ATTACH to take place. (iv) Clears The bprm->cred pointer as the credentials it was holding are now immutable. (v) Calls security_bprm_committed_creds() to apply any security alterations that must be done after the creds have been changed. SELinux uses this to flush signals and signal handlers. (f) If an error occurs before (d.i), bprm_free() will call abort_creds() to destroy the proposed new credentials and will then unlock cred_replace_mutex. No changes to the credentials will have been made. (2) LSM interface. A number of functions have been changed, added or removed: (*) security_bprm_alloc(), ->bprm_alloc_security() (*) security_bprm_free(), ->bprm_free_security() Removed in favour of preparing new credentials and modifying those. (*) security_bprm_apply_creds(), ->bprm_apply_creds() (*) security_bprm_post_apply_creds(), ->bprm_post_apply_creds() Removed; split between security_bprm_set_creds(), security_bprm_committing_creds() and security_bprm_committed_creds(). (*) security_bprm_set(), ->bprm_set_security() Removed; folded into security_bprm_set_creds(). (*) security_bprm_set_creds(), ->bprm_set_creds() New. The new credentials in bprm->creds should be checked and set up as appropriate. bprm->cred_prepared is 0 on the first call, 1 on the second and subsequent calls. (*) security_bprm_committing_creds(), ->bprm_committing_creds() (*) security_bprm_committed_creds(), ->bprm_committed_creds() New. Apply the security effects of the new credentials. This includes closing unauthorised files in SELinux. This function may not fail. When the former is called, the creds haven't yet been applied to the process; when the latter is called, they have. The former may access bprm->cred, the latter may not. (3) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) The bprm_security_struct struct has been removed in favour of using the credentials-under-construction approach. (c) flush_unauthorized_files() now takes a cred pointer and passes it on to inode_has_perm(), file_has_perm() and dentry_open(). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-13 23:39:24 +00:00
void security_bprm_committed_creds(struct linux_binprm *bprm)
{
call_void_hook(bprm_committed_creds, bprm);
}
/**
* security_fs_context_dup() - Duplicate a fs_context LSM blob
* @fc: destination filesystem context
* @src_fc: source filesystem context
*
* Allocate and attach a security structure to sc->security. This pointer is
* initialised to NULL by the caller. @fc indicates the new filesystem context.
* @src_fc indicates the original filesystem context.
*
* Return: Returns 0 on success or a negative error code on failure.
*/
int security_fs_context_dup(struct fs_context *fc, struct fs_context *src_fc)
{
return call_int_hook(fs_context_dup, 0, fc, src_fc);
}
/**
* security_fs_context_parse_param() - Configure a filesystem context
* @fc: filesystem context
* @param: filesystem parameter
*
* Userspace provided a parameter to configure a superblock. The LSM can
* consume the parameter or return it to the caller for use elsewhere.
*
* Return: If the parameter is used by the LSM it should return 0, if it is
* returned to the caller -ENOPARAM is returned, otherwise a negative
* error code is returned.
*/
int security_fs_context_parse_param(struct fs_context *fc,
struct fs_parameter *param)
{
struct security_hook_list *hp;
int trc;
int rc = -ENOPARAM;
hlist_for_each_entry(hp, &security_hook_heads.fs_context_parse_param,
list) {
trc = hp->hook.fs_context_parse_param(fc, param);
if (trc == 0)
rc = 0;
else if (trc != -ENOPARAM)
return trc;
}
return rc;
}
/**
* security_sb_alloc() - Allocate a super_block LSM blob
* @sb: filesystem superblock
*
* Allocate and attach a security structure to the sb->s_security field. The
* s_security field is initialized to NULL when the structure is allocated.
* @sb contains the super_block structure to be modified.
*
* Return: Returns 0 if operation was successful.
*/
int security_sb_alloc(struct super_block *sb)
{
int rc = lsm_superblock_alloc(sb);
if (unlikely(rc))
return rc;
rc = call_int_hook(sb_alloc_security, 0, sb);
if (unlikely(rc))
security_sb_free(sb);
return rc;
}
/**
* security_sb_delete() - Release super_block LSM associated objects
* @sb: filesystem superblock
*
* Release objects tied to a superblock (e.g. inodes). @sb contains the
* super_block structure being released.
*/
void security_sb_delete(struct super_block *sb)
{
call_void_hook(sb_delete, sb);
}
/**
* security_sb_free() - Free a super_block LSM blob
* @sb: filesystem superblock
*
* Deallocate and clear the sb->s_security field. @sb contains the super_block
* structure to be modified.
*/
void security_sb_free(struct super_block *sb)
{
call_void_hook(sb_free_security, sb);
kfree(sb->s_security);
sb->s_security = NULL;
}
/**
* security_free_mnt_opts() - Free memory associated with mount options
* @mnt_ops: LSM processed mount options
*
* Free memory associated with @mnt_ops.
*/
void security_free_mnt_opts(void **mnt_opts)
{
if (!*mnt_opts)
return;
call_void_hook(sb_free_mnt_opts, *mnt_opts);
*mnt_opts = NULL;
}
EXPORT_SYMBOL(security_free_mnt_opts);
/**
* security_sb_eat_lsm_opts() - Consume LSM mount options
* @options: mount options
* @mnt_ops: LSM processed mount options
*
* Eat (scan @options) and save them in @mnt_opts.
*
* Return: Returns 0 on success, negative values on failure.
*/
int security_sb_eat_lsm_opts(char *options, void **mnt_opts)
{
return call_int_hook(sb_eat_lsm_opts, 0, options, mnt_opts);
}
EXPORT_SYMBOL(security_sb_eat_lsm_opts);
/**
* security_sb_mnt_opts_compat() - Check if new mount options are allowed
* @sb: filesystem superblock
* @mnt_opts: new mount options
*
* Determine if the new mount options in @mnt_opts are allowed given the
* existing mounted filesystem at @sb. @sb superblock being compared.
*
* Return: Returns 0 if options are compatible.
*/
int security_sb_mnt_opts_compat(struct super_block *sb,
void *mnt_opts)
{
return call_int_hook(sb_mnt_opts_compat, 0, sb, mnt_opts);
}
EXPORT_SYMBOL(security_sb_mnt_opts_compat);
/**
* security_sb_remount() - Verify no incompatible mount changes during remount
* @sb: filesystem superblock
* @mnt_opts: (re)mount options
*
* Extracts security system specific mount options and verifies no changes are
* being made to those options.
*
* Return: Returns 0 if permission is granted.
*/
int security_sb_remount(struct super_block *sb,
void *mnt_opts)
{
return call_int_hook(sb_remount, 0, sb, mnt_opts);
}
EXPORT_SYMBOL(security_sb_remount);
/**
* security_sb_kern_mount() - Check if a kernel mount is allowed
* @sb: filesystem superblock
*
* Mount this @sb if allowed by permissions.
*
* Return: Returns 0 if permission is granted.
*/
int security_sb_kern_mount(struct super_block *sb)
{
return call_int_hook(sb_kern_mount, 0, sb);
}
/**
* security_sb_show_options() - Output the mount options for a superblock
* @m: output file
* @sb: filesystem superblock
*
* Show (print on @m) mount options for this @sb.
*
* Return: Returns 0 on success, negative values on failure.
*/
int security_sb_show_options(struct seq_file *m, struct super_block *sb)
{
return call_int_hook(sb_show_options, 0, m, sb);
}
/**
* security_sb_statfs() - Check if accessing fs stats is allowed
* @dentry: superblock handle
*
* Check permission before obtaining filesystem statistics for the @mnt
* mountpoint. @dentry is a handle on the superblock for the filesystem.
*
* Return: Returns 0 if permission is granted.
*/
int security_sb_statfs(struct dentry *dentry)
{
return call_int_hook(sb_statfs, 0, dentry);
}
/**
* security_sb_mount() - Check permission for mounting a filesystem
* @dev_name: filesystem backing device
* @path: mount point
* @type: filesystem type
* @flags: mount flags
* @data: filesystem specific data
*
* Check permission before an object specified by @dev_name is mounted on the
* mount point named by @nd. For an ordinary mount, @dev_name identifies a
* device if the file system type requires a device. For a remount
* (@flags & MS_REMOUNT), @dev_name is irrelevant. For a loopback/bind mount
* (@flags & MS_BIND), @dev_name identifies the pathname of the object being
* mounted.
*
* Return: Returns 0 if permission is granted.
*/
int security_sb_mount(const char *dev_name, const struct path *path,
const char *type, unsigned long flags, void *data)
{
return call_int_hook(sb_mount, 0, dev_name, path, type, flags, data);
}
/**
* security_sb_umount() - Check permission for unmounting a filesystem
* @mnt: mounted filesystem
* @flags: unmount flags
*
* Check permission before the @mnt file system is unmounted.
*
* Return: Returns 0 if permission is granted.
*/
int security_sb_umount(struct vfsmount *mnt, int flags)
{
return call_int_hook(sb_umount, 0, mnt, flags);
}
/**
* security_sb_pivotroot() - Check permissions for pivoting the rootfs
* @old_path: new location for current rootfs
* @new_path: location of the new rootfs
*
* Check permission before pivoting the root filesystem.
*
* Return: Returns 0 if permission is granted.
*/
int security_sb_pivotroot(const struct path *old_path, const struct path *new_path)
{
return call_int_hook(sb_pivotroot, 0, old_path, new_path);
}
/**
* security_sb_set_mnt_opts() - Set the mount options for a filesystem
* @sb: filesystem superblock
* @mnt_opts: binary mount options
* @kern_flags: kernel flags (in)
* @set_kern_flags: kernel flags (out)
*
* Set the security relevant mount options used for a superblock.
*
* Return: Returns 0 on success, error on failure.
*/
Security: add get, set, and cloning of superblock security information Adds security_get_sb_mnt_opts, security_set_sb_mnt_opts, and security_clont_sb_mnt_opts to the LSM and to SELinux. This will allow filesystems to directly own and control all of their mount options if they so choose. This interface deals only with option identifiers and strings so it should generic enough for any LSM which may come in the future. Filesystems which pass text mount data around in the kernel (almost all of them) need not currently make use of this interface when dealing with SELinux since it will still parse those strings as it always has. I assume future LSM's would do the same. NFS is the primary FS which does not use text mount data and thus must make use of this interface. An LSM would need to implement these functions only if they had mount time options, such as selinux has context= or fscontext=. If the LSM has no mount time options they could simply not implement and let the dummy ops take care of things. An LSM other than SELinux would need to define new option numbers in security.h and any FS which decides to own there own security options would need to be patched to use this new interface for every possible LSM. This is because it was stated to me very clearly that LSM's should not attempt to understand FS mount data and the burdon to understand security should be in the FS which owns the options. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen D. Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
2007-11-30 18:00:35 +00:00
int security_sb_set_mnt_opts(struct super_block *sb,
void *mnt_opts,
unsigned long kern_flags,
unsigned long *set_kern_flags)
Security: add get, set, and cloning of superblock security information Adds security_get_sb_mnt_opts, security_set_sb_mnt_opts, and security_clont_sb_mnt_opts to the LSM and to SELinux. This will allow filesystems to directly own and control all of their mount options if they so choose. This interface deals only with option identifiers and strings so it should generic enough for any LSM which may come in the future. Filesystems which pass text mount data around in the kernel (almost all of them) need not currently make use of this interface when dealing with SELinux since it will still parse those strings as it always has. I assume future LSM's would do the same. NFS is the primary FS which does not use text mount data and thus must make use of this interface. An LSM would need to implement these functions only if they had mount time options, such as selinux has context= or fscontext=. If the LSM has no mount time options they could simply not implement and let the dummy ops take care of things. An LSM other than SELinux would need to define new option numbers in security.h and any FS which decides to own there own security options would need to be patched to use this new interface for every possible LSM. This is because it was stated to me very clearly that LSM's should not attempt to understand FS mount data and the burdon to understand security should be in the FS which owns the options. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen D. Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
2007-11-30 18:00:35 +00:00
{
return call_int_hook(sb_set_mnt_opts,
mnt_opts ? -EOPNOTSUPP : 0, sb,
mnt_opts, kern_flags, set_kern_flags);
Security: add get, set, and cloning of superblock security information Adds security_get_sb_mnt_opts, security_set_sb_mnt_opts, and security_clont_sb_mnt_opts to the LSM and to SELinux. This will allow filesystems to directly own and control all of their mount options if they so choose. This interface deals only with option identifiers and strings so it should generic enough for any LSM which may come in the future. Filesystems which pass text mount data around in the kernel (almost all of them) need not currently make use of this interface when dealing with SELinux since it will still parse those strings as it always has. I assume future LSM's would do the same. NFS is the primary FS which does not use text mount data and thus must make use of this interface. An LSM would need to implement these functions only if they had mount time options, such as selinux has context= or fscontext=. If the LSM has no mount time options they could simply not implement and let the dummy ops take care of things. An LSM other than SELinux would need to define new option numbers in security.h and any FS which decides to own there own security options would need to be patched to use this new interface for every possible LSM. This is because it was stated to me very clearly that LSM's should not attempt to understand FS mount data and the burdon to understand security should be in the FS which owns the options. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen D. Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
2007-11-30 18:00:35 +00:00
}
EXPORT_SYMBOL(security_sb_set_mnt_opts);
Security: add get, set, and cloning of superblock security information Adds security_get_sb_mnt_opts, security_set_sb_mnt_opts, and security_clont_sb_mnt_opts to the LSM and to SELinux. This will allow filesystems to directly own and control all of their mount options if they so choose. This interface deals only with option identifiers and strings so it should generic enough for any LSM which may come in the future. Filesystems which pass text mount data around in the kernel (almost all of them) need not currently make use of this interface when dealing with SELinux since it will still parse those strings as it always has. I assume future LSM's would do the same. NFS is the primary FS which does not use text mount data and thus must make use of this interface. An LSM would need to implement these functions only if they had mount time options, such as selinux has context= or fscontext=. If the LSM has no mount time options they could simply not implement and let the dummy ops take care of things. An LSM other than SELinux would need to define new option numbers in security.h and any FS which decides to own there own security options would need to be patched to use this new interface for every possible LSM. This is because it was stated to me very clearly that LSM's should not attempt to understand FS mount data and the burdon to understand security should be in the FS which owns the options. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen D. Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
2007-11-30 18:00:35 +00:00
/**
* security_sb_clone_mnt_opts() - Duplicate superblock mount options
* @olddb: source superblock
* @newdb: destination superblock
* @kern_flags: kernel flags (in)
* @set_kern_flags: kernel flags (out)
*
* Copy all security options from a given superblock to another.
*
* Return: Returns 0 on success, error on failure.
*/
selinux: make security_sb_clone_mnt_opts return an error on context mismatch I had the following problem reported a while back. If you mount the same filesystem twice using NFSv4 with different contexts, then the second context= option is ignored. For instance: # mount server:/export /mnt/test1 # mount server:/export /mnt/test2 -o context=system_u:object_r:tmp_t:s0 # ls -dZ /mnt/test1 drwxrwxrwt. root root system_u:object_r:nfs_t:s0 /mnt/test1 # ls -dZ /mnt/test2 drwxrwxrwt. root root system_u:object_r:nfs_t:s0 /mnt/test2 When we call into SELinux to set the context of a "cloned" superblock, it will currently just bail out when it notices that we're reusing an existing superblock. Since the existing superblock is already set up and presumably in use, we can't go overwriting its context with the one from the "original" sb. Because of this, the second context= option in this case cannot take effect. This patch fixes this by turning security_sb_clone_mnt_opts into an int return operation. When it finds that the "new" superblock that it has been handed is already set up, it checks to see whether the contexts on the old superblock match it. If it does, then it will just return success, otherwise it'll return -EBUSY and emit a printk to tell the admin why the second mount failed. Note that this patch may cause casualties. The NFSv4 code relies on being able to walk down to an export from the pseudoroot. If you mount filesystems that are nested within one another with different contexts, then this patch will make those mounts fail in new and "exciting" ways. For instance, suppose that /export is a separate filesystem on the server: # mount server:/ /mnt/test1 # mount salusa:/export /mnt/test2 -o context=system_u:object_r:tmp_t:s0 mount.nfs: an incorrect mount option was specified ...with the printk in the ring buffer. Because we *might* eventually walk down to /mnt/test1/export, the mount is denied due to this patch. The second mount needs the pseudoroot superblock, but that's already present with the wrong context. OTOH, if we mount these in the reverse order, then both mounts work, because the pseudoroot superblock created when mounting /export is discarded once that mount is done. If we then however try to walk into that directory, the automount fails for the similar reasons: # cd /mnt/test1/scratch/ -bash: cd: /mnt/test1/scratch: Device or resource busy The story I've gotten from the SELinux folks that I've talked to is that this is desirable behavior. In SELinux-land, mounting the same data under different contexts is wrong -- there can be only one. Cc: Steve Dickson <steved@redhat.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2013-04-01 12:14:24 +00:00
int security_sb_clone_mnt_opts(const struct super_block *oldsb,
security/selinux: allow security_sb_clone_mnt_opts to enable/disable native labeling behavior When an NFSv4 client performs a mount operation, it first mounts the NFSv4 root and then does path walk to the exported path and performs a submount on that, cloning the security mount options from the root's superblock to the submount's superblock in the process. Unless the NFS server has an explicit fsid=0 export with the "security_label" option, the NFSv4 root superblock will not have SBLABEL_MNT set, and neither will the submount superblock after cloning the security mount options. As a result, setxattr's of security labels over NFSv4.2 will fail. In a similar fashion, NFSv4.2 mounts mounted with the context= mount option will not show the correct labels because the nfs_server->caps flags of the cloned superblock will still have NFS_CAP_SECURITY_LABEL set. Allowing the NFSv4 client to enable or disable SECURITY_LSM_NATIVE_LABELS behavior will ensure that the SBLABEL_MNT flag has the correct value when the client traverses from an exported path without the "security_label" option to one with the "security_label" option and vice versa. Similarly, checking to see if SECURITY_LSM_NATIVE_LABELS is set upon return from security_sb_clone_mnt_opts() and clearing NFS_CAP_SECURITY_LABEL if necessary will allow the correct labels to be displayed for NFSv4.2 mounts mounted with the context= mount option. Resolves: https://github.com/SELinuxProject/selinux-kernel/issues/35 Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov> Tested-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-06-05 15:45:04 +00:00
struct super_block *newsb,
unsigned long kern_flags,
unsigned long *set_kern_flags)
Security: add get, set, and cloning of superblock security information Adds security_get_sb_mnt_opts, security_set_sb_mnt_opts, and security_clont_sb_mnt_opts to the LSM and to SELinux. This will allow filesystems to directly own and control all of their mount options if they so choose. This interface deals only with option identifiers and strings so it should generic enough for any LSM which may come in the future. Filesystems which pass text mount data around in the kernel (almost all of them) need not currently make use of this interface when dealing with SELinux since it will still parse those strings as it always has. I assume future LSM's would do the same. NFS is the primary FS which does not use text mount data and thus must make use of this interface. An LSM would need to implement these functions only if they had mount time options, such as selinux has context= or fscontext=. If the LSM has no mount time options they could simply not implement and let the dummy ops take care of things. An LSM other than SELinux would need to define new option numbers in security.h and any FS which decides to own there own security options would need to be patched to use this new interface for every possible LSM. This is because it was stated to me very clearly that LSM's should not attempt to understand FS mount data and the burdon to understand security should be in the FS which owns the options. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen D. Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
2007-11-30 18:00:35 +00:00
{
security/selinux: allow security_sb_clone_mnt_opts to enable/disable native labeling behavior When an NFSv4 client performs a mount operation, it first mounts the NFSv4 root and then does path walk to the exported path and performs a submount on that, cloning the security mount options from the root's superblock to the submount's superblock in the process. Unless the NFS server has an explicit fsid=0 export with the "security_label" option, the NFSv4 root superblock will not have SBLABEL_MNT set, and neither will the submount superblock after cloning the security mount options. As a result, setxattr's of security labels over NFSv4.2 will fail. In a similar fashion, NFSv4.2 mounts mounted with the context= mount option will not show the correct labels because the nfs_server->caps flags of the cloned superblock will still have NFS_CAP_SECURITY_LABEL set. Allowing the NFSv4 client to enable or disable SECURITY_LSM_NATIVE_LABELS behavior will ensure that the SBLABEL_MNT flag has the correct value when the client traverses from an exported path without the "security_label" option to one with the "security_label" option and vice versa. Similarly, checking to see if SECURITY_LSM_NATIVE_LABELS is set upon return from security_sb_clone_mnt_opts() and clearing NFS_CAP_SECURITY_LABEL if necessary will allow the correct labels to be displayed for NFSv4.2 mounts mounted with the context= mount option. Resolves: https://github.com/SELinuxProject/selinux-kernel/issues/35 Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov> Tested-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-06-05 15:45:04 +00:00
return call_int_hook(sb_clone_mnt_opts, 0, oldsb, newsb,
kern_flags, set_kern_flags);
Security: add get, set, and cloning of superblock security information Adds security_get_sb_mnt_opts, security_set_sb_mnt_opts, and security_clont_sb_mnt_opts to the LSM and to SELinux. This will allow filesystems to directly own and control all of their mount options if they so choose. This interface deals only with option identifiers and strings so it should generic enough for any LSM which may come in the future. Filesystems which pass text mount data around in the kernel (almost all of them) need not currently make use of this interface when dealing with SELinux since it will still parse those strings as it always has. I assume future LSM's would do the same. NFS is the primary FS which does not use text mount data and thus must make use of this interface. An LSM would need to implement these functions only if they had mount time options, such as selinux has context= or fscontext=. If the LSM has no mount time options they could simply not implement and let the dummy ops take care of things. An LSM other than SELinux would need to define new option numbers in security.h and any FS which decides to own there own security options would need to be patched to use this new interface for every possible LSM. This is because it was stated to me very clearly that LSM's should not attempt to understand FS mount data and the burdon to understand security should be in the FS which owns the options. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen D. Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
2007-11-30 18:00:35 +00:00
}
EXPORT_SYMBOL(security_sb_clone_mnt_opts);
/**
* security_move_mount() - Check permissions for moving a mount
* @from_path: source mount point
* @to_path: destination mount point
*
* Check permission before a mount is moved.
*
* Return: Returns 0 if permission is granted.
*/
int security_move_mount(const struct path *from_path, const struct path *to_path)
{
return call_int_hook(move_mount, 0, from_path, to_path);
}
/**
* security_path_notify() - Check if setting a watch is allowed
* @path: file path
* @mask: event mask
* @obj_type: file path type
*
* Check permissions before setting a watch on events as defined by @mask, on
* an object at @path, whose type is defined by @obj_type.
*
* Return: Returns 0 if permission is granted.
*/
fanotify, inotify, dnotify, security: add security hook for fs notifications As of now, setting watches on filesystem objects has, at most, applied a check for read access to the inode, and in the case of fanotify, requires CAP_SYS_ADMIN. No specific security hook or permission check has been provided to control the setting of watches. Using any of inotify, dnotify, or fanotify, it is possible to observe, not only write-like operations, but even read access to a file. Modeling the watch as being merely a read from the file is insufficient for the needs of SELinux. This is due to the fact that read access should not necessarily imply access to information about when another process reads from a file. Furthermore, fanotify watches grant more power to an application in the form of permission events. While notification events are solely, unidirectional (i.e. they only pass information to the receiving application), permission events are blocking. Permission events make a request to the receiving application which will then reply with a decision as to whether or not that action may be completed. This causes the issue of the watching application having the ability to exercise control over the triggering process. Without drawing a distinction within the permission check, the ability to read would imply the greater ability to control an application. Additionally, mount and superblock watches apply to all files within the same mount or superblock. Read access to one file should not necessarily imply the ability to watch all files accessed within a given mount or superblock. In order to solve these issues, a new LSM hook is implemented and has been placed within the system calls for marking filesystem objects with inotify, fanotify, and dnotify watches. These calls to the hook are placed at the point at which the target path has been resolved and are provided with the path struct, the mask of requested notification events, and the type of object on which the mark is being set (inode, superblock, or mount). The mask and obj_type have already been translated into common FS_* values shared by the entirety of the fs notification infrastructure. The path struct is passed rather than just the inode so that the mount is available, particularly for mount watches. This also allows for use of the hook by pathname-based security modules. However, since the hook is intended for use even by inode based security modules, it is not placed under the CONFIG_SECURITY_PATH conditional. Otherwise, the inode-based security modules would need to enable all of the path hooks, even though they do not use any of them. This only provides a hook at the point of setting a watch, and presumes that permission to set a particular watch implies the ability to receive all notification about that object which match the mask. This is all that is required for SELinux. If other security modules require additional hooks or infrastructure to control delivery of notification, these can be added by them. It does not make sense for us to propose hooks for which we have no implementation. The understanding that all notifications received by the requesting application are all strictly of a type for which the application has been granted permission shows that this implementation is sufficient in its coverage. Security modules wishing to provide complete control over fanotify must also implement a security_file_open hook that validates that the access requested by the watching application is authorized. Fanotify has the issue that it returns a file descriptor with the file mode specified during fanotify_init() to the watching process on event. This is already covered by the LSM security_file_open hook if the security module implements checking of the requested file mode there. Otherwise, a watching process can obtain escalated access to a file for which it has not been authorized. The selinux_path_notify hook implementation works by adding five new file permissions: watch, watch_mount, watch_sb, watch_reads, and watch_with_perm (descriptions about which will follow), and one new filesystem permission: watch (which is applied to superblock checks). The hook then decides which subset of these permissions must be held by the requesting application based on the contents of the provided mask and the obj_type. The selinux_file_open hook already checks the requested file mode and therefore ensures that a watching process cannot escalate its access through fanotify. The watch, watch_mount, and watch_sb permissions are the baseline permissions for setting a watch on an object and each are a requirement for any watch to be set on a file, mount, or superblock respectively. It should be noted that having either of the other two permissions (watch_reads and watch_with_perm) does not imply the watch, watch_mount, or watch_sb permission. Superblock watches further require the filesystem watch permission to the superblock. As there is no labeled object in view for mounts, there is no specific check for mount watches beyond watch_mount to the inode. Such a check could be added in the future, if a suitable labeled object existed representing the mount. The watch_reads permission is required to receive notifications from read-exclusive events on filesystem objects. These events include accessing a file for the purpose of reading and closing a file which has been opened read-only. This distinction has been drawn in order to provide a direct indication in the policy for this otherwise not obvious capability. Read access to a file should not necessarily imply the ability to observe read events on a file. Finally, watch_with_perm only applies to fanotify masks since it is the only way to set a mask which allows for the blocking, permission event. This permission is needed for any watch which is of this type. Though fanotify requires CAP_SYS_ADMIN, this is insufficient as it gives implicit trust to root, which we do not do, and does not support least privilege. Signed-off-by: Aaron Goidel <acgoide@tycho.nsa.gov> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-08-12 15:20:00 +00:00
int security_path_notify(const struct path *path, u64 mask,
unsigned int obj_type)
{
return call_int_hook(path_notify, 0, path, mask, obj_type);
}
/**
* security_inode_alloc() - Allocate an inode LSM blob
* @inode: the inode
*
* Allocate and attach a security structure to @inode->i_security. The
* i_security field is initialized to NULL when the inode structure is
* allocated.
*
* Return: Return 0 if operation was successful.
*/
int security_inode_alloc(struct inode *inode)
{
int rc = lsm_inode_alloc(inode);
if (unlikely(rc))
return rc;
rc = call_int_hook(inode_alloc_security, 0, inode);
if (unlikely(rc))
security_inode_free(inode);
return rc;
}
static void inode_free_by_rcu(struct rcu_head *head)
{
/*
* The rcu head is at the start of the inode blob
*/
kmem_cache_free(lsm_inode_cache, head);
}
/**
* security_inode_free() - Free an inode's LSM blob
* @inode: the inode
*
* Deallocate the inode security structure and set @inode->i_security to NULL.
*/
void security_inode_free(struct inode *inode)
{
integrity_inode_free(inode);
call_void_hook(inode_free_security, inode);
/*
* The inode may still be referenced in a path walk and
* a call to security_inode_permission() can be made
* after inode_free_security() is called. Ideally, the VFS
* wouldn't do this, but fixing that is a much harder
* job. For now, simply free the i_security via RCU, and
* leave the current inode->i_security pointer intact.
* The inode will be freed after the RCU grace period too.
*/
if (inode->i_security)
call_rcu((struct rcu_head *)inode->i_security,
inode_free_by_rcu);
}
/**
* security_dentry_init_security() - Perform dentry initialization
* @dentry: the dentry to initialize
* @mode: mode used to determine resource type
* @name: name of the last path component
* @xattr_name: name of the security/LSM xattr
* @ctx: pointer to the resulting LSM context
* @ctxlen: length of @ctx
*
* Compute a context for a dentry as the inode is not yet available since NFSv4
* has no label backed by an EA anyway. It is important to note that
* @xattr_name does not need to be free'd by the caller, it is a static string.
*
* Return: Returns 0 on success, negative values on failure.
*/
int security_dentry_init_security(struct dentry *dentry, int mode,
const struct qstr *name,
const char **xattr_name, void **ctx,
u32 *ctxlen)
{
security, lsm: dentry_init_security() Handle multi LSM registration A ceph user has reported that ceph is crashing with kernel NULL pointer dereference. Following is the backtrace. /proc/version: Linux version 5.16.2-arch1-1 (linux@archlinux) (gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #1 SMP PREEMPT Thu, 20 Jan 2022 16:18:29 +0000 distro / arch: Arch Linux / x86_64 SELinux is not enabled ceph cluster version: 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) relevant dmesg output: [ 30.947129] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 30.947206] #PF: supervisor read access in kernel mode [ 30.947258] #PF: error_code(0x0000) - not-present page [ 30.947310] PGD 0 P4D 0 [ 30.947342] Oops: 0000 [#1] PREEMPT SMP PTI [ 30.947388] CPU: 5 PID: 778 Comm: touch Not tainted 5.16.2-arch1-1 #1 86fbf2c313cc37a553d65deb81d98e9dcc2a3659 [ 30.947486] Hardware name: Gigabyte Technology Co., Ltd. B365M DS3H/B365M DS3H, BIOS F5 08/13/2019 [ 30.947569] RIP: 0010:strlen+0x0/0x20 [ 30.947616] Code: b6 07 38 d0 74 16 48 83 c7 01 84 c0 74 05 48 39 f7 75 ec 31 c0 31 d2 89 d6 89 d7 c3 48 89 f8 31 d2 89 d6 89 d7 c3 0 f 1f 40 00 <80> 3f 00 74 12 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 31 ff [ 30.947782] RSP: 0018:ffffa4ed80ffbbb8 EFLAGS: 00010246 [ 30.947836] RAX: 0000000000000000 RBX: ffffa4ed80ffbc60 RCX: 0000000000000000 [ 30.947904] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 30.947971] RBP: ffff94b0d15c0ae0 R08: 0000000000000000 R09: 0000000000000000 [ 30.948040] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 30.948106] R13: 0000000000000001 R14: ffffa4ed80ffbc60 R15: 0000000000000000 [ 30.948174] FS: 00007fc7520f0740(0000) GS:ffff94b7ced40000(0000) knlGS:0000000000000000 [ 30.948252] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 30.948308] CR2: 0000000000000000 CR3: 0000000104a40001 CR4: 00000000003706e0 [ 30.948376] Call Trace: [ 30.948404] <TASK> [ 30.948431] ceph_security_init_secctx+0x7b/0x240 [ceph 49f9c4b9bf5be8760f19f1747e26da33920bce4b] [ 30.948582] ceph_atomic_open+0x51e/0x8a0 [ceph 49f9c4b9bf5be8760f19f1747e26da33920bce4b] [ 30.948708] ? get_cached_acl+0x4d/0xa0 [ 30.948759] path_openat+0x60d/0x1030 [ 30.948809] do_filp_open+0xa5/0x150 [ 30.948859] do_sys_openat2+0xc4/0x190 [ 30.948904] __x64_sys_openat+0x53/0xa0 [ 30.948948] do_syscall_64+0x5c/0x90 [ 30.948989] ? exc_page_fault+0x72/0x180 [ 30.949034] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 30.949091] RIP: 0033:0x7fc7521e25bb [ 30.950849] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 0 0 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 54 24 28 64 48 2b 14 25 Core of the problem is that ceph checks for return code from security_dentry_init_security() and if return code is 0, it assumes everything is fine and continues to call strlen(name), which crashes. Typically SELinux LSM returns 0 and sets name to "security.selinux" and it is not a problem. Or if selinux is not compiled in or disabled, it returns -EOPNOTSUP and ceph deals with it. But somehow in this configuration, 0 is being returned and "name" is not being initialized and that's creating the problem. Our suspicion is that BPF LSM is registering a hook for dentry_init_security() and returns hook default of 0. LSM_HOOK(int, 0, dentry_init_security, struct dentry *dentry,...) I have not been able to reproduce it just by doing CONFIG_BPF_LSM=y. Stephen has tested the patch though and confirms it solves the problem for him. dentry_init_security() is written in such a way that it expects only one LSM to register the hook. Atleast that's the expectation with current code. If another LSM returns a hook and returns default, it will simply return 0 as of now and that will break ceph. Hence, suggestion is that change semantics of this hook a bit. If there are no LSMs or no LSM is taking ownership and initializing security context, then return -EOPNOTSUP. Also allow at max one LSM to initialize security context. This hook can't deal with multiple LSMs trying to init security context. This patch implements this new behavior. Reported-by: Stephen Muth <smuth4@gmail.com> Tested-by: Stephen Muth <smuth4@gmail.com> Suggested-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Serge Hallyn <serge@hallyn.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: <stable@vger.kernel.org> # 5.16.0 Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: James Morris <jmorris@namei.org>
2022-01-26 20:35:14 +00:00
struct security_hook_list *hp;
int rc;
/*
* Only one module will provide a security context.
*/
hlist_for_each_entry(hp, &security_hook_heads.dentry_init_security, list) {
rc = hp->hook.dentry_init_security(dentry, mode, name,
xattr_name, ctx, ctxlen);
if (rc != LSM_RET_DEFAULT(dentry_init_security))
return rc;
}
return LSM_RET_DEFAULT(dentry_init_security);
}
EXPORT_SYMBOL(security_dentry_init_security);
/**
* security_dentry_create_files_as() - Perform dentry initialization
* @dentry: the dentry to initialize
* @mode: mode used to determine resource type
* @name: name of the last path component
* @old: creds to use for LSM context calculations
* @new: creds to modify
*
* Compute a context for a dentry as the inode is not yet available and set
* that context in passed in creds so that new files are created using that
* context. Context is calculated using the passed in creds and not the creds
* of the caller.
*
* Return: Returns 0 on success, error on failure.
*/
int security_dentry_create_files_as(struct dentry *dentry, int mode,
struct qstr *name,
const struct cred *old, struct cred *new)
{
return call_int_hook(dentry_create_files_as, 0, dentry, mode,
name, old, new);
}
EXPORT_SYMBOL(security_dentry_create_files_as);
/**
* security_inode_init_security() - Initialize an inode's LSM context
* @inode: the inode
* @dir: parent directory
* @qstr: last component of the pathname
* @initxattrs: callback function to write xattrs
* @fs_data: filesystem specific data
*
* Obtain the security attribute name suffix and value to set on a newly
* created inode and set up the incore security field for the new inode. This
* hook is called by the fs code as part of the inode creation transaction and
* provides for atomic labeling of the inode, unlike the post_create/mkdir/...
* hooks called by the VFS. The hook function is expected to allocate the name
* and value via kmalloc, with the caller being responsible for calling kfree
* after using them. If the security module does not use security attributes
* or does not wish to put a security attribute on this particular inode, then
* it should return -EOPNOTSUPP to skip this processing.
*
* Return: Returns 0 on success, -EOPNOTSUPP if no security attribute is
* needed, or -ENOMEM on memory allocation failure.
*/
int security_inode_init_security(struct inode *inode, struct inode *dir,
const struct qstr *qstr,
const initxattrs initxattrs, void *fs_data)
{
struct xattr new_xattrs[MAX_LSM_EVM_XATTR + 1];
struct xattr *lsm_xattr, *evm_xattr, *xattr;
int ret;
if (unlikely(IS_PRIVATE(inode)))
return 0;
if (!initxattrs)
return call_int_hook(inode_init_security, -EOPNOTSUPP, inode,
dir, qstr, NULL, NULL, NULL);
memset(new_xattrs, 0, sizeof(new_xattrs));
lsm_xattr = new_xattrs;
ret = call_int_hook(inode_init_security, -EOPNOTSUPP, inode, dir, qstr,
&lsm_xattr->name,
&lsm_xattr->value,
&lsm_xattr->value_len);
if (ret)
goto out;
evm_xattr = lsm_xattr + 1;
ret = evm_inode_init_security(inode, lsm_xattr, evm_xattr);
if (ret)
goto out;
ret = initxattrs(inode, new_xattrs, fs_data);
out:
for (xattr = new_xattrs; xattr->value != NULL; xattr++)
kfree(xattr->value);
return (ret == -EOPNOTSUPP) ? 0 : ret;
}
EXPORT_SYMBOL(security_inode_init_security);
/**
* security_inode_init_security_anon() - Initialize an anonymous inode
* @inode: the inode
* @name: the anonymous inode class
* @context_inode: an optional related inode
*
* Set up the incore security field for the new anonymous inode and return
* whether the inode creation is permitted by the security module or not.
*
* Return: Returns 0 on success, -EACCES if the security module denies the
* creation of this inode, or another -errno upon other errors.
*/
int security_inode_init_security_anon(struct inode *inode,
const struct qstr *name,
const struct inode *context_inode)
{
return call_int_hook(inode_init_security_anon, 0, inode, name,
context_inode);
}
int security_old_inode_init_security(struct inode *inode, struct inode *dir,
const struct qstr *qstr, const char **name,
void **value, size_t *len)
{
if (unlikely(IS_PRIVATE(inode)))
return -EOPNOTSUPP;
return call_int_hook(inode_init_security, -EOPNOTSUPP, inode, dir,
qstr, name, value, len);
}
EXPORT_SYMBOL(security_old_inode_init_security);
#ifdef CONFIG_SECURITY_PATH
/**
* security_path_mknod() - Check if creating a special file is allowed
* @dir: parent directory
* @dentry: new file
* @mode: new file mode
* @dev: device number
*
* Check permissions when creating a file. Note that this hook is called even
* if mknod operation is being done for a regular file.
*
* Return: Returns 0 if permission is granted.
*/
int security_path_mknod(const struct path *dir, struct dentry *dentry, umode_t mode,
unsigned int dev)
{
if (unlikely(IS_PRIVATE(d_backing_inode(dir->dentry))))
return 0;
return call_int_hook(path_mknod, 0, dir, dentry, mode, dev);
}
EXPORT_SYMBOL(security_path_mknod);
/**
* security_path_mkdir() - Check if creating a new directory is allowed
* @dir: parent directory
* @dentry: new directory
* @mode: new directory mode
*
* Check permissions to create a new directory in the existing directory.
*
* Return: Returns 0 if permission is granted.
*/
int security_path_mkdir(const struct path *dir, struct dentry *dentry, umode_t mode)
{
if (unlikely(IS_PRIVATE(d_backing_inode(dir->dentry))))
return 0;
return call_int_hook(path_mkdir, 0, dir, dentry, mode);
}
EXPORT_SYMBOL(security_path_mkdir);
/**
* security_path_rmdir() - Check if removing a directory is allowed
* @dir: parent directory
* @dentry: directory to remove
*
* Check the permission to remove a directory.
*
* Return: Returns 0 if permission is granted.
*/
int security_path_rmdir(const struct path *dir, struct dentry *dentry)
{
if (unlikely(IS_PRIVATE(d_backing_inode(dir->dentry))))
return 0;
return call_int_hook(path_rmdir, 0, dir, dentry);
}
/**
* security_path_unlink() - Check if removing a hard link is allowed
* @dir: parent directory
* @dentry: file
*
* Check the permission to remove a hard link to a file.
*
* Return: Returns 0 if permission is granted.
*/
int security_path_unlink(const struct path *dir, struct dentry *dentry)
{
if (unlikely(IS_PRIVATE(d_backing_inode(dir->dentry))))
return 0;
return call_int_hook(path_unlink, 0, dir, dentry);
}
EXPORT_SYMBOL(security_path_unlink);
/**
* security_path_symlink() - Check if creating a symbolic link is allowed
* @dir: parent directory
* @dentry: symbolic link
* @old_name: file pathname
*
* Check the permission to create a symbolic link to a file.
*
* Return: Returns 0 if permission is granted.
*/
int security_path_symlink(const struct path *dir, struct dentry *dentry,
const char *old_name)
{
if (unlikely(IS_PRIVATE(d_backing_inode(dir->dentry))))
return 0;
return call_int_hook(path_symlink, 0, dir, dentry, old_name);
}
/**
* security_path_link - Check if creating a hard link is allowed
* @old_dentry: existing file
* @new_dir: new parent directory
* @new_dentry: new link
*
* Check permission before creating a new hard link to a file.
*
* Return: Returns 0 if permission is granted.
*/
int security_path_link(struct dentry *old_dentry, const struct path *new_dir,
struct dentry *new_dentry)
{
if (unlikely(IS_PRIVATE(d_backing_inode(old_dentry))))
return 0;
return call_int_hook(path_link, 0, old_dentry, new_dir, new_dentry);
}
/**
* security_path_rename() - Check if renaming a file is allowed
* @old_dir: parent directory of the old file
* @old_dentry: the old file
* @new_dir: parent directory of the new file
* @new_dentry: the new file
* @flags: flags
*
* Check for permission to rename a file or directory.
*
* Return: Returns 0 if permission is granted.
*/
int security_path_rename(const struct path *old_dir, struct dentry *old_dentry,
const struct path *new_dir, struct dentry *new_dentry,
unsigned int flags)
{
if (unlikely(IS_PRIVATE(d_backing_inode(old_dentry)) ||
(d_is_positive(new_dentry) && IS_PRIVATE(d_backing_inode(new_dentry)))))
return 0;
return call_int_hook(path_rename, 0, old_dir, old_dentry, new_dir,
new_dentry, flags);
}
EXPORT_SYMBOL(security_path_rename);
/**
* security_path_truncate() - Check if truncating a file is allowed
* @path: file
*
* Check permission before truncating the file indicated by path. Note that
* truncation permissions may also be checked based on already opened files,
* using the security_file_truncate() hook.
*
* Return: Returns 0 if permission is granted.
*/
int security_path_truncate(const struct path *path)
{
if (unlikely(IS_PRIVATE(d_backing_inode(path->dentry))))
return 0;
return call_int_hook(path_truncate, 0, path);
}
/**
* security_path_chmod() - Check if changing the file's mode is allowed
* @path: file
* @mode: new mode
*
* Check for permission to change a mode of the file @path. The new mode is
* specified in @mode which is a bitmask of constants from
* <include/uapi/linux/stat.h>.
*
* Return: Returns 0 if permission is granted.
*/
int security_path_chmod(const struct path *path, umode_t mode)
{
if (unlikely(IS_PRIVATE(d_backing_inode(path->dentry))))
return 0;
return call_int_hook(path_chmod, 0, path, mode);
}
/**
* security_path_chown() - Check if changing the file's owner/group is allowed
* @path: file
* @uid: file owner
* @gid: file group
*
* Check for permission to change owner/group of a file or directory.
*
* Return: Returns 0 if permission is granted.
*/
int security_path_chown(const struct path *path, kuid_t uid, kgid_t gid)
{
if (unlikely(IS_PRIVATE(d_backing_inode(path->dentry))))
return 0;
return call_int_hook(path_chown, 0, path, uid, gid);
}
/**
* security_path_chroot() - Check if changing the root directory is allowed
* @path: directory
*
* Check for permission to change root directory.
*
* Return: Returns 0 if permission is granted.
*/
int security_path_chroot(const struct path *path)
{
return call_int_hook(path_chroot, 0, path);
}
#endif
/**
* security_inode_create() - Check if creating a file is allowed
* @dir: the parent directory
* @dentry: the file being created
* @mode: requested file mode
*
* Check permission to create a regular file.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_create(struct inode *dir, struct dentry *dentry, umode_t mode)
{
if (unlikely(IS_PRIVATE(dir)))
return 0;
return call_int_hook(inode_create, 0, dir, dentry, mode);
}
EXPORT_SYMBOL_GPL(security_inode_create);
/**
* security_inode_link() - Check if creating a hard link is allowed
* @old_dentry: existing file
* @dir: new parent directory
* @new_dentry: new link
*
* Check permission before creating a new hard link to a file.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_link(struct dentry *old_dentry, struct inode *dir,
struct dentry *new_dentry)
{
if (unlikely(IS_PRIVATE(d_backing_inode(old_dentry))))
return 0;
return call_int_hook(inode_link, 0, old_dentry, dir, new_dentry);
}
/**
* security_inode_unlink() - Check if removing a hard link is allowed
* @dir: parent directory
* @dentry: file
*
* Check the permission to remove a hard link to a file.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_unlink(struct inode *dir, struct dentry *dentry)
{
if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
return 0;
return call_int_hook(inode_unlink, 0, dir, dentry);
}
/**
* security_inode_symlink() Check if creating a symbolic link is allowed
* @dir: parent directory
* @dentry: symbolic link
* @old_name: existing filename
*
* Check the permission to create a symbolic link to a file.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_symlink(struct inode *dir, struct dentry *dentry,
const char *old_name)
{
if (unlikely(IS_PRIVATE(dir)))
return 0;
return call_int_hook(inode_symlink, 0, dir, dentry, old_name);
}
/**
* security_inode_mkdir() - Check if creation a new director is allowed
* @dir: parent directory
* @dentry: new directory
* @mode: new directory mode
*
* Check permissions to create a new directory in the existing directory
* associated with inode structure @dir.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
{
if (unlikely(IS_PRIVATE(dir)))
return 0;
return call_int_hook(inode_mkdir, 0, dir, dentry, mode);
}
EXPORT_SYMBOL_GPL(security_inode_mkdir);
/**
* security_inode_rmdir() - Check if removing a directory is allowed
* @dir: parent directory
* @dentry: directory to be removed
*
* Check the permission to remove a directory.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_rmdir(struct inode *dir, struct dentry *dentry)
{
if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
return 0;
return call_int_hook(inode_rmdir, 0, dir, dentry);
}
/**
* security_inode_mknod() - Check if creating a special file is allowed
* @dir: parent directory
* @dentry: new file
* @mode: new file mode
* @dev: device number
*
* Check permissions when creating a special file (or a socket or a fifo file
* created via the mknod system call). Note that if mknod operation is being
* done for a regular file, then the create hook will be called and not this
* hook.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
{
if (unlikely(IS_PRIVATE(dir)))
return 0;
return call_int_hook(inode_mknod, 0, dir, dentry, mode, dev);
}
/**
* security_inode_rename() - Check if renaming a file is allowed
* @old_dir: parent directory of the old file
* @old_dentry: the old file
* @new_dir: parent directory of the new file
* @new_dentry: the new file
* @flags: flags
*
* Check for permission to rename a file or directory.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry,
unsigned int flags)
{
if (unlikely(IS_PRIVATE(d_backing_inode(old_dentry)) ||
(d_is_positive(new_dentry) && IS_PRIVATE(d_backing_inode(new_dentry)))))
return 0;
if (flags & RENAME_EXCHANGE) {
int err = call_int_hook(inode_rename, 0, new_dir, new_dentry,
old_dir, old_dentry);
if (err)
return err;
}
return call_int_hook(inode_rename, 0, old_dir, old_dentry,
new_dir, new_dentry);
}
/**
* security_inode_readlink() - Check if reading a symbolic link is allowed
* @dentry: link
*
* Check the permission to read the symbolic link.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_readlink(struct dentry *dentry)
{
if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
return 0;
return call_int_hook(inode_readlink, 0, dentry);
}
/**
* security_inode_follow_link() - Check if following a symbolic link is allowed
* @dentry: link dentry
* @inode: link inode
* @rcu: true if in RCU-walk mode
*
* Check permission to follow a symbolic link when looking up a pathname. If
* @rcu is true, @inode is not stable.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_follow_link(struct dentry *dentry, struct inode *inode,
bool rcu)
{
if (unlikely(IS_PRIVATE(inode)))
return 0;
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "The main change in this kernel is Casey's generalized LSM stacking work, which removes the hard-coding of Capabilities and Yama stacking, allowing multiple arbitrary "small" LSMs to be stacked with a default monolithic module (e.g. SELinux, Smack, AppArmor). See https://lwn.net/Articles/636056/ This will allow smaller, simpler LSMs to be incorporated into the mainline kernel and arbitrarily stacked by users. Also, this is a useful cleanup of the LSM code in its own right" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits) tpm, tpm_crb: fix le64_to_cpu conversions in crb_acpi_add() vTPM: set virtual device before passing to ibmvtpm_reset_crq tpm_ibmvtpm: remove unneccessary message level. ima: update builtin policies ima: extend "mask" policy matching support ima: add support for new "euid" policy condition ima: fix ima_show_template_data_ascii() Smack: freeing an error pointer in smk_write_revoke_subj() selinux: fix setting of security labels on NFS selinux: Remove unused permission definitions selinux: enable genfscon labeling for sysfs and pstore files selinux: enable per-file labeling for debugfs files. selinux: update netlink socket classes signals: don't abuse __flush_signals() in selinux_bprm_committed_creds() selinux: Print 'sclass' as string when unrecognized netlink message occurs Smack: allow multiple labels in onlycap Smack: fix seq operations in smackfs ima: pass iint to ima_add_violation() ima: wrap event related data to the new ima_event_data structure integrity: add validity checks for 'path' parameter ...
2015-06-27 20:26:03 +00:00
return call_int_hook(inode_follow_link, 0, dentry, inode, rcu);
}
/**
* security_inode_permission() - Check if accessing an inode is allowed
* @inode: inode
* @mask: access mask
*
* Check permission before accessing an inode. This hook is called by the
* existing Linux permission function, so a security module can use it to
* provide additional checking for existing Linux permission checks. Notice
* that this hook is called when a file is opened (as well as many other
* operations), whereas the file_security_ops permission hook is called when
* the actual read/write operations are performed.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_permission(struct inode *inode, int mask)
{
if (unlikely(IS_PRIVATE(inode)))
return 0;
return call_int_hook(inode_permission, 0, inode, mask);
}
/**
* security_inode_setattr() - Check if setting file attributes is allowed
* @idmap: idmap of the mount
* @dentry: file
* @attr: new attributes
*
* Check permission before setting file attributes. Note that the kernel call
* to notify_change is performed from several locations, whenever file
* attributes change (such as when a file is truncated, chown/chmod operations,
* transferring disk quotas, etc).
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_setattr(struct mnt_idmap *idmap,
struct dentry *dentry, struct iattr *attr)
{
int ret;
if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
return 0;
ret = call_int_hook(inode_setattr, 0, dentry, attr);
if (ret)
return ret;
return evm_inode_setattr(idmap, dentry, attr);
}
EXPORT_SYMBOL_GPL(security_inode_setattr);
/**
* security_inode_getattr() - Check if getting file attributes is allowed
* @path: file
*
* Check permission before obtaining file attributes.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_getattr(const struct path *path)
{
if (unlikely(IS_PRIVATE(d_backing_inode(path->dentry))))
return 0;
return call_int_hook(inode_getattr, 0, path);
}
/**
* security_inode_setxattr() - Check if setting file xattrs is allowed
* @idmap: idmap of the mount
* @dentry: file
* @name: xattr name
* @value: xattr value
* @flags: flags
*
* Check permission before setting the extended attributes.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_setxattr(struct mnt_idmap *idmap,
commoncap: handle idmapped mounts When interacting with user namespace and non-user namespace aware filesystem capabilities the vfs will perform various security checks to determine whether or not the filesystem capabilities can be used by the caller, whether they need to be removed and so on. The main infrastructure for this resides in the capability codepaths but they are called through the LSM security infrastructure even though they are not technically an LSM or optional. This extends the existing security hooks security_inode_removexattr(), security_inode_killpriv(), security_inode_getsecurity() to pass down the mount's user namespace and makes them aware of idmapped mounts. In order to actually get filesystem capabilities from disk the capability infrastructure exposes the get_vfs_caps_from_disk() helper. For user namespace aware filesystem capabilities a root uid is stored alongside the capabilities. In order to determine whether the caller can make use of the filesystem capability or whether it needs to be ignored it is translated according to the superblock's user namespace. If it can be translated to uid 0 according to that id mapping the caller can use the filesystem capabilities stored on disk. If we are accessing the inode that holds the filesystem capabilities through an idmapped mount we map the root uid according to the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts: reading filesystem caps from disk enforces that the root uid associated with the filesystem capability must have a mapping in the superblock's user namespace and that the caller is either in the same user namespace or is a descendant of the superblock's user namespace. For filesystems that are mountable inside user namespace the caller can just mount the filesystem and won't usually need to idmap it. If they do want to idmap it they can create an idmapped mount and mark it with a user namespace they created and which is thus a descendant of s_user_ns. For filesystems that are not mountable inside user namespaces the descendant rule is trivially true because the s_user_ns will be the initial user namespace. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-11-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-21 13:19:29 +00:00
struct dentry *dentry, const char *name,
const void *value, size_t size, int flags)
{
int ret;
if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
return 0;
/*
* SELinux and Smack integrate the cap call,
* so assume that all LSMs supplying this call do so.
*/
ret = call_int_hook(inode_setxattr, 1, idmap, dentry, name, value,
commoncap: handle idmapped mounts When interacting with user namespace and non-user namespace aware filesystem capabilities the vfs will perform various security checks to determine whether or not the filesystem capabilities can be used by the caller, whether they need to be removed and so on. The main infrastructure for this resides in the capability codepaths but they are called through the LSM security infrastructure even though they are not technically an LSM or optional. This extends the existing security hooks security_inode_removexattr(), security_inode_killpriv(), security_inode_getsecurity() to pass down the mount's user namespace and makes them aware of idmapped mounts. In order to actually get filesystem capabilities from disk the capability infrastructure exposes the get_vfs_caps_from_disk() helper. For user namespace aware filesystem capabilities a root uid is stored alongside the capabilities. In order to determine whether the caller can make use of the filesystem capability or whether it needs to be ignored it is translated according to the superblock's user namespace. If it can be translated to uid 0 according to that id mapping the caller can use the filesystem capabilities stored on disk. If we are accessing the inode that holds the filesystem capabilities through an idmapped mount we map the root uid according to the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts: reading filesystem caps from disk enforces that the root uid associated with the filesystem capability must have a mapping in the superblock's user namespace and that the caller is either in the same user namespace or is a descendant of the superblock's user namespace. For filesystems that are mountable inside user namespace the caller can just mount the filesystem and won't usually need to idmap it. If they do want to idmap it they can create an idmapped mount and mark it with a user namespace they created and which is thus a descendant of s_user_ns. For filesystems that are not mountable inside user namespaces the descendant rule is trivially true because the s_user_ns will be the initial user namespace. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-11-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-21 13:19:29 +00:00
size, flags);
if (ret == 1)
ret = cap_inode_setxattr(dentry, name, value, size, flags);
if (ret)
return ret;
ret = ima_inode_setxattr(dentry, name, value, size);
if (ret)
return ret;
return evm_inode_setxattr(idmap, dentry, name, value, size);
}
/**
* security_inode_set_acl() - Check if setting posix acls is allowed
* @idmap: idmap of the mount
* @dentry: file
* @acl_name: acl name
* @kacl: acl struct
*
* Check permission before setting posix acls, the posix acls in @kacl are
* identified by @acl_name.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_set_acl(struct mnt_idmap *idmap,
security: add get, remove and set acl hook The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. So far posix acls were passed as a void blob to the security and integrity modules. Some of them like evm then proceed to interpret the void pointer and convert it into the kernel internal struct posix acl representation to perform their integrity checking magic. This is obviously pretty problematic as that requires knowledge that only the vfs is guaranteed to have and has lead to various bugs. Add a proper security hook for setting posix acls and pass down the posix acls in their appropriate vfs format instead of hacking it through a void pointer stored in the uapi format. In the next patches we implement the hooks for the few security modules that do actually have restrictions on posix acls. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-09-22 15:17:07 +00:00
struct dentry *dentry, const char *acl_name,
struct posix_acl *kacl)
{
integrity: implement get and set acl hook The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. So far posix acls were passed as a void blob to the security and integrity modules. Some of them like evm then proceed to interpret the void pointer and convert it into the kernel internal struct posix acl representation to perform their integrity checking magic. This is obviously pretty problematic as that requires knowledge that only the vfs is guaranteed to have and has lead to various bugs. Add a proper security hook for setting posix acls and pass down the posix acls in their appropriate vfs format instead of hacking it through a void pointer stored in the uapi format. I spent considerate time in the security module and integrity infrastructure and audited all codepaths. EVM is the only part that really has restrictions based on the actual posix acl values passed through it (e.g., i_mode). Before this dedicated hook EVM used to translate from the uapi posix acl format sent to it in the form of a void pointer into the vfs format. This is not a good thing. Instead of hacking around in the uapi struct give EVM the posix acls in the appropriate vfs format and perform sane permissions checks that mirror what it used to to in the generic xattr hook. IMA doesn't have any restrictions on posix acls. When posix acls are changed it just wants to update its appraisal status to trigger an EVM revalidation. The removal of posix acls is equivalent to passing NULL to the posix set acl hooks. This is the same as before through the generic xattr api. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Acked-by: Paul Moore <paul@paul-moore.com> (LSM) Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-09-22 15:17:10 +00:00
int ret;
security: add get, remove and set acl hook The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. So far posix acls were passed as a void blob to the security and integrity modules. Some of them like evm then proceed to interpret the void pointer and convert it into the kernel internal struct posix acl representation to perform their integrity checking magic. This is obviously pretty problematic as that requires knowledge that only the vfs is guaranteed to have and has lead to various bugs. Add a proper security hook for setting posix acls and pass down the posix acls in their appropriate vfs format instead of hacking it through a void pointer stored in the uapi format. In the next patches we implement the hooks for the few security modules that do actually have restrictions on posix acls. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-09-22 15:17:07 +00:00
if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
return 0;
ret = call_int_hook(inode_set_acl, 0, idmap, dentry, acl_name,
integrity: implement get and set acl hook The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. So far posix acls were passed as a void blob to the security and integrity modules. Some of them like evm then proceed to interpret the void pointer and convert it into the kernel internal struct posix acl representation to perform their integrity checking magic. This is obviously pretty problematic as that requires knowledge that only the vfs is guaranteed to have and has lead to various bugs. Add a proper security hook for setting posix acls and pass down the posix acls in their appropriate vfs format instead of hacking it through a void pointer stored in the uapi format. I spent considerate time in the security module and integrity infrastructure and audited all codepaths. EVM is the only part that really has restrictions based on the actual posix acl values passed through it (e.g., i_mode). Before this dedicated hook EVM used to translate from the uapi posix acl format sent to it in the form of a void pointer into the vfs format. This is not a good thing. Instead of hacking around in the uapi struct give EVM the posix acls in the appropriate vfs format and perform sane permissions checks that mirror what it used to to in the generic xattr hook. IMA doesn't have any restrictions on posix acls. When posix acls are changed it just wants to update its appraisal status to trigger an EVM revalidation. The removal of posix acls is equivalent to passing NULL to the posix set acl hooks. This is the same as before through the generic xattr api. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Acked-by: Paul Moore <paul@paul-moore.com> (LSM) Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-09-22 15:17:10 +00:00
kacl);
if (ret)
return ret;
ret = ima_inode_set_acl(idmap, dentry, acl_name, kacl);
integrity: implement get and set acl hook The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. So far posix acls were passed as a void blob to the security and integrity modules. Some of them like evm then proceed to interpret the void pointer and convert it into the kernel internal struct posix acl representation to perform their integrity checking magic. This is obviously pretty problematic as that requires knowledge that only the vfs is guaranteed to have and has lead to various bugs. Add a proper security hook for setting posix acls and pass down the posix acls in their appropriate vfs format instead of hacking it through a void pointer stored in the uapi format. I spent considerate time in the security module and integrity infrastructure and audited all codepaths. EVM is the only part that really has restrictions based on the actual posix acl values passed through it (e.g., i_mode). Before this dedicated hook EVM used to translate from the uapi posix acl format sent to it in the form of a void pointer into the vfs format. This is not a good thing. Instead of hacking around in the uapi struct give EVM the posix acls in the appropriate vfs format and perform sane permissions checks that mirror what it used to to in the generic xattr hook. IMA doesn't have any restrictions on posix acls. When posix acls are changed it just wants to update its appraisal status to trigger an EVM revalidation. The removal of posix acls is equivalent to passing NULL to the posix set acl hooks. This is the same as before through the generic xattr api. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Acked-by: Paul Moore <paul@paul-moore.com> (LSM) Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-09-22 15:17:10 +00:00
if (ret)
return ret;
return evm_inode_set_acl(idmap, dentry, acl_name, kacl);
security: add get, remove and set acl hook The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. So far posix acls were passed as a void blob to the security and integrity modules. Some of them like evm then proceed to interpret the void pointer and convert it into the kernel internal struct posix acl representation to perform their integrity checking magic. This is obviously pretty problematic as that requires knowledge that only the vfs is guaranteed to have and has lead to various bugs. Add a proper security hook for setting posix acls and pass down the posix acls in their appropriate vfs format instead of hacking it through a void pointer stored in the uapi format. In the next patches we implement the hooks for the few security modules that do actually have restrictions on posix acls. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-09-22 15:17:07 +00:00
}
/**
* security_inode_get_acl() - Check if reading posix acls is allowed
* @idmap: idmap of the mount
* @dentry: file
* @acl_name: acl name
*
* Check permission before getting osix acls, the posix acls are identified by
* @acl_name.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_get_acl(struct mnt_idmap *idmap,
security: add get, remove and set acl hook The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. So far posix acls were passed as a void blob to the security and integrity modules. Some of them like evm then proceed to interpret the void pointer and convert it into the kernel internal struct posix acl representation to perform their integrity checking magic. This is obviously pretty problematic as that requires knowledge that only the vfs is guaranteed to have and has lead to various bugs. Add a proper security hook for setting posix acls and pass down the posix acls in their appropriate vfs format instead of hacking it through a void pointer stored in the uapi format. In the next patches we implement the hooks for the few security modules that do actually have restrictions on posix acls. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-09-22 15:17:07 +00:00
struct dentry *dentry, const char *acl_name)
{
if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
return 0;
return call_int_hook(inode_get_acl, 0, idmap, dentry, acl_name);
security: add get, remove and set acl hook The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. So far posix acls were passed as a void blob to the security and integrity modules. Some of them like evm then proceed to interpret the void pointer and convert it into the kernel internal struct posix acl representation to perform their integrity checking magic. This is obviously pretty problematic as that requires knowledge that only the vfs is guaranteed to have and has lead to various bugs. Add a proper security hook for setting posix acls and pass down the posix acls in their appropriate vfs format instead of hacking it through a void pointer stored in the uapi format. In the next patches we implement the hooks for the few security modules that do actually have restrictions on posix acls. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-09-22 15:17:07 +00:00
}
/**
* security_inode_remove_acl() - Check if removing a posix acl is allowed
* @idmap: idmap of the mount
* @dentry: file
* @acl_name: acl name
*
* Check permission before removing posix acls, the posix acls are identified
* by @acl_name.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_remove_acl(struct mnt_idmap *idmap,
security: add get, remove and set acl hook The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. So far posix acls were passed as a void blob to the security and integrity modules. Some of them like evm then proceed to interpret the void pointer and convert it into the kernel internal struct posix acl representation to perform their integrity checking magic. This is obviously pretty problematic as that requires knowledge that only the vfs is guaranteed to have and has lead to various bugs. Add a proper security hook for setting posix acls and pass down the posix acls in their appropriate vfs format instead of hacking it through a void pointer stored in the uapi format. In the next patches we implement the hooks for the few security modules that do actually have restrictions on posix acls. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-09-22 15:17:07 +00:00
struct dentry *dentry, const char *acl_name)
{
integrity: implement get and set acl hook The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. So far posix acls were passed as a void blob to the security and integrity modules. Some of them like evm then proceed to interpret the void pointer and convert it into the kernel internal struct posix acl representation to perform their integrity checking magic. This is obviously pretty problematic as that requires knowledge that only the vfs is guaranteed to have and has lead to various bugs. Add a proper security hook for setting posix acls and pass down the posix acls in their appropriate vfs format instead of hacking it through a void pointer stored in the uapi format. I spent considerate time in the security module and integrity infrastructure and audited all codepaths. EVM is the only part that really has restrictions based on the actual posix acl values passed through it (e.g., i_mode). Before this dedicated hook EVM used to translate from the uapi posix acl format sent to it in the form of a void pointer into the vfs format. This is not a good thing. Instead of hacking around in the uapi struct give EVM the posix acls in the appropriate vfs format and perform sane permissions checks that mirror what it used to to in the generic xattr hook. IMA doesn't have any restrictions on posix acls. When posix acls are changed it just wants to update its appraisal status to trigger an EVM revalidation. The removal of posix acls is equivalent to passing NULL to the posix set acl hooks. This is the same as before through the generic xattr api. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Acked-by: Paul Moore <paul@paul-moore.com> (LSM) Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-09-22 15:17:10 +00:00
int ret;
security: add get, remove and set acl hook The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. So far posix acls were passed as a void blob to the security and integrity modules. Some of them like evm then proceed to interpret the void pointer and convert it into the kernel internal struct posix acl representation to perform their integrity checking magic. This is obviously pretty problematic as that requires knowledge that only the vfs is guaranteed to have and has lead to various bugs. Add a proper security hook for setting posix acls and pass down the posix acls in their appropriate vfs format instead of hacking it through a void pointer stored in the uapi format. In the next patches we implement the hooks for the few security modules that do actually have restrictions on posix acls. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-09-22 15:17:07 +00:00
if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
return 0;
ret = call_int_hook(inode_remove_acl, 0, idmap, dentry, acl_name);
integrity: implement get and set acl hook The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. So far posix acls were passed as a void blob to the security and integrity modules. Some of them like evm then proceed to interpret the void pointer and convert it into the kernel internal struct posix acl representation to perform their integrity checking magic. This is obviously pretty problematic as that requires knowledge that only the vfs is guaranteed to have and has lead to various bugs. Add a proper security hook for setting posix acls and pass down the posix acls in their appropriate vfs format instead of hacking it through a void pointer stored in the uapi format. I spent considerate time in the security module and integrity infrastructure and audited all codepaths. EVM is the only part that really has restrictions based on the actual posix acl values passed through it (e.g., i_mode). Before this dedicated hook EVM used to translate from the uapi posix acl format sent to it in the form of a void pointer into the vfs format. This is not a good thing. Instead of hacking around in the uapi struct give EVM the posix acls in the appropriate vfs format and perform sane permissions checks that mirror what it used to to in the generic xattr hook. IMA doesn't have any restrictions on posix acls. When posix acls are changed it just wants to update its appraisal status to trigger an EVM revalidation. The removal of posix acls is equivalent to passing NULL to the posix set acl hooks. This is the same as before through the generic xattr api. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Acked-by: Paul Moore <paul@paul-moore.com> (LSM) Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-09-22 15:17:10 +00:00
if (ret)
return ret;
ret = ima_inode_remove_acl(idmap, dentry, acl_name);
integrity: implement get and set acl hook The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. So far posix acls were passed as a void blob to the security and integrity modules. Some of them like evm then proceed to interpret the void pointer and convert it into the kernel internal struct posix acl representation to perform their integrity checking magic. This is obviously pretty problematic as that requires knowledge that only the vfs is guaranteed to have and has lead to various bugs. Add a proper security hook for setting posix acls and pass down the posix acls in their appropriate vfs format instead of hacking it through a void pointer stored in the uapi format. I spent considerate time in the security module and integrity infrastructure and audited all codepaths. EVM is the only part that really has restrictions based on the actual posix acl values passed through it (e.g., i_mode). Before this dedicated hook EVM used to translate from the uapi posix acl format sent to it in the form of a void pointer into the vfs format. This is not a good thing. Instead of hacking around in the uapi struct give EVM the posix acls in the appropriate vfs format and perform sane permissions checks that mirror what it used to to in the generic xattr hook. IMA doesn't have any restrictions on posix acls. When posix acls are changed it just wants to update its appraisal status to trigger an EVM revalidation. The removal of posix acls is equivalent to passing NULL to the posix set acl hooks. This is the same as before through the generic xattr api. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Acked-by: Paul Moore <paul@paul-moore.com> (LSM) Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-09-22 15:17:10 +00:00
if (ret)
return ret;
return evm_inode_remove_acl(idmap, dentry, acl_name);
security: add get, remove and set acl hook The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. So far posix acls were passed as a void blob to the security and integrity modules. Some of them like evm then proceed to interpret the void pointer and convert it into the kernel internal struct posix acl representation to perform their integrity checking magic. This is obviously pretty problematic as that requires knowledge that only the vfs is guaranteed to have and has lead to various bugs. Add a proper security hook for setting posix acls and pass down the posix acls in their appropriate vfs format instead of hacking it through a void pointer stored in the uapi format. In the next patches we implement the hooks for the few security modules that do actually have restrictions on posix acls. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-09-22 15:17:07 +00:00
}
/**
* security_inode_post_setxattr() - Update the inode after a setxattr operation
* @dentry: file
* @name: xattr name
* @value: xattr value
* @size: xattr value size
* @flags: flags
*
* Update inode security field after successful setxattr operation.
*/
void security_inode_post_setxattr(struct dentry *dentry, const char *name,
const void *value, size_t size, int flags)
{
if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
return;
call_void_hook(inode_post_setxattr, dentry, name, value, size, flags);
evm_inode_post_setxattr(dentry, name, value, size);
}
/**
* security_inode_getxattr() - Check if xattr access is allowed
* @dentry: file
* @name: xattr name
*
* Check permission before obtaining the extended attributes identified by
* @name for @dentry.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_getxattr(struct dentry *dentry, const char *name)
{
if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
return 0;
return call_int_hook(inode_getxattr, 0, dentry, name);
}
/**
* security_inode_listxattr() - Check if listing xattrs is allowed
* @dentry: file
*
* Check permission before obtaining the list of extended attribute names for
* @dentry.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_listxattr(struct dentry *dentry)
{
if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
return 0;
return call_int_hook(inode_listxattr, 0, dentry);
}
/**
* security_inode_removexattr() - Check if removing an xattr is allowed
* @idmap: idmap of the mount
* @dentry: file
* @name: xattr name
*
* Check permission before removing the extended attribute identified by @name
* for @dentry.
*
* Return: Returns 0 if permission is granted.
*/
int security_inode_removexattr(struct mnt_idmap *idmap,
commoncap: handle idmapped mounts When interacting with user namespace and non-user namespace aware filesystem capabilities the vfs will perform various security checks to determine whether or not the filesystem capabilities can be used by the caller, whether they need to be removed and so on. The main infrastructure for this resides in the capability codepaths but they are called through the LSM security infrastructure even though they are not technically an LSM or optional. This extends the existing security hooks security_inode_removexattr(), security_inode_killpriv(), security_inode_getsecurity() to pass down the mount's user namespace and makes them aware of idmapped mounts. In order to actually get filesystem capabilities from disk the capability infrastructure exposes the get_vfs_caps_from_disk() helper. For user namespace aware filesystem capabilities a root uid is stored alongside the capabilities. In order to determine whether the caller can make use of the filesystem capability or whether it needs to be ignored it is translated according to the superblock's user namespace. If it can be translated to uid 0 according to that id mapping the caller can use the filesystem capabilities stored on disk. If we are accessing the inode that holds the filesystem capabilities through an idmapped mount we map the root uid according to the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts: reading filesystem caps from disk enforces that the root uid associated with the filesystem capability must have a mapping in the superblock's user namespace and that the caller is either in the same user namespace or is a descendant of the superblock's user namespace. For filesystems that are mountable inside user namespace the caller can just mount the filesystem and won't usually need to idmap it. If they do want to idmap it they can create an idmapped mount and mark it with a user namespace they created and which is thus a descendant of s_user_ns. For filesystems that are not mountable inside user namespaces the descendant rule is trivially true because the s_user_ns will be the initial user namespace. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-11-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-21 13:19:29 +00:00
struct dentry *dentry, const char *name)
{
int ret;
if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
return 0;
/*
* SELinux and Smack integrate the cap call,
* so assume that all LSMs supplying this call do so.
*/
ret = call_int_hook(inode_removexattr, 1, idmap, dentry, name);
if (ret == 1)
ret = cap_inode_removexattr(idmap, dentry, name);
if (ret)
return ret;
ret = ima_inode_removexattr(dentry, name);
if (ret)
return ret;
return evm_inode_removexattr(idmap, dentry, name);
}
/**
* security_inode_need_killpriv() - Check if security_inode_killpriv() required
* @dentry: associated dentry
*
* Called when an inode has been changed to determine if
* security_inode_killpriv() should be called.
*
* Return: Return <0 on error to abort the inode change operation, return 0 if
* security_inode_killpriv() does not need to be called, return >0 if
* security_inode_killpriv() does need to be called.
*/
Implement file posix capabilities Implement file posix capabilities. This allows programs to be given a subset of root's powers regardless of who runs them, without having to use setuid and giving the binary all of root's powers. This version works with Kaigai Kohei's userspace tools, found at http://www.kaigai.gr.jp/index.php. For more information on how to use this patch, Chris Friedhoff has posted a nice page at http://www.friedhoff.org/fscaps.html. Changelog: Nov 27: Incorporate fixes from Andrew Morton (security-introduce-file-caps-tweaks and security-introduce-file-caps-warning-fix) Fix Kconfig dependency. Fix change signaling behavior when file caps are not compiled in. Nov 13: Integrate comments from Alexey: Remove CONFIG_ ifdef from capability.h, and use %zd for printing a size_t. Nov 13: Fix endianness warnings by sparse as suggested by Alexey Dobriyan. Nov 09: Address warnings of unused variables at cap_bprm_set_security when file capabilities are disabled, and simultaneously clean up the code a little, by pulling the new code into a helper function. Nov 08: For pointers to required userspace tools and how to use them, see http://www.friedhoff.org/fscaps.html. Nov 07: Fix the calculation of the highest bit checked in check_cap_sanity(). Nov 07: Allow file caps to be enabled without CONFIG_SECURITY, since capabilities are the default. Hook cap_task_setscheduler when !CONFIG_SECURITY. Move capable(TASK_KILL) to end of cap_task_kill to reduce audit messages. Nov 05: Add secondary calls in selinux/hooks.c to task_setioprio and task_setscheduler so that selinux and capabilities with file cap support can be stacked. Sep 05: As Seth Arnold points out, uid checks are out of place for capability code. Sep 01: Define task_setscheduler, task_setioprio, cap_task_kill, and task_setnice to make sure a user cannot affect a process in which they called a program with some fscaps. One remaining question is the note under task_setscheduler: are we ok with CAP_SYS_NICE being sufficient to confine a process to a cpuset? It is a semantic change, as without fsccaps, attach_task doesn't allow CAP_SYS_NICE to override the uid equivalence check. But since it uses security_task_setscheduler, which elsewhere is used where CAP_SYS_NICE can be used to override the uid equivalence check, fixing it might be tough. task_setscheduler note: this also controls cpuset:attach_task. Are we ok with CAP_SYS_NICE being used to confine to a cpuset? task_setioprio task_setnice sys_setpriority uses this (through set_one_prio) for another process. Need same checks as setrlimit Aug 21: Updated secureexec implementation to reflect the fact that euid and uid might be the same and nonzero, but the process might still have elevated caps. Aug 15: Handle endianness of xattrs. Enforce capability version match between kernel and disk. Enforce that no bits beyond the known max capability are set, else return -EPERM. With this extra processing, it may be worth reconsidering doing all the work at bprm_set_security rather than d_instantiate. Aug 10: Always call getxattr at bprm_set_security, rather than caching it at d_instantiate. [morgan@kernel.org: file-caps clean up for linux/capability.h] [bunk@kernel.org: unexport cap_inode_killpriv] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Andrew Morgan <morgan@kernel.org> Signed-off-by: Andrew Morgan <morgan@kernel.org> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 06:31:36 +00:00
int security_inode_need_killpriv(struct dentry *dentry)
{
return call_int_hook(inode_need_killpriv, 0, dentry);
Implement file posix capabilities Implement file posix capabilities. This allows programs to be given a subset of root's powers regardless of who runs them, without having to use setuid and giving the binary all of root's powers. This version works with Kaigai Kohei's userspace tools, found at http://www.kaigai.gr.jp/index.php. For more information on how to use this patch, Chris Friedhoff has posted a nice page at http://www.friedhoff.org/fscaps.html. Changelog: Nov 27: Incorporate fixes from Andrew Morton (security-introduce-file-caps-tweaks and security-introduce-file-caps-warning-fix) Fix Kconfig dependency. Fix change signaling behavior when file caps are not compiled in. Nov 13: Integrate comments from Alexey: Remove CONFIG_ ifdef from capability.h, and use %zd for printing a size_t. Nov 13: Fix endianness warnings by sparse as suggested by Alexey Dobriyan. Nov 09: Address warnings of unused variables at cap_bprm_set_security when file capabilities are disabled, and simultaneously clean up the code a little, by pulling the new code into a helper function. Nov 08: For pointers to required userspace tools and how to use them, see http://www.friedhoff.org/fscaps.html. Nov 07: Fix the calculation of the highest bit checked in check_cap_sanity(). Nov 07: Allow file caps to be enabled without CONFIG_SECURITY, since capabilities are the default. Hook cap_task_setscheduler when !CONFIG_SECURITY. Move capable(TASK_KILL) to end of cap_task_kill to reduce audit messages. Nov 05: Add secondary calls in selinux/hooks.c to task_setioprio and task_setscheduler so that selinux and capabilities with file cap support can be stacked. Sep 05: As Seth Arnold points out, uid checks are out of place for capability code. Sep 01: Define task_setscheduler, task_setioprio, cap_task_kill, and task_setnice to make sure a user cannot affect a process in which they called a program with some fscaps. One remaining question is the note under task_setscheduler: are we ok with CAP_SYS_NICE being sufficient to confine a process to a cpuset? It is a semantic change, as without fsccaps, attach_task doesn't allow CAP_SYS_NICE to override the uid equivalence check. But since it uses security_task_setscheduler, which elsewhere is used where CAP_SYS_NICE can be used to override the uid equivalence check, fixing it might be tough. task_setscheduler note: this also controls cpuset:attach_task. Are we ok with CAP_SYS_NICE being used to confine to a cpuset? task_setioprio task_setnice sys_setpriority uses this (through set_one_prio) for another process. Need same checks as setrlimit Aug 21: Updated secureexec implementation to reflect the fact that euid and uid might be the same and nonzero, but the process might still have elevated caps. Aug 15: Handle endianness of xattrs. Enforce capability version match between kernel and disk. Enforce that no bits beyond the known max capability are set, else return -EPERM. With this extra processing, it may be worth reconsidering doing all the work at bprm_set_security rather than d_instantiate. Aug 10: Always call getxattr at bprm_set_security, rather than caching it at d_instantiate. [morgan@kernel.org: file-caps clean up for linux/capability.h] [bunk@kernel.org: unexport cap_inode_killpriv] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Andrew Morgan <morgan@kernel.org> Signed-off-by: Andrew Morgan <morgan@kernel.org> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 06:31:36 +00:00
}
/**
* security_inode_killpriv() - The setuid bit is removed, update LSM state
* @idmap: idmap of the mount
* @dentry: associated dentry
*
* The @dentry's setuid bit is being removed. Remove similar security labels.
* Called with the dentry->d_inode->i_mutex held.
*
* Return: Return 0 on success. If error is returned, then the operation
* causing setuid bit removal is failed.
*/
int security_inode_killpriv(struct mnt_idmap *idmap,
commoncap: handle idmapped mounts When interacting with user namespace and non-user namespace aware filesystem capabilities the vfs will perform various security checks to determine whether or not the filesystem capabilities can be used by the caller, whether they need to be removed and so on. The main infrastructure for this resides in the capability codepaths but they are called through the LSM security infrastructure even though they are not technically an LSM or optional. This extends the existing security hooks security_inode_removexattr(), security_inode_killpriv(), security_inode_getsecurity() to pass down the mount's user namespace and makes them aware of idmapped mounts. In order to actually get filesystem capabilities from disk the capability infrastructure exposes the get_vfs_caps_from_disk() helper. For user namespace aware filesystem capabilities a root uid is stored alongside the capabilities. In order to determine whether the caller can make use of the filesystem capability or whether it needs to be ignored it is translated according to the superblock's user namespace. If it can be translated to uid 0 according to that id mapping the caller can use the filesystem capabilities stored on disk. If we are accessing the inode that holds the filesystem capabilities through an idmapped mount we map the root uid according to the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts: reading filesystem caps from disk enforces that the root uid associated with the filesystem capability must have a mapping in the superblock's user namespace and that the caller is either in the same user namespace or is a descendant of the superblock's user namespace. For filesystems that are mountable inside user namespace the caller can just mount the filesystem and won't usually need to idmap it. If they do want to idmap it they can create an idmapped mount and mark it with a user namespace they created and which is thus a descendant of s_user_ns. For filesystems that are not mountable inside user namespaces the descendant rule is trivially true because the s_user_ns will be the initial user namespace. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-11-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-21 13:19:29 +00:00
struct dentry *dentry)
Implement file posix capabilities Implement file posix capabilities. This allows programs to be given a subset of root's powers regardless of who runs them, without having to use setuid and giving the binary all of root's powers. This version works with Kaigai Kohei's userspace tools, found at http://www.kaigai.gr.jp/index.php. For more information on how to use this patch, Chris Friedhoff has posted a nice page at http://www.friedhoff.org/fscaps.html. Changelog: Nov 27: Incorporate fixes from Andrew Morton (security-introduce-file-caps-tweaks and security-introduce-file-caps-warning-fix) Fix Kconfig dependency. Fix change signaling behavior when file caps are not compiled in. Nov 13: Integrate comments from Alexey: Remove CONFIG_ ifdef from capability.h, and use %zd for printing a size_t. Nov 13: Fix endianness warnings by sparse as suggested by Alexey Dobriyan. Nov 09: Address warnings of unused variables at cap_bprm_set_security when file capabilities are disabled, and simultaneously clean up the code a little, by pulling the new code into a helper function. Nov 08: For pointers to required userspace tools and how to use them, see http://www.friedhoff.org/fscaps.html. Nov 07: Fix the calculation of the highest bit checked in check_cap_sanity(). Nov 07: Allow file caps to be enabled without CONFIG_SECURITY, since capabilities are the default. Hook cap_task_setscheduler when !CONFIG_SECURITY. Move capable(TASK_KILL) to end of cap_task_kill to reduce audit messages. Nov 05: Add secondary calls in selinux/hooks.c to task_setioprio and task_setscheduler so that selinux and capabilities with file cap support can be stacked. Sep 05: As Seth Arnold points out, uid checks are out of place for capability code. Sep 01: Define task_setscheduler, task_setioprio, cap_task_kill, and task_setnice to make sure a user cannot affect a process in which they called a program with some fscaps. One remaining question is the note under task_setscheduler: are we ok with CAP_SYS_NICE being sufficient to confine a process to a cpuset? It is a semantic change, as without fsccaps, attach_task doesn't allow CAP_SYS_NICE to override the uid equivalence check. But since it uses security_task_setscheduler, which elsewhere is used where CAP_SYS_NICE can be used to override the uid equivalence check, fixing it might be tough. task_setscheduler note: this also controls cpuset:attach_task. Are we ok with CAP_SYS_NICE being used to confine to a cpuset? task_setioprio task_setnice sys_setpriority uses this (through set_one_prio) for another process. Need same checks as setrlimit Aug 21: Updated secureexec implementation to reflect the fact that euid and uid might be the same and nonzero, but the process might still have elevated caps. Aug 15: Handle endianness of xattrs. Enforce capability version match between kernel and disk. Enforce that no bits beyond the known max capability are set, else return -EPERM. With this extra processing, it may be worth reconsidering doing all the work at bprm_set_security rather than d_instantiate. Aug 10: Always call getxattr at bprm_set_security, rather than caching it at d_instantiate. [morgan@kernel.org: file-caps clean up for linux/capability.h] [bunk@kernel.org: unexport cap_inode_killpriv] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Andrew Morgan <morgan@kernel.org> Signed-off-by: Andrew Morgan <morgan@kernel.org> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 06:31:36 +00:00
{
return call_int_hook(inode_killpriv, 0, idmap, dentry);
Implement file posix capabilities Implement file posix capabilities. This allows programs to be given a subset of root's powers regardless of who runs them, without having to use setuid and giving the binary all of root's powers. This version works with Kaigai Kohei's userspace tools, found at http://www.kaigai.gr.jp/index.php. For more information on how to use this patch, Chris Friedhoff has posted a nice page at http://www.friedhoff.org/fscaps.html. Changelog: Nov 27: Incorporate fixes from Andrew Morton (security-introduce-file-caps-tweaks and security-introduce-file-caps-warning-fix) Fix Kconfig dependency. Fix change signaling behavior when file caps are not compiled in. Nov 13: Integrate comments from Alexey: Remove CONFIG_ ifdef from capability.h, and use %zd for printing a size_t. Nov 13: Fix endianness warnings by sparse as suggested by Alexey Dobriyan. Nov 09: Address warnings of unused variables at cap_bprm_set_security when file capabilities are disabled, and simultaneously clean up the code a little, by pulling the new code into a helper function. Nov 08: For pointers to required userspace tools and how to use them, see http://www.friedhoff.org/fscaps.html. Nov 07: Fix the calculation of the highest bit checked in check_cap_sanity(). Nov 07: Allow file caps to be enabled without CONFIG_SECURITY, since capabilities are the default. Hook cap_task_setscheduler when !CONFIG_SECURITY. Move capable(TASK_KILL) to end of cap_task_kill to reduce audit messages. Nov 05: Add secondary calls in selinux/hooks.c to task_setioprio and task_setscheduler so that selinux and capabilities with file cap support can be stacked. Sep 05: As Seth Arnold points out, uid checks are out of place for capability code. Sep 01: Define task_setscheduler, task_setioprio, cap_task_kill, and task_setnice to make sure a user cannot affect a process in which they called a program with some fscaps. One remaining question is the note under task_setscheduler: are we ok with CAP_SYS_NICE being sufficient to confine a process to a cpuset? It is a semantic change, as without fsccaps, attach_task doesn't allow CAP_SYS_NICE to override the uid equivalence check. But since it uses security_task_setscheduler, which elsewhere is used where CAP_SYS_NICE can be used to override the uid equivalence check, fixing it might be tough. task_setscheduler note: this also controls cpuset:attach_task. Are we ok with CAP_SYS_NICE being used to confine to a cpuset? task_setioprio task_setnice sys_setpriority uses this (through set_one_prio) for another process. Need same checks as setrlimit Aug 21: Updated secureexec implementation to reflect the fact that euid and uid might be the same and nonzero, but the process might still have elevated caps. Aug 15: Handle endianness of xattrs. Enforce capability version match between kernel and disk. Enforce that no bits beyond the known max capability are set, else return -EPERM. With this extra processing, it may be worth reconsidering doing all the work at bprm_set_security rather than d_instantiate. Aug 10: Always call getxattr at bprm_set_security, rather than caching it at d_instantiate. [morgan@kernel.org: file-caps clean up for linux/capability.h] [bunk@kernel.org: unexport cap_inode_killpriv] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Andrew Morgan <morgan@kernel.org> Signed-off-by: Andrew Morgan <morgan@kernel.org> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 06:31:36 +00:00
}
/**
* security_inode_getsecurity() - Get the xattr security label of an inode
* @idmap: idmap of the mount
* @inode: inode
* @name: xattr name
* @buffer: security label buffer
* @alloc: allocation flag
*
* Retrieve a copy of the extended attribute representation of the security
* label associated with @name for @inode via @buffer. Note that @name is the
* remainder of the attribute name after the security prefix has been removed.
* @alloc is used to specify if the call should return a value via the buffer
* or just the value length.
*
* Return: Returns size of buffer on success.
*/
int security_inode_getsecurity(struct mnt_idmap *idmap,
commoncap: handle idmapped mounts When interacting with user namespace and non-user namespace aware filesystem capabilities the vfs will perform various security checks to determine whether or not the filesystem capabilities can be used by the caller, whether they need to be removed and so on. The main infrastructure for this resides in the capability codepaths but they are called through the LSM security infrastructure even though they are not technically an LSM or optional. This extends the existing security hooks security_inode_removexattr(), security_inode_killpriv(), security_inode_getsecurity() to pass down the mount's user namespace and makes them aware of idmapped mounts. In order to actually get filesystem capabilities from disk the capability infrastructure exposes the get_vfs_caps_from_disk() helper. For user namespace aware filesystem capabilities a root uid is stored alongside the capabilities. In order to determine whether the caller can make use of the filesystem capability or whether it needs to be ignored it is translated according to the superblock's user namespace. If it can be translated to uid 0 according to that id mapping the caller can use the filesystem capabilities stored on disk. If we are accessing the inode that holds the filesystem capabilities through an idmapped mount we map the root uid according to the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts: reading filesystem caps from disk enforces that the root uid associated with the filesystem capability must have a mapping in the superblock's user namespace and that the caller is either in the same user namespace or is a descendant of the superblock's user namespace. For filesystems that are mountable inside user namespace the caller can just mount the filesystem and won't usually need to idmap it. If they do want to idmap it they can create an idmapped mount and mark it with a user namespace they created and which is thus a descendant of s_user_ns. For filesystems that are not mountable inside user namespaces the descendant rule is trivially true because the s_user_ns will be the initial user namespace. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-11-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-21 13:19:29 +00:00
struct inode *inode, const char *name,
void **buffer, bool alloc)
{
struct security_hook_list *hp;
int rc;
if (unlikely(IS_PRIVATE(inode)))
return LSM_RET_DEFAULT(inode_getsecurity);
/*
* Only one module will provide an attribute with a given name.
*/
hlist_for_each_entry(hp, &security_hook_heads.inode_getsecurity, list) {
rc = hp->hook.inode_getsecurity(idmap, inode, name, buffer, alloc);
if (rc != LSM_RET_DEFAULT(inode_getsecurity))
return rc;
}
return LSM_RET_DEFAULT(inode_getsecurity);
}
/**
* security_inode_setsecurity() - Set the xattr security label of an inode
* @inode: inode
* @name: xattr name
* @value: security label
* @size: length of security label
* @flags: flags
*
* Set the security label associated with @name for @inode from the extended
* attribute value @value. @size indicates the size of the @value in bytes.
* @flags may be XATTR_CREATE, XATTR_REPLACE, or 0. Note that @name is the
* remainder of the attribute name after the security. prefix has been removed.
*
* Return: Returns 0 on success.
*/
int security_inode_setsecurity(struct inode *inode, const char *name, const void *value, size_t size, int flags)
{
struct security_hook_list *hp;
int rc;
if (unlikely(IS_PRIVATE(inode)))
return LSM_RET_DEFAULT(inode_setsecurity);
/*
* Only one module will provide an attribute with a given name.
*/
hlist_for_each_entry(hp, &security_hook_heads.inode_setsecurity, list) {
rc = hp->hook.inode_setsecurity(inode, name, value, size,
flags);
if (rc != LSM_RET_DEFAULT(inode_setsecurity))
return rc;
}
return LSM_RET_DEFAULT(inode_setsecurity);
}
/**
* security_inode_listsecurity() - List the xattr security label names
* @inode: inode
* @buffer: buffer
* @buffer_size: size of buffer
*
* Copy the extended attribute names for the security labels associated with
* @inode into @buffer. The maximum size of @buffer is specified by
* @buffer_size. @buffer may be NULL to request the size of the buffer
* required.
*
* Return: Returns number of bytes used/required on success.
*/
int security_inode_listsecurity(struct inode *inode, char *buffer, size_t buffer_size)
{
if (unlikely(IS_PRIVATE(inode)))
return 0;
return call_int_hook(inode_listsecurity, 0, inode, buffer, buffer_size);
}
EXPORT_SYMBOL(security_inode_listsecurity);
/**
* security_inode_getsecid() - Get an inode's secid
* @inode: inode
* @secid: secid to return
*
* Get the secid associated with the node. In case of failure, @secid will be
* set to zero.
*/
void security_inode_getsecid(struct inode *inode, u32 *secid)
{
call_void_hook(inode_getsecid, inode, secid);
}
/**
* security_inode_copy_up() - Create new creds for an overlayfs copy-up op
* @src: union dentry of copy-up file
* @new: newly created creds
*
* A file is about to be copied up from lower layer to upper layer of overlay
* filesystem. Security module can prepare a set of new creds and modify as
* need be and return new creds. Caller will switch to new creds temporarily to
* create new file and release newly allocated creds.
*
* Return: Returns 0 on success or a negative error code on error.
*/
int security_inode_copy_up(struct dentry *src, struct cred **new)
{
return call_int_hook(inode_copy_up, 0, src, new);
}
EXPORT_SYMBOL(security_inode_copy_up);
/**
* security_inode_copy_up_xattr() - Filter xattrs in an overlayfs copy-up op
* @name: xattr name
*
* Filter the xattrs being copied up when a unioned file is copied up from a
* lower layer to the union/overlay layer. The caller is responsible for
* reading and writing the xattrs, this hook is merely a filter.
*
* Return: Returns 0 to accept the xattr, 1 to discard the xattr, -EOPNOTSUPP
* if the security module does not know about attribute, or a negative
* error code to abort the copy up.
*/
int security_inode_copy_up_xattr(const char *name)
{
struct security_hook_list *hp;
int rc;
/*
* The implementation can return 0 (accept the xattr), 1 (discard the
* xattr), -EOPNOTSUPP if it does not know anything about the xattr or
* any other error code incase of an error.
*/
hlist_for_each_entry(hp,
&security_hook_heads.inode_copy_up_xattr, list) {
rc = hp->hook.inode_copy_up_xattr(name);
if (rc != LSM_RET_DEFAULT(inode_copy_up_xattr))
return rc;
}
return LSM_RET_DEFAULT(inode_copy_up_xattr);
}
EXPORT_SYMBOL(security_inode_copy_up_xattr);
/**
* security_kernfs_init_security() - Init LSM context for a kernfs node
* @kn_dir: parent kernfs node
* @kn: the kernfs node to initialize
*
* Initialize the security context of a newly created kernfs node based on its
* own and its parent's attributes.
*
* Return: Returns 0 if permission is granted.
*/
int security_kernfs_init_security(struct kernfs_node *kn_dir,
struct kernfs_node *kn)
{
return call_int_hook(kernfs_init_security, 0, kn_dir, kn);
}
/**
* security_file_permission() - Check file permissions
* @file: file
* @mask: requested permissions
*
* Check file permissions before accessing an open file. This hook is called
* by various operations that read or write files. A security module can use
* this hook to perform additional checking on these operations, e.g. to
* revalidate permissions on use to support privilege bracketing or policy
* changes. Notice that this hook is used when the actual read/write
* operations are performed, whereas the inode_security_ops hook is called when
* a file is opened (as well as many other operations). Although this hook can
* be used to revalidate permissions for various system call operations that
* read or write files, it does not address the revalidation of permissions for
* memory-mapped files. Security modules must handle this separately if they
* need such revalidation.
*
* Return: Returns 0 if permission is granted.
*/
int security_file_permission(struct file *file, int mask)
{
int ret;
ret = call_int_hook(file_permission, 0, file, mask);
if (ret)
return ret;
return fsnotify_perm(file, mask);
}
/**
* security_file_alloc() - Allocate and init a file's LSM blob
* @file: the file
*
* Allocate and attach a security structure to the file->f_security field. The
* security field is initialized to NULL when the structure is first created.
*
* Return: Return 0 if the hook is successful and permission is granted.
*/
int security_file_alloc(struct file *file)
{
int rc = lsm_file_alloc(file);
if (rc)
return rc;
rc = call_int_hook(file_alloc_security, 0, file);
if (unlikely(rc))
security_file_free(file);
return rc;
}
/**
* security_file_free() - Free a file's LSM blob
* @file: the file
*
* Deallocate and free any security structures stored in file->f_security.
*/
void security_file_free(struct file *file)
{
void *blob;
call_void_hook(file_free_security, file);
blob = file->f_security;
if (blob) {
file->f_security = NULL;
kmem_cache_free(lsm_file_cache, blob);
}
}
/**
* security_file_ioctl() - Check if an ioctl is allowed
* @file: associated file
* @cmd: ioctl cmd
* @arg: ioctl arguments
*
* Check permission for an ioctl operation on @file. Note that @arg sometimes
* represents a user space pointer; in other cases, it may be a simple integer
* value. When @arg represents a user space pointer, it should never be used
* by the security module.
*
* Return: Returns 0 if permission is granted.
*/
int security_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
return call_int_hook(file_ioctl, 0, file, cmd, arg);
}
EXPORT_SYMBOL_GPL(security_file_ioctl);
static inline unsigned long mmap_prot(struct file *file, unsigned long prot)
{
/*
* Does we have PROT_READ and does the application expect
* it to imply PROT_EXEC? If not, nothing to talk about...
*/
if ((prot & (PROT_READ | PROT_EXEC)) != PROT_READ)
return prot;
if (!(current->personality & READ_IMPLIES_EXEC))
return prot;
/*
* if that's an anonymous mapping, let it.
*/
if (!file)
return prot | PROT_EXEC;
/*
* ditto if it's not on noexec mount, except that on !MMU we need
* NOMMU_MAP_EXEC (== VM_MAYEXEC) in this case
*/
vfs: Commit to never having exectuables on proc and sysfs. Today proc and sysfs do not contain any executable files. Several applications today mount proc or sysfs without noexec and nosuid and then depend on there being no exectuables files on proc or sysfs. Having any executable files show on proc or sysfs would cause a user space visible regression, and most likely security problems. Therefore commit to never allowing executables on proc and sysfs by adding a new flag to mark them as filesystems without executables and enforce that flag. Test the flag where MNT_NOEXEC is tested today, so that the only user visible effect will be that exectuables will be treated as if the execute bit is cleared. The filesystems proc and sysfs do not currently incoporate any executable files so this does not result in any user visible effects. This makes it unnecessary to vet changes to proc and sysfs tightly for adding exectuable files or changes to chattr that would modify existing files, as no matter what the individual file say they will not be treated as exectuable files by the vfs. Not having to vet changes to closely is important as without this we are only one proc_create call (or another goof up in the implementation of notify_change) from having problematic executables on proc. Those mistakes are all too easy to make and would create a situation where there are security issues or the assumptions of some program having to be broken (and cause userspace regressions). Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-06-29 19:42:03 +00:00
if (!path_noexec(&file->f_path)) {
#ifndef CONFIG_MMU
if (file->f_op->mmap_capabilities) {
unsigned caps = file->f_op->mmap_capabilities(file);
if (!(caps & NOMMU_MAP_EXEC))
return prot;
}
#endif
return prot | PROT_EXEC;
}
/* anything on noexec mount won't get PROT_EXEC */
return prot;
}
/**
* security_mmap_file() - Check if mmap'ing a file is allowed
* @file: file
* @prot: protection applied by the kernel
* @flags: flags
*
* Check permissions for a mmap operation. The @file may be NULL, e.g. if
* mapping anonymous memory.
*
* Return: Returns 0 if permission is granted.
*/
int security_mmap_file(struct file *file, unsigned long prot,
unsigned long flags)
{
ima: Align ima_file_mmap() parameters with mmap_file LSM hook Commit 98de59bfe4b2f ("take calculation of final prot in security_mmap_file() into a helper") moved the code to update prot, to be the actual protections applied to the kernel, to a new helper called mmap_prot(). However, while without the helper ima_file_mmap() was getting the updated prot, with the helper ima_file_mmap() gets the original prot, which contains the protections requested by the application. A possible consequence of this change is that, if an application calls mmap() with only PROT_READ, and the kernel applies PROT_EXEC in addition, that application would have access to executable memory without having this event recorded in the IMA measurement list. This situation would occur for example if the application, before mmap(), calls the personality() system call with READ_IMPLIES_EXEC as the first argument. Align ima_file_mmap() parameters with those of the mmap_file LSM hook, so that IMA can receive both the requested prot and the final prot. Since the requested protections are stored in a new variable, and the final protections are stored in the existing variable, this effectively restores the original behavior of the MMAP_CHECK hook. Cc: stable@vger.kernel.org Fixes: 98de59bfe4b2 ("take calculation of final prot in security_mmap_file() into a helper") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2023-01-31 17:42:43 +00:00
unsigned long prot_adj = mmap_prot(file, prot);
int ret;
ima: Align ima_file_mmap() parameters with mmap_file LSM hook Commit 98de59bfe4b2f ("take calculation of final prot in security_mmap_file() into a helper") moved the code to update prot, to be the actual protections applied to the kernel, to a new helper called mmap_prot(). However, while without the helper ima_file_mmap() was getting the updated prot, with the helper ima_file_mmap() gets the original prot, which contains the protections requested by the application. A possible consequence of this change is that, if an application calls mmap() with only PROT_READ, and the kernel applies PROT_EXEC in addition, that application would have access to executable memory without having this event recorded in the IMA measurement list. This situation would occur for example if the application, before mmap(), calls the personality() system call with READ_IMPLIES_EXEC as the first argument. Align ima_file_mmap() parameters with those of the mmap_file LSM hook, so that IMA can receive both the requested prot and the final prot. Since the requested protections are stored in a new variable, and the final protections are stored in the existing variable, this effectively restores the original behavior of the MMAP_CHECK hook. Cc: stable@vger.kernel.org Fixes: 98de59bfe4b2 ("take calculation of final prot in security_mmap_file() into a helper") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2023-01-31 17:42:43 +00:00
ret = call_int_hook(mmap_file, 0, file, prot, prot_adj, flags);
if (ret)
return ret;
ima: Align ima_file_mmap() parameters with mmap_file LSM hook Commit 98de59bfe4b2f ("take calculation of final prot in security_mmap_file() into a helper") moved the code to update prot, to be the actual protections applied to the kernel, to a new helper called mmap_prot(). However, while without the helper ima_file_mmap() was getting the updated prot, with the helper ima_file_mmap() gets the original prot, which contains the protections requested by the application. A possible consequence of this change is that, if an application calls mmap() with only PROT_READ, and the kernel applies PROT_EXEC in addition, that application would have access to executable memory without having this event recorded in the IMA measurement list. This situation would occur for example if the application, before mmap(), calls the personality() system call with READ_IMPLIES_EXEC as the first argument. Align ima_file_mmap() parameters with those of the mmap_file LSM hook, so that IMA can receive both the requested prot and the final prot. Since the requested protections are stored in a new variable, and the final protections are stored in the existing variable, this effectively restores the original behavior of the MMAP_CHECK hook. Cc: stable@vger.kernel.org Fixes: 98de59bfe4b2 ("take calculation of final prot in security_mmap_file() into a helper") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2023-01-31 17:42:43 +00:00
return ima_file_mmap(file, prot, prot_adj, flags);
}
/**
* security_mmap_addr() - Check if mmap'ing an address is allowed
* @addr: address
*
* Check permissions for a mmap operation at @addr.
*
* Return: Returns 0 if permission is granted.
*/
int security_mmap_addr(unsigned long addr)
{
return call_int_hook(mmap_addr, 0, addr);
}
/**
* security_file_mprotect() - Check if changing memory protections is allowed
* @vma: memory region
* @reqprot: application requested protection
* @prog: protection applied by the kernel
*
* Check permissions before changing memory access permissions.
*
* Return: Returns 0 if permission is granted.
*/
int security_file_mprotect(struct vm_area_struct *vma, unsigned long reqprot,
unsigned long prot)
{
int ret;
ret = call_int_hook(file_mprotect, 0, vma, reqprot, prot);
if (ret)
return ret;
return ima_file_mprotect(vma, prot);
}
/**
* security_file_lock() - Check if a file lock is allowed
* @file: file
* @cmd: lock operation (e.g. F_RDLCK, F_WRLCK)
*
* Check permission before performing file locking operations. Note the hook
* mediates both flock and fcntl style locks.
*
* Return: Returns 0 if permission is granted.
*/
int security_file_lock(struct file *file, unsigned int cmd)
{
return call_int_hook(file_lock, 0, file, cmd);
}
/**
* security_file_fcntl() - Check if fcntl() op is allowed
* @file: file
* @cmd: fnctl command
* @arg: command argument
*
* Check permission before allowing the file operation specified by @cmd from
* being performed on the file @file. Note that @arg sometimes represents a
* user space pointer; in other cases, it may be a simple integer value. When
* @arg represents a user space pointer, it should never be used by the
* security module.
*
* Return: Returns 0 if permission is granted.
*/
int security_file_fcntl(struct file *file, unsigned int cmd, unsigned long arg)
{
return call_int_hook(file_fcntl, 0, file, cmd, arg);
}
/**
* security_file_set_fowner() - Set the file owner info in the LSM blob
* @file: the file
*
* Save owner security information (typically from current->security) in
* file->f_security for later use by the send_sigiotask hook.
*
* Return: Returns 0 on success.
*/
void security_file_set_fowner(struct file *file)
{
call_void_hook(file_set_fowner, file);
}
/**
* security_file_send_sigiotask() - Check if sending SIGIO/SIGURG is allowed
* @tsk: target task
* @fown: signal sender
* @sig: signal to be sent, SIGIO is sent if 0
*
* Check permission for the file owner @fown to send SIGIO or SIGURG to the
* process @tsk. Note that this hook is sometimes called from interrupt. Note
* that the fown_struct, @fown, is never outside the context of a struct file,
* so the file structure (and associated security information) can always be
* obtained: container_of(fown, struct file, f_owner).
*
* Return: Returns 0 if permission is granted.
*/
int security_file_send_sigiotask(struct task_struct *tsk,
struct fown_struct *fown, int sig)
{
return call_int_hook(file_send_sigiotask, 0, tsk, fown, sig);
}
/**
* security_file_receive() - Check is receiving a file via IPC is allowed
* @file: file being received
*
* This hook allows security modules to control the ability of a process to
* receive an open file descriptor via socket IPC.
*
* Return: Returns 0 if permission is granted.
*/
int security_file_receive(struct file *file)
{
return call_int_hook(file_receive, 0, file);
}
/**
* security_file_open() - Save open() time state for late use by the LSM
* @file:
*
* Save open-time permission checking state for later use upon file_permission,
* and recheck access if anything has changed since inode_permission.
*
* Return: Returns 0 if permission is granted.
*/
int security_file_open(struct file *file)
{
int ret;
ret = call_int_hook(file_open, 0, file);
if (ret)
return ret;
return fsnotify_perm(file, MAY_OPEN);
}
/**
* security_file_truncate() - Check if truncating a file is allowed
* @file: file
*
* Check permission before truncating a file, i.e. using ftruncate. Note that
* truncation permission may also be checked based on the path, using the
* @path_truncate hook.
*
* Return: Returns 0 if permission is granted.
*/
int security_file_truncate(struct file *file)
{
return call_int_hook(file_truncate, 0, file);
}
/**
* security_task_alloc() - Allocate a task's LSM blob
* @task: the task
* @clone_flags: flags indicating what is being shared
*
* Handle allocation of task-related resources.
*
* Return: Returns a zero on success, negative values on failure.
*/
LSM: Revive security_task_alloc() hook and per "struct task_struct" security blob. We switched from "struct task_struct"->security to "struct cred"->security in Linux 2.6.29. But not all LSM modules were happy with that change. TOMOYO LSM module is an example which want to use per "struct task_struct" security blob, for TOMOYO's security context is defined based on "struct task_struct" rather than "struct cred". AppArmor LSM module is another example which want to use it, for AppArmor is currently abusing the cred a little bit to store the change_hat and setexeccon info. Although security_task_free() hook was revived in Linux 3.4 because Yama LSM module wanted to release per "struct task_struct" security blob, security_task_alloc() hook and "struct task_struct"->security field were not revived. Nowadays, we are getting proposals of lightweight LSM modules which want to use per "struct task_struct" security blob. We are already allowing multiple concurrent LSM modules (up to one fully armored module which uses "struct cred"->security field or exclusive hooks like security_xfrm_state_pol_flow_match(), plus unlimited number of lightweight modules which do not use "struct cred"->security nor exclusive hooks) as long as they are built into the kernel. But this patch does not implement variable length "struct task_struct"->security field which will become needed when multiple LSM modules want to use "struct task_struct"-> security field. Although it won't be difficult to implement variable length "struct task_struct"->security field, let's think about it after we merged this patch. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: John Johansen <john.johansen@canonical.com> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Tested-by: Djalal Harouni <tixxdz@gmail.com> Acked-by: José Bollo <jobol@nonadev.net> Cc: Paul Moore <paul@paul-moore.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Eric Paris <eparis@parisplace.org> Cc: Kees Cook <keescook@chromium.org> Cc: James Morris <james.l.morris@oracle.com> Cc: José Bollo <jobol@nonadev.net> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-03-24 11:46:33 +00:00
int security_task_alloc(struct task_struct *task, unsigned long clone_flags)
{
int rc = lsm_task_alloc(task);
if (rc)
return rc;
rc = call_int_hook(task_alloc, 0, task, clone_flags);
if (unlikely(rc))
security_task_free(task);
return rc;
LSM: Revive security_task_alloc() hook and per "struct task_struct" security blob. We switched from "struct task_struct"->security to "struct cred"->security in Linux 2.6.29. But not all LSM modules were happy with that change. TOMOYO LSM module is an example which want to use per "struct task_struct" security blob, for TOMOYO's security context is defined based on "struct task_struct" rather than "struct cred". AppArmor LSM module is another example which want to use it, for AppArmor is currently abusing the cred a little bit to store the change_hat and setexeccon info. Although security_task_free() hook was revived in Linux 3.4 because Yama LSM module wanted to release per "struct task_struct" security blob, security_task_alloc() hook and "struct task_struct"->security field were not revived. Nowadays, we are getting proposals of lightweight LSM modules which want to use per "struct task_struct" security blob. We are already allowing multiple concurrent LSM modules (up to one fully armored module which uses "struct cred"->security field or exclusive hooks like security_xfrm_state_pol_flow_match(), plus unlimited number of lightweight modules which do not use "struct cred"->security nor exclusive hooks) as long as they are built into the kernel. But this patch does not implement variable length "struct task_struct"->security field which will become needed when multiple LSM modules want to use "struct task_struct"-> security field. Although it won't be difficult to implement variable length "struct task_struct"->security field, let's think about it after we merged this patch. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: John Johansen <john.johansen@canonical.com> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Tested-by: Djalal Harouni <tixxdz@gmail.com> Acked-by: José Bollo <jobol@nonadev.net> Cc: Paul Moore <paul@paul-moore.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Eric Paris <eparis@parisplace.org> Cc: Kees Cook <keescook@chromium.org> Cc: James Morris <james.l.morris@oracle.com> Cc: José Bollo <jobol@nonadev.net> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-03-24 11:46:33 +00:00
}
/**
* security_task_free() - Free a task's LSM blob and related resources
* @task: task
*
* Handle release of task-related resources. Note that this can be called from
* interrupt context.
*/
void security_task_free(struct task_struct *task)
{
call_void_hook(task_free, task);
kfree(task->security);
task->security = NULL;
}
/**
* security_cred_alloc_blank() - Allocate the min memory to allow cred_transfer
* @cred: credentials
* @gfp: gfp flags
*
* Only allocate sufficient memory and attach to @cred such that
* cred_transfer() will not get ENOMEM.
*
* Return: Returns 0 on success, negative values on failure.
*/
KEYS: Add a keyctl to install a process's session keyring on its parent [try #6] Add a keyctl to install a process's session keyring onto its parent. This replaces the parent's session keyring. Because the COW credential code does not permit one process to change another process's credentials directly, the change is deferred until userspace next starts executing again. Normally this will be after a wait*() syscall. To support this, three new security hooks have been provided: cred_alloc_blank() to allocate unset security creds, cred_transfer() to fill in the blank security creds and key_session_to_parent() - which asks the LSM if the process may replace its parent's session keyring. The replacement may only happen if the process has the same ownership details as its parent, and the process has LINK permission on the session keyring, and the session keyring is owned by the process, and the LSM permits it. Note that this requires alteration to each architecture's notify_resume path. This has been done for all arches barring blackfin, m68k* and xtensa, all of which need assembly alteration to support TIF_NOTIFY_RESUME. This allows the replacement to be performed at the point the parent process resumes userspace execution. This allows the userspace AFS pioctl emulation to fully emulate newpag() and the VIOCSETTOK and VIOCSETTOK2 pioctls, all of which require the ability to alter the parent process's PAG membership. However, since kAFS doesn't use PAGs per se, but rather dumps the keys into the session keyring, the session keyring of the parent must be replaced if, for example, VIOCSETTOK is passed the newpag flag. This can be tested with the following program: #include <stdio.h> #include <stdlib.h> #include <keyutils.h> #define KEYCTL_SESSION_TO_PARENT 18 #define OSERROR(X, S) do { if ((long)(X) == -1) { perror(S); exit(1); } } while(0) int main(int argc, char **argv) { key_serial_t keyring, key; long ret; keyring = keyctl_join_session_keyring(argv[1]); OSERROR(keyring, "keyctl_join_session_keyring"); key = add_key("user", "a", "b", 1, keyring); OSERROR(key, "add_key"); ret = keyctl(KEYCTL_SESSION_TO_PARENT); OSERROR(ret, "KEYCTL_SESSION_TO_PARENT"); return 0; } Compiled and linked with -lkeyutils, you should see something like: [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: _ses 355907932 --alswrv 4043 -1 \_ keyring: _uid.4043 [dhowells@andromeda ~]$ /tmp/newpag [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: _ses 1055658746 --alswrv 4043 4043 \_ user: a [dhowells@andromeda ~]$ /tmp/newpag hello [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: hello 340417692 --alswrv 4043 4043 \_ user: a Where the test program creates a new session keyring, sticks a user key named 'a' into it and then installs it on its parent. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-02 08:14:21 +00:00
int security_cred_alloc_blank(struct cred *cred, gfp_t gfp)
{
int rc = lsm_cred_alloc(cred, gfp);
if (rc)
return rc;
rc = call_int_hook(cred_alloc_blank, 0, cred, gfp);
if (unlikely(rc))
security_cred_free(cred);
return rc;
KEYS: Add a keyctl to install a process's session keyring on its parent [try #6] Add a keyctl to install a process's session keyring onto its parent. This replaces the parent's session keyring. Because the COW credential code does not permit one process to change another process's credentials directly, the change is deferred until userspace next starts executing again. Normally this will be after a wait*() syscall. To support this, three new security hooks have been provided: cred_alloc_blank() to allocate unset security creds, cred_transfer() to fill in the blank security creds and key_session_to_parent() - which asks the LSM if the process may replace its parent's session keyring. The replacement may only happen if the process has the same ownership details as its parent, and the process has LINK permission on the session keyring, and the session keyring is owned by the process, and the LSM permits it. Note that this requires alteration to each architecture's notify_resume path. This has been done for all arches barring blackfin, m68k* and xtensa, all of which need assembly alteration to support TIF_NOTIFY_RESUME. This allows the replacement to be performed at the point the parent process resumes userspace execution. This allows the userspace AFS pioctl emulation to fully emulate newpag() and the VIOCSETTOK and VIOCSETTOK2 pioctls, all of which require the ability to alter the parent process's PAG membership. However, since kAFS doesn't use PAGs per se, but rather dumps the keys into the session keyring, the session keyring of the parent must be replaced if, for example, VIOCSETTOK is passed the newpag flag. This can be tested with the following program: #include <stdio.h> #include <stdlib.h> #include <keyutils.h> #define KEYCTL_SESSION_TO_PARENT 18 #define OSERROR(X, S) do { if ((long)(X) == -1) { perror(S); exit(1); } } while(0) int main(int argc, char **argv) { key_serial_t keyring, key; long ret; keyring = keyctl_join_session_keyring(argv[1]); OSERROR(keyring, "keyctl_join_session_keyring"); key = add_key("user", "a", "b", 1, keyring); OSERROR(key, "add_key"); ret = keyctl(KEYCTL_SESSION_TO_PARENT); OSERROR(ret, "KEYCTL_SESSION_TO_PARENT"); return 0; } Compiled and linked with -lkeyutils, you should see something like: [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: _ses 355907932 --alswrv 4043 -1 \_ keyring: _uid.4043 [dhowells@andromeda ~]$ /tmp/newpag [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: _ses 1055658746 --alswrv 4043 4043 \_ user: a [dhowells@andromeda ~]$ /tmp/newpag hello [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: hello 340417692 --alswrv 4043 4043 \_ user: a Where the test program creates a new session keyring, sticks a user key named 'a' into it and then installs it on its parent. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-02 08:14:21 +00:00
}
/**
* security_cred_free() - Free the cred's LSM blob and associated resources
* @cred: credentials
*
* Deallocate and clear the cred->security field in a set of credentials.
*/
CRED: Inaugurate COW credentials Inaugurate copy-on-write credentials management. This uses RCU to manage the credentials pointer in the task_struct with respect to accesses by other tasks. A process may only modify its own credentials, and so does not need locking to access or modify its own credentials. A mutex (cred_replace_mutex) is added to the task_struct to control the effect of PTRACE_ATTACHED on credential calculations, particularly with respect to execve(). With this patch, the contents of an active credentials struct may not be changed directly; rather a new set of credentials must be prepared, modified and committed using something like the following sequence of events: struct cred *new = prepare_creds(); int ret = blah(new); if (ret < 0) { abort_creds(new); return ret; } return commit_creds(new); There are some exceptions to this rule: the keyrings pointed to by the active credentials may be instantiated - keyrings violate the COW rule as managing COW keyrings is tricky, given that it is possible for a task to directly alter the keys in a keyring in use by another task. To help enforce this, various pointers to sets of credentials, such as those in the task_struct, are declared const. The purpose of this is compile-time discouragement of altering credentials through those pointers. Once a set of credentials has been made public through one of these pointers, it may not be modified, except under special circumstances: (1) Its reference count may incremented and decremented. (2) The keyrings to which it points may be modified, but not replaced. The only safe way to modify anything else is to create a replacement and commit using the functions described in Documentation/credentials.txt (which will be added by a later patch). This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). This now prepares and commits credentials in various places in the security code rather than altering the current creds directly. (2) Temporary credential overrides. do_coredump() and sys_faccessat() now prepare their own credentials and temporarily override the ones currently on the acting thread, whilst preventing interference from other threads by holding cred_replace_mutex on the thread being dumped. This will be replaced in a future patch by something that hands down the credentials directly to the functions being called, rather than altering the task's objective credentials. (3) LSM interface. A number of functions have been changed, added or removed: (*) security_capset_check(), ->capset_check() (*) security_capset_set(), ->capset_set() Removed in favour of security_capset(). (*) security_capset(), ->capset() New. This is passed a pointer to the new creds, a pointer to the old creds and the proposed capability sets. It should fill in the new creds or return an error. All pointers, barring the pointer to the new creds, are now const. (*) security_bprm_apply_creds(), ->bprm_apply_creds() Changed; now returns a value, which will cause the process to be killed if it's an error. (*) security_task_alloc(), ->task_alloc_security() Removed in favour of security_prepare_creds(). (*) security_cred_free(), ->cred_free() New. Free security data attached to cred->security. (*) security_prepare_creds(), ->cred_prepare() New. Duplicate any security data attached to cred->security. (*) security_commit_creds(), ->cred_commit() New. Apply any security effects for the upcoming installation of new security by commit_creds(). (*) security_task_post_setuid(), ->task_post_setuid() Removed in favour of security_task_fix_setuid(). (*) security_task_fix_setuid(), ->task_fix_setuid() Fix up the proposed new credentials for setuid(). This is used by cap_set_fix_setuid() to implicitly adjust capabilities in line with setuid() changes. Changes are made to the new credentials, rather than the task itself as in security_task_post_setuid(). (*) security_task_reparent_to_init(), ->task_reparent_to_init() Removed. Instead the task being reparented to init is referred directly to init's credentials. NOTE! This results in the loss of some state: SELinux's osid no longer records the sid of the thread that forked it. (*) security_key_alloc(), ->key_alloc() (*) security_key_permission(), ->key_permission() Changed. These now take cred pointers rather than task pointers to refer to the security context. (4) sys_capset(). This has been simplified and uses less locking. The LSM functions it calls have been merged. (5) reparent_to_kthreadd(). This gives the current thread the same credentials as init by simply using commit_thread() to point that way. (6) __sigqueue_alloc() and switch_uid() __sigqueue_alloc() can't stop the target task from changing its creds beneath it, so this function gets a reference to the currently applicable user_struct which it then passes into the sigqueue struct it returns if successful. switch_uid() is now called from commit_creds(), and possibly should be folded into that. commit_creds() should take care of protecting __sigqueue_alloc(). (7) [sg]et[ug]id() and co and [sg]et_current_groups. The set functions now all use prepare_creds(), commit_creds() and abort_creds() to build and check a new set of credentials before applying it. security_task_set[ug]id() is called inside the prepared section. This guarantees that nothing else will affect the creds until we've finished. The calling of set_dumpable() has been moved into commit_creds(). Much of the functionality of set_user() has been moved into commit_creds(). The get functions all simply access the data directly. (8) security_task_prctl() and cap_task_prctl(). security_task_prctl() has been modified to return -ENOSYS if it doesn't want to handle a function, or otherwise return the return value directly rather than through an argument. Additionally, cap_task_prctl() now prepares a new set of credentials, even if it doesn't end up using it. (9) Keyrings. A number of changes have been made to the keyrings code: (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have all been dropped and built in to the credentials functions directly. They may want separating out again later. (b) key_alloc() and search_process_keyrings() now take a cred pointer rather than a task pointer to specify the security context. (c) copy_creds() gives a new thread within the same thread group a new thread keyring if its parent had one, otherwise it discards the thread keyring. (d) The authorisation key now points directly to the credentials to extend the search into rather pointing to the task that carries them. (e) Installing thread, process or session keyrings causes a new set of credentials to be created, even though it's not strictly necessary for process or session keyrings (they're shared). (10) Usermode helper. The usermode helper code now carries a cred struct pointer in its subprocess_info struct instead of a new session keyring pointer. This set of credentials is derived from init_cred and installed on the new process after it has been cloned. call_usermodehelper_setup() allocates the new credentials and call_usermodehelper_freeinfo() discards them if they haven't been used. A special cred function (prepare_usermodeinfo_creds()) is provided specifically for call_usermodehelper_setup() to call. call_usermodehelper_setkeys() adjusts the credentials to sport the supplied keyring as the new session keyring. (11) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) selinux_setprocattr() no longer does its check for whether the current ptracer can access processes with the new SID inside the lock that covers getting the ptracer's SID. Whilst this lock ensures that the check is done with the ptracer pinned, the result is only valid until the lock is released, so there's no point doing it inside the lock. (12) is_single_threaded(). This function has been extracted from selinux_setprocattr() and put into a file of its own in the lib/ directory as join_session_keyring() now wants to use it too. The code in SELinux just checked to see whether a task shared mm_structs with other tasks (CLONE_VM), but that isn't good enough. We really want to know if they're part of the same thread group (CLONE_THREAD). (13) nfsd. The NFS server daemon now has to use the COW credentials to set the credentials it is going to use. It really needs to pass the credentials down to the functions it calls, but it can't do that until other patches in this series have been applied. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-13 23:39:23 +00:00
void security_cred_free(struct cred *cred)
{
/*
* There is a failure case in prepare_creds() that
* may result in a call here with ->security being NULL.
*/
if (unlikely(cred->security == NULL))
return;
call_void_hook(cred_free, cred);
kfree(cred->security);
cred->security = NULL;
}
/**
* security_prepare_creds() - Prepare a new set of credentials
* @new: new credentials
* @old: original credentials
* @gfp: gfp flags
*
* Prepare a new set of credentials by copying the data from the old set.
*
* Return: Returns 0 on success, negative values on failure.
*/
CRED: Inaugurate COW credentials Inaugurate copy-on-write credentials management. This uses RCU to manage the credentials pointer in the task_struct with respect to accesses by other tasks. A process may only modify its own credentials, and so does not need locking to access or modify its own credentials. A mutex (cred_replace_mutex) is added to the task_struct to control the effect of PTRACE_ATTACHED on credential calculations, particularly with respect to execve(). With this patch, the contents of an active credentials struct may not be changed directly; rather a new set of credentials must be prepared, modified and committed using something like the following sequence of events: struct cred *new = prepare_creds(); int ret = blah(new); if (ret < 0) { abort_creds(new); return ret; } return commit_creds(new); There are some exceptions to this rule: the keyrings pointed to by the active credentials may be instantiated - keyrings violate the COW rule as managing COW keyrings is tricky, given that it is possible for a task to directly alter the keys in a keyring in use by another task. To help enforce this, various pointers to sets of credentials, such as those in the task_struct, are declared const. The purpose of this is compile-time discouragement of altering credentials through those pointers. Once a set of credentials has been made public through one of these pointers, it may not be modified, except under special circumstances: (1) Its reference count may incremented and decremented. (2) The keyrings to which it points may be modified, but not replaced. The only safe way to modify anything else is to create a replacement and commit using the functions described in Documentation/credentials.txt (which will be added by a later patch). This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). This now prepares and commits credentials in various places in the security code rather than altering the current creds directly. (2) Temporary credential overrides. do_coredump() and sys_faccessat() now prepare their own credentials and temporarily override the ones currently on the acting thread, whilst preventing interference from other threads by holding cred_replace_mutex on the thread being dumped. This will be replaced in a future patch by something that hands down the credentials directly to the functions being called, rather than altering the task's objective credentials. (3) LSM interface. A number of functions have been changed, added or removed: (*) security_capset_check(), ->capset_check() (*) security_capset_set(), ->capset_set() Removed in favour of security_capset(). (*) security_capset(), ->capset() New. This is passed a pointer to the new creds, a pointer to the old creds and the proposed capability sets. It should fill in the new creds or return an error. All pointers, barring the pointer to the new creds, are now const. (*) security_bprm_apply_creds(), ->bprm_apply_creds() Changed; now returns a value, which will cause the process to be killed if it's an error. (*) security_task_alloc(), ->task_alloc_security() Removed in favour of security_prepare_creds(). (*) security_cred_free(), ->cred_free() New. Free security data attached to cred->security. (*) security_prepare_creds(), ->cred_prepare() New. Duplicate any security data attached to cred->security. (*) security_commit_creds(), ->cred_commit() New. Apply any security effects for the upcoming installation of new security by commit_creds(). (*) security_task_post_setuid(), ->task_post_setuid() Removed in favour of security_task_fix_setuid(). (*) security_task_fix_setuid(), ->task_fix_setuid() Fix up the proposed new credentials for setuid(). This is used by cap_set_fix_setuid() to implicitly adjust capabilities in line with setuid() changes. Changes are made to the new credentials, rather than the task itself as in security_task_post_setuid(). (*) security_task_reparent_to_init(), ->task_reparent_to_init() Removed. Instead the task being reparented to init is referred directly to init's credentials. NOTE! This results in the loss of some state: SELinux's osid no longer records the sid of the thread that forked it. (*) security_key_alloc(), ->key_alloc() (*) security_key_permission(), ->key_permission() Changed. These now take cred pointers rather than task pointers to refer to the security context. (4) sys_capset(). This has been simplified and uses less locking. The LSM functions it calls have been merged. (5) reparent_to_kthreadd(). This gives the current thread the same credentials as init by simply using commit_thread() to point that way. (6) __sigqueue_alloc() and switch_uid() __sigqueue_alloc() can't stop the target task from changing its creds beneath it, so this function gets a reference to the currently applicable user_struct which it then passes into the sigqueue struct it returns if successful. switch_uid() is now called from commit_creds(), and possibly should be folded into that. commit_creds() should take care of protecting __sigqueue_alloc(). (7) [sg]et[ug]id() and co and [sg]et_current_groups. The set functions now all use prepare_creds(), commit_creds() and abort_creds() to build and check a new set of credentials before applying it. security_task_set[ug]id() is called inside the prepared section. This guarantees that nothing else will affect the creds until we've finished. The calling of set_dumpable() has been moved into commit_creds(). Much of the functionality of set_user() has been moved into commit_creds(). The get functions all simply access the data directly. (8) security_task_prctl() and cap_task_prctl(). security_task_prctl() has been modified to return -ENOSYS if it doesn't want to handle a function, or otherwise return the return value directly rather than through an argument. Additionally, cap_task_prctl() now prepares a new set of credentials, even if it doesn't end up using it. (9) Keyrings. A number of changes have been made to the keyrings code: (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have all been dropped and built in to the credentials functions directly. They may want separating out again later. (b) key_alloc() and search_process_keyrings() now take a cred pointer rather than a task pointer to specify the security context. (c) copy_creds() gives a new thread within the same thread group a new thread keyring if its parent had one, otherwise it discards the thread keyring. (d) The authorisation key now points directly to the credentials to extend the search into rather pointing to the task that carries them. (e) Installing thread, process or session keyrings causes a new set of credentials to be created, even though it's not strictly necessary for process or session keyrings (they're shared). (10) Usermode helper. The usermode helper code now carries a cred struct pointer in its subprocess_info struct instead of a new session keyring pointer. This set of credentials is derived from init_cred and installed on the new process after it has been cloned. call_usermodehelper_setup() allocates the new credentials and call_usermodehelper_freeinfo() discards them if they haven't been used. A special cred function (prepare_usermodeinfo_creds()) is provided specifically for call_usermodehelper_setup() to call. call_usermodehelper_setkeys() adjusts the credentials to sport the supplied keyring as the new session keyring. (11) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) selinux_setprocattr() no longer does its check for whether the current ptracer can access processes with the new SID inside the lock that covers getting the ptracer's SID. Whilst this lock ensures that the check is done with the ptracer pinned, the result is only valid until the lock is released, so there's no point doing it inside the lock. (12) is_single_threaded(). This function has been extracted from selinux_setprocattr() and put into a file of its own in the lib/ directory as join_session_keyring() now wants to use it too. The code in SELinux just checked to see whether a task shared mm_structs with other tasks (CLONE_VM), but that isn't good enough. We really want to know if they're part of the same thread group (CLONE_THREAD). (13) nfsd. The NFS server daemon now has to use the COW credentials to set the credentials it is going to use. It really needs to pass the credentials down to the functions it calls, but it can't do that until other patches in this series have been applied. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-13 23:39:23 +00:00
int security_prepare_creds(struct cred *new, const struct cred *old, gfp_t gfp)
{
int rc = lsm_cred_alloc(new, gfp);
if (rc)
return rc;
rc = call_int_hook(cred_prepare, 0, new, old, gfp);
if (unlikely(rc))
security_cred_free(new);
return rc;
CRED: Inaugurate COW credentials Inaugurate copy-on-write credentials management. This uses RCU to manage the credentials pointer in the task_struct with respect to accesses by other tasks. A process may only modify its own credentials, and so does not need locking to access or modify its own credentials. A mutex (cred_replace_mutex) is added to the task_struct to control the effect of PTRACE_ATTACHED on credential calculations, particularly with respect to execve(). With this patch, the contents of an active credentials struct may not be changed directly; rather a new set of credentials must be prepared, modified and committed using something like the following sequence of events: struct cred *new = prepare_creds(); int ret = blah(new); if (ret < 0) { abort_creds(new); return ret; } return commit_creds(new); There are some exceptions to this rule: the keyrings pointed to by the active credentials may be instantiated - keyrings violate the COW rule as managing COW keyrings is tricky, given that it is possible for a task to directly alter the keys in a keyring in use by another task. To help enforce this, various pointers to sets of credentials, such as those in the task_struct, are declared const. The purpose of this is compile-time discouragement of altering credentials through those pointers. Once a set of credentials has been made public through one of these pointers, it may not be modified, except under special circumstances: (1) Its reference count may incremented and decremented. (2) The keyrings to which it points may be modified, but not replaced. The only safe way to modify anything else is to create a replacement and commit using the functions described in Documentation/credentials.txt (which will be added by a later patch). This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). This now prepares and commits credentials in various places in the security code rather than altering the current creds directly. (2) Temporary credential overrides. do_coredump() and sys_faccessat() now prepare their own credentials and temporarily override the ones currently on the acting thread, whilst preventing interference from other threads by holding cred_replace_mutex on the thread being dumped. This will be replaced in a future patch by something that hands down the credentials directly to the functions being called, rather than altering the task's objective credentials. (3) LSM interface. A number of functions have been changed, added or removed: (*) security_capset_check(), ->capset_check() (*) security_capset_set(), ->capset_set() Removed in favour of security_capset(). (*) security_capset(), ->capset() New. This is passed a pointer to the new creds, a pointer to the old creds and the proposed capability sets. It should fill in the new creds or return an error. All pointers, barring the pointer to the new creds, are now const. (*) security_bprm_apply_creds(), ->bprm_apply_creds() Changed; now returns a value, which will cause the process to be killed if it's an error. (*) security_task_alloc(), ->task_alloc_security() Removed in favour of security_prepare_creds(). (*) security_cred_free(), ->cred_free() New. Free security data attached to cred->security. (*) security_prepare_creds(), ->cred_prepare() New. Duplicate any security data attached to cred->security. (*) security_commit_creds(), ->cred_commit() New. Apply any security effects for the upcoming installation of new security by commit_creds(). (*) security_task_post_setuid(), ->task_post_setuid() Removed in favour of security_task_fix_setuid(). (*) security_task_fix_setuid(), ->task_fix_setuid() Fix up the proposed new credentials for setuid(). This is used by cap_set_fix_setuid() to implicitly adjust capabilities in line with setuid() changes. Changes are made to the new credentials, rather than the task itself as in security_task_post_setuid(). (*) security_task_reparent_to_init(), ->task_reparent_to_init() Removed. Instead the task being reparented to init is referred directly to init's credentials. NOTE! This results in the loss of some state: SELinux's osid no longer records the sid of the thread that forked it. (*) security_key_alloc(), ->key_alloc() (*) security_key_permission(), ->key_permission() Changed. These now take cred pointers rather than task pointers to refer to the security context. (4) sys_capset(). This has been simplified and uses less locking. The LSM functions it calls have been merged. (5) reparent_to_kthreadd(). This gives the current thread the same credentials as init by simply using commit_thread() to point that way. (6) __sigqueue_alloc() and switch_uid() __sigqueue_alloc() can't stop the target task from changing its creds beneath it, so this function gets a reference to the currently applicable user_struct which it then passes into the sigqueue struct it returns if successful. switch_uid() is now called from commit_creds(), and possibly should be folded into that. commit_creds() should take care of protecting __sigqueue_alloc(). (7) [sg]et[ug]id() and co and [sg]et_current_groups. The set functions now all use prepare_creds(), commit_creds() and abort_creds() to build and check a new set of credentials before applying it. security_task_set[ug]id() is called inside the prepared section. This guarantees that nothing else will affect the creds until we've finished. The calling of set_dumpable() has been moved into commit_creds(). Much of the functionality of set_user() has been moved into commit_creds(). The get functions all simply access the data directly. (8) security_task_prctl() and cap_task_prctl(). security_task_prctl() has been modified to return -ENOSYS if it doesn't want to handle a function, or otherwise return the return value directly rather than through an argument. Additionally, cap_task_prctl() now prepares a new set of credentials, even if it doesn't end up using it. (9) Keyrings. A number of changes have been made to the keyrings code: (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have all been dropped and built in to the credentials functions directly. They may want separating out again later. (b) key_alloc() and search_process_keyrings() now take a cred pointer rather than a task pointer to specify the security context. (c) copy_creds() gives a new thread within the same thread group a new thread keyring if its parent had one, otherwise it discards the thread keyring. (d) The authorisation key now points directly to the credentials to extend the search into rather pointing to the task that carries them. (e) Installing thread, process or session keyrings causes a new set of credentials to be created, even though it's not strictly necessary for process or session keyrings (they're shared). (10) Usermode helper. The usermode helper code now carries a cred struct pointer in its subprocess_info struct instead of a new session keyring pointer. This set of credentials is derived from init_cred and installed on the new process after it has been cloned. call_usermodehelper_setup() allocates the new credentials and call_usermodehelper_freeinfo() discards them if they haven't been used. A special cred function (prepare_usermodeinfo_creds()) is provided specifically for call_usermodehelper_setup() to call. call_usermodehelper_setkeys() adjusts the credentials to sport the supplied keyring as the new session keyring. (11) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) selinux_setprocattr() no longer does its check for whether the current ptracer can access processes with the new SID inside the lock that covers getting the ptracer's SID. Whilst this lock ensures that the check is done with the ptracer pinned, the result is only valid until the lock is released, so there's no point doing it inside the lock. (12) is_single_threaded(). This function has been extracted from selinux_setprocattr() and put into a file of its own in the lib/ directory as join_session_keyring() now wants to use it too. The code in SELinux just checked to see whether a task shared mm_structs with other tasks (CLONE_VM), but that isn't good enough. We really want to know if they're part of the same thread group (CLONE_THREAD). (13) nfsd. The NFS server daemon now has to use the COW credentials to set the credentials it is going to use. It really needs to pass the credentials down to the functions it calls, but it can't do that until other patches in this series have been applied. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-13 23:39:23 +00:00
}
/**
* security_transfer_creds() - Transfer creds
* @new: target credentials
* @old: original credentials
*
* Transfer data from original creds to new creds.
*/
KEYS: Add a keyctl to install a process's session keyring on its parent [try #6] Add a keyctl to install a process's session keyring onto its parent. This replaces the parent's session keyring. Because the COW credential code does not permit one process to change another process's credentials directly, the change is deferred until userspace next starts executing again. Normally this will be after a wait*() syscall. To support this, three new security hooks have been provided: cred_alloc_blank() to allocate unset security creds, cred_transfer() to fill in the blank security creds and key_session_to_parent() - which asks the LSM if the process may replace its parent's session keyring. The replacement may only happen if the process has the same ownership details as its parent, and the process has LINK permission on the session keyring, and the session keyring is owned by the process, and the LSM permits it. Note that this requires alteration to each architecture's notify_resume path. This has been done for all arches barring blackfin, m68k* and xtensa, all of which need assembly alteration to support TIF_NOTIFY_RESUME. This allows the replacement to be performed at the point the parent process resumes userspace execution. This allows the userspace AFS pioctl emulation to fully emulate newpag() and the VIOCSETTOK and VIOCSETTOK2 pioctls, all of which require the ability to alter the parent process's PAG membership. However, since kAFS doesn't use PAGs per se, but rather dumps the keys into the session keyring, the session keyring of the parent must be replaced if, for example, VIOCSETTOK is passed the newpag flag. This can be tested with the following program: #include <stdio.h> #include <stdlib.h> #include <keyutils.h> #define KEYCTL_SESSION_TO_PARENT 18 #define OSERROR(X, S) do { if ((long)(X) == -1) { perror(S); exit(1); } } while(0) int main(int argc, char **argv) { key_serial_t keyring, key; long ret; keyring = keyctl_join_session_keyring(argv[1]); OSERROR(keyring, "keyctl_join_session_keyring"); key = add_key("user", "a", "b", 1, keyring); OSERROR(key, "add_key"); ret = keyctl(KEYCTL_SESSION_TO_PARENT); OSERROR(ret, "KEYCTL_SESSION_TO_PARENT"); return 0; } Compiled and linked with -lkeyutils, you should see something like: [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: _ses 355907932 --alswrv 4043 -1 \_ keyring: _uid.4043 [dhowells@andromeda ~]$ /tmp/newpag [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: _ses 1055658746 --alswrv 4043 4043 \_ user: a [dhowells@andromeda ~]$ /tmp/newpag hello [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: hello 340417692 --alswrv 4043 4043 \_ user: a Where the test program creates a new session keyring, sticks a user key named 'a' into it and then installs it on its parent. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-02 08:14:21 +00:00
void security_transfer_creds(struct cred *new, const struct cred *old)
{
call_void_hook(cred_transfer, new, old);
KEYS: Add a keyctl to install a process's session keyring on its parent [try #6] Add a keyctl to install a process's session keyring onto its parent. This replaces the parent's session keyring. Because the COW credential code does not permit one process to change another process's credentials directly, the change is deferred until userspace next starts executing again. Normally this will be after a wait*() syscall. To support this, three new security hooks have been provided: cred_alloc_blank() to allocate unset security creds, cred_transfer() to fill in the blank security creds and key_session_to_parent() - which asks the LSM if the process may replace its parent's session keyring. The replacement may only happen if the process has the same ownership details as its parent, and the process has LINK permission on the session keyring, and the session keyring is owned by the process, and the LSM permits it. Note that this requires alteration to each architecture's notify_resume path. This has been done for all arches barring blackfin, m68k* and xtensa, all of which need assembly alteration to support TIF_NOTIFY_RESUME. This allows the replacement to be performed at the point the parent process resumes userspace execution. This allows the userspace AFS pioctl emulation to fully emulate newpag() and the VIOCSETTOK and VIOCSETTOK2 pioctls, all of which require the ability to alter the parent process's PAG membership. However, since kAFS doesn't use PAGs per se, but rather dumps the keys into the session keyring, the session keyring of the parent must be replaced if, for example, VIOCSETTOK is passed the newpag flag. This can be tested with the following program: #include <stdio.h> #include <stdlib.h> #include <keyutils.h> #define KEYCTL_SESSION_TO_PARENT 18 #define OSERROR(X, S) do { if ((long)(X) == -1) { perror(S); exit(1); } } while(0) int main(int argc, char **argv) { key_serial_t keyring, key; long ret; keyring = keyctl_join_session_keyring(argv[1]); OSERROR(keyring, "keyctl_join_session_keyring"); key = add_key("user", "a", "b", 1, keyring); OSERROR(key, "add_key"); ret = keyctl(KEYCTL_SESSION_TO_PARENT); OSERROR(ret, "KEYCTL_SESSION_TO_PARENT"); return 0; } Compiled and linked with -lkeyutils, you should see something like: [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: _ses 355907932 --alswrv 4043 -1 \_ keyring: _uid.4043 [dhowells@andromeda ~]$ /tmp/newpag [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: _ses 1055658746 --alswrv 4043 4043 \_ user: a [dhowells@andromeda ~]$ /tmp/newpag hello [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: hello 340417692 --alswrv 4043 4043 \_ user: a Where the test program creates a new session keyring, sticks a user key named 'a' into it and then installs it on its parent. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-02 08:14:21 +00:00
}
/**
* security_cred_getsecid() - Get the secid from a set of credentials
* @c: credentials
* @secid: secid value
*
* Retrieve the security identifier of the cred structure @c. In case of
* failure, @secid will be set to zero.
*/
void security_cred_getsecid(const struct cred *c, u32 *secid)
{
*secid = 0;
call_void_hook(cred_getsecid, c, secid);
}
EXPORT_SYMBOL(security_cred_getsecid);
/**
* security_kernel_act_as() - Set the kernel credentials to act as secid
* @new: credentials
* @secid: secid
*
* Set the credentials for a kernel service to act as (subjective context).
* The current task must be the one that nominated @secid.
*
* Return: Returns 0 if successful.
*/
CRED: Allow kernel services to override LSM settings for task actions Allow kernel services to override LSM settings appropriate to the actions performed by a task by duplicating a set of credentials, modifying it and then using task_struct::cred to point to it when performing operations on behalf of a task. This is used, for example, by CacheFiles which has to transparently access the cache on behalf of a process that thinks it is doing, say, NFS accesses with a potentially inappropriate (with respect to accessing the cache) set of credentials. This patch provides two LSM hooks for modifying a task security record: (*) security_kernel_act_as() which allows modification of the security datum with which a task acts on other objects (most notably files). (*) security_kernel_create_files_as() which allows modification of the security datum that is used to initialise the security data on a file that a task creates. The patch also provides four new credentials handling functions, which wrap the LSM functions: (1) prepare_kernel_cred() Prepare a set of credentials for a kernel service to use, based either on a daemon's credentials or on init_cred. All the keyrings are cleared. (2) set_security_override() Set the LSM security ID in a set of credentials to a specific security context, assuming permission from the LSM policy. (3) set_security_override_from_ctx() As (2), but takes the security context as a string. (4) set_create_files_as() Set the file creation LSM security ID in a set of credentials to be the same as that on a particular inode. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [Smack changes] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-13 23:39:28 +00:00
int security_kernel_act_as(struct cred *new, u32 secid)
{
return call_int_hook(kernel_act_as, 0, new, secid);
CRED: Allow kernel services to override LSM settings for task actions Allow kernel services to override LSM settings appropriate to the actions performed by a task by duplicating a set of credentials, modifying it and then using task_struct::cred to point to it when performing operations on behalf of a task. This is used, for example, by CacheFiles which has to transparently access the cache on behalf of a process that thinks it is doing, say, NFS accesses with a potentially inappropriate (with respect to accessing the cache) set of credentials. This patch provides two LSM hooks for modifying a task security record: (*) security_kernel_act_as() which allows modification of the security datum with which a task acts on other objects (most notably files). (*) security_kernel_create_files_as() which allows modification of the security datum that is used to initialise the security data on a file that a task creates. The patch also provides four new credentials handling functions, which wrap the LSM functions: (1) prepare_kernel_cred() Prepare a set of credentials for a kernel service to use, based either on a daemon's credentials or on init_cred. All the keyrings are cleared. (2) set_security_override() Set the LSM security ID in a set of credentials to a specific security context, assuming permission from the LSM policy. (3) set_security_override_from_ctx() As (2), but takes the security context as a string. (4) set_create_files_as() Set the file creation LSM security ID in a set of credentials to be the same as that on a particular inode. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [Smack changes] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-13 23:39:28 +00:00
}
/**
* security_kernel_create_files_as() - Set file creation context using an inode
* @new: target credentials
* @inode: reference inode
*
* Set the file creation context in a set of credentials to be the same as the
* objective context of the specified inode. The current task must be the one
* that nominated @inode.
*
* Return: Returns 0 if successful.
*/
CRED: Allow kernel services to override LSM settings for task actions Allow kernel services to override LSM settings appropriate to the actions performed by a task by duplicating a set of credentials, modifying it and then using task_struct::cred to point to it when performing operations on behalf of a task. This is used, for example, by CacheFiles which has to transparently access the cache on behalf of a process that thinks it is doing, say, NFS accesses with a potentially inappropriate (with respect to accessing the cache) set of credentials. This patch provides two LSM hooks for modifying a task security record: (*) security_kernel_act_as() which allows modification of the security datum with which a task acts on other objects (most notably files). (*) security_kernel_create_files_as() which allows modification of the security datum that is used to initialise the security data on a file that a task creates. The patch also provides four new credentials handling functions, which wrap the LSM functions: (1) prepare_kernel_cred() Prepare a set of credentials for a kernel service to use, based either on a daemon's credentials or on init_cred. All the keyrings are cleared. (2) set_security_override() Set the LSM security ID in a set of credentials to a specific security context, assuming permission from the LSM policy. (3) set_security_override_from_ctx() As (2), but takes the security context as a string. (4) set_create_files_as() Set the file creation LSM security ID in a set of credentials to be the same as that on a particular inode. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [Smack changes] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-13 23:39:28 +00:00
int security_kernel_create_files_as(struct cred *new, struct inode *inode)
{
return call_int_hook(kernel_create_files_as, 0, new, inode);
CRED: Allow kernel services to override LSM settings for task actions Allow kernel services to override LSM settings appropriate to the actions performed by a task by duplicating a set of credentials, modifying it and then using task_struct::cred to point to it when performing operations on behalf of a task. This is used, for example, by CacheFiles which has to transparently access the cache on behalf of a process that thinks it is doing, say, NFS accesses with a potentially inappropriate (with respect to accessing the cache) set of credentials. This patch provides two LSM hooks for modifying a task security record: (*) security_kernel_act_as() which allows modification of the security datum with which a task acts on other objects (most notably files). (*) security_kernel_create_files_as() which allows modification of the security datum that is used to initialise the security data on a file that a task creates. The patch also provides four new credentials handling functions, which wrap the LSM functions: (1) prepare_kernel_cred() Prepare a set of credentials for a kernel service to use, based either on a daemon's credentials or on init_cred. All the keyrings are cleared. (2) set_security_override() Set the LSM security ID in a set of credentials to a specific security context, assuming permission from the LSM policy. (3) set_security_override_from_ctx() As (2), but takes the security context as a string. (4) set_create_files_as() Set the file creation LSM security ID in a set of credentials to be the same as that on a particular inode. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [Smack changes] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-13 23:39:28 +00:00
}
/**
* security_kernel_module_request() - Check is loading a module is allowed
* @kmod_name: module name
*
* Ability to trigger the kernel to automatically upcall to userspace for
* userspace to load a kernel module with the given name.
*
* Return: Returns 0 if successful.
*/
int security_kernel_module_request(char *kmod_name)
{
integrity: prevent deadlock during digsig verification. This patch aimed to prevent deadlock during digsig verification.The point of issue - user space utility modprobe and/or it's dependencies (ld-*.so, libz.so.*, libc-*.so and /lib/modules/ files) that could be used for kernel modules load during digsig verification and could be signed by digsig in the same time. First at all, look at crypto_alloc_tfm() work algorithm: crypto_alloc_tfm() will first attempt to locate an already loaded algorithm. If that fails and the kernel supports dynamically loadable modules, it will then attempt to load a module of the same name or alias. If that fails it will send a query to any loaded crypto manager to construct an algorithm on the fly. We have situation, when public_key_verify_signature() in case of RSA algorithm use alg_name to store internal information in order to construct an algorithm on the fly, but crypto_larval_lookup() will try to use alg_name in order to load kernel module with same name. 1) we can't do anything with crypto module work, since it designed to work exactly in this way; 2) we can't globally filter module requests for modprobe, since it designed to work with any requests. In this patch, I propose add an exception for "crypto-pkcs1pad(rsa,*)" module requests only in case of enabled integrity asymmetric keys support. Since we don't have any real "crypto-pkcs1pad(rsa,*)" kernel modules for sure, we are safe to fail such module request from crypto_larval_lookup(). In this way we prevent modprobe execution during digsig verification and avoid possible deadlock if modprobe and/or it's dependencies also signed with digsig. Requested "crypto-pkcs1pad(rsa,*)" kernel module name formed by: 1) "pkcs1pad(rsa,%s)" in public_key_verify_signature(); 2) "crypto-%s" / "crypto-%s-all" in crypto_larval_lookup(). "crypto-pkcs1pad(rsa," part of request is a constant and unique and could be used as filter. Signed-off-by: Mikhail Kurinnoi <viewizard@viewizard.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> include/linux/integrity.h | 13 +++++++++++++ security/integrity/digsig_asymmetric.c | 23 +++++++++++++++++++++++ security/security.c | 7 ++++++- 3 files changed, 42 insertions(+), 1 deletion(-)
2018-06-27 13:33:42 +00:00
int ret;
ret = call_int_hook(kernel_module_request, 0, kmod_name);
if (ret)
return ret;
return integrity_kernel_module_request(kmod_name);
}
/**
* security_kernel_read_file() - Read a file specified by userspace
* @file: file
* @id: file identifier
* @contents: trust if security_kernel_post_read_file() will be called
*
* Read a file specified by userspace.
*
* Return: Returns 0 if permission is granted.
*/
int security_kernel_read_file(struct file *file, enum kernel_read_file_id id,
bool contents)
{
int ret;
ret = call_int_hook(kernel_read_file, 0, file, id, contents);
if (ret)
return ret;
return ima_read_file(file, id, contents);
}
EXPORT_SYMBOL_GPL(security_kernel_read_file);
/**
* security_kernel_post_read_file() - Read a file specified by userspace
* @file: file
* @buf: file contents
* @size: size of file contents
* @id: file identifier
*
* Read a file specified by userspace. This must be paired with a prior call
* to security_kernel_read_file() call that indicated this hook would also be
* called, see security_kernel_read_file() for more information.
*
* Return: Returns 0 if permission is granted.
*/
int security_kernel_post_read_file(struct file *file, char *buf, loff_t size,
enum kernel_read_file_id id)
{
int ret;
ret = call_int_hook(kernel_post_read_file, 0, file, buf, size, id);
if (ret)
return ret;
return ima_post_read_file(file, buf, size, id);
}
EXPORT_SYMBOL_GPL(security_kernel_post_read_file);
/**
* security_kernel_load_data() - Load data provided by userspace
* @id: data identifier
* @contents: true if security_kernel_post_load_data() will be called
*
* Load data provided by userspace.
*
* Return: Returns 0 if permission is granted.
*/
LSM: Introduce kernel_post_load_data() hook There are a few places in the kernel where LSMs would like to have visibility into the contents of a kernel buffer that has been loaded or read. While security_kernel_post_read_file() (which includes the buffer) exists as a pairing for security_kernel_read_file(), no such hook exists to pair with security_kernel_load_data(). Earlier proposals for just using security_kernel_post_read_file() with a NULL file argument were rejected (i.e. "file" should always be valid for the security_..._file hooks, but it appears at least one case was left in the kernel during earlier refactoring. (This will be fixed in a subsequent patch.) Since not all cases of security_kernel_load_data() can have a single contiguous buffer made available to the LSM hook (e.g. kexec image segments are separately loaded), there needs to be a way for the LSM to reason about its expectations of the hook coverage. In order to handle this, add a "contents" argument to the "kernel_load_data" hook that indicates if the newly added "kernel_post_load_data" hook will be called with the full contents once loaded. That way, LSMs requiring full contents can choose to unilaterally reject "kernel_load_data" with contents=false (which is effectively the existing hook coverage), but when contents=true they can allow it and later evaluate the "kernel_post_load_data" hook once the buffer is loaded. With this change, LSMs can gain coverage over non-file-backed data loads (e.g. init_module(2) and firmware userspace helper), which will happen in subsequent patches. Additionally prepare IMA to start processing these cases. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/r/20201002173828.2099543-9-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-02 17:38:20 +00:00
int security_kernel_load_data(enum kernel_load_data_id id, bool contents)
{
int ret;
LSM: Introduce kernel_post_load_data() hook There are a few places in the kernel where LSMs would like to have visibility into the contents of a kernel buffer that has been loaded or read. While security_kernel_post_read_file() (which includes the buffer) exists as a pairing for security_kernel_read_file(), no such hook exists to pair with security_kernel_load_data(). Earlier proposals for just using security_kernel_post_read_file() with a NULL file argument were rejected (i.e. "file" should always be valid for the security_..._file hooks, but it appears at least one case was left in the kernel during earlier refactoring. (This will be fixed in a subsequent patch.) Since not all cases of security_kernel_load_data() can have a single contiguous buffer made available to the LSM hook (e.g. kexec image segments are separately loaded), there needs to be a way for the LSM to reason about its expectations of the hook coverage. In order to handle this, add a "contents" argument to the "kernel_load_data" hook that indicates if the newly added "kernel_post_load_data" hook will be called with the full contents once loaded. That way, LSMs requiring full contents can choose to unilaterally reject "kernel_load_data" with contents=false (which is effectively the existing hook coverage), but when contents=true they can allow it and later evaluate the "kernel_post_load_data" hook once the buffer is loaded. With this change, LSMs can gain coverage over non-file-backed data loads (e.g. init_module(2) and firmware userspace helper), which will happen in subsequent patches. Additionally prepare IMA to start processing these cases. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/r/20201002173828.2099543-9-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-02 17:38:20 +00:00
ret = call_int_hook(kernel_load_data, 0, id, contents);
if (ret)
return ret;
LSM: Introduce kernel_post_load_data() hook There are a few places in the kernel where LSMs would like to have visibility into the contents of a kernel buffer that has been loaded or read. While security_kernel_post_read_file() (which includes the buffer) exists as a pairing for security_kernel_read_file(), no such hook exists to pair with security_kernel_load_data(). Earlier proposals for just using security_kernel_post_read_file() with a NULL file argument were rejected (i.e. "file" should always be valid for the security_..._file hooks, but it appears at least one case was left in the kernel during earlier refactoring. (This will be fixed in a subsequent patch.) Since not all cases of security_kernel_load_data() can have a single contiguous buffer made available to the LSM hook (e.g. kexec image segments are separately loaded), there needs to be a way for the LSM to reason about its expectations of the hook coverage. In order to handle this, add a "contents" argument to the "kernel_load_data" hook that indicates if the newly added "kernel_post_load_data" hook will be called with the full contents once loaded. That way, LSMs requiring full contents can choose to unilaterally reject "kernel_load_data" with contents=false (which is effectively the existing hook coverage), but when contents=true they can allow it and later evaluate the "kernel_post_load_data" hook once the buffer is loaded. With this change, LSMs can gain coverage over non-file-backed data loads (e.g. init_module(2) and firmware userspace helper), which will happen in subsequent patches. Additionally prepare IMA to start processing these cases. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/r/20201002173828.2099543-9-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-02 17:38:20 +00:00
return ima_load_data(id, contents);
}
EXPORT_SYMBOL_GPL(security_kernel_load_data);
/**
* security_kernel_post_load_data() - Load userspace data from a non-file source
* @buf: data
* @size: size of data
* @id: data identifier
* @description: text description of data, specific to the id value
*
* Load data provided by a non-file source (usually userspace buffer). This
* must be paired with a prior security_kernel_load_data() call that indicated
* this hook would also be called, see security_kernel_load_data() for more
* information.
*
* Return: Returns 0 if permission is granted.
*/
LSM: Introduce kernel_post_load_data() hook There are a few places in the kernel where LSMs would like to have visibility into the contents of a kernel buffer that has been loaded or read. While security_kernel_post_read_file() (which includes the buffer) exists as a pairing for security_kernel_read_file(), no such hook exists to pair with security_kernel_load_data(). Earlier proposals for just using security_kernel_post_read_file() with a NULL file argument were rejected (i.e. "file" should always be valid for the security_..._file hooks, but it appears at least one case was left in the kernel during earlier refactoring. (This will be fixed in a subsequent patch.) Since not all cases of security_kernel_load_data() can have a single contiguous buffer made available to the LSM hook (e.g. kexec image segments are separately loaded), there needs to be a way for the LSM to reason about its expectations of the hook coverage. In order to handle this, add a "contents" argument to the "kernel_load_data" hook that indicates if the newly added "kernel_post_load_data" hook will be called with the full contents once loaded. That way, LSMs requiring full contents can choose to unilaterally reject "kernel_load_data" with contents=false (which is effectively the existing hook coverage), but when contents=true they can allow it and later evaluate the "kernel_post_load_data" hook once the buffer is loaded. With this change, LSMs can gain coverage over non-file-backed data loads (e.g. init_module(2) and firmware userspace helper), which will happen in subsequent patches. Additionally prepare IMA to start processing these cases. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/r/20201002173828.2099543-9-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-02 17:38:20 +00:00
int security_kernel_post_load_data(char *buf, loff_t size,
enum kernel_load_data_id id,
char *description)
{
int ret;
ret = call_int_hook(kernel_post_load_data, 0, buf, size, id,
description);
if (ret)
return ret;
return ima_post_load_data(buf, size, id, description);
}
EXPORT_SYMBOL_GPL(security_kernel_post_load_data);
/**
* security_task_fix_setuid() - Update LSM with new user id attributes
* @new: updated credentials
* @old: credentials being replaced
* @flags: LSM_SETID_* flag values
*
* Update the module's state after setting one or more of the user identity
* attributes of the current process. The @flags parameter indicates which of
* the set*uid system calls invoked this hook. If @new is the set of
* credentials that will be installed. Modifications should be made to this
* rather than to @current->cred.
*
* Return: Returns 0 on success.
*/
CRED: Inaugurate COW credentials Inaugurate copy-on-write credentials management. This uses RCU to manage the credentials pointer in the task_struct with respect to accesses by other tasks. A process may only modify its own credentials, and so does not need locking to access or modify its own credentials. A mutex (cred_replace_mutex) is added to the task_struct to control the effect of PTRACE_ATTACHED on credential calculations, particularly with respect to execve(). With this patch, the contents of an active credentials struct may not be changed directly; rather a new set of credentials must be prepared, modified and committed using something like the following sequence of events: struct cred *new = prepare_creds(); int ret = blah(new); if (ret < 0) { abort_creds(new); return ret; } return commit_creds(new); There are some exceptions to this rule: the keyrings pointed to by the active credentials may be instantiated - keyrings violate the COW rule as managing COW keyrings is tricky, given that it is possible for a task to directly alter the keys in a keyring in use by another task. To help enforce this, various pointers to sets of credentials, such as those in the task_struct, are declared const. The purpose of this is compile-time discouragement of altering credentials through those pointers. Once a set of credentials has been made public through one of these pointers, it may not be modified, except under special circumstances: (1) Its reference count may incremented and decremented. (2) The keyrings to which it points may be modified, but not replaced. The only safe way to modify anything else is to create a replacement and commit using the functions described in Documentation/credentials.txt (which will be added by a later patch). This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). This now prepares and commits credentials in various places in the security code rather than altering the current creds directly. (2) Temporary credential overrides. do_coredump() and sys_faccessat() now prepare their own credentials and temporarily override the ones currently on the acting thread, whilst preventing interference from other threads by holding cred_replace_mutex on the thread being dumped. This will be replaced in a future patch by something that hands down the credentials directly to the functions being called, rather than altering the task's objective credentials. (3) LSM interface. A number of functions have been changed, added or removed: (*) security_capset_check(), ->capset_check() (*) security_capset_set(), ->capset_set() Removed in favour of security_capset(). (*) security_capset(), ->capset() New. This is passed a pointer to the new creds, a pointer to the old creds and the proposed capability sets. It should fill in the new creds or return an error. All pointers, barring the pointer to the new creds, are now const. (*) security_bprm_apply_creds(), ->bprm_apply_creds() Changed; now returns a value, which will cause the process to be killed if it's an error. (*) security_task_alloc(), ->task_alloc_security() Removed in favour of security_prepare_creds(). (*) security_cred_free(), ->cred_free() New. Free security data attached to cred->security. (*) security_prepare_creds(), ->cred_prepare() New. Duplicate any security data attached to cred->security. (*) security_commit_creds(), ->cred_commit() New. Apply any security effects for the upcoming installation of new security by commit_creds(). (*) security_task_post_setuid(), ->task_post_setuid() Removed in favour of security_task_fix_setuid(). (*) security_task_fix_setuid(), ->task_fix_setuid() Fix up the proposed new credentials for setuid(). This is used by cap_set_fix_setuid() to implicitly adjust capabilities in line with setuid() changes. Changes are made to the new credentials, rather than the task itself as in security_task_post_setuid(). (*) security_task_reparent_to_init(), ->task_reparent_to_init() Removed. Instead the task being reparented to init is referred directly to init's credentials. NOTE! This results in the loss of some state: SELinux's osid no longer records the sid of the thread that forked it. (*) security_key_alloc(), ->key_alloc() (*) security_key_permission(), ->key_permission() Changed. These now take cred pointers rather than task pointers to refer to the security context. (4) sys_capset(). This has been simplified and uses less locking. The LSM functions it calls have been merged. (5) reparent_to_kthreadd(). This gives the current thread the same credentials as init by simply using commit_thread() to point that way. (6) __sigqueue_alloc() and switch_uid() __sigqueue_alloc() can't stop the target task from changing its creds beneath it, so this function gets a reference to the currently applicable user_struct which it then passes into the sigqueue struct it returns if successful. switch_uid() is now called from commit_creds(), and possibly should be folded into that. commit_creds() should take care of protecting __sigqueue_alloc(). (7) [sg]et[ug]id() and co and [sg]et_current_groups. The set functions now all use prepare_creds(), commit_creds() and abort_creds() to build and check a new set of credentials before applying it. security_task_set[ug]id() is called inside the prepared section. This guarantees that nothing else will affect the creds until we've finished. The calling of set_dumpable() has been moved into commit_creds(). Much of the functionality of set_user() has been moved into commit_creds(). The get functions all simply access the data directly. (8) security_task_prctl() and cap_task_prctl(). security_task_prctl() has been modified to return -ENOSYS if it doesn't want to handle a function, or otherwise return the return value directly rather than through an argument. Additionally, cap_task_prctl() now prepares a new set of credentials, even if it doesn't end up using it. (9) Keyrings. A number of changes have been made to the keyrings code: (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have all been dropped and built in to the credentials functions directly. They may want separating out again later. (b) key_alloc() and search_process_keyrings() now take a cred pointer rather than a task pointer to specify the security context. (c) copy_creds() gives a new thread within the same thread group a new thread keyring if its parent had one, otherwise it discards the thread keyring. (d) The authorisation key now points directly to the credentials to extend the search into rather pointing to the task that carries them. (e) Installing thread, process or session keyrings causes a new set of credentials to be created, even though it's not strictly necessary for process or session keyrings (they're shared). (10) Usermode helper. The usermode helper code now carries a cred struct pointer in its subprocess_info struct instead of a new session keyring pointer. This set of credentials is derived from init_cred and installed on the new process after it has been cloned. call_usermodehelper_setup() allocates the new credentials and call_usermodehelper_freeinfo() discards them if they haven't been used. A special cred function (prepare_usermodeinfo_creds()) is provided specifically for call_usermodehelper_setup() to call. call_usermodehelper_setkeys() adjusts the credentials to sport the supplied keyring as the new session keyring. (11) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) selinux_setprocattr() no longer does its check for whether the current ptracer can access processes with the new SID inside the lock that covers getting the ptracer's SID. Whilst this lock ensures that the check is done with the ptracer pinned, the result is only valid until the lock is released, so there's no point doing it inside the lock. (12) is_single_threaded(). This function has been extracted from selinux_setprocattr() and put into a file of its own in the lib/ directory as join_session_keyring() now wants to use it too. The code in SELinux just checked to see whether a task shared mm_structs with other tasks (CLONE_VM), but that isn't good enough. We really want to know if they're part of the same thread group (CLONE_THREAD). (13) nfsd. The NFS server daemon now has to use the COW credentials to set the credentials it is going to use. It really needs to pass the credentials down to the functions it calls, but it can't do that until other patches in this series have been applied. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-13 23:39:23 +00:00
int security_task_fix_setuid(struct cred *new, const struct cred *old,
int flags)
{
return call_int_hook(task_fix_setuid, 0, new, old, flags);
}
/**
* security_task_fix_setgid() - Update LSM with new group id attributes
* @new: updated credentials
* @old: credentials being replaced
* @flags: LSM_SETID_* flag value
*
* Update the module's state after setting one or more of the group identity
* attributes of the current process. The @flags parameter indicates which of
* the set*gid system calls invoked this hook. @new is the set of credentials
* that will be installed. Modifications should be made to this rather than to
* @current->cred.
*
* Return: Returns 0 on success.
*/
int security_task_fix_setgid(struct cred *new, const struct cred *old,
int flags)
{
return call_int_hook(task_fix_setgid, 0, new, old, flags);
}
/**
* security_task_fix_setgroups() - Update LSM with new supplementary groups
* @new: updated credentials
* @old: credentials being replaced
*
* Update the module's state after setting the supplementary group identity
* attributes of the current process. @new is the set of credentials that will
* be installed. Modifications should be made to this rather than to
* @current->cred.
*
* Return: Returns 0 on success.
*/
int security_task_fix_setgroups(struct cred *new, const struct cred *old)
{
return call_int_hook(task_fix_setgroups, 0, new, old);
}
/**
* security_task_setpgid() - Check if setting the pgid is allowed
* @p: task being modified
* @pgid: new pgid
*
* Check permission before setting the process group identifier of the process
* @p to @pgid.
*
* Return: Returns 0 if permission is granted.
*/
int security_task_setpgid(struct task_struct *p, pid_t pgid)
{
return call_int_hook(task_setpgid, 0, p, pgid);
}
/**
* security_task_getpgid() - Check if getting the pgid is allowed
* @p: task
*
* Check permission before getting the process group identifier of the process
* @p.
*
* Return: Returns 0 if permission is granted.
*/
int security_task_getpgid(struct task_struct *p)
{
return call_int_hook(task_getpgid, 0, p);
}
/**
* security_task_getsid() - Check if getting the session id is allowed
* @p: task
*
* Check permission before getting the session identifier of the process @p.
*
* Return: Returns 0 if permission is granted.
*/
int security_task_getsid(struct task_struct *p)
{
return call_int_hook(task_getsid, 0, p);
}
/**
* security_current_getsecid_subj() - Get the current task's subjective secid
* @secid: secid value
*
* Retrieve the subjective security identifier of the current task and return
* it in @secid. In case of failure, @secid will be set to zero.
*/
void security_current_getsecid_subj(u32 *secid)
{
*secid = 0;
call_void_hook(current_getsecid_subj, secid);
}
EXPORT_SYMBOL(security_current_getsecid_subj);
/**
* security_task_getsecid_obj() - Get a task's objective secid
* @p: target task
* @secid: secid value
*
* Retrieve the objective security identifier of the task_struct in @p and
* return it in @secid. In case of failure, @secid will be set to zero.
*/
void security_task_getsecid_obj(struct task_struct *p, u32 *secid)
{
*secid = 0;
call_void_hook(task_getsecid_obj, p, secid);
}
EXPORT_SYMBOL(security_task_getsecid_obj);
/**
* security_task_setnice() - Check if setting a task's nice value is allowed
* @p: target task
* @nice: nice value
*
* Check permission before setting the nice value of @p to @nice.
*
* Return: Returns 0 if permission is granted.
*/
int security_task_setnice(struct task_struct *p, int nice)
{
return call_int_hook(task_setnice, 0, p, nice);
}
/**
* security_task_setioprio() - Check if setting a task's ioprio is allowed
* @p: target task
* @ioprio: ioprio value
*
* Check permission before setting the ioprio value of @p to @ioprio.
*
* Return: Returns 0 if permission is granted.
*/
int security_task_setioprio(struct task_struct *p, int ioprio)
{
return call_int_hook(task_setioprio, 0, p, ioprio);
}
/**
* security_task_getioprio() - Check if getting a task's ioprio is allowed
* @p: task
*
* Check permission before getting the ioprio value of @p.
*
* Return: Returns 0 if permission is granted.
*/
int security_task_getioprio(struct task_struct *p)
{
return call_int_hook(task_getioprio, 0, p);
}
/**
* security_task_prlimit() - Check if get/setting resources limits is allowed
* @cred: current task credentials
* @tcred: target task credentials
* @flags: LSM_PRLIMIT_* flag bits indicating a get/set/both
*
* Check permission before getting and/or setting the resource limits of
* another task.
*
* Return: Returns 0 if permission is granted.
*/
prlimit,security,selinux: add a security hook for prlimit When SELinux was first added to the kernel, a process could only get and set its own resource limits via getrlimit(2) and setrlimit(2), so no MAC checks were required for those operations, and thus no security hooks were defined for them. Later, SELinux introduced a hook for setlimit(2) with a check if the hard limit was being changed in order to be able to rely on the hard limit value as a safe reset point upon context transitions. Later on, when prlimit(2) was added to the kernel with the ability to get or set resource limits (hard or soft) of another process, LSM/SELinux was not updated other than to pass the target process to the setrlimit hook. This resulted in incomplete control over both getting and setting the resource limits of another process. Add a new security_task_prlimit() hook to the check_prlimit_permission() function to provide complete mediation. The hook is only called when acting on another task, and only if the existing DAC/capability checks would allow access. Pass flags down to the hook to indicate whether the prlimit(2) call will read, write, or both read and write the resource limits of the target process. The existing security_task_setrlimit() hook is left alone; it continues to serve a purpose in supporting the ability to make decisions based on the old and/or new resource limit values when setting limits. This is consistent with the DAC/capability logic, where check_prlimit_permission() performs generic DAC/capability checks for acting on another task, while do_prlimit() performs a capability check based on a comparison of the old and new resource limits. Fix the inline documentation for the hook to match the code. Implement the new hook for SELinux. For setting resource limits, we reuse the existing setrlimit permission. Note that this does overload the setrlimit permission to mean the ability to set the resource limit (soft or hard) of another process or the ability to change one's own hard limit. For getting resource limits, a new getrlimit permission is defined. This was not originally defined since getrlimit(2) could only be used to obtain a process' own limits. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-02-17 12:57:00 +00:00
int security_task_prlimit(const struct cred *cred, const struct cred *tcred,
unsigned int flags)
{
return call_int_hook(task_prlimit, 0, cred, tcred, flags);
}
/**
* security_task_setrlimit() - Check if setting a new rlimit value is allowed
* @p: target task's group leader
* @resource: resource whose limit is being set
* @new_rlim: new resource limit
*
* Check permission before setting the resource limits of process @p for
* @resource to @new_rlim. The old resource limit values can be examined by
* dereferencing (p->signal->rlim + resource).
*
* Return: Returns 0 if permission is granted.
*/
int security_task_setrlimit(struct task_struct *p, unsigned int resource,
struct rlimit *new_rlim)
{
return call_int_hook(task_setrlimit, 0, p, resource, new_rlim);
}
/**
* security_task_setscheduler() - Check if setting sched policy/param is allowed
* @p: target task
*
* Check permission before setting scheduling policy and/or parameters of
* process @p.
*
* Return: Returns 0 if permission is granted.
*/
int security_task_setscheduler(struct task_struct *p)
{
return call_int_hook(task_setscheduler, 0, p);
}
/**
* security_task_getscheduler() - Check if getting scheduling info is allowed
* @p: target task
*
* Check permission before obtaining scheduling information for process @p.
*
* Return: Returns 0 if permission is granted.
*/
int security_task_getscheduler(struct task_struct *p)
{
return call_int_hook(task_getscheduler, 0, p);
}
/**
* security_task_movememory() - Check if moving memory is allowed
* @p: task
*
* Check permission before moving memory owned by process @p.
*
* Return: Returns 0 if permission is granted.
*/
int security_task_movememory(struct task_struct *p)
{
return call_int_hook(task_movememory, 0, p);
}
/**
* security_task_kill() - Check if sending a signal is allowed
* @p: target process
* @info: signal information
* @sig: signal value
* @cred: credentials of the signal sender, NULL if @current
*
* Check permission before sending signal @sig to @p. @info can be NULL, the
* constant 1, or a pointer to a kernel_siginfo structure. If @info is 1 or
* SI_FROMKERNEL(info) is true, then the signal should be viewed as coming from
* the kernel and should typically be permitted. SIGIO signals are handled
* separately by the send_sigiotask hook in file_security_ops.
*
* Return: Returns 0 if permission is granted.
*/
int security_task_kill(struct task_struct *p, struct kernel_siginfo *info,
int sig, const struct cred *cred)
{
return call_int_hook(task_kill, 0, p, info, sig, cred);
}
/**
* security_task_prctl() - Check if a prctl op is allowed
* @option: operation
* @arg2: argument
* @arg3: argument
* @arg4: argument
* @arg5: argument
*
* Check permission before performing a process control operation on the
* current process.
*
* Return: Return -ENOSYS if no-one wanted to handle this op, any other value
* to cause prctl() to return immediately with that value.
*/
int security_task_prctl(int option, unsigned long arg2, unsigned long arg3,
CRED: Inaugurate COW credentials Inaugurate copy-on-write credentials management. This uses RCU to manage the credentials pointer in the task_struct with respect to accesses by other tasks. A process may only modify its own credentials, and so does not need locking to access or modify its own credentials. A mutex (cred_replace_mutex) is added to the task_struct to control the effect of PTRACE_ATTACHED on credential calculations, particularly with respect to execve(). With this patch, the contents of an active credentials struct may not be changed directly; rather a new set of credentials must be prepared, modified and committed using something like the following sequence of events: struct cred *new = prepare_creds(); int ret = blah(new); if (ret < 0) { abort_creds(new); return ret; } return commit_creds(new); There are some exceptions to this rule: the keyrings pointed to by the active credentials may be instantiated - keyrings violate the COW rule as managing COW keyrings is tricky, given that it is possible for a task to directly alter the keys in a keyring in use by another task. To help enforce this, various pointers to sets of credentials, such as those in the task_struct, are declared const. The purpose of this is compile-time discouragement of altering credentials through those pointers. Once a set of credentials has been made public through one of these pointers, it may not be modified, except under special circumstances: (1) Its reference count may incremented and decremented. (2) The keyrings to which it points may be modified, but not replaced. The only safe way to modify anything else is to create a replacement and commit using the functions described in Documentation/credentials.txt (which will be added by a later patch). This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). This now prepares and commits credentials in various places in the security code rather than altering the current creds directly. (2) Temporary credential overrides. do_coredump() and sys_faccessat() now prepare their own credentials and temporarily override the ones currently on the acting thread, whilst preventing interference from other threads by holding cred_replace_mutex on the thread being dumped. This will be replaced in a future patch by something that hands down the credentials directly to the functions being called, rather than altering the task's objective credentials. (3) LSM interface. A number of functions have been changed, added or removed: (*) security_capset_check(), ->capset_check() (*) security_capset_set(), ->capset_set() Removed in favour of security_capset(). (*) security_capset(), ->capset() New. This is passed a pointer to the new creds, a pointer to the old creds and the proposed capability sets. It should fill in the new creds or return an error. All pointers, barring the pointer to the new creds, are now const. (*) security_bprm_apply_creds(), ->bprm_apply_creds() Changed; now returns a value, which will cause the process to be killed if it's an error. (*) security_task_alloc(), ->task_alloc_security() Removed in favour of security_prepare_creds(). (*) security_cred_free(), ->cred_free() New. Free security data attached to cred->security. (*) security_prepare_creds(), ->cred_prepare() New. Duplicate any security data attached to cred->security. (*) security_commit_creds(), ->cred_commit() New. Apply any security effects for the upcoming installation of new security by commit_creds(). (*) security_task_post_setuid(), ->task_post_setuid() Removed in favour of security_task_fix_setuid(). (*) security_task_fix_setuid(), ->task_fix_setuid() Fix up the proposed new credentials for setuid(). This is used by cap_set_fix_setuid() to implicitly adjust capabilities in line with setuid() changes. Changes are made to the new credentials, rather than the task itself as in security_task_post_setuid(). (*) security_task_reparent_to_init(), ->task_reparent_to_init() Removed. Instead the task being reparented to init is referred directly to init's credentials. NOTE! This results in the loss of some state: SELinux's osid no longer records the sid of the thread that forked it. (*) security_key_alloc(), ->key_alloc() (*) security_key_permission(), ->key_permission() Changed. These now take cred pointers rather than task pointers to refer to the security context. (4) sys_capset(). This has been simplified and uses less locking. The LSM functions it calls have been merged. (5) reparent_to_kthreadd(). This gives the current thread the same credentials as init by simply using commit_thread() to point that way. (6) __sigqueue_alloc() and switch_uid() __sigqueue_alloc() can't stop the target task from changing its creds beneath it, so this function gets a reference to the currently applicable user_struct which it then passes into the sigqueue struct it returns if successful. switch_uid() is now called from commit_creds(), and possibly should be folded into that. commit_creds() should take care of protecting __sigqueue_alloc(). (7) [sg]et[ug]id() and co and [sg]et_current_groups. The set functions now all use prepare_creds(), commit_creds() and abort_creds() to build and check a new set of credentials before applying it. security_task_set[ug]id() is called inside the prepared section. This guarantees that nothing else will affect the creds until we've finished. The calling of set_dumpable() has been moved into commit_creds(). Much of the functionality of set_user() has been moved into commit_creds(). The get functions all simply access the data directly. (8) security_task_prctl() and cap_task_prctl(). security_task_prctl() has been modified to return -ENOSYS if it doesn't want to handle a function, or otherwise return the return value directly rather than through an argument. Additionally, cap_task_prctl() now prepares a new set of credentials, even if it doesn't end up using it. (9) Keyrings. A number of changes have been made to the keyrings code: (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have all been dropped and built in to the credentials functions directly. They may want separating out again later. (b) key_alloc() and search_process_keyrings() now take a cred pointer rather than a task pointer to specify the security context. (c) copy_creds() gives a new thread within the same thread group a new thread keyring if its parent had one, otherwise it discards the thread keyring. (d) The authorisation key now points directly to the credentials to extend the search into rather pointing to the task that carries them. (e) Installing thread, process or session keyrings causes a new set of credentials to be created, even though it's not strictly necessary for process or session keyrings (they're shared). (10) Usermode helper. The usermode helper code now carries a cred struct pointer in its subprocess_info struct instead of a new session keyring pointer. This set of credentials is derived from init_cred and installed on the new process after it has been cloned. call_usermodehelper_setup() allocates the new credentials and call_usermodehelper_freeinfo() discards them if they haven't been used. A special cred function (prepare_usermodeinfo_creds()) is provided specifically for call_usermodehelper_setup() to call. call_usermodehelper_setkeys() adjusts the credentials to sport the supplied keyring as the new session keyring. (11) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) selinux_setprocattr() no longer does its check for whether the current ptracer can access processes with the new SID inside the lock that covers getting the ptracer's SID. Whilst this lock ensures that the check is done with the ptracer pinned, the result is only valid until the lock is released, so there's no point doing it inside the lock. (12) is_single_threaded(). This function has been extracted from selinux_setprocattr() and put into a file of its own in the lib/ directory as join_session_keyring() now wants to use it too. The code in SELinux just checked to see whether a task shared mm_structs with other tasks (CLONE_VM), but that isn't good enough. We really want to know if they're part of the same thread group (CLONE_THREAD). (13) nfsd. The NFS server daemon now has to use the COW credentials to set the credentials it is going to use. It really needs to pass the credentials down to the functions it calls, but it can't do that until other patches in this series have been applied. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-13 23:39:23 +00:00
unsigned long arg4, unsigned long arg5)
{
int thisrc;
int rc = LSM_RET_DEFAULT(task_prctl);
struct security_hook_list *hp;
hlist_for_each_entry(hp, &security_hook_heads.task_prctl, list) {
thisrc = hp->hook.task_prctl(option, arg2, arg3, arg4, arg5);
if (thisrc != LSM_RET_DEFAULT(task_prctl)) {
rc = thisrc;
if (thisrc != 0)
break;
}
}
return rc;
}
/**
* security_task_to_inode() - Set the security attributes of a task's inode
* @p: task
* @inode: inode
*
* Set the security attributes for an inode based on an associated task's
* security attributes, e.g. for /proc/pid inodes.
*/
void security_task_to_inode(struct task_struct *p, struct inode *inode)
{
call_void_hook(task_to_inode, p, inode);
}
/**
* security_create_user_ns() - Check if creating a new userns is allowed
* @cred: prepared creds
*
* Check permission prior to creating a new user namespace.
*
* Return: Returns 0 if successful, otherwise < 0 error code.
*/
security, lsm: Introduce security_create_user_ns() User namespaces are an effective tool to allow programs to run with permission without requiring the need for a program to run as root. User namespaces may also be used as a sandboxing technique. However, attackers sometimes leverage user namespaces as an initial attack vector to perform some exploit. [1,2,3] While it is not the unprivileged user namespace functionality, which causes the kernel to be exploitable, users/administrators might want to more granularly limit or at least monitor how various processes use this functionality, while vulnerable kernel subsystems are being patched. Preventing user namespace already creation comes in a few of forms in order of granularity: 1. /proc/sys/user/max_user_namespaces sysctl 2. Distro specific patch(es) 3. CONFIG_USER_NS To block a task based on its attributes, the LSM hook cred_prepare is a decent candidate for use because it provides more granular control, and it is called before create_user_ns(): cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred) Since security_prepare_creds() is meant for LSMs to copy and prepare credentials, access control is an unintended use of the hook. [4] Further, security_prepare_creds() will always return a ENOMEM if the hook returns any non-zero error code. This hook also does not handle the clone3 case which requires us to access a user space pointer to know if we're in the CLONE_NEW_USER call path which may be subject to a TOCTTOU attack. Lastly, cred_prepare is called in many call paths, and a targeted hook further limits the frequency of calls which is a beneficial outcome. Therefore introduce a new function security_create_user_ns() with an accompanying userns_create LSM hook. With the new userns_create hook, users will have more control over the observability and access control over user namespace creation. Users should expect that normal operation of user namespaces will behave as usual, and only be impacted when controls are implemented by users or administrators. This hook takes the prepared creds for LSM authors to write policy against. On success, the new namespace is applied to credentials, otherwise an error is returned. Links: 1. https://nvd.nist.gov/vuln/detail/CVE-2022-0492 2. https://nvd.nist.gov/vuln/detail/CVE-2022-25636 3. https://nvd.nist.gov/vuln/detail/CVE-2022-34918 4. https://lore.kernel.org/all/1c4b1c0d-12f6-6e9e-a6a3-cdce7418110c@schaufler-ca.com/ Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: KP Singh <kpsingh@kernel.org> Signed-off-by: Frederick Lawler <fred@cloudflare.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-08-15 16:20:25 +00:00
int security_create_user_ns(const struct cred *cred)
{
return call_int_hook(userns_create, 0, cred);
}
int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag)
{
return call_int_hook(ipc_permission, 0, ipcp, flag);
}
void security_ipc_getsecid(struct kern_ipc_perm *ipcp, u32 *secid)
{
*secid = 0;
call_void_hook(ipc_getsecid, ipcp, secid);
}
int security_msg_msg_alloc(struct msg_msg *msg)
{
int rc = lsm_msg_msg_alloc(msg);
if (unlikely(rc))
return rc;
rc = call_int_hook(msg_msg_alloc_security, 0, msg);
if (unlikely(rc))
security_msg_msg_free(msg);
return rc;
}
void security_msg_msg_free(struct msg_msg *msg)
{
call_void_hook(msg_msg_free_security, msg);
kfree(msg->security);
msg->security = NULL;
}
int security_msg_queue_alloc(struct kern_ipc_perm *msq)
{
int rc = lsm_ipc_alloc(msq);
if (unlikely(rc))
return rc;
rc = call_int_hook(msg_queue_alloc_security, 0, msq);
if (unlikely(rc))
security_msg_queue_free(msq);
return rc;
}
void security_msg_queue_free(struct kern_ipc_perm *msq)
{
call_void_hook(msg_queue_free_security, msq);
kfree(msq->security);
msq->security = NULL;
}
int security_msg_queue_associate(struct kern_ipc_perm *msq, int msqflg)
{
return call_int_hook(msg_queue_associate, 0, msq, msqflg);
}
int security_msg_queue_msgctl(struct kern_ipc_perm *msq, int cmd)
{
return call_int_hook(msg_queue_msgctl, 0, msq, cmd);
}
int security_msg_queue_msgsnd(struct kern_ipc_perm *msq,
struct msg_msg *msg, int msqflg)
{
return call_int_hook(msg_queue_msgsnd, 0, msq, msg, msqflg);
}
int security_msg_queue_msgrcv(struct kern_ipc_perm *msq, struct msg_msg *msg,
struct task_struct *target, long type, int mode)
{
return call_int_hook(msg_queue_msgrcv, 0, msq, msg, target, type, mode);
}
int security_shm_alloc(struct kern_ipc_perm *shp)
{
int rc = lsm_ipc_alloc(shp);
if (unlikely(rc))
return rc;
rc = call_int_hook(shm_alloc_security, 0, shp);
if (unlikely(rc))
security_shm_free(shp);
return rc;
}
void security_shm_free(struct kern_ipc_perm *shp)
{
call_void_hook(shm_free_security, shp);
kfree(shp->security);
shp->security = NULL;
}
int security_shm_associate(struct kern_ipc_perm *shp, int shmflg)
{
return call_int_hook(shm_associate, 0, shp, shmflg);
}
int security_shm_shmctl(struct kern_ipc_perm *shp, int cmd)
{
return call_int_hook(shm_shmctl, 0, shp, cmd);
}
int security_shm_shmat(struct kern_ipc_perm *shp, char __user *shmaddr, int shmflg)
{
return call_int_hook(shm_shmat, 0, shp, shmaddr, shmflg);
}
int security_sem_alloc(struct kern_ipc_perm *sma)
{
int rc = lsm_ipc_alloc(sma);
if (unlikely(rc))
return rc;
rc = call_int_hook(sem_alloc_security, 0, sma);
if (unlikely(rc))
security_sem_free(sma);
return rc;
}
void security_sem_free(struct kern_ipc_perm *sma)
{
call_void_hook(sem_free_security, sma);
kfree(sma->security);
sma->security = NULL;
}
int security_sem_associate(struct kern_ipc_perm *sma, int semflg)
{
return call_int_hook(sem_associate, 0, sma, semflg);
}
int security_sem_semctl(struct kern_ipc_perm *sma, int cmd)
{
return call_int_hook(sem_semctl, 0, sma, cmd);
}
int security_sem_semop(struct kern_ipc_perm *sma, struct sembuf *sops,
unsigned nsops, int alter)
{
return call_int_hook(sem_semop, 0, sma, sops, nsops, alter);
}
/**
* security_d_instantiate() - Populate an inode's LSM state based on a dentry
* @dentry: dentry
* @inode: inode
*
* Fill in @inode security information for a @dentry if allowed.
*/
void security_d_instantiate(struct dentry *dentry, struct inode *inode)
{
if (unlikely(inode && IS_PRIVATE(inode)))
return;
call_void_hook(d_instantiate, dentry, inode);
}
EXPORT_SYMBOL(security_d_instantiate);
/**
* security_getprocattr() - Read an attribute for a task
* @p: the task
* @lsm: LSM name
* @name: attribute name
* @value: attribute value
*
* Read attribute @name for task @p and store it into @value if allowed.
*
* Return: Returns the length of @value on success, a negative value otherwise.
*/
int security_getprocattr(struct task_struct *p, const char *lsm,
const char *name, char **value)
{
struct security_hook_list *hp;
hlist_for_each_entry(hp, &security_hook_heads.getprocattr, list) {
if (lsm != NULL && strcmp(lsm, hp->lsm))
continue;
return hp->hook.getprocattr(p, name, value);
}
return LSM_RET_DEFAULT(getprocattr);
}
/**
* security_setprocattr() - Set an attribute for a task
* @lsm: LSM name
* @name: attribute name
* @value: attribute value
* @size: attribute value size
*
* Write (set) the current task's attribute @name to @value, size @size if
* allowed.
*
* Return: Returns bytes written on success, a negative value otherwise.
*/
int security_setprocattr(const char *lsm, const char *name, void *value,
size_t size)
{
struct security_hook_list *hp;
hlist_for_each_entry(hp, &security_hook_heads.setprocattr, list) {
if (lsm != NULL && strcmp(lsm, hp->lsm))
continue;
return hp->hook.setprocattr(name, value, size);
}
return LSM_RET_DEFAULT(setprocattr);
}
/**
* security_netlink_send() - Save info and check if netlink sending is allowed
* @sk: sending socket
* @skb: netlink message
*
* Save security information for a netlink message so that permission checking
* can be performed when the message is processed. The security information
* can be saved using the eff_cap field of the netlink_skb_parms structure.
* Also may be used to provide fine grained control over message transmission.
*
* Return: Returns 0 if the information was successfully saved and message is
* allowed to be transmitted.
*/
int security_netlink_send(struct sock *sk, struct sk_buff *skb)
{
return call_int_hook(netlink_send, 0, sk, skb);
}
int security_ismaclabel(const char *name)
{
return call_int_hook(ismaclabel, 0, name);
}
EXPORT_SYMBOL(security_ismaclabel);
int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
{
struct security_hook_list *hp;
int rc;
/*
* Currently, only one LSM can implement secid_to_secctx (i.e this
* LSM hook is not "stackable").
*/
hlist_for_each_entry(hp, &security_hook_heads.secid_to_secctx, list) {
rc = hp->hook.secid_to_secctx(secid, secdata, seclen);
if (rc != LSM_RET_DEFAULT(secid_to_secctx))
return rc;
}
return LSM_RET_DEFAULT(secid_to_secctx);
}
EXPORT_SYMBOL(security_secid_to_secctx);
int security_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid)
{
*secid = 0;
return call_int_hook(secctx_to_secid, 0, secdata, seclen, secid);
}
EXPORT_SYMBOL(security_secctx_to_secid);
void security_release_secctx(char *secdata, u32 seclen)
{
call_void_hook(release_secctx, secdata, seclen);
}
EXPORT_SYMBOL(security_release_secctx);
void security_inode_invalidate_secctx(struct inode *inode)
{
call_void_hook(inode_invalidate_secctx, inode);
}
EXPORT_SYMBOL(security_inode_invalidate_secctx);
LSM/SELinux: inode_{get,set,notify}secctx hooks to access LSM security context information. This patch introduces three new hooks. The inode_getsecctx hook is used to get all relevant information from an LSM about an inode. The inode_setsecctx is used to set both the in-core and on-disk state for the inode based on a context derived from inode_getsecctx.The final hook inode_notifysecctx will notify the LSM of a change for the in-core state of the inode in question. These hooks are for use in the labeled NFS code and addresses concerns of how to set security on an inode in a multi-xattr LSM. For historical reasons Stephen Smalley's explanation of the reason for these hooks is pasted below. Quote Stephen Smalley inode_setsecctx: Change the security context of an inode. Updates the in core security context managed by the security module and invokes the fs code as needed (via __vfs_setxattr_noperm) to update any backing xattrs that represent the context. Example usage: NFS server invokes this hook to change the security context in its incore inode and on the backing file system to a value provided by the client on a SETATTR operation. inode_notifysecctx: Notify the security module of what the security context of an inode should be. Initializes the incore security context managed by the security module for this inode. Example usage: NFS client invokes this hook to initialize the security context in its incore inode to the value provided by the server for the file when the server returned the file's attributes to the client. Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-03 18:25:57 +00:00
int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen)
{
return call_int_hook(inode_notifysecctx, 0, inode, ctx, ctxlen);
LSM/SELinux: inode_{get,set,notify}secctx hooks to access LSM security context information. This patch introduces three new hooks. The inode_getsecctx hook is used to get all relevant information from an LSM about an inode. The inode_setsecctx is used to set both the in-core and on-disk state for the inode based on a context derived from inode_getsecctx.The final hook inode_notifysecctx will notify the LSM of a change for the in-core state of the inode in question. These hooks are for use in the labeled NFS code and addresses concerns of how to set security on an inode in a multi-xattr LSM. For historical reasons Stephen Smalley's explanation of the reason for these hooks is pasted below. Quote Stephen Smalley inode_setsecctx: Change the security context of an inode. Updates the in core security context managed by the security module and invokes the fs code as needed (via __vfs_setxattr_noperm) to update any backing xattrs that represent the context. Example usage: NFS server invokes this hook to change the security context in its incore inode and on the backing file system to a value provided by the client on a SETATTR operation. inode_notifysecctx: Notify the security module of what the security context of an inode should be. Initializes the incore security context managed by the security module for this inode. Example usage: NFS client invokes this hook to initialize the security context in its incore inode to the value provided by the server for the file when the server returned the file's attributes to the client. Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-03 18:25:57 +00:00
}
EXPORT_SYMBOL(security_inode_notifysecctx);
int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen)
{
return call_int_hook(inode_setsecctx, 0, dentry, ctx, ctxlen);
LSM/SELinux: inode_{get,set,notify}secctx hooks to access LSM security context information. This patch introduces three new hooks. The inode_getsecctx hook is used to get all relevant information from an LSM about an inode. The inode_setsecctx is used to set both the in-core and on-disk state for the inode based on a context derived from inode_getsecctx.The final hook inode_notifysecctx will notify the LSM of a change for the in-core state of the inode in question. These hooks are for use in the labeled NFS code and addresses concerns of how to set security on an inode in a multi-xattr LSM. For historical reasons Stephen Smalley's explanation of the reason for these hooks is pasted below. Quote Stephen Smalley inode_setsecctx: Change the security context of an inode. Updates the in core security context managed by the security module and invokes the fs code as needed (via __vfs_setxattr_noperm) to update any backing xattrs that represent the context. Example usage: NFS server invokes this hook to change the security context in its incore inode and on the backing file system to a value provided by the client on a SETATTR operation. inode_notifysecctx: Notify the security module of what the security context of an inode should be. Initializes the incore security context managed by the security module for this inode. Example usage: NFS client invokes this hook to initialize the security context in its incore inode to the value provided by the server for the file when the server returned the file's attributes to the client. Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-03 18:25:57 +00:00
}
EXPORT_SYMBOL(security_inode_setsecctx);
int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen)
{
return call_int_hook(inode_getsecctx, -EOPNOTSUPP, inode, ctx, ctxlen);
LSM/SELinux: inode_{get,set,notify}secctx hooks to access LSM security context information. This patch introduces three new hooks. The inode_getsecctx hook is used to get all relevant information from an LSM about an inode. The inode_setsecctx is used to set both the in-core and on-disk state for the inode based on a context derived from inode_getsecctx.The final hook inode_notifysecctx will notify the LSM of a change for the in-core state of the inode in question. These hooks are for use in the labeled NFS code and addresses concerns of how to set security on an inode in a multi-xattr LSM. For historical reasons Stephen Smalley's explanation of the reason for these hooks is pasted below. Quote Stephen Smalley inode_setsecctx: Change the security context of an inode. Updates the in core security context managed by the security module and invokes the fs code as needed (via __vfs_setxattr_noperm) to update any backing xattrs that represent the context. Example usage: NFS server invokes this hook to change the security context in its incore inode and on the backing file system to a value provided by the client on a SETATTR operation. inode_notifysecctx: Notify the security module of what the security context of an inode should be. Initializes the incore security context managed by the security module for this inode. Example usage: NFS client invokes this hook to initialize the security context in its incore inode to the value provided by the server for the file when the server returned the file's attributes to the client. Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-03 18:25:57 +00:00
}
EXPORT_SYMBOL(security_inode_getsecctx);
#ifdef CONFIG_WATCH_QUEUE
int security_post_notification(const struct cred *w_cred,
const struct cred *cred,
struct watch_notification *n)
{
return call_int_hook(post_notification, 0, w_cred, cred, n);
}
#endif /* CONFIG_WATCH_QUEUE */
#ifdef CONFIG_KEY_NOTIFICATIONS
int security_watch_key(struct key *key)
{
return call_int_hook(watch_key, 0, key);
}
#endif
#ifdef CONFIG_SECURITY_NETWORK
/**
* security_unix_stream_connect() - Check if a AF_UNIX stream is allowed
* @sock: originating sock
* @other: peer sock
* @newsk: new sock
*
* Check permissions before establishing a Unix domain stream connection
* between @sock and @other.
*
* The @unix_stream_connect and @unix_may_send hooks were necessary because
* Linux provides an alternative to the conventional file name space for Unix
* domain sockets. Whereas binding and connecting to sockets in the file name
* space is mediated by the typical file permissions (and caught by the mknod
* and permission hooks in inode_security_ops), binding and connecting to
* sockets in the abstract name space is completely unmediated. Sufficient
* control of Unix domain sockets in the abstract name space isn't possible
* using only the socket layer hooks, since we need to know the actual target
* socket, which is not looked up until we are inside the af_unix code.
*
* Return: Returns 0 if permission is granted.
*/
int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk)
{
return call_int_hook(unix_stream_connect, 0, sock, other, newsk);
}
EXPORT_SYMBOL(security_unix_stream_connect);
/**
* security_unix_may_send() - Check if AF_UNIX socket can send datagrams
* @sock: originating sock
* @other: peer sock
*
* Check permissions before connecting or sending datagrams from @sock to
* @other.
*
* The @unix_stream_connect and @unix_may_send hooks were necessary because
* Linux provides an alternative to the conventional file name space for Unix
* domain sockets. Whereas binding and connecting to sockets in the file name
* space is mediated by the typical file permissions (and caught by the mknod
* and permission hooks in inode_security_ops), binding and connecting to
* sockets in the abstract name space is completely unmediated. Sufficient
* control of Unix domain sockets in the abstract name space isn't possible
* using only the socket layer hooks, since we need to know the actual target
* socket, which is not looked up until we are inside the af_unix code.
*
* Return: Returns 0 if permission is granted.
*/
int security_unix_may_send(struct socket *sock, struct socket *other)
{
return call_int_hook(unix_may_send, 0, sock, other);
}
EXPORT_SYMBOL(security_unix_may_send);
int security_socket_create(int family, int type, int protocol, int kern)
{
return call_int_hook(socket_create, 0, family, type, protocol, kern);
}
int security_socket_post_create(struct socket *sock, int family,
int type, int protocol, int kern)
{
return call_int_hook(socket_post_create, 0, sock, family, type,
protocol, kern);
}
int security_socket_socketpair(struct socket *socka, struct socket *sockb)
{
return call_int_hook(socket_socketpair, 0, socka, sockb);
}
EXPORT_SYMBOL(security_socket_socketpair);
int security_socket_bind(struct socket *sock, struct sockaddr *address, int addrlen)
{
return call_int_hook(socket_bind, 0, sock, address, addrlen);
}
int security_socket_connect(struct socket *sock, struct sockaddr *address, int addrlen)
{
return call_int_hook(socket_connect, 0, sock, address, addrlen);
}
int security_socket_listen(struct socket *sock, int backlog)
{
return call_int_hook(socket_listen, 0, sock, backlog);
}
int security_socket_accept(struct socket *sock, struct socket *newsock)
{
return call_int_hook(socket_accept, 0, sock, newsock);
}
int security_socket_sendmsg(struct socket *sock, struct msghdr *msg, int size)
{
return call_int_hook(socket_sendmsg, 0, sock, msg, size);
}
int security_socket_recvmsg(struct socket *sock, struct msghdr *msg,
int size, int flags)
{
return call_int_hook(socket_recvmsg, 0, sock, msg, size, flags);
}
int security_socket_getsockname(struct socket *sock)
{
return call_int_hook(socket_getsockname, 0, sock);
}
int security_socket_getpeername(struct socket *sock)
{
return call_int_hook(socket_getpeername, 0, sock);
}
int security_socket_getsockopt(struct socket *sock, int level, int optname)
{
return call_int_hook(socket_getsockopt, 0, sock, level, optname);
}
int security_socket_setsockopt(struct socket *sock, int level, int optname)
{
return call_int_hook(socket_setsockopt, 0, sock, level, optname);
}
int security_socket_shutdown(struct socket *sock, int how)
{
return call_int_hook(socket_shutdown, 0, sock, how);
}
int security_sock_rcv_skb(struct sock *sk, struct sk_buff *skb)
{
return call_int_hook(socket_sock_rcv_skb, 0, sk, skb);
}
EXPORT_SYMBOL(security_sock_rcv_skb);
int security_socket_getpeersec_stream(struct socket *sock, sockptr_t optval,
sockptr_t optlen, unsigned int len)
{
return call_int_hook(socket_getpeersec_stream, -ENOPROTOOPT, sock,
optval, optlen, len);
}
int security_socket_getpeersec_dgram(struct socket *sock, struct sk_buff *skb, u32 *secid)
{
return call_int_hook(socket_getpeersec_dgram, -ENOPROTOOPT, sock,
skb, secid);
}
EXPORT_SYMBOL(security_socket_getpeersec_dgram);
int security_sk_alloc(struct sock *sk, int family, gfp_t priority)
{
return call_int_hook(sk_alloc_security, 0, sk, family, priority);
}
void security_sk_free(struct sock *sk)
{
call_void_hook(sk_free_security, sk);
}
void security_sk_clone(const struct sock *sk, struct sock *newsk)
{
call_void_hook(sk_clone_security, sk, newsk);
}
EXPORT_SYMBOL(security_sk_clone);
void security_sk_classify_flow(struct sock *sk, struct flowi_common *flic)
{
call_void_hook(sk_getsecid, sk, &flic->flowic_secid);
}
EXPORT_SYMBOL(security_sk_classify_flow);
void security_req_classify_flow(const struct request_sock *req,
struct flowi_common *flic)
{
call_void_hook(req_classify_flow, req, flic);
}
EXPORT_SYMBOL(security_req_classify_flow);
void security_sock_graft(struct sock *sk, struct socket *parent)
{
call_void_hook(sock_graft, sk, parent);
}
EXPORT_SYMBOL(security_sock_graft);
int security_inet_conn_request(const struct sock *sk,
struct sk_buff *skb, struct request_sock *req)
{
return call_int_hook(inet_conn_request, 0, sk, skb, req);
}
EXPORT_SYMBOL(security_inet_conn_request);
void security_inet_csk_clone(struct sock *newsk,
const struct request_sock *req)
{
call_void_hook(inet_csk_clone, newsk, req);
}
void security_inet_conn_established(struct sock *sk,
struct sk_buff *skb)
{
call_void_hook(inet_conn_established, sk, skb);
}
EXPORT_SYMBOL(security_inet_conn_established);
int security_secmark_relabel_packet(u32 secid)
{
return call_int_hook(secmark_relabel_packet, 0, secid);
}
EXPORT_SYMBOL(security_secmark_relabel_packet);
void security_secmark_refcount_inc(void)
{
call_void_hook(secmark_refcount_inc);
}
EXPORT_SYMBOL(security_secmark_refcount_inc);
void security_secmark_refcount_dec(void)
{
call_void_hook(secmark_refcount_dec);
}
EXPORT_SYMBOL(security_secmark_refcount_dec);
tun: fix LSM/SELinux labeling of tun/tap devices This patch corrects some problems with LSM/SELinux that were introduced with the multiqueue patchset. The problem stems from the fact that the multiqueue work changed the relationship between the tun device and its associated socket; before the socket persisted for the life of the device, however after the multiqueue changes the socket only persisted for the life of the userspace connection (fd open). For non-persistent devices this is not an issue, but for persistent devices this can cause the tun device to lose its SELinux label. We correct this problem by adding an opaque LSM security blob to the tun device struct which allows us to have the LSM security state, e.g. SELinux labeling information, persist for the lifetime of the tun device. In the process we tweak the LSM hooks to work with this new approach to TUN device/socket labeling and introduce a new LSM hook, security_tun_dev_attach_queue(), to approve requests to attach to a TUN queue via TUNSETQUEUE. The SELinux code has been adjusted to match the new LSM hooks, the other LSMs do not make use of the LSM TUN controls. This patch makes use of the recently added "tun_socket:attach_queue" permission to restrict access to the TUNSETQUEUE operation. On older SELinux policies which do not define the "tun_socket:attach_queue" permission the access control decision for TUNSETQUEUE will be handled according to the SELinux policy's unknown permission setting. Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Eric Paris <eparis@parisplace.org> Tested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 07:12:19 +00:00
int security_tun_dev_alloc_security(void **security)
{
return call_int_hook(tun_dev_alloc_security, 0, security);
tun: fix LSM/SELinux labeling of tun/tap devices This patch corrects some problems with LSM/SELinux that were introduced with the multiqueue patchset. The problem stems from the fact that the multiqueue work changed the relationship between the tun device and its associated socket; before the socket persisted for the life of the device, however after the multiqueue changes the socket only persisted for the life of the userspace connection (fd open). For non-persistent devices this is not an issue, but for persistent devices this can cause the tun device to lose its SELinux label. We correct this problem by adding an opaque LSM security blob to the tun device struct which allows us to have the LSM security state, e.g. SELinux labeling information, persist for the lifetime of the tun device. In the process we tweak the LSM hooks to work with this new approach to TUN device/socket labeling and introduce a new LSM hook, security_tun_dev_attach_queue(), to approve requests to attach to a TUN queue via TUNSETQUEUE. The SELinux code has been adjusted to match the new LSM hooks, the other LSMs do not make use of the LSM TUN controls. This patch makes use of the recently added "tun_socket:attach_queue" permission to restrict access to the TUNSETQUEUE operation. On older SELinux policies which do not define the "tun_socket:attach_queue" permission the access control decision for TUNSETQUEUE will be handled according to the SELinux policy's unknown permission setting. Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Eric Paris <eparis@parisplace.org> Tested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 07:12:19 +00:00
}
EXPORT_SYMBOL(security_tun_dev_alloc_security);
void security_tun_dev_free_security(void *security)
{
call_void_hook(tun_dev_free_security, security);
tun: fix LSM/SELinux labeling of tun/tap devices This patch corrects some problems with LSM/SELinux that were introduced with the multiqueue patchset. The problem stems from the fact that the multiqueue work changed the relationship between the tun device and its associated socket; before the socket persisted for the life of the device, however after the multiqueue changes the socket only persisted for the life of the userspace connection (fd open). For non-persistent devices this is not an issue, but for persistent devices this can cause the tun device to lose its SELinux label. We correct this problem by adding an opaque LSM security blob to the tun device struct which allows us to have the LSM security state, e.g. SELinux labeling information, persist for the lifetime of the tun device. In the process we tweak the LSM hooks to work with this new approach to TUN device/socket labeling and introduce a new LSM hook, security_tun_dev_attach_queue(), to approve requests to attach to a TUN queue via TUNSETQUEUE. The SELinux code has been adjusted to match the new LSM hooks, the other LSMs do not make use of the LSM TUN controls. This patch makes use of the recently added "tun_socket:attach_queue" permission to restrict access to the TUNSETQUEUE operation. On older SELinux policies which do not define the "tun_socket:attach_queue" permission the access control decision for TUNSETQUEUE will be handled according to the SELinux policy's unknown permission setting. Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Eric Paris <eparis@parisplace.org> Tested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 07:12:19 +00:00
}
EXPORT_SYMBOL(security_tun_dev_free_security);
int security_tun_dev_create(void)
{
return call_int_hook(tun_dev_create, 0);
}
EXPORT_SYMBOL(security_tun_dev_create);
tun: fix LSM/SELinux labeling of tun/tap devices This patch corrects some problems with LSM/SELinux that were introduced with the multiqueue patchset. The problem stems from the fact that the multiqueue work changed the relationship between the tun device and its associated socket; before the socket persisted for the life of the device, however after the multiqueue changes the socket only persisted for the life of the userspace connection (fd open). For non-persistent devices this is not an issue, but for persistent devices this can cause the tun device to lose its SELinux label. We correct this problem by adding an opaque LSM security blob to the tun device struct which allows us to have the LSM security state, e.g. SELinux labeling information, persist for the lifetime of the tun device. In the process we tweak the LSM hooks to work with this new approach to TUN device/socket labeling and introduce a new LSM hook, security_tun_dev_attach_queue(), to approve requests to attach to a TUN queue via TUNSETQUEUE. The SELinux code has been adjusted to match the new LSM hooks, the other LSMs do not make use of the LSM TUN controls. This patch makes use of the recently added "tun_socket:attach_queue" permission to restrict access to the TUNSETQUEUE operation. On older SELinux policies which do not define the "tun_socket:attach_queue" permission the access control decision for TUNSETQUEUE will be handled according to the SELinux policy's unknown permission setting. Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Eric Paris <eparis@parisplace.org> Tested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 07:12:19 +00:00
int security_tun_dev_attach_queue(void *security)
{
return call_int_hook(tun_dev_attach_queue, 0, security);
}
tun: fix LSM/SELinux labeling of tun/tap devices This patch corrects some problems with LSM/SELinux that were introduced with the multiqueue patchset. The problem stems from the fact that the multiqueue work changed the relationship between the tun device and its associated socket; before the socket persisted for the life of the device, however after the multiqueue changes the socket only persisted for the life of the userspace connection (fd open). For non-persistent devices this is not an issue, but for persistent devices this can cause the tun device to lose its SELinux label. We correct this problem by adding an opaque LSM security blob to the tun device struct which allows us to have the LSM security state, e.g. SELinux labeling information, persist for the lifetime of the tun device. In the process we tweak the LSM hooks to work with this new approach to TUN device/socket labeling and introduce a new LSM hook, security_tun_dev_attach_queue(), to approve requests to attach to a TUN queue via TUNSETQUEUE. The SELinux code has been adjusted to match the new LSM hooks, the other LSMs do not make use of the LSM TUN controls. This patch makes use of the recently added "tun_socket:attach_queue" permission to restrict access to the TUNSETQUEUE operation. On older SELinux policies which do not define the "tun_socket:attach_queue" permission the access control decision for TUNSETQUEUE will be handled according to the SELinux policy's unknown permission setting. Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Eric Paris <eparis@parisplace.org> Tested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 07:12:19 +00:00
EXPORT_SYMBOL(security_tun_dev_attach_queue);
tun: fix LSM/SELinux labeling of tun/tap devices This patch corrects some problems with LSM/SELinux that were introduced with the multiqueue patchset. The problem stems from the fact that the multiqueue work changed the relationship between the tun device and its associated socket; before the socket persisted for the life of the device, however after the multiqueue changes the socket only persisted for the life of the userspace connection (fd open). For non-persistent devices this is not an issue, but for persistent devices this can cause the tun device to lose its SELinux label. We correct this problem by adding an opaque LSM security blob to the tun device struct which allows us to have the LSM security state, e.g. SELinux labeling information, persist for the lifetime of the tun device. In the process we tweak the LSM hooks to work with this new approach to TUN device/socket labeling and introduce a new LSM hook, security_tun_dev_attach_queue(), to approve requests to attach to a TUN queue via TUNSETQUEUE. The SELinux code has been adjusted to match the new LSM hooks, the other LSMs do not make use of the LSM TUN controls. This patch makes use of the recently added "tun_socket:attach_queue" permission to restrict access to the TUNSETQUEUE operation. On older SELinux policies which do not define the "tun_socket:attach_queue" permission the access control decision for TUNSETQUEUE will be handled according to the SELinux policy's unknown permission setting. Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Eric Paris <eparis@parisplace.org> Tested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 07:12:19 +00:00
int security_tun_dev_attach(struct sock *sk, void *security)
{
return call_int_hook(tun_dev_attach, 0, sk, security);
}
EXPORT_SYMBOL(security_tun_dev_attach);
tun: fix LSM/SELinux labeling of tun/tap devices This patch corrects some problems with LSM/SELinux that were introduced with the multiqueue patchset. The problem stems from the fact that the multiqueue work changed the relationship between the tun device and its associated socket; before the socket persisted for the life of the device, however after the multiqueue changes the socket only persisted for the life of the userspace connection (fd open). For non-persistent devices this is not an issue, but for persistent devices this can cause the tun device to lose its SELinux label. We correct this problem by adding an opaque LSM security blob to the tun device struct which allows us to have the LSM security state, e.g. SELinux labeling information, persist for the lifetime of the tun device. In the process we tweak the LSM hooks to work with this new approach to TUN device/socket labeling and introduce a new LSM hook, security_tun_dev_attach_queue(), to approve requests to attach to a TUN queue via TUNSETQUEUE. The SELinux code has been adjusted to match the new LSM hooks, the other LSMs do not make use of the LSM TUN controls. This patch makes use of the recently added "tun_socket:attach_queue" permission to restrict access to the TUNSETQUEUE operation. On older SELinux policies which do not define the "tun_socket:attach_queue" permission the access control decision for TUNSETQUEUE will be handled according to the SELinux policy's unknown permission setting. Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Eric Paris <eparis@parisplace.org> Tested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 07:12:19 +00:00
int security_tun_dev_open(void *security)
{
return call_int_hook(tun_dev_open, 0, security);
tun: fix LSM/SELinux labeling of tun/tap devices This patch corrects some problems with LSM/SELinux that were introduced with the multiqueue patchset. The problem stems from the fact that the multiqueue work changed the relationship between the tun device and its associated socket; before the socket persisted for the life of the device, however after the multiqueue changes the socket only persisted for the life of the userspace connection (fd open). For non-persistent devices this is not an issue, but for persistent devices this can cause the tun device to lose its SELinux label. We correct this problem by adding an opaque LSM security blob to the tun device struct which allows us to have the LSM security state, e.g. SELinux labeling information, persist for the lifetime of the tun device. In the process we tweak the LSM hooks to work with this new approach to TUN device/socket labeling and introduce a new LSM hook, security_tun_dev_attach_queue(), to approve requests to attach to a TUN queue via TUNSETQUEUE. The SELinux code has been adjusted to match the new LSM hooks, the other LSMs do not make use of the LSM TUN controls. This patch makes use of the recently added "tun_socket:attach_queue" permission to restrict access to the TUNSETQUEUE operation. On older SELinux policies which do not define the "tun_socket:attach_queue" permission the access control decision for TUNSETQUEUE will be handled according to the SELinux policy's unknown permission setting. Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Eric Paris <eparis@parisplace.org> Tested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 07:12:19 +00:00
}
EXPORT_SYMBOL(security_tun_dev_open);
int security_sctp_assoc_request(struct sctp_association *asoc, struct sk_buff *skb)
{
return call_int_hook(sctp_assoc_request, 0, asoc, skb);
}
EXPORT_SYMBOL(security_sctp_assoc_request);
int security_sctp_bind_connect(struct sock *sk, int optname,
struct sockaddr *address, int addrlen)
{
return call_int_hook(sctp_bind_connect, 0, sk, optname,
address, addrlen);
}
EXPORT_SYMBOL(security_sctp_bind_connect);
void security_sctp_sk_clone(struct sctp_association *asoc, struct sock *sk,
struct sock *newsk)
{
call_void_hook(sctp_sk_clone, asoc, sk, newsk);
}
EXPORT_SYMBOL(security_sctp_sk_clone);
int security_sctp_assoc_established(struct sctp_association *asoc,
struct sk_buff *skb)
{
return call_int_hook(sctp_assoc_established, 0, asoc, skb);
}
EXPORT_SYMBOL(security_sctp_assoc_established);
#endif /* CONFIG_SECURITY_NETWORK */
IB/core: Enforce PKey security on QPs Add new LSM hooks to allocate and free security contexts and check for permission to access a PKey. Allocate and free a security context when creating and destroying a QP. This context is used for controlling access to PKeys. When a request is made to modify a QP that changes the port, PKey index, or alternate path, check that the QP has permission for the PKey in the PKey table index on the subnet prefix of the port. If the QP is shared make sure all handles to the QP also have access. Store which port and PKey index a QP is using. After the reset to init transition the user can modify the port, PKey index and alternate path independently. So port and PKey settings changes can be a merge of the previous settings and the new ones. In order to maintain access control if there are PKey table or subnet prefix change keep a list of all QPs are using each PKey index on each port. If a change occurs all QPs using that device and port must have access enforced for the new cache settings. These changes add a transaction to the QP modify process. Association with the old port and PKey index must be maintained if the modify fails, and must be removed if it succeeds. Association with the new port and PKey index must be established prior to the modify and removed if the modify fails. 1. When a QP is modified to a particular Port, PKey index or alternate path insert that QP into the appropriate lists. 2. Check permission to access the new settings. 3. If step 2 grants access attempt to modify the QP. 4a. If steps 2 and 3 succeed remove any prior associations. 4b. If ether fails remove the new setting associations. If a PKey table or subnet prefix changes walk the list of QPs and check that they have permission. If not send the QP to the error state and raise a fatal error event. If it's a shared QP make sure all the QPs that share the real_qp have permission as well. If the QP that owns a security structure is denied access the security structure is marked as such and the QP is added to an error_list. Once the moving the QP to error is complete the security structure mark is cleared. Maintaining the lists correctly turns QP destroy into a transaction. The hardware driver for the device frees the ib_qp structure, so while the destroy is in progress the ib_qp pointer in the ib_qp_security struct is undefined. When the destroy process begins the ib_qp_security structure is marked as destroying. This prevents any action from being taken on the QP pointer. After the QP is destroyed successfully it could still listed on an error_list wait for it to be processed by that flow before cleaning up the structure. If the destroy fails the QPs port and PKey settings are reinserted into the appropriate lists, the destroying flag is cleared, and access control is enforced, in case there were any cache changes during the destroy flow. To keep the security changes isolated a new file is used to hold security related functionality. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Acked-by: Doug Ledford <dledford@redhat.com> [PM: merge fixup in ib_verbs.h and uverbs_cmd.c] Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-19 12:48:52 +00:00
#ifdef CONFIG_SECURITY_INFINIBAND
int security_ib_pkey_access(void *sec, u64 subnet_prefix, u16 pkey)
{
return call_int_hook(ib_pkey_access, 0, sec, subnet_prefix, pkey);
}
EXPORT_SYMBOL(security_ib_pkey_access);
int security_ib_endport_manage_subnet(void *sec, const char *dev_name, u8 port_num)
{
return call_int_hook(ib_endport_manage_subnet, 0, sec, dev_name, port_num);
}
EXPORT_SYMBOL(security_ib_endport_manage_subnet);
IB/core: Enforce PKey security on QPs Add new LSM hooks to allocate and free security contexts and check for permission to access a PKey. Allocate and free a security context when creating and destroying a QP. This context is used for controlling access to PKeys. When a request is made to modify a QP that changes the port, PKey index, or alternate path, check that the QP has permission for the PKey in the PKey table index on the subnet prefix of the port. If the QP is shared make sure all handles to the QP also have access. Store which port and PKey index a QP is using. After the reset to init transition the user can modify the port, PKey index and alternate path independently. So port and PKey settings changes can be a merge of the previous settings and the new ones. In order to maintain access control if there are PKey table or subnet prefix change keep a list of all QPs are using each PKey index on each port. If a change occurs all QPs using that device and port must have access enforced for the new cache settings. These changes add a transaction to the QP modify process. Association with the old port and PKey index must be maintained if the modify fails, and must be removed if it succeeds. Association with the new port and PKey index must be established prior to the modify and removed if the modify fails. 1. When a QP is modified to a particular Port, PKey index or alternate path insert that QP into the appropriate lists. 2. Check permission to access the new settings. 3. If step 2 grants access attempt to modify the QP. 4a. If steps 2 and 3 succeed remove any prior associations. 4b. If ether fails remove the new setting associations. If a PKey table or subnet prefix changes walk the list of QPs and check that they have permission. If not send the QP to the error state and raise a fatal error event. If it's a shared QP make sure all the QPs that share the real_qp have permission as well. If the QP that owns a security structure is denied access the security structure is marked as such and the QP is added to an error_list. Once the moving the QP to error is complete the security structure mark is cleared. Maintaining the lists correctly turns QP destroy into a transaction. The hardware driver for the device frees the ib_qp structure, so while the destroy is in progress the ib_qp pointer in the ib_qp_security struct is undefined. When the destroy process begins the ib_qp_security structure is marked as destroying. This prevents any action from being taken on the QP pointer. After the QP is destroyed successfully it could still listed on an error_list wait for it to be processed by that flow before cleaning up the structure. If the destroy fails the QPs port and PKey settings are reinserted into the appropriate lists, the destroying flag is cleared, and access control is enforced, in case there were any cache changes during the destroy flow. To keep the security changes isolated a new file is used to hold security related functionality. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Acked-by: Doug Ledford <dledford@redhat.com> [PM: merge fixup in ib_verbs.h and uverbs_cmd.c] Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-19 12:48:52 +00:00
int security_ib_alloc_security(void **sec)
{
return call_int_hook(ib_alloc_security, 0, sec);
}
EXPORT_SYMBOL(security_ib_alloc_security);
void security_ib_free_security(void *sec)
{
call_void_hook(ib_free_security, sec);
}
EXPORT_SYMBOL(security_ib_free_security);
#endif /* CONFIG_SECURITY_INFINIBAND */
#ifdef CONFIG_SECURITY_NETWORK_XFRM
selinux: add gfp argument to security_xfrm_policy_alloc and fix callers security_xfrm_policy_alloc can be called in atomic context so the allocation should be done with GFP_ATOMIC. Add an argument to let the callers choose the appropriate way. In order to do so a gfp argument needs to be added to the method xfrm_policy_alloc_security in struct security_operations and to the internal function selinux_xfrm_alloc_user. After that switch to GFP_ATOMIC in the atomic callers and leave GFP_KERNEL as before for the rest. The path that needed the gfp argument addition is: security_xfrm_policy_alloc -> security_ops.xfrm_policy_alloc_security -> all users of xfrm_policy_alloc_security (e.g. selinux_xfrm_policy_alloc) -> selinux_xfrm_alloc_user (here the allocation used to be GFP_KERNEL only) Now adding a gfp argument to selinux_xfrm_alloc_user requires us to also add it to security_context_to_sid which is used inside and prior to this patch did only GFP_KERNEL allocation. So add gfp argument to security_context_to_sid and adjust all of its callers as well. CC: Paul Moore <paul@paul-moore.com> CC: Dave Jones <davej@redhat.com> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Fan Du <fan.du@windriver.com> CC: David S. Miller <davem@davemloft.net> CC: LSM list <linux-security-module@vger.kernel.org> CC: SELinux list <selinux@tycho.nsa.gov> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-03-07 11:44:19 +00:00
int security_xfrm_policy_alloc(struct xfrm_sec_ctx **ctxp,
struct xfrm_user_sec_ctx *sec_ctx,
gfp_t gfp)
{
return call_int_hook(xfrm_policy_alloc_security, 0, ctxp, sec_ctx, gfp);
}
EXPORT_SYMBOL(security_xfrm_policy_alloc);
int security_xfrm_policy_clone(struct xfrm_sec_ctx *old_ctx,
struct xfrm_sec_ctx **new_ctxp)
{
return call_int_hook(xfrm_policy_clone_security, 0, old_ctx, new_ctxp);
}
void security_xfrm_policy_free(struct xfrm_sec_ctx *ctx)
{
call_void_hook(xfrm_policy_free_security, ctx);
}
EXPORT_SYMBOL(security_xfrm_policy_free);
int security_xfrm_policy_delete(struct xfrm_sec_ctx *ctx)
{
return call_int_hook(xfrm_policy_delete_security, 0, ctx);
}
int security_xfrm_state_alloc(struct xfrm_state *x,
struct xfrm_user_sec_ctx *sec_ctx)
{
return call_int_hook(xfrm_state_alloc, 0, x, sec_ctx);
}
EXPORT_SYMBOL(security_xfrm_state_alloc);
int security_xfrm_state_alloc_acquire(struct xfrm_state *x,
struct xfrm_sec_ctx *polsec, u32 secid)
{
return call_int_hook(xfrm_state_alloc_acquire, 0, x, polsec, secid);
}
int security_xfrm_state_delete(struct xfrm_state *x)
{
return call_int_hook(xfrm_state_delete_security, 0, x);
}
EXPORT_SYMBOL(security_xfrm_state_delete);
void security_xfrm_state_free(struct xfrm_state *x)
{
call_void_hook(xfrm_state_free_security, x);
}
int security_xfrm_policy_lookup(struct xfrm_sec_ctx *ctx, u32 fl_secid)
{
return call_int_hook(xfrm_policy_lookup, 0, ctx, fl_secid);
}
int security_xfrm_state_pol_flow_match(struct xfrm_state *x,
struct xfrm_policy *xp,
const struct flowi_common *flic)
{
struct security_hook_list *hp;
int rc = LSM_RET_DEFAULT(xfrm_state_pol_flow_match);
/*
* Since this function is expected to return 0 or 1, the judgment
* becomes difficult if multiple LSMs supply this call. Fortunately,
* we can use the first LSM's judgment because currently only SELinux
* supplies this call.
*
* For speed optimization, we explicitly break the loop rather than
* using the macro
*/
hlist_for_each_entry(hp, &security_hook_heads.xfrm_state_pol_flow_match,
list) {
rc = hp->hook.xfrm_state_pol_flow_match(x, xp, flic);
break;
}
return rc;
}
int security_xfrm_decode_session(struct sk_buff *skb, u32 *secid)
{
return call_int_hook(xfrm_decode_session, 0, skb, secid, 1);
}
void security_skb_classify_flow(struct sk_buff *skb, struct flowi_common *flic)
{
int rc = call_int_hook(xfrm_decode_session, 0, skb, &flic->flowic_secid,
0);
BUG_ON(rc);
}
EXPORT_SYMBOL(security_skb_classify_flow);
#endif /* CONFIG_SECURITY_NETWORK_XFRM */
#ifdef CONFIG_KEYS
CRED: Inaugurate COW credentials Inaugurate copy-on-write credentials management. This uses RCU to manage the credentials pointer in the task_struct with respect to accesses by other tasks. A process may only modify its own credentials, and so does not need locking to access or modify its own credentials. A mutex (cred_replace_mutex) is added to the task_struct to control the effect of PTRACE_ATTACHED on credential calculations, particularly with respect to execve(). With this patch, the contents of an active credentials struct may not be changed directly; rather a new set of credentials must be prepared, modified and committed using something like the following sequence of events: struct cred *new = prepare_creds(); int ret = blah(new); if (ret < 0) { abort_creds(new); return ret; } return commit_creds(new); There are some exceptions to this rule: the keyrings pointed to by the active credentials may be instantiated - keyrings violate the COW rule as managing COW keyrings is tricky, given that it is possible for a task to directly alter the keys in a keyring in use by another task. To help enforce this, various pointers to sets of credentials, such as those in the task_struct, are declared const. The purpose of this is compile-time discouragement of altering credentials through those pointers. Once a set of credentials has been made public through one of these pointers, it may not be modified, except under special circumstances: (1) Its reference count may incremented and decremented. (2) The keyrings to which it points may be modified, but not replaced. The only safe way to modify anything else is to create a replacement and commit using the functions described in Documentation/credentials.txt (which will be added by a later patch). This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). This now prepares and commits credentials in various places in the security code rather than altering the current creds directly. (2) Temporary credential overrides. do_coredump() and sys_faccessat() now prepare their own credentials and temporarily override the ones currently on the acting thread, whilst preventing interference from other threads by holding cred_replace_mutex on the thread being dumped. This will be replaced in a future patch by something that hands down the credentials directly to the functions being called, rather than altering the task's objective credentials. (3) LSM interface. A number of functions have been changed, added or removed: (*) security_capset_check(), ->capset_check() (*) security_capset_set(), ->capset_set() Removed in favour of security_capset(). (*) security_capset(), ->capset() New. This is passed a pointer to the new creds, a pointer to the old creds and the proposed capability sets. It should fill in the new creds or return an error. All pointers, barring the pointer to the new creds, are now const. (*) security_bprm_apply_creds(), ->bprm_apply_creds() Changed; now returns a value, which will cause the process to be killed if it's an error. (*) security_task_alloc(), ->task_alloc_security() Removed in favour of security_prepare_creds(). (*) security_cred_free(), ->cred_free() New. Free security data attached to cred->security. (*) security_prepare_creds(), ->cred_prepare() New. Duplicate any security data attached to cred->security. (*) security_commit_creds(), ->cred_commit() New. Apply any security effects for the upcoming installation of new security by commit_creds(). (*) security_task_post_setuid(), ->task_post_setuid() Removed in favour of security_task_fix_setuid(). (*) security_task_fix_setuid(), ->task_fix_setuid() Fix up the proposed new credentials for setuid(). This is used by cap_set_fix_setuid() to implicitly adjust capabilities in line with setuid() changes. Changes are made to the new credentials, rather than the task itself as in security_task_post_setuid(). (*) security_task_reparent_to_init(), ->task_reparent_to_init() Removed. Instead the task being reparented to init is referred directly to init's credentials. NOTE! This results in the loss of some state: SELinux's osid no longer records the sid of the thread that forked it. (*) security_key_alloc(), ->key_alloc() (*) security_key_permission(), ->key_permission() Changed. These now take cred pointers rather than task pointers to refer to the security context. (4) sys_capset(). This has been simplified and uses less locking. The LSM functions it calls have been merged. (5) reparent_to_kthreadd(). This gives the current thread the same credentials as init by simply using commit_thread() to point that way. (6) __sigqueue_alloc() and switch_uid() __sigqueue_alloc() can't stop the target task from changing its creds beneath it, so this function gets a reference to the currently applicable user_struct which it then passes into the sigqueue struct it returns if successful. switch_uid() is now called from commit_creds(), and possibly should be folded into that. commit_creds() should take care of protecting __sigqueue_alloc(). (7) [sg]et[ug]id() and co and [sg]et_current_groups. The set functions now all use prepare_creds(), commit_creds() and abort_creds() to build and check a new set of credentials before applying it. security_task_set[ug]id() is called inside the prepared section. This guarantees that nothing else will affect the creds until we've finished. The calling of set_dumpable() has been moved into commit_creds(). Much of the functionality of set_user() has been moved into commit_creds(). The get functions all simply access the data directly. (8) security_task_prctl() and cap_task_prctl(). security_task_prctl() has been modified to return -ENOSYS if it doesn't want to handle a function, or otherwise return the return value directly rather than through an argument. Additionally, cap_task_prctl() now prepares a new set of credentials, even if it doesn't end up using it. (9) Keyrings. A number of changes have been made to the keyrings code: (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have all been dropped and built in to the credentials functions directly. They may want separating out again later. (b) key_alloc() and search_process_keyrings() now take a cred pointer rather than a task pointer to specify the security context. (c) copy_creds() gives a new thread within the same thread group a new thread keyring if its parent had one, otherwise it discards the thread keyring. (d) The authorisation key now points directly to the credentials to extend the search into rather pointing to the task that carries them. (e) Installing thread, process or session keyrings causes a new set of credentials to be created, even though it's not strictly necessary for process or session keyrings (they're shared). (10) Usermode helper. The usermode helper code now carries a cred struct pointer in its subprocess_info struct instead of a new session keyring pointer. This set of credentials is derived from init_cred and installed on the new process after it has been cloned. call_usermodehelper_setup() allocates the new credentials and call_usermodehelper_freeinfo() discards them if they haven't been used. A special cred function (prepare_usermodeinfo_creds()) is provided specifically for call_usermodehelper_setup() to call. call_usermodehelper_setkeys() adjusts the credentials to sport the supplied keyring as the new session keyring. (11) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) selinux_setprocattr() no longer does its check for whether the current ptracer can access processes with the new SID inside the lock that covers getting the ptracer's SID. Whilst this lock ensures that the check is done with the ptracer pinned, the result is only valid until the lock is released, so there's no point doing it inside the lock. (12) is_single_threaded(). This function has been extracted from selinux_setprocattr() and put into a file of its own in the lib/ directory as join_session_keyring() now wants to use it too. The code in SELinux just checked to see whether a task shared mm_structs with other tasks (CLONE_VM), but that isn't good enough. We really want to know if they're part of the same thread group (CLONE_THREAD). (13) nfsd. The NFS server daemon now has to use the COW credentials to set the credentials it is going to use. It really needs to pass the credentials down to the functions it calls, but it can't do that until other patches in this series have been applied. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-13 23:39:23 +00:00
int security_key_alloc(struct key *key, const struct cred *cred,
unsigned long flags)
{
return call_int_hook(key_alloc, 0, key, cred, flags);
}
void security_key_free(struct key *key)
{
call_void_hook(key_free, key);
}
int security_key_permission(key_ref_t key_ref, const struct cred *cred,
enum key_need_perm need_perm)
{
return call_int_hook(key_permission, 0, key_ref, cred, need_perm);
}
int security_key_getsecurity(struct key *key, char **_buffer)
{
*_buffer = NULL;
return call_int_hook(key_getsecurity, 0, key, _buffer);
}
#endif /* CONFIG_KEYS */
#ifdef CONFIG_AUDIT
int security_audit_rule_init(u32 field, u32 op, char *rulestr, void **lsmrule)
{
return call_int_hook(audit_rule_init, 0, field, op, rulestr, lsmrule);
}
int security_audit_rule_known(struct audit_krule *krule)
{
return call_int_hook(audit_rule_known, 0, krule);
}
void security_audit_rule_free(void *lsmrule)
{
call_void_hook(audit_rule_free, lsmrule);
}
int security_audit_rule_match(u32 secid, u32 field, u32 op, void *lsmrule)
{
return call_int_hook(audit_rule_match, 0, secid, field, op, lsmrule);
}
#endif /* CONFIG_AUDIT */
#ifdef CONFIG_BPF_SYSCALL
int security_bpf(int cmd, union bpf_attr *attr, unsigned int size)
{
return call_int_hook(bpf, 0, cmd, attr, size);
}
int security_bpf_map(struct bpf_map *map, fmode_t fmode)
{
return call_int_hook(bpf_map, 0, map, fmode);
}
int security_bpf_prog(struct bpf_prog *prog)
{
return call_int_hook(bpf_prog, 0, prog);
}
int security_bpf_map_alloc(struct bpf_map *map)
{
return call_int_hook(bpf_map_alloc_security, 0, map);
}
int security_bpf_prog_alloc(struct bpf_prog_aux *aux)
{
return call_int_hook(bpf_prog_alloc_security, 0, aux);
}
void security_bpf_map_free(struct bpf_map *map)
{
call_void_hook(bpf_map_free_security, map);
}
void security_bpf_prog_free(struct bpf_prog_aux *aux)
{
call_void_hook(bpf_prog_free_security, aux);
}
#endif /* CONFIG_BPF_SYSCALL */
int security_locked_down(enum lockdown_reason what)
{
return call_int_hook(locked_down, 0, what);
}
EXPORT_SYMBOL(security_locked_down);
perf_event: Add support for LSM and SELinux checks In current mainline, the degree of access to perf_event_open(2) system call depends on the perf_event_paranoid sysctl. This has a number of limitations: 1. The sysctl is only a single value. Many types of accesses are controlled based on the single value thus making the control very limited and coarse grained. 2. The sysctl is global, so if the sysctl is changed, then that means all processes get access to perf_event_open(2) opening the door to security issues. This patch adds LSM and SELinux access checking which will be used in Android to access perf_event_open(2) for the purposes of attaching BPF programs to tracepoints, perf profiling and other operations from userspace. These operations are intended for production systems. 5 new LSM hooks are added: 1. perf_event_open: This controls access during the perf_event_open(2) syscall itself. The hook is called from all the places that the perf_event_paranoid sysctl is checked to keep it consistent with the systctl. The hook gets passed a 'type' argument which controls CPU, kernel and tracepoint accesses (in this context, CPU, kernel and tracepoint have the same semantics as the perf_event_paranoid sysctl). Additionally, I added an 'open' type which is similar to perf_event_paranoid sysctl == 3 patch carried in Android and several other distros but was rejected in mainline [1] in 2016. 2. perf_event_alloc: This allocates a new security object for the event which stores the current SID within the event. It will be useful when the perf event's FD is passed through IPC to another process which may try to read the FD. Appropriate security checks will limit access. 3. perf_event_free: Called when the event is closed. 4. perf_event_read: Called from the read(2) and mmap(2) syscalls for the event. 5. perf_event_write: Called from the ioctl(2) syscalls for the event. [1] https://lwn.net/Articles/696240/ Since Peter had suggest LSM hooks in 2016 [1], I am adding his Suggested-by tag below. To use this patch, we set the perf_event_paranoid sysctl to -1 and then apply selinux checking as appropriate (default deny everything, and then add policy rules to give access to domains that need it). In the future we can remove the perf_event_paranoid sysctl altogether. Suggested-by: Peter Zijlstra <peterz@infradead.org> Co-developed-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: James Morris <jmorris@namei.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: rostedt@goodmis.org Cc: Yonghong Song <yhs@fb.com> Cc: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: jeffv@google.com Cc: Jiri Olsa <jolsa@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: primiano@google.com Cc: Song Liu <songliubraving@fb.com> Cc: rsavitski@google.com Cc: Namhyung Kim <namhyung@kernel.org> Cc: Matthew Garrett <matthewgarrett@google.com> Link: https://lkml.kernel.org/r/20191014170308.70668-1-joel@joelfernandes.org
2019-10-14 17:03:08 +00:00
#ifdef CONFIG_PERF_EVENTS
int security_perf_event_open(struct perf_event_attr *attr, int type)
{
return call_int_hook(perf_event_open, 0, attr, type);
}
int security_perf_event_alloc(struct perf_event *event)
{
return call_int_hook(perf_event_alloc, 0, event);
}
void security_perf_event_free(struct perf_event *event)
{
call_void_hook(perf_event_free, event);
}
int security_perf_event_read(struct perf_event *event)
{
return call_int_hook(perf_event_read, 0, event);
}
int security_perf_event_write(struct perf_event *event)
{
return call_int_hook(perf_event_write, 0, event);
}
#endif /* CONFIG_PERF_EVENTS */
lsm,io_uring: add LSM hooks to io_uring A full expalantion of io_uring is beyond the scope of this commit description, but in summary it is an asynchronous I/O mechanism which allows for I/O requests and the resulting data to be queued in memory mapped "rings" which are shared between the kernel and userspace. Optionally, io_uring offers the ability for applications to spawn kernel threads to dequeue I/O requests from the ring and submit the requests in the kernel, helping to minimize the syscall overhead. Rings are accessed in userspace by memory mapping a file descriptor provided by the io_uring_setup(2), and can be shared between applications as one might do with any open file descriptor. Finally, process credentials can be registered with a given ring and any process with access to that ring can submit I/O requests using any of the registered credentials. While the io_uring functionality is widely recognized as offering a vastly improved, and high performing asynchronous I/O mechanism, its ability to allow processes to submit I/O requests with credentials other than its own presents a challenge to LSMs. When a process creates a new io_uring ring the ring's credentials are inhertied from the calling process; if this ring is shared with another process operating with different credentials there is the potential to bypass the LSMs security policy. Similarly, registering credentials with a given ring allows any process with access to that ring to submit I/O requests with those credentials. In an effort to allow LSMs to apply security policy to io_uring I/O operations, this patch adds two new LSM hooks. These hooks, in conjunction with the LSM anonymous inode support previously submitted, allow an LSM to apply access control policy to the sharing of io_uring rings as well as any io_uring credential changes requested by a process. The new LSM hooks are described below: * int security_uring_override_creds(cred) Controls if the current task, executing an io_uring operation, is allowed to override it's credentials with @cred. In cases where the current task is a user application, the current credentials will be those of the user application. In cases where the current task is a kernel thread servicing io_uring requests the current credentials will be those of the io_uring ring (inherited from the process that created the ring). * int security_uring_sqpoll(void) Controls if the current task is allowed to create an io_uring polling thread (IORING_SETUP_SQPOLL). Without a SQPOLL thread in the kernel processes must submit I/O requests via io_uring_enter(2) which allows us to compare any requested credential changes against the application making the request. With a SQPOLL thread, we can no longer compare requested credential changes against the application making the request, the comparison is made against the ring's credentials. Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-02-02 00:56:49 +00:00
#ifdef CONFIG_IO_URING
int security_uring_override_creds(const struct cred *new)
{
return call_int_hook(uring_override_creds, 0, new);
}
int security_uring_sqpoll(void)
{
return call_int_hook(uring_sqpoll, 0);
}
int security_uring_cmd(struct io_uring_cmd *ioucmd)
{
return call_int_hook(uring_cmd, 0, ioucmd);
}
lsm,io_uring: add LSM hooks to io_uring A full expalantion of io_uring is beyond the scope of this commit description, but in summary it is an asynchronous I/O mechanism which allows for I/O requests and the resulting data to be queued in memory mapped "rings" which are shared between the kernel and userspace. Optionally, io_uring offers the ability for applications to spawn kernel threads to dequeue I/O requests from the ring and submit the requests in the kernel, helping to minimize the syscall overhead. Rings are accessed in userspace by memory mapping a file descriptor provided by the io_uring_setup(2), and can be shared between applications as one might do with any open file descriptor. Finally, process credentials can be registered with a given ring and any process with access to that ring can submit I/O requests using any of the registered credentials. While the io_uring functionality is widely recognized as offering a vastly improved, and high performing asynchronous I/O mechanism, its ability to allow processes to submit I/O requests with credentials other than its own presents a challenge to LSMs. When a process creates a new io_uring ring the ring's credentials are inhertied from the calling process; if this ring is shared with another process operating with different credentials there is the potential to bypass the LSMs security policy. Similarly, registering credentials with a given ring allows any process with access to that ring to submit I/O requests with those credentials. In an effort to allow LSMs to apply security policy to io_uring I/O operations, this patch adds two new LSM hooks. These hooks, in conjunction with the LSM anonymous inode support previously submitted, allow an LSM to apply access control policy to the sharing of io_uring rings as well as any io_uring credential changes requested by a process. The new LSM hooks are described below: * int security_uring_override_creds(cred) Controls if the current task, executing an io_uring operation, is allowed to override it's credentials with @cred. In cases where the current task is a user application, the current credentials will be those of the user application. In cases where the current task is a kernel thread servicing io_uring requests the current credentials will be those of the io_uring ring (inherited from the process that created the ring). * int security_uring_sqpoll(void) Controls if the current task is allowed to create an io_uring polling thread (IORING_SETUP_SQPOLL). Without a SQPOLL thread in the kernel processes must submit I/O requests via io_uring_enter(2) which allows us to compare any requested credential changes against the application making the request. With a SQPOLL thread, we can no longer compare requested credential changes against the application making the request, the comparison is made against the ring's credentials. Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-02-02 00:56:49 +00:00
#endif /* CONFIG_IO_URING */