mainlining shenanigans
Go to file
Christian Brauner 99cdb8b9a5 seccomp: notify about unused filter
We've been making heavy use of the seccomp notifier to intercept and
handle certain syscalls for containers. This patch allows a syscall
supervisor listening on a given notifier to be notified when a seccomp
filter has become unused.

A container is often managed by a singleton supervisor process the
so-called "monitor". This monitor process has an event loop which has
various event handlers registered. If the user specified a seccomp
profile that included a notifier for various syscalls then we also
register a seccomp notify even handler. For any container using a
separate pid namespace the lifecycle of the seccomp notifier is bound to
the init process of the pid namespace, i.e. when the init process exits
the filter must be unused.

If a new process attaches to a container we force it to assume a seccomp
profile. This can either be the same seccomp profile as the container
was started with or a modified one. If the attaching process makes use
of the seccomp notifier we will register a new seccomp notifier handler
in the monitor's event loop. However, when the attaching process exits
we can't simply delete the handler since other child processes could've
been created (daemons spawned etc.) that have inherited the seccomp
filter and so we need to keep the seccomp notifier fd alive in the event
loop. But this is problematic since we don't get a notification when the
seccomp filter has become unused and so we currently never remove the
seccomp notifier fd from the event loop and just keep accumulating fds
in the event loop. We've had this issue for a while but it has recently
become more pressing as more and larger users make use of this.

To fix this, we introduce a new "users" reference counter that tracks any
tasks and dependent filters making use of a filter. When a notifier is
registered waiting tasks will be notified that the filter is now empty
by receiving a (E)POLLHUP event.

The concept in this patch introduces is the same as for signal_struct,
i.e. reference counting for life-cycle management is decoupled from
reference counting taks using the object. There's probably some trickery
possible but the second counter is just the correct way of doing this
IMHO and has precedence.

Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matt Denton <mpdenton@google.com>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Jann Horn <jannh@google.com>
Cc: Chris Palmer <palmer@google.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Robert Sesek <rsesek@google.com>
Cc: Jeffrey Vander Stoep <jeffv@google.com>
Cc: Linux Containers <containers@lists.linux-foundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200531115031.391515-3-christian.brauner@ubuntu.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-10 16:01:51 -07:00
arch Kbuild updates for v5.8 (2nd) 2020-06-13 13:29:16 -07:00
block Kbuild updates for v5.8 (2nd) 2020-06-13 13:29:16 -07:00
certs .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
crypto Merge branch 'rwonce/rework' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux 2020-06-10 14:46:54 -07:00
Documentation Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-06-13 16:27:13 -07:00
drivers Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-06-13 16:27:13 -07:00
fs seccomp: Report number of loaded filters in /proc/$pid/status 2020-07-10 16:01:51 -07:00
include seccomp: release filter after task is fully dead 2020-07-10 16:01:51 -07:00
init seccomp: Report number of loaded filters in /proc/$pid/status 2020-07-10 16:01:51 -07:00
ipc mmap locking API: use coccinelle to convert mmap_sem rwsem call sites 2020-06-09 09:39:14 -07:00
kernel seccomp: notify about unused filter 2020-07-10 16:01:51 -07:00
lib Kbuild updates for v5.8 (2nd) 2020-06-13 13:29:16 -07:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
mm Kbuild updates for v5.8 (2nd) 2020-06-13 13:29:16 -07:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-06-13 16:27:13 -07:00
samples binderfs: add gitignore for generated sample program 2020-06-13 13:41:24 -07:00
scripts Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-06-13 16:27:13 -07:00
security Add additional LSM hooks for SafeSetID 2020-06-14 11:39:31 -07:00
sound Kbuild updates for v5.8 (2nd) 2020-06-13 13:29:16 -07:00
tools selftests/seccomp: Set NNP for TSYNC ESRCH flag test 2020-07-10 16:01:46 -07:00
usr bpfilter: match bit size of bpfilter_umh to that of the kernel 2020-05-17 18:52:01 +09:00
virt MIPS: 2020-06-12 11:05:52 -07:00
.clang-format block: add bio_for_each_bvec_all() 2020-05-25 11:25:24 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore modpost: generate vmlinux.symvers and reuse it for the second modpost 2020-06-06 23:38:12 +09:00
.mailmap A fair amount of stuff this time around, dominated by yet another massive 2020-06-01 15:45:27 -07:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS mailmap: change email for Ricardo Ribalda 2020-05-25 18:59:59 -06:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS Merge branch 'i2c/for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2020-06-13 13:12:38 -07:00
Makefile Linux 5.8-rc1 2020-06-14 12:45:04 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.