Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Note sysctl_hugetlb_shm_group can only be written in the root user
in the initial user namespace, so we can assume sysctl_hugetlb_shm_group
is in the initial user namespace.
Cc: William Irwin <wli@holomorphy.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Now that the type changes are done, here is the final set of
changes to make the quota code work when user namespaces are enabled.
Small cleanups and fixes to make the code build when user namespaces
are enabled.
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Modify quota_send_warning to take struct kqid instead a type and
identifier pair.
When sending netlink broadcasts always convert uids and quota
identifiers into the intial user namespace. There is as yet no way to
send a netlink broadcast message with different contents to receivers
in different namespaces, so for the time being just map all of the
identifiers into the initial user namespace which preserves the
current behavior.
Change the callers of quota_send_warning in gfs2, xfs and dquot
to generate a struct kqid to pass to quota send warning. When
all of the user namespaces convesions are complete a struct kqid
values will be availbe without need for conversion, but a conversion
is needed now to avoid needing to convert everything at once.
Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Update the quotactl user space interface to successfull compile with
user namespaces support enabled and to hand off quota identifiers to
lower layers of the kernel in struct kqid instead of type and qid
pairs.
The quota on function is not converted because while it takes a quota
type and an id. The id is the on disk quota format to use, which
is something completely different.
The signature of two struct quotactl_ops methods were changed to take
struct kqid argumetns get_dqblk and set_dqblk.
The dquot, xfs, and ocfs2 implementations of get_dqblk and set_dqblk
are minimally changed so that the code continues to work with
the change in parameter type.
This is the first in a series of changes to always store quota
identifiers in the kernel in struct kqid and only use raw type and qid
values when interacting with on disk structures or userspace. Always
using struct kqid internally makes it hard to miss places that need
conversion to or from the kernel internal values.
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Convert ext2, ext3, and ext4 to fully support the posix acl changes,
using e_uid e_gid instead e_id.
Enabled building with posix acls enabled, all filesystems supporting
user namespaces, now also support posix acls when user namespaces are enabled.
Cc: Theodore Tso <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
- When tracing capture the kuid.
- When displaying the data to user space convert the kuid into the
user namespace of the process that opened the report file.
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
BSD process accounting conveniently passes the file the accounting
records will be written into to do_acct_process. The file credentials
captured the user namespace of the opener of the file. Use the file
credentials to format the uid and the gid of the current process into
the user namespace of the user that started the bsd process
accounting.
Cc: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
- Explicitly limit exit task stat broadcast to the initial user and
pid namespaces, as it is already limited to the initial network
namespace.
- For broadcast task stats explicitly generate all of the idenitiers
in terms of the initial user namespace and the initial pid
namespace.
- For request stats report them in terms of the current user namespace
and the current pid namespace. Netlink messages are delivered
syncrhonously to the kernel allowing us to get the user namespace
and the pid namespace from the current task.
- Pass the namespaces for representing pids and uids and gids
into bacct_add_task.
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
- Explicitly format uids gids in audit messges in the initial user
namespace. This is safe because auditd is restrected to be in
the initial user namespace.
- Convert audit_sig_uid into a kuid_t.
- Enable building the audit code and user namespaces at the same time.
The net result is that the audit subsystem now uses kuid_t and kgid_t whenever
possible making it almost impossible to confuse a raw uid_t with a kuid_t
preventing bugs.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
In net/dns_resolver/dns_key.c and net/rxrpc/ar-key.c make them
work with user namespaces enabled where key_alloc takes kuids and kgids.
Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID instead of bare 0's.
Cc: Sage Weil <sage@inktank.com>
Cc: ceph-devel@vger.kernel.org
Cc: David Howells <dhowells@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: linux-afs@lists.infradead.org
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
- Replace key_user ->user_ns equality checks with kuid_has_mapping checks.
- Use from_kuid to generate key descriptions
- Use kuid_t and kgid_t and the associated helpers instead of uid_t and gid_t
- Avoid potential problems with file descriptor passing by displaying
keys in the user namespace of the opener of key status proc files.
Cc: linux-security-module@vger.kernel.org
Cc: keyrings@linux-nfs.org
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Blink Blink this had not been converted to use struct pid ages ago?
- On drm open capture the openers kuid and struct pid.
- On drm close release the kuid and struct pid
- When reporting the uid and pid convert the kuid and struct pid
into values in the appropriate namespace.
Cc: dri-devel@lists.freedesktop.org
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
- Store the ipc owner and creator with a kuid
- Store the ipc group and the crators group with a kgid.
- Add error handling to ipc_update_perms, allowing it to
fail if the uids and gids can not be converted to kuids
or kgids.
- Modify the proc files to display the ipc creator and
owner in the user namespace of the opener of the proc file.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
- Only allow asking for events from the initial user and pid namespace,
where we generate the events in.
- Convert kuids and kgids into the initial user namespace to report
them via the process event connector.
Cc: David Miller <davem@davemloft.net>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Enable building of pf_key sockets and user namespace support at the
same time. This combination builds successfully so there is no reason
to forbid it.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Maxim Krasnyansky <maxk@qualcomm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: John W. Linville <linville@tuxdriver.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
- Only allow adding matches from the initial user namespace
- Add the appropriate conversion functions to handle matches
against sockets in other user namespaces.
Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
xt_recent creates a bunch of proc files and initializes their uid
and gids to the values of ip_list_uid and ip_list_gid. When
initialize those proc files convert those values to kuids so they
can continue to reside on the /proc inode.
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jan Engelhardt <jengelh@medozas.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
xt_LOG always writes messages via sb_add via printk. Therefore when
xt_LOG logs the uid and gid of a socket a packet came from the
values should be converted to be in the initial user namespace.
Thus making xt_LOG as user namespace safe as possible.
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
The flow classifier can use uids and gids of the sockets that
are transmitting packets and do insert those uids and gids
into the packet classification calcuation. I don't fully
understand the details but it appears that we can depend
on specific uids and gids when making traffic classification
decisions.
To work with user namespaces enabled map from kuids and kgids
into uids and gids in the initial user namespace giving raw
integer values the code can play with and depend on.
To avoid issues of userspace depending on uids and gids in
packet classifiers installed from other user namespaces
and getting confused deny all packet classifiers that
use uids or gids that are not comming from a netlink socket
in the initial user namespace.
Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Changli Gao <xiaosuo@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
At logging instance creation capture the peer netlink socket's user
namespace. Use the captured peer user namespace when reporting socket
uids to the peer.
The peer socket's user namespace is guaranateed to be valid until the user
closes the netlink socket. nfnetlink_log removes instances during the final
close of a socket. __build_packet_message does not get called after an
instance is destroyed. Therefore it is safe to let the peer netlink socket
take care of the user namespace reference counting for us.
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Compute the user namespace of the socket that we are replying to
and translate the kuids of reported sockets into that user namespace.
Cc: Andrew Vagin <avagin@openvz.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Correct a long standing omission and use struct pid in the owner
field of struct ip6_flowlabel when the share type is IPV6_FL_S_PROCESS.
This guarantees we don't have issues when pid wraparound occurs.
Use a kuid_t in the owner field of struct ip6_flowlabel when the
share type is IPV6_FL_S_USER to add user namespace support.
In /proc/net/ip6_flowlabel capture the current pid namespace when
opening the file and release the pid namespace when the file is
closed ensuring we print the pid owner value that is meaning to
the reader of the file. Similarly use from_kuid_munged to print
uid values that are meaningful to the reader of the file.
This requires exporting pid_nr_ns so that ipv6 can continue to built
as a module. Yoiks what silliness
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
- Store sysctl_ping_group_range as a paire of kgid_t values
instead of a pair of gid_t values.
- Move the kgid conversion work from ping_init_sock into ipv4_ping_group_range
- For invalid cases reset to the default disabled state.
With the kgid_t conversion made part of the original value sanitation
from userspace understand how the code will react becomes clearer
and it becomes possible to set the sysctl ping group range from
something other than the initial user namespace.
Cc: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Now that the networking core is user namespace safe allow
networking and user namespaces to be built at the same time.
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
The user namespace code has an explicit "depends on USB_DEVICEFS = n"
dependency to prevent building code that is not yet user namespace safe. With
the removal of usbfs from the kernel it is now impossible to satisfy the
USB_DEFICEFS = n dependency and thus it is impossible to enable user
namespace support in 3.5-rc1. So remove the now useless depedency.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
When hotadd_new_pgdat() is called to create new pgdat for a new node, a
fallback zonelist should be created for the new node. There's code to try
to achieve that in hotadd_new_pgdat() as below:
/*
* The node we allocated has no zone fallback lists. For avoiding
* to access not-initialized zonelist, build here.
*/
mutex_lock(&zonelists_mutex);
build_all_zonelists(pgdat, NULL);
mutex_unlock(&zonelists_mutex);
But it doesn't work as expected. When hotadd_new_pgdat() is called, the
new node is still in offline state because node_set_online(nid) hasn't
been called yet. And build_all_zonelists() only builds zonelists for
online nodes as:
for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
build_zonelists(pgdat);
build_zonelist_cache(pgdat);
}
Though we hope to create zonelist for the new pgdat, but it doesn't. So
add a new parameter "pgdat" the build_all_zonelists() to build pgdat for
the new pgdat too.
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Keping Chen <chenkeping@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement a new controller that allows us to control HugeTLB allocations.
The extension allows to limit the HugeTLB usage per control group and
enforces the controller limit during page fault. Since HugeTLB doesn't
support page reclaim, enforcing the limit at page fault time implies that,
the application will get SIGBUS signal if it tries to access HugeTLB pages
beyond its limit. This requires the application to know beforehand how
much HugeTLB pages it would require for its use.
The charge/uncharge calls will be added to HugeTLB code in later patch.
Support for cgroup removal will be added in later patches.
[akpm@linux-foundation.org: s/CONFIG_CGROUP_HUGETLB_RES_CTLR/CONFIG_MEMCG_HUGETLB/g]
[akpm@linux-foundation.org: s/CONFIG_MEMCG_HUGETLB/CONFIG_CGROUP_HUGETLB/g]
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull ARM updates from Russell King:
"First ARM push of this merge window, post me coming back from holiday.
This is what has been in linux-next for the last few weeks. Not much
to say which isn't described by the commit summaries."
* 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (32 commits)
ARM: 7463/1: topology: Update cpu_power according to DT information
ARM: 7462/1: topology: factorize the update of sibling masks
ARM: 7461/1: topology: Add arch_scale_freq_power function
ARM: 7456/1: ptrace: provide separate functions for tracing syscall {entry,exit}
ARM: 7455/1: audit: move syscall auditing until after ptrace SIGTRAP handling
ARM: 7454/1: entry: don't bother with syscall tracing on ret_from_fork path
ARM: 7453/1: audit: only allow syscall auditing for pure EABI userspace
ARM: 7452/1: delay: allow timer-based delay implementation to be selected
ARM: 7451/1: arch timer: implement read_current_timer and get_cycles
ARM: 7450/1: dcache: select DCACHE_WORD_ACCESS for little-endian ARMv6+ CPUs
ARM: 7449/1: use generic strnlen_user and strncpy_from_user functions
ARM: 7448/1: perf: remove arm_perf_pmu_ids global enumeration
ARM: 7447/1: rwlocks: remove unused branch labels from trylock routines
ARM: 7446/1: spinlock: use ticket algorithm for ARMv6+ locking implementation
ARM: 7445/1: mm: update CONTEXTIDR register to contain PID of current process
ARM: 7444/1: kernel: add arch-timer C3STOP feature
ARM: 7460/1: remove asm/locks.h
ARM: 7439/1: head.S: simplify initial page table mapping
ARM: 7437/1: zImage: Allow DTB command line concatenation with ATAG_CMDLINE
ARM: 7436/1: Do not map the vectors page as write-through on UP systems
...
main.c has initcall_level_names[] for parse_args to print in debug messages,
add comments to keep them in sync with initcalls defined in init.h.
Also add "loadable" into comment re not using *_initcall macros in
modules, to disambiguate from kernel/params.c and other builtins.
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>