linux/net
Martin KaFai Lau 46f8bc9275 bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper
In kernel, it is common to check "skb->sk && sk_fullsock(skb->sk)"
before accessing the fields in sock.  For example, in __netdev_pick_tx:

static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb,
			    struct net_device *sb_dev)
{
	/* ... */

	struct sock *sk = skb->sk;

		if (queue_index != new_index && sk &&
		    sk_fullsock(sk) &&
		    rcu_access_pointer(sk->sk_dst_cache))
			sk_tx_queue_set(sk, new_index);

	/* ... */

	return queue_index;
}

This patch adds a "struct bpf_sock *sk" pointer to the "struct __sk_buff"
where a few of the convert_ctx_access() in filter.c has already been
accessing the skb->sk sock_common's fields,
e.g. sock_ops_convert_ctx_access().

"__sk_buff->sk" is a PTR_TO_SOCK_COMMON_OR_NULL in the verifier.
Some of the fileds in "bpf_sock" will not be directly
accessible through the "__sk_buff->sk" pointer.  It is limited
by the new "bpf_sock_common_is_valid_access()".
e.g. The existing "type", "protocol", "mark" and "priority" in bpf_sock
     are not allowed.

The newly added "struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)"
can be used to get a sk with all accessible fields in "bpf_sock".
This helper is added to both cg_skb and sched_(cls|act).

int cg_skb_foo(struct __sk_buff *skb) {
	struct bpf_sock *sk;

	sk = skb->sk;
	if (!sk)
		return 1;

	sk = bpf_sk_fullsock(sk);
	if (!sk)
		return 1;

	if (sk->family != AF_INET6 || sk->protocol != IPPROTO_TCP)
		return 1;

	/* some_traffic_shaping(); */

	return 1;
}

(1) The sk is read only

(2) There is no new "struct bpf_sock_common" introduced.

(3) Future kernel sock's members could be added to bpf_sock only
    instead of repeatedly adding at multiple places like currently
    in bpf_sock_ops_md, bpf_sock_addr_md, sk_reuseport_md...etc.

(4) After "sk = skb->sk", the reg holding sk is in type
    PTR_TO_SOCK_COMMON_OR_NULL.

(5) After bpf_sk_fullsock(), the return type will be in type
    PTR_TO_SOCKET_OR_NULL which is the same as the return type of
    bpf_sk_lookup_xxx().

    However, bpf_sk_fullsock() does not take refcnt.  The
    acquire_reference_state() is only depending on the return type now.
    To avoid it, a new is_acquire_function() is checked before calling
    acquire_reference_state().

(6) The WARN_ON in "release_reference_state()" is no longer an
    internal verifier bug.

    When reg->id is not found in state->refs[], it means the
    bpf_prog does something wrong like
    "bpf_sk_release(bpf_sk_fullsock(skb->sk))" where reference has
    never been acquired by calling "bpf_sk_fullsock(skb->sk)".

    A -EINVAL and a verbose are done instead of WARN_ON.  A test is
    added to the test_verifier in a later patch.

    Since the WARN_ON in "release_reference_state()" is no longer
    needed, "__release_reference_state()" is folded into
    "release_reference_state()" also.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-10 19:46:17 -08:00
..
6lowpan 6lowpan: convert to DEFINE_SHOW_ATTRIBUTE 2018-12-19 00:28:05 +01:00
9p 9p/net: put a lower bound on msize 2018-12-25 17:07:49 +09:00
802
8021q net: core: dev: Add extack argument to dev_change_flags() 2018-12-06 13:26:07 -08:00
appletalk
atm Revert "net: simplify sock_poll_wait" 2018-10-23 10:57:06 -07:00
ax25 ax25: fix possible use-after-free 2019-01-23 11:18:00 -08:00
batman-adv This feature/cleanup patchset includes the following patches: 2019-02-01 11:04:13 -08:00
bluetooth socket: Use old_timeval types for socket timestamps 2019-02-03 11:17:30 -08:00
bpf bpf: add BPF_PROG_TEST_RUN support for flow dissector 2019-01-29 01:08:29 +01:00
bpfilter bpfilter: remove extra header search paths for bpfilter_umh 2019-02-03 20:11:43 -08:00
bridge net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID 2019-02-06 14:17:03 -08:00
caif Revert "net: simplify sock_poll_wait" 2018-10-23 10:57:06 -07:00
can can: bcm: check timer values before ktime conversion 2019-01-22 11:33:46 +01:00
ceph libceph: avoid KEEPALIVE_PENDING races in ceph_con_keepalive() 2019-01-21 14:53:12 +01:00
core bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper 2019-02-10 19:46:17 -08:00
dcb
dccp tcp: Refactor pingpong code 2019-01-27 13:29:43 -08:00
decnet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-29 21:18:54 -08:00
dns_resolver
dsa net: dsa: Implement ndo_get_port_parent_id() 2019-02-06 14:17:03 -08:00
ethernet net: ethernet: provide nvmem_get_mac_address() 2018-12-03 15:40:30 -08:00
hsr
ieee802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-24 16:19:56 -08:00
ife
ipv4 net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID 2019-02-06 14:17:03 -08:00
ipv6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-29 21:18:54 -08:00
iucv iucv: Remove SKB list assumptions. 2018-11-10 16:55:11 -08:00
kcm
key af_key: fix indentation on declaration statement 2018-11-15 18:09:32 +01:00
l2tp ppp: Move PFC decompression to PPP generic layer 2018-12-20 16:49:30 -08:00
l3mdev l3mdev: add function to retreive upper master 2018-12-03 14:15:26 -08:00
lapb
llc llc: do not use sk_eat_skb() 2018-10-22 19:59:20 -07:00
mac80211 mac80211: fix missing/malformed documentation 2019-02-01 12:11:13 +01:00
mac802154
mpls net: mpls: netconf: perform strict checks also for doit handlers 2019-01-19 10:09:59 -08:00
ncsi net/ncsi: Add NCSI Mellanox OEM command 2018-11-27 16:37:20 -08:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-29 21:18:54 -08:00
netlabel
netlink net: netlink: add helper to retrieve NETLINK_F_STRICT_CHK 2019-01-19 10:09:58 -08:00
netrom netrom: switch to sock timer API 2019-01-27 10:38:04 -08:00
nfc net: Revert recent Spectre-v1 patches. 2018-12-23 16:01:35 -08:00
nsh
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2019-01-28 17:34:38 -08:00
packet af_packet: fix raw sockets over 6in4 tunnel 2019-01-17 15:54:45 -08:00
phonet net: Revert recent Spectre-v1 patches. 2018-12-23 16:01:35 -08:00
psample
qrtr
rds rds: rdma: update rdma transport for tos 2019-02-04 14:59:13 -08:00
rfkill rfkill: gpio: Remove unused include 2018-12-18 13:13:56 +01:00
rose net/rose: fix NULL ax25_cb kernel panic 2019-01-27 10:40:01 -08:00
rxrpc sockopt: Rename SO_TIMESTAMP* to SO_TIMESTAMP*_OLD 2019-02-03 11:17:30 -08:00
sched cls_flower: don't expose TC actions to drivers anymore 2019-02-06 10:38:26 -08:00
sctp sctp: add SCTP_FUTURE_ASOC and SCTP_CURRENT_ASSOC for SCTP_STREAM_SCHEDULER sockopt 2019-01-30 00:44:08 -08:00
smc net/smc: original socket family in inet_sock_diag 2019-02-07 18:06:19 -08:00
strparser
sunrpc SUNRPC: Address Kerberos performance/behavior regression 2019-01-15 15:36:41 -05:00
switchdev net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID 2019-02-06 14:17:03 -08:00
tipc tipc: remove dead code in struct tipc_topsrv 2019-01-24 22:31:23 -08:00
tls net: tls: Set async_capable for tls zerocopy only if we see EINPROGRESS 2019-02-01 15:05:07 -08:00
unix Revert "net: simplify sock_poll_wait" 2018-10-23 10:57:06 -07:00
vmw_vsock socket: move compat timeout handling into sock.c 2019-02-03 11:17:30 -08:00
wimax
wireless netlink: reduce NLA_POLICY_NESTED{,_ARRAY} arguments 2019-02-01 11:06:55 +01:00
x25 net/x25: handle call collisions 2018-11-29 14:25:36 -08:00
xdp Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2019-01-28 19:38:33 -08:00
xfrm xfrm: Make set-mark default behavior backward compatible 2019-01-16 13:10:55 +01:00
compat.c socket: Use old_timeval types for socket timestamps 2019-02-03 11:17:30 -08:00
Kconfig net: convert bridge_nf to use skb extension infrastructure 2018-12-19 11:21:37 -08:00
Makefile
socket.c socket: Add SO_TIMESTAMPING_NEW 2019-02-03 11:17:31 -08:00
sysctl_net.c