linux/net
Daniel Borkmann 6bb0fef489 netlink, mmap: fix edge-case leakages in nf queue zero-copy
When netlink mmap on receive side is the consumer of nf queue data,
it can happen that in some edge cases, we write skb shared info into
the user space mmap buffer:

Assume a possible rx ring frame size of only 4096, and the network skb,
which is being zero-copied into the netlink skb, contains page frags
with an overall skb->len larger than the linear part of the netlink
skb.

skb_zerocopy(), which is generic and thus not aware of the fact that
shared info cannot be accessed for such skbs then tries to write and
fill frags, thus leaking kernel data/pointers and in some corner cases
possibly writing out of bounds of the mmap area (when filling the
last slot in the ring buffer this way).

I.e. the ring buffer slot is then of status NL_MMAP_STATUS_VALID, has
an advertised length larger than 4096, where the linear part is visible
at the slot beginning, and the leaked sizeof(struct skb_shared_info)
has been written to the beginning of the next slot (also corrupting
the struct nl_mmap_hdr slot header incl. status etc), since skb->end
points to skb->data + ring->frame_size - NL_MMAP_HDRLEN.

The fix adds and lets __netlink_alloc_skb() take the actual needed
linear room for the network skb + meta data into account. It's completely
irrelevant for non-mmaped netlink sockets, but in case mmap sockets
are used, it can be decided whether the available skb_tailroom() is
really large enough for the buffer, or whether it needs to internally
fallback to a normal alloc_skb().

>From nf queue side, the information whether the destination port is
an mmap RX ring is not really available without extra port-to-socket
lookup, thus it can only be determined in lower layers i.e. when
__netlink_alloc_skb() is called that checks internally for this. I
chose to add the extra ldiff parameter as mmap will then still work:
We have data_len and hlen in nfqnl_build_packet_message(), data_len
is the full length (capped at queue->copy_range) for skb_zerocopy()
and hlen some possible part of data_len that needs to be copied; the
rem_len variable indicates the needed remaining linear mmap space.

The only other workaround in nf queue internally would be after
allocation time by f.e. cap'ing the data_len to the skb_tailroom()
iff we deal with an mmap skb, but that would 1) expose the fact that
we use a mmap skb to upper layers, and 2) trim the skb where we
otherwise could just have moved the full skb into the normal receive
queue.

After the patch, in my test case the ring slot doesn't fit and therefore
shows NL_MMAP_STATUS_COPY, where a full skb carries all the data and
thus needs to be picked up via recv().

Fixes: 3ab1f683bf ("nfnetlink: add support for memory mapped netlink")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-09 21:43:22 -07:00
..
6lowpan 6lowpan: move module_init into core functionality 2015-08-11 22:05:36 +02:00
9p 9p: ensure err is initialized to 0 in p9_client_read/write 2015-08-22 21:35:02 -04:00
802
8021q net: 8021q: convert to using IFF_NO_QUEUE 2015-08-18 11:55:06 -07:00
appletalk
atm br2684: Remove unnecessary formatting macros b1 and bs 2015-07-31 15:25:52 -07:00
ax25 NET: AX.25: Stop heartbeat timer on disconnect. 2015-07-15 15:59:58 -07:00
batman-adv batman-adv: turn batadv_neigh_node_get() into local function 2015-08-27 20:15:34 +02:00
bluetooth Bluetooth: Fix SCO link type handling on connection complete 2015-08-28 21:03:00 +02:00
bridge net: bridge: remove unnecessary switchdev include 2015-09-08 22:33:14 -07:00
caif net: caif: convert to using IFF_NO_QUEUE 2015-08-18 11:55:07 -07:00
can can: replace timestamp as unique skb attribute 2015-07-12 21:13:22 +02:00
ceph libceph: treat sockaddr_storage with uninitialized family as blank 2015-07-09 20:30:34 +03:00
core net: ipv6: use common fib_default_rule_pref 2015-09-09 14:19:50 -07:00
dcb
dccp tcp: fix recv with flags MSG_WAITALL | MSG_PEEK 2015-07-27 01:06:53 -07:00
decnet net: ipv6: use common fib_default_rule_pref 2015-09-09 14:19:50 -07:00
dns_resolver
dsa net: dsa: Allow DSA and CPU ports to have a phy-mode property 2015-08-31 14:48:02 -07:00
ethernet flow_dissector: Add flags argument to skb_flow_dissector functions 2015-09-01 15:06:22 -07:00
hsr net: hsr: convert to using IFF_NO_QUEUE 2015-08-18 11:55:07 -07:00
ieee802154 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2015-08-29 13:15:03 -07:00
ipv4 net: ipv6: use common fib_default_rule_pref 2015-09-09 14:19:50 -07:00
ipv6 ipv6: fix ifnullfree.cocci warnings 2015-09-09 17:21:01 -07:00
ipx
irda
iucv
key net: Fix RCU splat in af_key 2015-08-24 14:48:10 -07:00
l2tp
lapb
llc tcp: fix recv with flags MSG_WAITALL | MSG_PEEK 2015-07-27 01:06:53 -07:00
mac80211 mac80211: reject software RSSI CQM with beacon filtering 2015-09-04 15:23:22 +02:00
mac802154 ieee802154: add ack request default handling 2015-08-10 20:43:06 +02:00
mpls mpls: fix mpls_net_init memory leak 2015-08-31 12:45:09 -07:00
netfilter netlink, mmap: fix edge-case leakages in nf queue zero-copy 2015-09-09 21:43:22 -07:00
netlabel
netlink netlink, mmap: fix edge-case leakages in nf queue zero-copy 2015-09-09 21:43:22 -07:00
netrom netfilter: Remove spurios included of netfilter.h 2015-06-18 21:14:32 +02:00
nfc nfc: netlink: Add capability to reply to vendor_cmd with data 2015-08-20 22:00:11 +02:00
openvswitch openvswitch: Remove conntrack Kconfig option. 2015-09-06 23:48:33 -07:00
packet packet: add extended BPF fanout mode 2015-08-17 14:22:48 -07:00
phonet
rds RDS: verify the underlying transport exists before creating a connection 2015-09-09 12:38:30 -07:00
rfkill rfkill: Copy "all" global state to other types 2015-09-04 14:26:56 +02:00
rose Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-24 02:58:51 -07:00
rxrpc
sched flow_dissector: Add flags argument to skb_flow_dissector functions 2015-09-01 15:06:22 -07:00
sctp sctp: add routing output fallback 2015-09-03 15:43:05 -07:00
sunrpc NFS client bugfixes for Linux 4.2 2015-07-28 09:37:44 -07:00
switchdev switchdev: fix return value of switchdev_port_fdb_dump in case of error 2015-09-05 22:02:11 -07:00
tipc net: tipc: fix stall during bclink wakeup procedure 2015-09-08 22:50:26 -07:00
unix net/unix: support SCM_SECURITY for stream sockets 2015-06-10 22:49:20 -07:00
vmw_vsock
wimax net:wimax: Fix doucble word "the the" in networking.xml 2015-08-09 22:43:52 -07:00
wireless cfg80211: regulatory: restore proper user alpha2 2015-09-04 14:29:25 +02:00
x25
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2015-09-03 08:08:17 -07:00
compat.c
Kconfig lwtunnel: infrastructure for handling light weight tunnels like mpls 2015-07-21 10:39:03 -07:00
Makefile
socket.c
sysctl_net.c