linux/net/core
Guillaume Nault 4057765f2d sock: consistent handling of extreme SO_SNDBUF/SO_RCVBUF values
SO_SNDBUF and SO_RCVBUF (and their *BUFFORCE version) may overflow or
underflow their input value. This patch aims at providing explicit
handling of these extreme cases, to get a clear behaviour even with
values bigger than INT_MAX / 2 or lower than INT_MIN / 2.

For simplicity, only SO_SNDBUF and SO_SNDBUFFORCE are described here,
but the same explanation and fix apply to SO_RCVBUF and SO_RCVBUFFORCE
(with 'SNDBUF' replaced by 'RCVBUF' and 'wmem_max' by 'rmem_max').

Overflow of positive values

===========================

When handling SO_SNDBUF or SO_SNDBUFFORCE, if 'val' exceeds
INT_MAX / 2, the buffer size is set to its minimum value because
'val * 2' overflows, and max_t() considers that it's smaller than
SOCK_MIN_SNDBUF. For SO_SNDBUF, this can only happen with
net.core.wmem_max > INT_MAX / 2.

SO_SNDBUF and SO_SNDBUFFORCE are actually designed to let users probe
for the maximum buffer size by setting an arbitrary large number that
gets capped to the maximum allowed/possible size. Having the upper
half of the positive integer space to potentially reduce the buffer
size to its minimum value defeats this purpose.

This patch caps the base value to INT_MAX / 2, so that bigger values
don't overflow and keep setting the buffer size to its maximum.

Underflow of negative values
============================

For negative numbers, SO_SNDBUF always considers them bigger than
net.core.wmem_max, which is bounded by [SOCK_MIN_SNDBUF, INT_MAX].
Therefore such values are set to net.core.wmem_max and we're back to
the behaviour of positive integers described above (return maximum
buffer size if wmem_max <= INT_MAX / 2, return SOCK_MIN_SNDBUF
otherwise).

However, SO_SNDBUFFORCE behaves differently. The user value is
directly multiplied by two and compared with SOCK_MIN_SNDBUF. If
'val * 2' doesn't underflow or if it underflows to a value smaller
than SOCK_MIN_SNDBUF then buffer size is set to its minimum value.
Otherwise the buffer size is set to the underflowed value.

This patch treats negative values passed to SO_SNDBUFFORCE as null, to
prevent underflows. Therefore negative values now always set the buffer
size to its minimum value.

Even though SO_SNDBUF behaves inconsistently by setting buffer size to
the maximum value when passed a negative number, no attempt is made to
modify this behaviour. There may exist some programs that rely on using
negative numbers to set the maximum buffer size. Avoiding overflows
because of extreme net.core.wmem_max values is the most we can do here.

Summary of altered behaviours
=============================

val      : user-space value passed to setsockopt()
val_uf   : the underflowed value resulting from doubling val when
           val < INT_MIN / 2
wmem_max : short for net.core.wmem_max
val_cap  : min(val, wmem_max)
min_len  : minimal buffer length (that is, SOCK_MIN_SNDBUF)
max_len  : maximal possible buffer length, regardless of wmem_max (that
           is, INT_MAX - 1)
^^^^     : altered behaviour

SO_SNDBUF:
+-------------------------+-------------+------------+----------------+
|       CONDITION         | OLD RESULT  | NEW RESULT |    COMMENT     |
+-------------------------+-------------+------------+----------------+
| val < 0 &&              |             |            | No overflow,   |
| wmem_max <= INT_MAX/2   | wmem_max*2  | wmem_max*2 | keep original  |
|                         |             |            | behaviour      |
+-------------------------+-------------+------------+----------------+
| val < 0 &&              |             |            | Cap wmem_max   |
| INT_MAX/2 < wmem_max    | min_len     | max_len    | to prevent     |
|                         |             | ^^^^^^^    | overflow       |
+-------------------------+-------------+------------+----------------+
| 0 <= val <= min_len/2   | min_len     | min_len    | Ordinary case  |
+-------------------------+-------------+------------+----------------+
| min_len/2 < val &&      | val_cap*2   | val_cap*2  | Ordinary case  |
| val_cap <= INT_MAX/2    |             |            |                |
+-------------------------+-------------+------------+----------------+
| min_len < val &&        |             |            | Cap val_cap    |
| INT_MAX/2 < val_cap     | min_len     | max_len    | again to       |
| (implies that           |             | ^^^^^^^    | prevent        |
| INT_MAX/2 < wmem_max)   |             |            | overflow       |
+-------------------------+-------------+------------+----------------+

SO_SNDBUFFORCE:
+------------------------------+---------+---------+------------------+
|          CONDITION           | BEFORE  | AFTER   |     COMMENT      |
|                              | PATCH   | PATCH   |                  |
+------------------------------+---------+---------+------------------+
| val < INT_MIN/2 &&           | min_len | min_len | Underflow with   |
| val_uf <= min_len            |         |         | no consequence   |
+------------------------------+---------+---------+------------------+
| val < INT_MIN/2 &&           | val_uf  | min_len | Set val to 0 to  |
| val_uf > min_len             |         | ^^^^^^^ | avoid underflow  |
+------------------------------+---------+---------+------------------+
| INT_MIN/2 <= val < 0         | min_len | min_len | No underflow     |
+------------------------------+---------+---------+------------------+
| 0 <= val <= min_len/2        | min_len | min_len | Ordinary case    |
+------------------------------+---------+---------+------------------+
| min_len/2 < val <= INT_MAX/2 | val*2   | val*2   | Ordinary case    |
+------------------------------+---------+---------+------------------+
| INT_MAX/2 < val              | min_len | max_len | Cap val to       |
|                              |         | ^^^^^^^ | prevent overflow |
+------------------------------+---------+---------+------------------+

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-16 18:09:54 -08:00
..
datagram.c for-4.21/block-20181221 2018-12-28 13:19:59 -08:00
dev_addr_lists.c net: dev: Issue NETDEV_PRE_CHANGEADDR 2018-12-13 18:41:38 -08:00
dev_ioctl.c net: dev: Add extack argument to dev_set_mac_address() 2018-12-13 18:41:38 -08:00
dev.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2019-02-06 16:56:20 -08:00
devlink.c devlink: Fix list access without lock while reading region 2019-02-14 11:45:39 -05:00
drop_monitor.c
dst_cache.c
dst.c net: add a route cache full diagnostic message 2019-01-17 15:37:25 -08:00
ethtool.c ethtool: Remove unnecessary null check in ethtool_rx_flow_rule_create 2019-02-08 23:05:18 -08:00
failover.c net: Introduce generic failover module 2018-05-28 22:59:54 -04:00
fib_notifier.c net: Fix fib notifer to return errno 2018-03-29 14:10:30 -04:00
fib_rules.c net/fib_rules: Update fib_nl_dumprule for strict data checking 2018-10-08 10:39:05 -07:00
filter.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-08 15:00:17 -08:00
flow_dissector.c net/flow_dissector: move bpf case into __skb_flow_bpf_dissect 2019-01-29 01:08:29 +01:00
flow_offload.c flow_offload: add flow action infrastructure 2019-02-06 10:38:25 -08:00
gen_estimator.c net: core: protect rate estimator statistics pointer with lock 2018-08-11 12:37:10 -07:00
gen_stats.c net/core: make function ___gnet_stats_copy_basic() static 2018-09-28 10:25:11 -07:00
gro_cells.c gro_cell: add napi_disable in gro_cells_destroy 2018-12-19 15:50:02 -08:00
hwbm.c
link_watch.c net: linkwatch: add check for netdevice being present to linkwatch_do_dev 2018-09-19 21:06:46 -07:00
lwt_bpf.c bpf: in __bpf_redirect_no_mac pull mac only if present 2019-01-20 01:11:48 +01:00
lwtunnel.c
Makefile flow_offload: add flow_rule and flow_match structures and use them 2019-02-06 10:38:25 -08:00
neighbour.c neighbour: Do not perturb drop profiles when neigh_probe 2019-01-17 22:08:14 -08:00
net_namespace.c net: namespace: perform strict checks also for doit handlers 2019-01-19 10:09:58 -08:00
net-procfs.c proc: introduce proc_create_net{,_data} 2018-05-16 07:24:30 +02:00
net-sysfs.c net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID 2019-02-06 14:17:03 -08:00
net-sysfs.h
net-traces.c net/ipv6: Udate fib6_table_lookup tracepoint 2018-05-24 23:01:15 -04:00
netclassid_cgroup.c cgroup, netclassid: add a preemption point to write_classid 2018-10-23 12:58:17 -07:00
netevent.c
netpoll.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-12-27 13:04:52 -08:00
netprio_cgroup.c
page_pool.c page_pool: use DMA_ATTR_SKIP_CPU_SYNC for DMA mappings 2019-02-13 22:00:16 -08:00
pktgen.c pktgen: Fix fall-through annotation 2018-09-13 15:36:41 -07:00
ptp_classifier.c
request_sock.c
rtnetlink.c net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID 2019-02-06 14:17:03 -08:00
scm.c socket: Add SO_TIMESTAMPING_NEW 2019-02-03 11:17:31 -08:00
secure_seq.c infiniband: i40iw, nes: don't use wall time for TCP sequence numbers 2018-07-11 12:10:19 -06:00
skbuff.c net, skbuff: do not prefer skb allocation fails early 2019-01-04 12:53:16 -08:00
skmsg.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-08 15:00:17 -08:00
sock_diag.c net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd() 2018-08-14 10:01:24 -07:00
sock_map.c bpf: skmsg, fix psock create on existing kcm/tls port 2018-10-20 00:40:45 +02:00
sock_reuseport.c sctp: add sock_reuseport for the sock in __sctp_hash_endpoint 2018-11-12 09:09:51 -08:00
sock.c sock: consistent handling of extreme SO_SNDBUF/SO_RCVBUF values 2019-02-16 18:09:54 -08:00
stream.c tcp: reduce POLLOUT events caused by TCP_NOTSENT_LOWAT 2018-12-04 21:21:18 -08:00
sysctl_net_core.c net: introduce a knob to control whether to inherit devconf config 2019-01-22 11:07:21 -08:00
timestamping.c
tso.c
utils.c net: Remove some unneeded semicolon 2018-08-04 13:05:39 -07:00
xdp.c xdp: remove redundant variable 'headroom' 2018-09-01 01:35:53 +02:00