linux/net
Florian Westphal 86c1a04564 tcp: use zero-window when free_space is low
Currently the kernel tries to announce a zero window when free_space
is below the current receiver mss estimate.

When a sender is transmitting small packets and reader consumes data
slowly (or not at all), receiver might be unable to shrink the receive
win because

a) we cannot withdraw already-commited receive window, and,
b) we have to round the current rwin up to a multiple of the wscale
   factor, else we would shrink the current window.

This causes the receive buffer to fill up until the rmem limit is hit.
When this happens, we start dropping packets.

Moreover, tcp_clamp_window may continue to grow sk_rcvbuf towards rmem[2]
even if socket is not being read from.

As we cannot avoid the "current_win is rounded up to multiple of mss"
issue [we would violate a) above] at least try to prevent the receive buf
growth towards tcp_rmem[2] limit by attempting to move to zero-window
announcement when free_space becomes less than 1/16 of the current
allowed receive buffer maximum.  If tcp_rmem[2] is large, this will
increase our chances to get a zero-window announcement out in time.

Reproducer:
On server:
$ nc -l -p 12345
<suspend it: CTRL-Z>

Client:
#!/usr/bin/env python
import socket
import time

sock = socket.socket()
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
sock.connect(("192.168.4.1", 12345));
while True:
   sock.send('A' * 23)
   time.sleep(0.005)

socket buffer on server-side will grow until tcp_rmem[2] is hit,
at which point the client rexmits data until -EDTIMEOUT:

tcp_data_queue invokes tcp_try_rmem_schedule which will call
tcp_prune_queue which calls tcp_clamp_window().  And that function will
grow sk->sk_rcvbuf up until it eventually hits tcp_rmem[2].

Thanks to Eric Dumazet for running regression tests.

Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-19 16:48:06 -05:00
..
9p 9p/trans_virtio.c: Fix broken zero-copy on vmalloc() buffers 2014-02-10 17:48:54 -08:00
802 neigh: use NEIGH_VAR_INIT in ndo_neigh_setup functions. 2014-01-16 11:31:58 -08:00
8021q net: introduce netdev_alloc_pcpu_stats() for drivers 2014-02-14 15:49:55 -05:00
appletalk appletalk: fix checkpatch error with indent 2014-02-14 16:18:32 -05:00
atm net: Fix some fallout from the etner_addr_copy() changes. 2014-01-21 18:57:26 -08:00
ax25 net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-02-19 01:24:22 -05:00
bluetooth Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid 2014-02-18 16:29:46 -08:00
bridge net: introduce netdev_alloc_pcpu_stats() for drivers 2014-02-14 15:49:55 -05:00
caif net: Include appropriate header file in caif/cfsrvl.c 2014-02-09 17:32:49 -08:00
can linux-can-fixes-for-3.14-20140129 2014-01-30 16:48:17 -08:00
ceph net: remove unnecessary return's 2014-02-13 18:33:38 -05:00
core Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-02-19 01:24:22 -05:00
dcb dcb: use __dev_get_by_name instead of dev_get_by_name to find interface 2014-01-14 18:50:46 -08:00
dccp dccp: re-enable debug macro 2014-02-16 23:45:00 -05:00
decnet net: Move prototype declaration to header file include/net/dn.h from net/decnet/af_decnet.c 2014-02-09 17:32:49 -08:00
dns_resolver net/*: Fix FSF address in file headers 2013-12-06 12:37:57 -05:00
dsa dsa: Use ether_addr_copy 2014-01-21 18:13:05 -08:00
ethernet net: eth_type_trans() should use skb_header_pointer() 2014-01-16 15:30:31 -08:00
hsr hsr: Use ether_addr_copy 2014-02-18 18:14:09 -05:00
ieee802154 ieee802154: fix faulty check in set_phy_params api 2014-02-18 18:11:05 -05:00
ipv4 tcp: use zero-window when free_space is low 2014-02-19 16:48:06 -05:00
ipv6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-02-19 01:24:22 -05:00
ipx ipx: implement shutdown() 2014-02-12 19:26:32 -05:00
irda net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
iucv net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
key xfrm: export verify_userspi_info for pkfey and netlink interface 2013-12-16 12:54:02 +01:00
l2tp net: remove unnecessary return's 2014-02-13 18:33:38 -05:00
lapb net/lapb: re-send packets on timeout 2013-09-23 16:52:45 -04:00
llc llc: remove noisy WARN from llc_mac_hdr_init 2014-01-28 18:01:32 -08:00
mac80211 netdevice: add queue selection fallback handler for ndo_select_queue 2014-02-17 00:36:34 -05:00
mac802154 ieee802154: add netlink APIs for smartMAC configuration 2014-02-17 16:42:39 -05:00
mpls ipip: add GSO/TSO support 2013-10-19 19:36:19 -04:00
netfilter net: Include appropriate header file in netfilter/nft_lookup.c 2014-02-09 17:32:50 -08:00
netlabel netlabel: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
netlink netlink: fix checkpatch errors space and "foo *bar" 2014-02-17 16:57:28 -05:00
netrom net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00
openvswitch openvswitch: rename ->sync to ->syncp 2014-02-15 02:06:23 -05:00
packet af_packet: remove a stray tab in packet_set_ring() 2014-02-18 18:02:25 -05:00
phonet net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
rds net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
rfkill Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00
rose net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
rxrpc RxRPC fixes 2014-01-28 18:04:18 -08:00
sched Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-02-19 01:24:22 -05:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-02-19 01:24:22 -05:00
sunrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-02-11 12:05:55 -08:00
tipc tipc: failed transmissions should return error 2014-02-19 16:40:57 -05:00
unix net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
vmw_vsock net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
wimax wimax: remove dead code 2013-11-21 13:09:42 -05:00
wireless net: remove unnecessary return's 2014-02-13 18:33:38 -05:00
x25 net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00
compat.c x86, x32: Correct invalid use of user timespec in the kernel 2014-01-30 18:44:13 -08:00
Kconfig net: netprio: rename config to be more consistent with cgroup configs 2014-01-03 23:41:42 +01:00
Makefile net: move 6lowpan compression code to separate module 2014-01-15 15:36:38 -08:00
nonet.c
socket.c socket: replace some printk with pr_* 2014-02-13 18:15:10 -05:00
sysctl_net.c net: Update the sysctl permissions handler to test effective uid/gid 2013-10-07 15:57:56 -04:00