linux/net
Yaogong Wang 9f5afeae51 tcp: use an RB tree for ooo receive queue
Over the years, TCP BDP has increased by several orders of magnitude,
and some people are considering to reach the 2 Gbytes limit.

Even with current window scale limit of 14, ~1 Gbytes maps to ~740,000
MSS.

In presence of packet losses (or reorders), TCP stores incoming packets
into an out of order queue, and number of skbs sitting there waiting for
the missing packets to be received can be in the 10^5 range.

Most packets are appended to the tail of this queue, and when
packets can finally be transferred to receive queue, we scan the queue
from its head.

However, in presence of heavy losses, we might have to find an arbitrary
point in this queue, involving a linear scan for every incoming packet,
throwing away cpu caches.

This patch converts it to a RB tree, to get bounded latencies.

Yaogong wrote a preliminary patch about 2 years ago.
Eric did the rebase, added ofo_last_skb cache, polishing and tests.

Tested with network dropping between 1 and 10 % packets, with good
success (about 30 % increase of throughput in stress tests)

Next step would be to also use an RB tree for the write queue at sender
side ;)

Signed-off-by: Yaogong Wang <wygivan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-By: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:25:58 -07:00
..
6lowpan 6lowpan: ndisc: set invalid unicast short addr to unspec 2016-07-08 13:23:12 +02:00
9p 9p/trans_virtio: use kvfree() for iov_iter_get_pages_alloc() 2016-08-09 13:42:36 +03:00
802
8021q net: remove type_check from dev_get_nest_level() 2016-08-13 15:15:54 -07:00
appletalk
atm net: atm: remove redundant null pointer check on dev->name 2016-08-18 21:03:48 -07:00
ax25 AX.25: Close socket connection on session completion 2016-06-18 20:55:34 -07:00
batman-adv batman: make netlink attributes const 2016-09-01 14:09:00 -07:00
bluetooth Bluetooth: Fix hci_sock_recvmsg when MSG_TRUNC is not set 2016-08-25 20:58:47 +02:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2016-09-06 12:45:26 -07:00
caif caif: Remove unneeded header file 2016-06-28 05:26:14 -04:00
can can: only call can_stat_update with procfs 2016-06-23 11:23:49 +02:00
ceph libceph: using kfree_rcu() to simplify the code 2016-08-08 21:41:42 +02:00
core tcp: use an RB tree for ooo receive queue 2016-09-08 17:25:58 -07:00
dcb
dccp Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-07-29 17:38:46 -07:00
decnet net: fix decnet rtnexthop parsing 2016-07-05 14:08:47 -07:00
dns_resolver
dsa net: dsa: add MDB support 2016-08-31 14:15:42 -07:00
ethernet
hsr net/hsr: Use setup_timer and mod_timer. 2016-05-16 14:00:43 -04:00
ieee802154 ieee802154: 6lowpan: fix intra pan id check 2016-07-08 13:23:12 +02:00
ipv4 tcp: use an RB tree for ooo receive queue 2016-09-08 17:25:58 -07:00
ipv6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2016-09-06 12:45:26 -07:00
ipx
irda net/irda: remove pointless assignment/check 2016-08-19 18:07:24 -07:00
iucv Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-07-29 17:38:46 -07:00
kcm kcm: Remove TCP specific references from kcm and strparser 2016-08-28 23:32:41 -04:00
key
l2tp l2tp: make nla_policy const 2016-09-01 14:09:01 -07:00
l3mdev net: vrf: Implement get_saddr for IPv6 2016-06-17 21:25:29 -07:00
lapb net/lapb: tuse %*ph to dump buffers 2016-05-29 22:33:25 -07:00
llc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-05-09 15:59:24 -04:00
mac80211 mac80211: call get_expected_throughput only after adding station 2016-08-11 20:00:37 +02:00
mac802154
mpls mpls: get rid of trivial returns 2016-09-01 10:13:15 -07:00
ncsi net/ncsi: avoid maybe-uninitialized warning 2016-07-25 10:32:59 -07:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2016-09-06 12:45:26 -07:00
netlabel netlabel: Implement CALIPSO config functions for SMACK. 2016-06-27 15:06:18 -04:00
netlink netlink: don't forget to release a rhashtable_iter structure 2016-09-07 17:29:38 -07:00
netrom
nfc NFC: digital: Fix RTOX supervisor PDU handling 2016-07-11 02:02:03 +02:00
openvswitch openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes 2016-09-08 17:10:28 -07:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-07-24 00:53:32 -04:00
phonet
qrtr Merge tag 'qcom-soc-for-4.7-2' into net-next 2016-05-17 14:11:19 -04:00
rds RDS: add __printf format attribute to error reporting functions 2016-08-08 16:16:21 -07:00
rfkill
rose rose: limit sk_filter trim to payload 2016-07-13 11:53:40 -07:00
rxrpc rxrpc: Add tracepoint for working out where aborts happen 2016-09-07 16:34:40 +01:00
sched Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-08-30 00:54:02 -04:00
sctp net: inet: diag: expose the socket mark to privileged processes. 2016-09-08 16:13:09 -07:00
strparser kcm: Remove TCP specific references from kcm and strparser 2016-08-28 23:32:41 -04:00
sunrpc NFS client bugfixes for Linux 4.8 2016-08-12 12:32:24 -07:00
switchdev rtnetlink: fdb dump: optimize by saving last interface markers 2016-09-01 16:56:15 -07:00
tipc tipc: send broadcast nack directly upon sequence gap detection 2016-09-02 17:10:25 -07:00
unix af_unix: charge buffers to kmemcg 2016-07-26 16:19:19 -07:00
vmw_vsock vhost/vsock: drop space available check for TX vq 2016-08-15 05:05:21 +03:00
wimax
wireless Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-08-18 01:17:32 -04:00
x25 net: fix a kernel infoleak in x25 module 2016-05-09 22:45:33 -04:00
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2016-09-08 13:09:41 -07:00
compat.c packet: compat support for sock_fprog 2016-06-09 23:41:03 -07:00
Kconfig strparser: Stream parser for messages 2016-08-17 19:36:23 -04:00
Makefile strparser: Stream parser for messages 2016-08-17 19:36:23 -04:00
socket.c fs: poll/select/recvmmsg: use timespec64 for timeout events 2016-05-19 19:12:14 -07:00
sysctl_net.c net: make net namespace sysctls belong to container's owner 2016-08-14 21:08:58 -07:00