linux/net/ipv4
Yaogong Wang 9f5afeae51 tcp: use an RB tree for ooo receive queue
Over the years, TCP BDP has increased by several orders of magnitude,
and some people are considering to reach the 2 Gbytes limit.

Even with current window scale limit of 14, ~1 Gbytes maps to ~740,000
MSS.

In presence of packet losses (or reorders), TCP stores incoming packets
into an out of order queue, and number of skbs sitting there waiting for
the missing packets to be received can be in the 10^5 range.

Most packets are appended to the tail of this queue, and when
packets can finally be transferred to receive queue, we scan the queue
from its head.

However, in presence of heavy losses, we might have to find an arbitrary
point in this queue, involving a linear scan for every incoming packet,
throwing away cpu caches.

This patch converts it to a RB tree, to get bounded latencies.

Yaogong wrote a preliminary patch about 2 years ago.
Eric did the rebase, added ofo_last_skb cache, polishing and tests.

Tested with network dropping between 1 and 10 % packets, with good
success (about 30 % increase of throughput in stress tests)

Next step would be to also use an RB tree for the write queue at sender
side ;)

Signed-off-by: Yaogong Wang <wygivan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-By: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:25:58 -07:00
..
netfilter netfilter: log: Check param to avoid overflow in nf_log_set 2016-08-30 11:52:32 +02:00
af_inet.c tcp: Set read_sock and peek_len proto_ops 2016-08-28 23:32:41 -04:00
ah4.c
arp.c net: rename NET_{ADD|INC}_STATS_BH() 2016-04-27 22:48:24 -04:00
cipso_ipv4.c Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/selinux into next 2016-07-07 10:15:34 +10:00
datagram.c
devinet.c ipv4: do not abuse GFP_ATOMIC in inet_netconf_notify_devconf() 2016-07-09 18:12:25 -04:00
esp4.c esp: Fix ESN generation under UDP encapsulation 2016-06-23 11:52:00 -04:00
fib_frontend.c net: Remove fib_local variable 2016-08-09 14:57:39 -07:00
fib_lookup.h
fib_rules.c net: Add l3mdev rule 2016-06-08 11:36:02 -07:00
fib_semantics.c net: ipv4: fix sparse error in fib_good_nh() 2016-08-19 17:07:30 -07:00
fib_trie.c fib_trie: Fix the description of pos and bits 2016-08-18 23:51:23 -07:00
fou.c fou: make nla_policy const 2016-09-01 14:09:00 -07:00
gre_demux.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-06-30 05:03:36 -04:00
gre_offload.c gso: Remove arbitrary checks for unsupported GSO 2016-05-20 18:03:15 -04:00
icmp.c net: icmp: rename ICMPMSGIN_INC_STATS_BH() 2016-04-27 22:48:23 -04:00
igmp.c net/multicast: should not send source list records when have filter mode change 2016-08-08 16:04:39 -07:00
inet_connection_sock.c timers, net/ipv4/inet: Initialize connection request timers as pinned 2016-07-07 10:35:06 +02:00
inet_diag.c net: inet: diag: expose the socket mark to privileged processes. 2016-09-08 16:13:09 -07:00
inet_fragment.c net: disable fragment reassembly if high_thresh is zero 2016-06-05 22:56:42 -04:00
inet_hashtables.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-05-04 00:52:29 -04:00
inet_timewait_sock.c timers, net/ipv4/inet: Initialize connection request timers as pinned 2016-07-07 10:35:06 +02:00
inetpeer.c
ip_forward.c net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags 2016-07-19 16:40:22 -07:00
ip_fragment.c net: rename IP_INC_STATS_BH() 2016-04-27 22:48:23 -04:00
ip_gre.c gre: set inner_protocol on xmit 2016-08-15 13:37:12 -07:00
ip_input.c net: original ingress device index in PKTINFO 2016-05-11 19:31:40 -04:00
ip_options.c
ip_output.c net: lwtunnel: Handle fragmentation 2016-08-30 22:27:18 -07:00
ip_sockglue.c sock: propagate __sock_cmsg_send() error 2016-05-16 13:46:23 -04:00
ip_tunnel_core.c net: ip_finish_output_gso: Allow fragmenting segments of tunneled skbs if their DF is unset 2016-08-22 17:11:01 -07:00
ip_tunnel.c net: ipv4: Add ability to have GRE ignore DF bit in IPv4 payloads 2016-06-15 21:39:59 -07:00
ip_vti.c vti: flush x-netns xfrm cache when vti interface is removed 2016-08-09 12:57:49 -07:00
ipcomp.c
ipconfig.c net: ipconfig: Fix NULL pointer dereference on RARP/BOOTP/DHCP timeout 2016-08-22 21:04:41 -07:00
ipip.c ipip: support MPLS over IPv4 2016-07-09 17:45:56 -04:00
ipmr.c net: ipmr/ip6mr: update lastuse on entry change 2016-07-26 15:18:31 -07:00
Kconfig tcp: add NV congestion control 2016-06-10 23:07:49 -07:00
Makefile tcp: add NV congestion control 2016-06-10 23:07:49 -07:00
netfilter.c
ping.c
proc.c tcp: md5: add LINUX_MIB_TCPMD5FAILURE counter 2016-08-25 16:43:11 -07:00
protocol.c
raw.c
route.c net: lwtunnel: Handle fragmentation 2016-08-30 22:27:18 -07:00
syncookies.c net: rename NET_{ADD|INC}_STATS_BH() 2016-04-27 22:48:24 -04:00
sysctl_net_ipv4.c ipv4: Fix non-initialized TTL when CONFIG_SYSCTL=n 2016-05-23 14:32:06 -07:00
tcp_bic.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_cdg.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_cong.c
tcp_cubic.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_dctcp.c tcp: return sizeof tcp_dctcp_info in dctcp_get_info() 2016-06-14 23:46:30 -07:00
tcp_diag.c net: diag: Fix refcnt leak in error path destroying socket 2016-08-23 23:11:36 -07:00
tcp_fastopen.c tcp: do not assume TCP code is non preemptible 2016-05-02 17:02:25 -04:00
tcp_highspeed.c
tcp_htcp.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_hybla.c
tcp_illinois.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_input.c tcp: use an RB tree for ooo receive queue 2016-09-08 17:25:58 -07:00
tcp_ipv4.c tcp: use an RB tree for ooo receive queue 2016-09-08 17:25:58 -07:00
tcp_lp.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_metrics.c tcp: make nla_policy const 2016-09-01 14:09:01 -07:00
tcp_minisocks.c tcp: use an RB tree for ooo receive queue 2016-09-08 17:25:58 -07:00
tcp_nv.c tcp: add NV congestion control 2016-06-10 23:07:49 -07:00
tcp_offload.c gso: Remove arbitrary checks for unsupported GSO 2016-05-20 18:03:15 -04:00
tcp_output.c tcp: defer sacked assignment 2016-08-18 23:27:27 -07:00
tcp_probe.c
tcp_recovery.c tcp: do not assume TCP code is non preemptible 2016-05-02 17:02:25 -04:00
tcp_scalable.c
tcp_timer.c tcp_timer.c: Add kernel-doc function descriptions 2016-07-15 23:18:14 -07:00
tcp_vegas.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_vegas.h tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_veno.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_westwood.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_yeah.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp.c tcp: use an RB tree for ooo receive queue 2016-09-08 17:25:58 -07:00
tunnel4.c tunnels: correct conditional build of MPLS and IPv6 2016-07-11 13:27:06 -07:00
udp_diag.c net: inet: diag: expose the socket mark to privileged processes. 2016-09-08 16:13:09 -07:00
udp_impl.h
udp_offload.c gso: Remove arbitrary checks for unsupported GSO 2016-05-20 18:03:15 -04:00
udp_tunnel.c net: Remove deprecated tunnel specific UDP offload functions 2016-06-17 20:23:32 -07:00
udp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-08-30 00:54:02 -04:00
udplite.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-08-30 00:54:02 -04:00
xfrm4_input.c
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c
xfrm4_output.c
xfrm4_policy.c net: xfrm: fix old-style declaration 2016-06-16 22:06:30 -07:00
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c