linux/net
Lawrence Brakmo 85cce21578 bpf: Add BPF_SOCKET_OPS_BASE_RTT support to tcp_nv
TCP_NV will try to get the base RTT from a socket_ops BPF program if one
is loaded. NV will then use the base RTT to bound its min RTT (its
notion of the base RTT). It uses the base RTT as an upper bound and 80%
of the base RTT as its lower bound.

In other words, NV will consider filtered RTTs larger than base RTT as a
sign of congestion. As a result, there is no minRTT inflation when there
is a lot of congestion. For example, in a DC where the RTTs are less
than 40us when there is no congestion, a base RTT value of 80us improves
the performance of NV. The difference between the uncongested RTT and
the base RTT provided represents how much queueing we are willing to
have (in practice it can be higher).

NV has been tunned to reduce congestion when there are many flows at the
cost of one flow not achieving full bandwith utilization. When a
reasonable base RTT is provided, one NV flow can now fully utilize the
full bandwidth. In addition, the performance is also improved when there
are many flows.

In the following examples the NV results are using a kernel with this
patch set (i.e. both NV results are using the new nv_loss_dec_factor).

With one host sending to another host and only one flow the
goodputs are:
  Cubic: 9.3 Gbps, NV: 5.5 Gbps, NV (baseRTT=80us): 9.2 Gbps

With 2 hosts sending to one host (1 flow per host, the goodput per flow
is:
  Cubic: 4.6 Gbps, NV: 4.5 Gbps, NV (baseRTT=80us)L 4.6 Gbps

But the RTTs seen by a ping process in the sender is:
  Cubic: 3.3ms  NV: 97us,  NV (baseRTT=80us): 146us

With a lot of flows things look even better for NV with baseRTT. Here we
have 3 hosts sending to one host. Each sending host has 6 flows: 1
stream, 4x1MB RPC, 1x10KB RPC. Cubic, NV and NV with baseRTT all fully
utilize the full available bandwidth. However, the distribution of
bandwidth among the flows is very different. For the 10KB RPC flow:
  Cubic: 27Mbps, NV: 111Mbps, NV (baseRTT=80us): 222Mbps

The 99% latencies for the 10KB flows are:
  Cubic: 26ms,  NV: 1ms,  NV (baseRTT=80us): 500us

The RTT seen by a ping process at the senders:
  Cubic: 3.2ms  NV: 720us,  NV (baseRTT=80us): 330us

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 03:12:05 +01:00
..
6lowpan
9p net/9p: switch p9_fd_read to kernel_write 2017-09-04 19:05:16 -04:00
802
8021q Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-05 18:19:22 -07:00
appletalk
atm net: atm: Convert timers to use timer_setup() 2017-10-18 12:40:27 +01:00
ax25
batman-adv This cleanup patchset includes the following patches: 2017-10-06 10:12:52 -07:00
bluetooth Bluetooth: Fix compiler warning with selftest duration calculation 2017-10-06 21:49:13 +03:00
bpf bpf: add meta pointer for direct access 2017-09-26 13:36:44 -07:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-09 20:11:09 -07:00
caif
can net: can: Convert timers to use timer_setup() 2017-10-18 12:39:38 +01:00
ceph libceph: don't allow bidirectional swap of pg-upmap-items 2017-09-19 20:34:29 +02:00
core bpf: Adding helper function bpf_getsockops 2017-10-22 03:12:05 +01:00
dcb
dccp inet/connection_sock: Convert timers to use timer_setup() 2017-10-18 12:39:55 +01:00
decnet decnet: af_decnet: mark expected switch fall-throughs 2017-10-18 14:10:29 +01:00
dns_resolver
dsa net: sched: avoid ndo_setup_tc calls for TC_SETUP_CLS* 2017-10-21 03:04:08 +01:00
ethernet
hsr net/hsr: Check skb_put_padto() return value 2017-08-22 13:40:23 -07:00
ieee802154 Merge remote-tracking branch 'net-next/master' 2017-10-18 17:40:18 +02:00
ife
ipv4 bpf: Add BPF_SOCKET_OPS_BASE_RTT support to tcp_nv 2017-10-22 03:12:05 +01:00
ipv6 ipv6: let trace_fib6_table_lookup() dereference the fib table 2017-10-21 02:23:38 +01:00
ipx net: ipx: mark expected switch fall-through 2017-10-18 14:13:08 +01:00
iucv
kcm kcm: Remove redundant unlikely() 2017-09-26 09:54:06 -07:00
key
l2tp net: l2tp: mark expected switch fall-through 2017-10-19 13:33:23 +01:00
l3mdev
lapb net/lapb: Convert timers to use timer_setup() 2017-10-18 12:39:36 +01:00
llc
mac80211 mac80211: don't track HT capability changes 2017-10-13 14:29:02 +02:00
mac802154 mac802154: Fix MAC header and payload encrypted 2017-09-20 13:37:16 +02:00
mpls ip_tunnel: fix building with NET_IP_TUNNEL=m 2017-10-12 12:21:11 -07:00
ncsi net/ncsi: fix ncsi_vlan_rx_{add,kill}_vid references 2017-09-05 09:11:45 -07:00
netfilter bpf: Add file mode configuration into bpf maps 2017-10-20 13:32:59 +01:00
netlabel
netlink netlink: use NETLINK_CB(in_skb).sk instead of looking it up 2017-10-18 12:20:13 +01:00
netrom net: netrom: nr_in: mark expected switch fall-through 2017-10-22 02:00:33 +01:00
nfc net: nfc: llcp_core: use setup_timer() helper. 2017-09-25 13:19:20 -07:00
nsh nsh: add GSO support 2017-08-29 15:16:52 -07:00
openvswitch openvswitch: conntrack: mark expected switch fall-through 2017-10-22 02:01:26 +01:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-05 18:19:22 -07:00
phonet net: phonet: mark phonet_protocol as const 2017-10-07 23:15:08 +01:00
psample
qrtr net: qrtr: Support decoding incoming v2 packets 2017-10-11 15:28:39 -07:00
rds RDS: IB: Initialize max_items based on underlying device attributes 2017-10-05 21:16:33 -07:00
rfkill
rose net: rose: mark expected switch fall-throughs 2017-10-22 02:02:26 +01:00
rxrpc rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals 2017-10-18 11:43:07 +01:00
sched net: sched: mark expected switch fall-throughs 2017-10-22 02:07:08 +01:00
sctp sctp: make array sctp_sched_ops static 2017-10-11 20:18:25 -07:00
smc net/smc: dev_put for netdev after usage of ib_query_gid() 2017-10-12 12:20:27 -07:00
strparser strparser: initialize all callbacks 2017-08-24 21:57:50 -07:00
sunrpc sunrpc: Convert timers to use timer_setup() 2017-10-18 12:40:27 +01:00
switchdev
tipc tipc: refactor tipc_sk_timeout() function 2017-10-22 02:36:35 +01:00
tls tls: make tls_sw_free_resources static 2017-09-14 09:55:21 -07:00
unix net: af_unix: mark expected switch fall-through 2017-10-22 03:07:50 +01:00
vmw_vsock VSOCK: add sock_diag interface 2017-10-05 18:44:17 -07:00
wimax
wireless Three fixes for the recently added new code: 2017-10-14 18:36:46 -07:00
x25 net: x25: mark expected switch fall-throughs 2017-10-22 03:08:46 +01:00
xfrm xfrm: Convert timers to use timer_setup() 2017-10-18 12:39:37 +01:00
compat.c net: compat: assert the size of cmsg copied in is as expected 2017-09-20 15:36:18 -07:00
Kconfig net: Remove CONFIG_NETFILTER_DEBUG and _ASSERT() macros. 2017-09-04 13:25:20 +02:00
Makefile nsh: add GSO support 2017-08-29 15:16:52 -07:00
socket.c
sysctl_net.c