linux/include/net
Eric Dumazet d4589926d7 tcp: refine TSO splits
While investigating performance problems on small RPC workloads,
I noticed linux TCP stack was always splitting the last TSO skb
into two parts (skbs). One being a multiple of MSS, and a small one
with the Push flag. This split is done even if TCP_NODELAY is set,
or if no small packet is in flight.

Example with request/response of 4K/4K

IP A > B: . ack 68432 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: . 65537:68433(2896) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: P 68433:69633(1200) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
IP B > A: . ack 68433 win 2768 <nop,nop,timestamp 6525001 6524593>
IP B > A: . 69632:72528(2896) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
IP B > A: P 72528:73728(1200) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
IP A > B: . ack 72528 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: . 69633:72529(2896) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: P 72529:73729(1200) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>

We can avoid this split by including the Nagle tests at the right place.

Note : If some NIC had trouble sending TSO packets with a partial
last segment, we would have hit the problem in GRO/forwarding workload already.

tcp_minshall_update() is moved to tcp_output.c and is updated as we might
feed a TSO packet with a partial last segment.

This patch tremendously improves performance, as the traffic now looks
like :

IP A > B: . ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
IP A > B: P 94209:98305(4096) ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
IP B > A: . ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
IP B > A: P 98304:102400(4096) ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
IP A > B: . ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
IP A > B: P 98305:102401(4096) ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
IP B > A: . ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
IP B > A: P 102400:106496(4096) ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
IP A > B: . ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
IP A > B: P 102401:106497(4096) ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
IP B > A: . ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>
IP B > A: P 106496:110592(4096) ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>

Before :

lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
280774

 Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':

     205719.049006 task-clock                #    9.278 CPUs utilized
         8,449,968 context-switches          #    0.041 M/sec
         1,935,997 CPU-migrations            #    0.009 M/sec
           160,541 page-faults               #    0.780 K/sec
   548,478,722,290 cycles                    #    2.666 GHz                     [83.20%]
   455,240,670,857 stalled-cycles-frontend   #   83.00% frontend cycles idle    [83.48%]
   272,881,454,275 stalled-cycles-backend    #   49.75% backend  cycles idle    [66.73%]
   166,091,460,030 instructions              #    0.30  insns per cycle
                                             #    2.74  stalled cycles per insn [83.39%]
    29,150,229,399 branches                  #  141.699 M/sec                   [83.30%]
     1,943,814,026 branch-misses             #    6.67% of all branches         [83.32%]

      22.173517844 seconds time elapsed

lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
IpOutRequests                   16851063           0.0
IpExtOutOctets                  23878580777        0.0

After patch :

lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
280877

 Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':

     107496.071918 task-clock                #    4.847 CPUs utilized
         5,635,458 context-switches          #    0.052 M/sec
         1,374,707 CPU-migrations            #    0.013 M/sec
           160,920 page-faults               #    0.001 M/sec
   281,500,010,924 cycles                    #    2.619 GHz                     [83.28%]
   228,865,069,307 stalled-cycles-frontend   #   81.30% frontend cycles idle    [83.38%]
   142,462,742,658 stalled-cycles-backend    #   50.61% backend  cycles idle    [66.81%]
    95,227,712,566 instructions              #    0.34  insns per cycle
                                             #    2.40  stalled cycles per insn [83.43%]
    16,209,868,171 branches                  #  150.795 M/sec                   [83.20%]
       874,252,952 branch-misses             #    5.39% of all branches         [83.37%]

      22.175821286 seconds time elapsed

lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
IpOutRequests                   11239428           0.0
IpExtOutOctets                  23595191035        0.0

Indeed, the occupancy of tx skbs (IpExtOutOctets/IpOutRequests) is higher :
2099 instead of 1417, thus helping GRO to be more efficient when using FQ packet
scheduler.

Many thanks to Neal for review and ideas.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Van Jacobson <vanj@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 15:15:25 -05:00
..
9p for-linus-3.12-merge minor 9p fixes and tweaks for 3.12 merge window 2013-09-11 12:34:13 -07:00
bluetooth Bluetooth: Use macros for connectionless slave broadcast features 2013-12-09 00:02:41 +04:00
caif caif_hsi.h: Remove extern from function prototypes 2013-09-23 16:29:41 -04:00
irda include/net/: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
iucv af_iucv: fix recvmsg by replacing skb_pull() function 2013-04-08 17:16:57 -04:00
netfilter netfilter: push reasm skb through instead of original frag skbs 2013-11-11 00:19:35 -05:00
netns tcp_memcontrol: Remove the per netns control. 2013-10-21 18:43:02 -04:00
nfc include/net/: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
phonet net: remove my future former mail address 2012-06-17 16:29:38 -07:00
sctp sctp: Reorder 'struc association' members to reduce its size 2013-12-17 14:32:43 -05:00
tc_act include/net/: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
act_api.h net: Remove extern from include/net/ scheduling prototypes 2013-07-31 17:24:22 -07:00
addrconf.h neigh: ipv6: respect default values set before an address is assigned to device 2013-12-09 20:56:12 -05:00
af_ieee802154.h
af_rxrpc.h af_rxrpc.h: Remove extern from function prototypes 2013-07-31 17:50:01 -07:00
af_unix.h af_unix: improve STREAM behavior with fragmented memory 2013-08-10 01:16:44 -07:00
af_vsock.h VSOCK: Move af_vsock.h and vsock_addr.h to include/net 2013-07-27 22:14:06 -07:00
ah.h
arp.h arp/neighbour.h: Remove extern from function prototypes 2013-07-31 17:50:02 -07:00
atmclip.h
ax25.h ax25.h: Remove extern from function prototypes 2013-07-31 17:50:02 -07:00
ax88796.h
busy_poll.h net: add cpu_relax to busy poll loop 2013-08-28 17:45:48 -04:00
cfg80211-wext.h
cfg80211.h cfg80211/mac80211/ath6kl: acquire wdev lock outside ch_switch_notify 2013-12-02 11:51:54 +01:00
checksum.h net: checksum: fix warning in skb_checksum 2013-11-04 15:27:08 -05:00
cipso_ipv4.h cipso: cleanup cipso_v4_translate() when !CONFIG_NETLABEL 2013-12-10 17:56:54 -05:00
cls_cgroup.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2013-09-05 14:54:29 -07:00
codel.h net: codel: Avoid undefined behavior from signed overflow 2013-11-04 20:01:29 -05:00
compat.h compat.h: Remove extern from function prototypes 2013-09-20 14:49:32 -04:00
datalink.h
dcbevent.h include/net/: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
dcbnl.h include/net/: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
dn_dev.h dn_dev: add support for IFA_FLAGS nl attribute 2013-12-10 21:50:00 -05:00
dn_fib.h decnet (dn*.h): Remove extern from function prototypes 2013-09-20 14:49:32 -04:00
dn_neigh.h decnet (dn*.h): Remove extern from function prototypes 2013-09-20 14:49:32 -04:00
dn_nsp.h decnet (dn*.h): Remove extern from function prototypes 2013-09-20 14:49:32 -04:00
dn_route.h decnet (dn*.h): Remove extern from function prototypes 2013-09-20 14:49:32 -04:00
dn.h decnet (dn*.h): Remove extern from function prototypes 2013-09-20 14:49:32 -04:00
dsa.h
dsfield.h ipv6: Optimize ipv6_change_dsfield(). 2013-01-09 23:59:53 -08:00
dst_ops.h net: Fix warnings in dst_ops.h 2012-07-19 10:43:03 -07:00
dst.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-10-23 16:49:34 -04:00
esp.h net: move pskb_put() to core code 2013-11-07 19:28:58 -05:00
ethoc.h
fib_rules.h fib_rules.h: Remove extern from function prototypes 2013-09-20 14:49:33 -04:00
firewire.h firewire net, ipv4 arp: Extend hardware address and remove driver-level packet inspection. 2013-03-26 12:32:13 -04:00
flow_keys.h flow_dissector: factor out the ports extraction in skb_flow_get_ports 2013-10-03 15:36:37 -04:00
flow.h flow.h/flow_keys.h: Remove extern from function prototypes 2013-09-20 14:49:33 -04:00
garp.h garp.h: Remove extern from function prototypes 2013-09-20 14:49:33 -04:00
gen_stats.h gen_stats.h: Remove extern from function prototypes 2013-09-20 14:49:33 -04:00
genetlink.h genetlink: fix genl_set_err() group ID 2013-11-21 13:09:43 -05:00
gre.h ipv4: generalize gre_handle_offloads 2013-10-19 19:36:18 -04:00
gro_cells.h gro: Fix kcalloc argument order 2013-01-27 22:46:33 -05:00
icmp.h icmp.h: Remove extern from function prototypes 2013-09-20 14:49:33 -04:00
ieee80211_radiotap.h mac80211: add radiotap flag and handling for 5/10 MHz 2013-07-16 09:58:05 +03:00
ieee802154_netdev.h ieee802154/nl-mac.c: make some MLME operations optional 2013-04-08 12:00:16 -04:00
ieee802154.h
if_inet6.h ipv6 addrconf: extend ifa_flags to u32 2013-12-06 16:34:43 -05:00
inet6_connection_sock.h inet*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
inet6_hashtables.h ipv6: split inet6_ehashfn to hash functions per compilation unit 2013-10-19 19:45:34 -04:00
inet_common.h inet*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
inet_connection_sock.h inet*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
inet_ecn.h net: Correct comparisons and calculations using skb->tail and skb-transport_header 2013-05-28 23:49:07 -07:00
inet_frag.h inet: remove old fragmentation hash initializing 2013-10-23 17:01:41 -04:00
inet_hashtables.h tcp/dccp: remove twchain 2013-10-08 23:19:24 -04:00
inet_sock.h inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once 2013-10-19 19:45:35 -04:00
inet_timewait_sock.h netdev: inet_timewait_sock.h missing semi-colon when KMEMCHECK is enabled 2013-10-17 15:56:53 -04:00
inetpeer.h inet*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
ip6_checksum.h net: fix build errors if ipv6 is disabled 2013-10-09 13:04:03 -04:00
ip6_fib.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-11-04 13:48:30 -05:00
ip6_route.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-10-23 16:49:34 -04:00
ip6_tunnel.h tunnels: harmonize cleanup done on skb on xmit path 2013-09-04 00:27:25 -04:00
ip_fib.h ip*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
ip_tunnels.h ipv4: generalize gre_handle_offloads 2013-10-19 19:36:18 -04:00
ip_vs.h netfilter: push reasm skb through instead of original frag skbs 2013-11-11 00:19:35 -05:00
ip.h inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu functions 2013-11-23 14:46:23 -08:00
ipcomp.h
ipconfig.h
ipv6.h ipv6: add ip6_flowlabel helper 2013-12-09 21:03:49 -05:00
ipx.h ipx.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
iw_handler.h iw_handler.h: Remove extern from function prototypes 2013-09-21 14:01:39 -04:00
lapb.h lapb.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
lib80211.h hostap: Don't use create_proc_read_entry() 2013-04-29 15:41:56 -04:00
llc_c_ac.h llc*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
llc_c_ev.h llc*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
llc_c_st.h
llc_conn.h llc*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
llc_if.h llc*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
llc_pdu.h llc*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
llc_s_ac.h llc*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
llc_s_ev.h llc*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
llc_s_st.h
llc_sap.h llc*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
llc.h llc*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
mac80211.h mac80211: add min required channel definition field 2013-11-25 20:52:05 +01:00
mac802154.h mac802154: correct a typo in ieee802154_alloc_device() prototype 2013-10-21 18:56:23 -04:00
mip6.h include/net/: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
mld.h net: ipv6: mld: get rid of MLDV2_MRC and simplify calculation 2013-09-04 14:53:20 -04:00
mrp.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-10-01 17:06:14 -04:00
ndisc.h ndisc.h: Remove extern from function prototypes 2013-09-21 14:01:39 -04:00
neighbour.h ipv6: router reachability probing 2013-12-11 16:02:58 -05:00
net_namespace.h netfilter: nf_tables: complete net namespace support 2013-10-14 18:00:59 +02:00
net_ratelimit.h
netdma.h
netevent.h netevent/netlink.h: Remove extern from function prototypes 2013-09-21 14:01:39 -04:00
netlabel.h include/net/: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
netlink.h netevent/netlink.h: Remove extern from function prototypes 2013-09-21 14:01:39 -04:00
netprio_cgroup.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2013-09-05 14:54:29 -07:00
netrom.h netrom.h: Remove extern from function prototypes 2013-09-21 14:01:39 -04:00
nexthop.h
nl802154.h
p8022.h p8022.h: Remove extern from function prototypes 2013-09-21 14:01:39 -04:00
ping.h inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu functions 2013-11-23 14:46:23 -08:00
pkt_cls.h net: Remove extern from include/net/ scheduling prototypes 2013-07-31 17:24:22 -07:00
pkt_sched.h pkt_sched: give visibility to mq slave qdiscs 2013-12-09 19:54:47 -05:00
protocol.h protocol.h: Remove extern from function prototypes 2013-09-23 01:51:08 -04:00
psnap.h psnap.h: Remove extern from function prototypes 2013-09-23 01:51:08 -04:00
raw.h raw/rawv6.h: Remove extern from function prototypes 2013-09-23 01:51:08 -04:00
rawv6.h raw/rawv6.h: Remove extern from function prototypes 2013-09-23 01:51:08 -04:00
red.h
regulatory.h cfg80211: use enum nl80211_dfs_regions for dfs_region everywhere 2013-11-25 20:52:12 +01:00
request_sock.h inet: includes a sock_common in request_sock 2013-10-10 00:08:07 -04:00
rose.h rose.h: Remove extern from function prototypes 2013-09-23 01:51:08 -04:00
route.h ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE 2013-11-05 21:52:27 -05:00
rtnetlink.h rtnetlink.h: Remove extern from function prototypes 2013-09-23 01:51:09 -04:00
sch_generic.h net_sched: add u64 rate to psched_ratecfg_precompute() 2013-09-20 14:41:02 -04:00
scm.h scm.h: Remove extern from function prototypes 2013-09-23 01:51:09 -04:00
secure_seq.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-10-01 17:06:14 -04:00
slhc_vj.h
snmp.h net: avoid reloads in SNMP_UPD_PO_STATS 2012-08-06 13:40:47 -07:00
sock.h tcp_memcontrol: Cleanup/fix cg_proto->memory_pressure handling. 2013-12-05 21:01:01 -05:00
stp.h stp.h: Remove extern from function prototypes 2013-09-23 01:51:09 -04:00
tcp_memcontrol.h tcp_memcontrol: Kill struct tcp_memcontrol 2013-10-21 18:43:02 -04:00
tcp_states.h
tcp.h tcp: refine TSO splits 2013-12-17 15:15:25 -05:00
timewait_sock.h [PATCH] tcp: Cache inetpeer in timewait socket, and only when necessary. 2012-06-09 14:56:12 -07:00
transp_v6.h transp_v6.h: style neatening 2013-06-04 16:43:42 -07:00
udp.h udp: Remove unnecessary semicolon from do{}while (0) macro 2013-11-07 02:14:33 -05:00
udplite.h udplite.h: Remove extern from function prototypes 2013-09-23 16:29:40 -04:00
vsock_addr.h VSOCK: Move af_vsock.h and vsock_addr.h to include/net 2013-07-27 22:14:06 -07:00
vxlan.h vxlan: Have the NIC drivers do less work for offloads 2013-10-29 02:39:13 -07:00
wext.h wext.h: Remove extern from function prototypes 2013-09-23 16:29:40 -04:00
wimax.h wimax.h: Remove extern from function prototypes 2013-09-23 16:29:41 -04:00
wpan-phy.h mac802154: monitor device support 2012-05-16 15:17:08 -04:00
x25.h x25.h: Remove extern from function prototypes 2013-09-23 16:29:41 -04:00
x25device.h
xfrm.h ipv6: Add a receive path hook for vti6 in xfrm6_mode_tunnel. 2013-10-09 13:16:36 +02:00