2f53384424
vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096) The call to tcp_push() at the end of do_tcp_sendpages() forces an immediate xmit when pipe is not already filled, and tso_fragment() try to split these skb to MSS multiples. 4096 bytes are usually split in a skb with 2 MSS, and a remaining sub-mss skb (assuming MTU=1500) This makes slow start suboptimal because many small frames are sent to qdisc/driver layers instead of big ones (constrained by cwnd and packets in flight of course) In fact, applications using sendmsg() (adding an additional memory copy) instead of vmsplice()/splice()/sendfile() are a bit faster because of this anomaly, especially if serving small files in environments with large initial [c]wnd. Call tcp_push() only if MSG_MORE is not set in the flags parameter. This bit is automatically provided by splice() internals but for the last page, or on all pages if user specified SPLICE_F_MORE splice() flag. In some workloads, this can reduce number of sent logical packets by an order of magnitude, making zero-copy TCP actually faster than one-copy :) Reported-by: Tom Herbert <therbert@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
---|---|---|
.. | ||
9p | ||
802 | ||
8021q | ||
appletalk | ||
atm | ||
ax25 | ||
batman-adv | ||
bluetooth | ||
bridge | ||
caif | ||
can | ||
ceph | ||
core | ||
dcb | ||
dccp | ||
decnet | ||
dns_resolver | ||
dsa | ||
econet | ||
ethernet | ||
ieee802154 | ||
ipv4 | ||
ipv6 | ||
ipx | ||
irda | ||
iucv | ||
key | ||
l2tp | ||
lapb | ||
llc | ||
mac80211 | ||
netfilter | ||
netlabel | ||
netlink | ||
netrom | ||
nfc | ||
openvswitch | ||
packet | ||
phonet | ||
rds | ||
rfkill | ||
rose | ||
rxrpc | ||
sched | ||
sctp | ||
sunrpc | ||
tipc | ||
unix | ||
wanrouter | ||
wimax | ||
wireless | ||
x25 | ||
xfrm | ||
compat.c | ||
Kconfig | ||
Makefile | ||
nonet.c | ||
socket.c | ||
sysctl_net.c |