linux/net/ipv6
Eric Dumazet ca777eff51 tcp: remove dst refcount false sharing for prequeue mode
Alexander Duyck reported high false sharing on dst refcount in tcp stack
when prequeue is used. prequeue is the mechanism used when a thread is
blocked in recvmsg()/read() on a TCP socket, using a blocking model
rather than select()/poll()/epoll() non blocking one.

We already try to use RCU in input path as much as possible, but we were
forced to take a refcount on the dst when skb escaped RCU protected
region. When/if the user thread runs on different cpu, dst_release()
will then touch dst refcount again.

Commit 093162553c (tcp: force a dst refcount when prequeue packet)
was an example of a race fix.

It turns out the only remaining usage of skb->dst for a packet stored
in a TCP socket prequeue is IP early demux.

We can add a logic to detect when IP early demux is probably going
to use skb->dst. Because we do an optimistic check rather than duplicate
existing logic, we need to guard inet_sk_rx_dst_set() and
inet6_sk_rx_dst_set() from using a NULL dst.

Many thanks to Alexander for providing a nice bug report, git bisection,
and reproducer.

Tested using Alexander script on a 40Gb NIC, 8 RX queues.
Hosts have 24 cores, 48 hyper threads.

echo 0 >/proc/sys/net/ipv4/tcp_autocorking

for i in `seq 0 47`
do
  for j in `seq 0 2`
  do
     netperf -H $DEST -t TCP_STREAM -l 1000 \
             -c -C -T $i,$i -P 0 -- \
             -m 64 -s 64K -D &
  done
done

Before patch : ~6Mpps and ~95% cpu usage on receiver
After patch : ~9Mpps and ~35% cpu usage on receiver.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 16:54:41 -07:00
..
netfilter netfilter: fix missing dependencies in NETFILTER_XT_TARGET_LOG 2014-09-02 13:59:54 -07:00
addrconf_core.c net: clean up snmp stats code 2014-05-07 16:06:05 -04:00
addrconf.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-09-07 21:41:53 -07:00
addrlabel.c list: fix order of arguments for hlist_add_after(_rcu) 2014-08-06 18:01:24 -07:00
af_inet6.c net/ipv4: bind ip_nonlocal_bind to current netns 2014-09-09 11:27:09 -07:00
ah6.c ipv6: White-space cleansing : Structure layouts 2014-08-24 22:37:52 -07:00
anycast.c ipv6: restore the behavior of ipv6_sock_ac_drop() 2014-09-07 16:10:07 -07:00
datagram.c sock: deduplicate errqueue dequeue 2014-09-01 21:49:08 -07:00
esp6.c ipv6: White-space cleansing : Structure layouts 2014-08-24 22:37:52 -07:00
exthdrs_core.c ipv6: ipv6_find_hdr restore prev functionality 2014-02-27 18:27:26 -05:00
exthdrs_offload.c ipv6: Fix exthdrs offload registration. 2014-03-06 16:35:55 -05:00
exthdrs.c ipv6: White-space cleansing : Line Layouts 2014-08-24 22:37:52 -07:00
fib6_rules.c ipv6: move IPV6_TCLASS_SHIFT into ipv6.h and define a helper 2014-01-15 15:53:18 -08:00
icmp.c ipv6: White-space cleansing : Line Layouts 2014-08-24 22:37:52 -07:00
inet6_connection_sock.c ipv6: White-space cleansing : gaps between function and symbol export 2014-08-24 22:37:52 -07:00
inet6_hashtables.c ipv6: White-space cleansing : gaps between function and symbol export 2014-08-24 22:37:52 -07:00
ip6_checksum.c udp: Generic functions to set checksum 2014-06-04 22:46:38 -07:00
ip6_fib.c net: ipv6: fib: don't sleep inside atomic lock 2014-08-22 10:54:49 -07:00
ip6_flowlabel.c ipv6: White-space cleansing : gaps between function and symbol export 2014-08-24 22:37:52 -07:00
ip6_gre.c net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
ip6_icmp.c ipv6: White-space cleansing : Line Layouts 2014-08-24 22:37:52 -07:00
ip6_input.c ipv6: White-space cleansing : Line Layouts 2014-08-24 22:37:52 -07:00
ip6_offload.c ipv6: White-space cleansing : Line Layouts 2014-08-24 22:37:52 -07:00
ip6_offload.h ipv6: Pull IPv6 GSO registration out of the module 2012-11-15 17:39:24 -05:00
ip6_output.c ipv6: White-space cleansing : gaps between function and symbol export 2014-08-24 22:37:52 -07:00
ip6_tunnel.c ipv6: White-space cleansing : Line Layouts 2014-08-24 22:37:52 -07:00
ip6_vti.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2014-07-30 20:05:54 -07:00
ip6mr.c ipv6: White-space cleansing : Line Layouts 2014-08-24 22:37:52 -07:00
ipcomp6.c ipv6: White-space cleansing : Structure layouts 2014-08-24 22:37:52 -07:00
ipv6_sockglue.c ipv6: White-space cleansing : gaps between function and symbol export 2014-08-24 22:37:52 -07:00
Kconfig ip6_vti: Fix build when NET_IP_TUNNEL is not set. 2014-02-20 14:29:49 +01:00
Makefile xfrm6: Add IPsec protocol multiplexer 2014-03-14 07:28:07 +01:00
mcast.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-09-07 21:41:53 -07:00
mip6.c ipv6: White-space cleansing : Structure layouts 2014-08-24 22:37:52 -07:00
ndisc.c ipv6: White-space cleansing : gaps between function and symbol export 2014-08-24 22:37:52 -07:00
netfilter.c netfilter: Fix potential use after free in ip6_route_me_harder() 2014-05-09 02:36:39 +02:00
output_core.c ipv6: White-space cleansing : Line Layouts 2014-08-24 22:37:52 -07:00
ping.c net: Eliminate no_check from protosw 2014-05-23 16:28:53 -04:00
proc.c ipv6: White-space cleansing : Line Layouts 2014-08-24 22:37:52 -07:00
protocol.c net: remove outdated comment for ipv4 and ipv6 protocol handler 2013-11-28 18:47:51 -05:00
raw.c ipv6: White-space cleansing : Line Layouts 2014-08-24 22:37:52 -07:00
reassembly.c ipv6: White-space cleansing : Structure layouts 2014-08-24 22:37:52 -07:00
route.c ipv6: White-space cleansing : gaps between function and symbol export 2014-08-24 22:37:52 -07:00
sit.c ipv6: White-space cleansing : Line Layouts 2014-08-24 22:37:52 -07:00
syncookies.c tcp: syncookies: mark cookie_secret read_mostly 2014-08-27 16:30:49 -07:00
sysctl_net_ipv6.c ipv6: add sysctl_mld_qrv to configure query robustness variable 2014-09-04 22:26:14 -07:00
tcp_ipv6.c tcp: remove dst refcount false sharing for prequeue mode 2014-09-09 16:54:41 -07:00
tcpv6_offload.c tcp: Call skb_gro_checksum_validate 2014-08-24 18:09:24 -07:00
tunnel6.c ipv6: White-space cleansing : gaps between function and symbol export 2014-08-24 22:37:52 -07:00
udp_impl.h net: ipv4/ipv6: Remove extern from function prototypes 2013-10-19 19:12:11 -04:00
udp_offload.c udp: Add support for doing checksum unnecessary conversion 2014-09-01 21:36:28 -07:00
udp.c udp: Add support for doing checksum unnecessary conversion 2014-09-01 21:36:28 -07:00
udplite.c net: Eliminate no_check from protosw 2014-05-23 16:28:53 -04:00
xfrm6_input.c ipv6: White-space cleansing : gaps between function and symbol export 2014-08-24 22:37:52 -07:00
xfrm6_mode_beet.c ipsec: be careful of non existing mac headers 2012-02-23 16:50:45 -05:00
xfrm6_mode_ro.c ipv4/ipv6: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
xfrm6_mode_transport.c
xfrm6_mode_tunnel.c xfrm6: Remove xfrm_tunnel_notifier 2014-03-14 07:28:08 +01:00
xfrm6_output.c ipv6: White-space cleansing : gaps between function and symbol export 2014-08-24 22:37:52 -07:00
xfrm6_policy.c ipv6: White-space cleansing : Line Layouts 2014-08-24 22:37:52 -07:00
xfrm6_protocol.c xfrm6: Properly handle unsupported protocols 2014-05-06 07:08:38 +02:00
xfrm6_state.c ipv6: White-space cleansing : Line Layouts 2014-08-24 22:37:52 -07:00
xfrm6_tunnel.c ipv6: White-space cleansing : gaps between function and symbol export 2014-08-24 22:37:52 -07:00