linux/net/ipv4
Eric Dumazet 8b27dae5a2 tcp: add one skb cache for rx
Often times, recvmsg() system calls and BH handling for a particular
TCP socket are done on different cpus.

This means the incoming skb had to be allocated on a cpu,
but freed on another.

This incurs a high spinlock contention in slab layer for small rpc,
but also a high number of cache line ping pongs for larger packets.

A full size GRO packet might use 45 page fragments, meaning
that up to 45 put_page() can be involved.

More over performing the __kfree_skb() in the recvmsg() context
adds a latency for user applications, and increase probability
of trapping them in backlog processing, since the BH handler
might found the socket owned by the user.

This patch, combined with the prior one increases the rpc
performance by about 10 % on servers with large number of cores.

(tcp_rr workload with 10,000 flows and 112 threads reach 9 Mpps
 instead of 8 Mpps)

This also increases single bulk flow performance on 40Gbit+ links,
since in this case there are often two cpus working in tandem :

 - CPU handling the NIC rx interrupts, feeding the receive queue,
  and (after this patch) freeing the skbs that were consumed.

 - CPU in recvmsg() system call, essentially 100 % busy copying out
  data to user space.

Having at most one skb in a per-socket cache has very little risk
of memory exhaustion, and since it is protected by socket lock,
its management is essentially free.

Note that if rps/rfs is used, we do not enable this feature, because
there is high chance that the same cpu is handling both the recvmsg()
system call and the TCP rx path, but that another cpu did the skb
allocations in the device driver right before the RPS/RFS logic.

To properly handle this case, it seems we would need to record
on which cpu skb was allocated, and use a different channel
to give skbs back to this cpu.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23 21:57:38 -04:00
..
bpfilter net: bpfilter: disallow to remove bpfilter module while being used 2019-01-11 18:05:41 -08:00
netfilter netfilter: nf_tables: merge ipv4 and ipv6 nat chain types 2019-03-01 14:36:59 +01:00
af_inet.c tcp: add one skb cache for rx 2019-03-23 21:57:38 -04:00
ah4.c net-ipv4: remove 2 always zero parameters from ipv4_redirect() 2018-09-26 20:30:55 -07:00
arp.c net: Evict neighbor entries on carrier down 2018-10-12 09:47:39 -07:00
cipso_ipv4.c netlabel: fix out-of-bounds memory accesses 2019-02-27 21:45:24 -08:00
datagram.c ipv4: Allow sending multicast packets on specific i/f using VRF socket 2018-10-02 22:28:17 -07:00
devinet.c net: ignore sysctl_devconf_inherit_init_net without SYSCTL 2019-03-04 13:14:34 -08:00
esp4_offload.c net: use skb_sec_path helper in more places 2018-12-19 11:21:37 -08:00
esp4.c esp: Skip TX bytes accounting when sending from a request socket 2019-01-28 11:20:58 +01:00
fib_frontend.c ipv4: Return error for RTA_VIA attribute 2019-02-26 13:23:17 -08:00
fib_lookup.h
fib_notifier.c
fib_rules.c ipv4: fib_rules: Fix possible infinite loop in fib_empty_table 2018-12-30 12:57:04 -08:00
fib_semantics.c ipv4: fib: use struct_size() in kzalloc() 2019-02-01 15:12:29 -08:00
fib_trie.c ipv4: Allow amount of dirty memory from fib resizing to be controllable 2019-03-21 13:29:53 -07:00
fou.c genetlink: make policy common to family 2019-03-22 10:38:23 -04:00
gre_demux.c net: ip_gre: use erspan key field for tunnel lookup 2019-01-22 11:52:17 -08:00
gre_offload.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-07-03 10:29:26 +09:00
icmp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-03-02 12:54:35 -08:00
igmp.c net: remove unneeded switch fall-through 2019-02-21 13:48:00 -08:00
inet_connection_sock.c inet: minor optimization for backlog setting in listen(2) 2018-11-07 22:31:07 -08:00
inet_diag.c inet_diag: fix reporting cgroup classid and fallback to priority 2019-02-12 13:35:57 -05:00
inet_fragment.c net: remove unused struct inet_frag_queue.fragments field 2019-02-26 08:27:05 -08:00
inet_hashtables.c net: dccp: fix kernel crash on module load 2018-12-24 15:27:56 -08:00
inet_timewait_sock.c soreuseport: initialise timewait reuseport field 2018-04-07 22:32:32 -04:00
inetpeer.c net: ipv4: use a dedicated counter for icmp_v4 redirect packets 2019-02-08 21:50:15 -08:00
ip_forward.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-20 11:53:36 -08:00
ip_fragment.c net: remove unused struct inet_frag_queue.fragments field 2019-02-26 08:27:05 -08:00
ip_gre.c route: Add multipath_hash in flowi_common to make user-define hash 2019-02-27 12:50:17 -08:00
ip_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-03-02 12:54:35 -08:00
ip_options.c net: avoid use IPCB in cipso_v4_error 2019-02-25 14:32:35 -08:00
ip_output.c sk_buff: add skb extension infrastructure 2018-12-19 11:21:37 -08:00
ip_sockglue.c ip: on queued skb use skb_header_pointer instead of pskb_may_pull 2019-01-10 09:27:20 -05:00
ip_tunnel_core.c ip_tunnel: Add dst_cache support in lwtunnel_state of ip tunnel 2019-02-24 22:13:49 -08:00
ip_tunnel.c iptunnel: NULL pointer deref for ip_md_tunnel_xmit 2019-03-06 10:43:06 -08:00
ip_vti.c vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel 2019-01-09 14:00:37 +01:00
ipcomp.c net-ipv4: remove 2 always zero parameters from ipv4_redirect() 2018-09-26 20:30:55 -07:00
ipconfig.c ipconfig: add carrier_timeout kernel parameter 2019-02-01 15:24:13 -08:00
ipip.c ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit 2019-01-26 09:43:03 -08:00
ipmr_base.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-10-19 11:03:06 -07:00
ipmr.c ipmr: ip6mr: Create new sockopt to clear mfc cache or vifs 2019-02-21 13:05:05 -08:00
Kconfig net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
Makefile bpf, sockmap: convert to generic sk_msg interface 2018-10-15 12:23:19 -07:00
metrics.c net: Add extack argument to ip_fib_metrics_init 2018-11-06 15:00:45 -08:00
netfilter.c netfilter: ipv4: remove useless export_symbol 2019-01-28 11:32:58 +01:00
netlink.c ipv4: Add ICMPv6 support when parse route ipproto 2019-03-01 16:41:27 -08:00
ping.c ipv4: Allow sending multicast packets on specific i/f using VRF socket 2018-10-02 22:28:17 -07:00
proc.c tcp: implement coalescing on backlog queue 2018-11-30 13:26:54 -08:00
protocol.c fou, fou6: ICMP error handlers for FoU and GUE 2018-11-08 17:13:08 -08:00
raw_diag.c
raw.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-20 11:53:36 -08:00
route.c net: dst: remove gc leftovers 2019-03-21 13:39:25 -07:00
syncookies.c tcp: free request sock directly upon TFO or syncookies error 2019-03-19 14:13:01 -07:00
sysctl_net_ipv4.c ipv4: Allow amount of dirty memory from fib resizing to be controllable 2019-03-21 13:29:53 -07:00
tcp_bbr.c tcp_bbr: adapt cwnd based on ack aggregation estimation 2019-01-24 22:27:27 -08:00
tcp_bic.c
tcp_bpf.c bpf: sk_msg, sock{map|hash} redirect through ULP 2018-12-20 23:47:09 +01:00
tcp_cdg.c tcp: cdg: use tcp high resolution clock cache 2018-10-15 22:56:42 -07:00
tcp_cong.c
tcp_cubic.c
tcp_dctcp.c tcp: refactor DCTCP ECN ACK handling 2018-10-10 22:26:00 -07:00
tcp_dctcp.h tcp: refactor DCTCP ECN ACK handling 2018-10-10 22:26:00 -07:00
tcp_diag.c
tcp_fastopen.c
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: free request sock directly upon TFO or syncookies error 2019-03-19 14:13:01 -07:00
tcp_ipv4.c tcp: add one skb cache for rx 2019-03-23 21:57:38 -04:00
tcp_lp.c
tcp_metrics.c genetlink: make policy common to family 2019-03-22 10:38:23 -04:00
tcp_minisocks.c tcp: use tcp_md5_needed for timewait sockets 2019-02-26 13:16:03 -08:00
tcp_nv.c
tcp_offload.c net: use indirect call wrappers at GRO transport layer 2018-12-15 13:23:02 -08:00
tcp_output.c tcp: remove conditional branches from tcp_mstamp_refresh() 2019-03-23 21:43:21 -04:00
tcp_rate.c tcp: introduce tcp_skb_timestamp_us() helper 2018-09-21 19:37:59 -07:00
tcp_recovery.c tcp: introduce tcp_skb_timestamp_us() helper 2018-09-21 19:37:59 -07:00
tcp_scalable.c
tcp_timer.c tcp: Refactor pingpong code 2019-01-27 13:29:43 -08:00
tcp_ulp.c tcp, ulp: remove socket lock assertion on ULP cleanup 2018-10-16 12:38:41 -07:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tcp.c tcp: add one skb cache for rx 2019-03-23 21:57:38 -04:00
tunnel4.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
udp_diag.c net: diag: document swapped src/dst in udp_dump_one. 2018-10-28 19:27:21 -07:00
udp_impl.h udp: add missing rehash callback to udplite 2019-01-17 15:01:08 -08:00
udp_offload.c udp: use indirect call wrappers for GRO socket lookup 2018-12-15 13:23:02 -08:00
udp_tunnel.c net/ipv4/udp_tunnel: prefer SO_BINDTOIFINDEX over SO_BINDTODEVICE 2019-01-17 14:55:52 -08:00
udp.c udp: fix possible user after free in error handler 2019-02-22 16:05:11 -08:00
udplite.c udp: add missing rehash callback to udplite 2019-01-17 15:01:08 -08:00
xfrm4_input.c xfrm: reset transport header back to network header after all input transforms ahave been applied 2018-09-04 10:26:30 +02:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c xfrm: reset transport header back to network header after all input transforms ahave been applied 2018-09-04 10:26:30 +02:00
xfrm4_mode_tunnel.c xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto 2018-03-07 10:54:29 +01:00
xfrm4_output.c
xfrm4_policy.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
xfrm4_protocol.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
xfrm4_state.c
xfrm4_tunnel.c