linux/net/ipv4
Eric Dumazet d826eb14ec ipv4: PKTINFO doesnt need dst reference
Le lundi 07 novembre 2011 à 15:33 +0100, Eric Dumazet a écrit :

> At least, in recent kernels we dont change dst->refcnt in forwarding
> patch (usinf NOREF skb->dst)
>
> One particular point is the atomic_inc(dst->refcnt) we have to perform
> when queuing an UDP packet if socket asked PKTINFO stuff (for example a
> typical DNS server has to setup this option)
>
> I have one patch somewhere that stores the information in skb->cb[] and
> avoid the atomic_{inc|dec}(dst->refcnt).
>

OK I found it, I did some extra tests and believe its ready.

[PATCH net-next] ipv4: IP_PKTINFO doesnt need dst reference

When a socket uses IP_PKTINFO notifications, we currently force a dst
reference for each received skb. Reader has to access dst to get needed
information (rt_iif & rt_spec_dst) and must release dst reference.

We also forced a dst reference if skb was put in socket backlog, even
without IP_PKTINFO handling. This happens under stress/load.

We can instead store the needed information in skb->cb[], so that only
softirq handler really access dst, improving cache hit ratios.

This removes two atomic operations per packet, and false sharing as
well.

On a benchmark using a mono threaded receiver (doing only recvmsg()
calls), I can reach 720.000 pps instead of 570.000 pps.

IP_PKTINFO is typically used by DNS servers, and any multihomed aware
UDP application.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-09 16:36:27 -05:00
..
netfilter Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
af_inet.c ipv4: reduce percpu needs for icmpmsg mibs 2011-11-09 16:04:20 -05:00
ah4.c inet: constify ip headers and in6_addr 2011-04-22 11:04:14 -07:00
arp.c neigh: Pass neighbour entry to output ops. 2011-07-17 23:11:17 -07:00
cipso_ipv4.c cipso: remove an unneeded NULL check in cipso_v4_doi_add() 2011-10-11 18:43:53 -04:00
datagram.c ipv4: Lock socket and use cork flow in ip4_datagram_connect(). 2011-05-08 13:48:57 -07:00
devinet.c rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER 2011-08-02 04:29:23 -07:00
esp4.c inet: constify ip headers and in6_addr 2011-04-22 11:04:14 -07:00
fib_frontend.c rtnetlink: Compute and store minimum ifinfo dump size 2011-06-09 20:38:07 -07:00
fib_lookup.h ipv4: Fix nexthop caching wrt. scoping. 2011-03-24 18:06:47 -07:00
fib_rules.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
fib_semantics.c ipv4: Fix fib_info->fib_metrics leak 2011-09-16 17:42:26 -04:00
fib_trie.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
gre.c rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER 2011-08-02 04:29:23 -07:00
icmp.c net: more accurate skb truesize 2011-10-13 16:05:07 -04:00
igmp.c Merge branch 'master' of github.com:davem330/net 2011-09-22 03:23:13 -04:00
inet_connection_sock.c net: rename sk_clone to sk_clone_lock 2011-11-08 17:07:07 -05:00
inet_diag.c net-netlink: Add a new attribute to expose TOS values via netlink 2011-10-12 19:09:18 -04:00
inet_fragment.c net/ipv4: EXPORT_SYMBOL cleanups 2010-07-12 12:57:54 -07:00
inet_hashtables.c net: Compute protocol sequence numbers and fragment IDs using MD5. 2011-08-06 18:33:19 -07:00
inet_lro.c net: add skb frag size accessors 2011-10-19 03:10:46 -04:00
inet_timewait_sock.c net: Fix files explicitly needing to include module.h 2011-10-31 19:30:28 -04:00
inetpeer.c net: Compute protocol sequence numbers and fragment IDs using MD5. 2011-08-06 18:33:19 -07:00
ip_forward.c ipv4: Fix 'iph' use before set. 2011-05-12 23:03:46 -04:00
ip_fragment.c net: add skb frag size accessors 2011-10-19 03:10:46 -04:00
ip_gre.c net: better pcpu data alignment 2011-11-08 15:10:59 -05:00
ip_input.c ip: introduce ip_is_fragment helper inline function 2011-06-21 20:33:34 -07:00
ip_options.c ip_options_compile: properly handle unaligned pointer 2011-05-31 15:11:02 -07:00
ip_output.c ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT 2011-10-24 03:06:21 -04:00
ip_sockglue.c ipv4: PKTINFO doesnt need dst reference 2011-11-09 16:36:27 -05:00
ipcomp.c inet: constify ip headers and in6_addr 2011-04-22 11:04:14 -07:00
ipconfig.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
ipip.c net: better pcpu data alignment 2011-11-08 15:10:59 -05:00
ipmr.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
Kconfig ipv4: Remove fib_hash. 2011-02-01 15:35:25 -08:00
Makefile net: ipv4: add IPPROTO_ICMP socket kind 2011-05-13 16:08:13 -04:00
netfilter.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
ping.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
proc.c ipv4: reduce percpu needs for icmpmsg mibs 2011-11-09 16:04:20 -05:00
protocol.c net: add __rcu annotations to protocol 2010-10-27 11:37:31 -07:00
raw.c ipv4: PKTINFO doesnt need dst reference 2011-11-09 16:36:27 -05:00
route.c ipv4: Fix inetpeer expire time information 2011-11-08 14:40:40 -05:00
syncookies.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
sysctl_net_ipv4.c inetpeer: remove unused list 2011-06-08 17:05:30 -07:00
tcp_bic.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_cong.c net/ipv4: Eliminate kstrdup memory leak 2010-08-27 19:31:56 -07:00
tcp_cubic.c tcp_cubic: limit delayed_ack ratio to prevent divide error 2011-05-08 15:51:57 -07:00
tcp_diag.c tcp: diag: Dont report negative values for rx queue 2009-12-03 16:06:13 -08:00
tcp_highspeed.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_htcp.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_hybla.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_illinois.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_input.c tcp: add const qualifiers where possible 2011-10-21 05:22:42 -04:00
tcp_ipv4.c net: add missing bh_unlock_sock() calls 2011-11-03 18:06:18 -04:00
tcp_lp.c Fix common misspellings 2011-03-31 11:26:23 -03:00
tcp_minisocks.c net: rename sk_clone to sk_clone_lock 2011-11-08 17:07:07 -05:00
tcp_output.c tcp: Fix comments for Nagle algorithm 2011-11-08 14:02:47 -05:00
tcp_probe.c net: ipv4: tcp_probe: cleanup snprintf() use 2010-11-17 12:27:46 -08:00
tcp_scalable.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_timer.c TCP: remove TCP_DEBUG 2011-10-24 17:36:08 -04:00
tcp_vegas.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_vegas.h
tcp_veno.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_westwood.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_yeah.c Fix common misspellings 2011-03-31 11:26:23 -03:00
tcp.c TCP: remove TCP_DEBUG 2011-10-24 17:36:08 -04:00
tunnel4.c tunnels: add __rcu annotations 2010-10-27 11:37:32 -07:00
udp_impl.h
udp.c ipv4: PKTINFO doesnt need dst reference 2011-11-09 16:36:27 -05:00
udplite.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
xfrm4_input.c net/ipv4: EXPORT_SYMBOL cleanups 2010-07-12 12:57:54 -07:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c ipv4: Don't pre-seed hoplimit metric. 2010-12-12 22:08:17 -08:00
xfrm4_output.c xfrm4: Don't call icmp_send on local error 2011-07-01 17:33:19 -07:00
xfrm4_policy.c ipv4: fix ipsec forward performance regression 2011-10-24 03:01:22 -04:00
xfrm4_state.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
xfrm4_tunnel.c net: struct xfrm_tunnel in read_mostly section 2010-08-30 13:50:45 -07:00