linux/net/ipv4
Evgeniy Polyakov a9d8f9110d inet: Allowing more than 64k connections and heavily optimize bind(0) time.
With simple extension to the binding mechanism, which allows to bind more
than 64k sockets (or smaller amount, depending on sysctl parameters),
we have to traverse the whole bind hash table to find out empty bucket.
And while it is not a problem for example for 32k connections, bind()
completion time grows exponentially (since after each successful binding
we have to traverse one bucket more to find empty one) even if we start
each time from random offset inside the hash table.

So, when hash table is full, and we want to add another socket, we have
to traverse the whole table no matter what, so effectivelly this will be
the worst case performance and it will be constant.

Attached picture shows bind() time depending on number of already bound
sockets.

Green area corresponds to the usual binding to zero port process, which
turns on kernel port selection as described above. Red area is the bind
process, when number of reuse-bound sockets is not limited by 64k (or
sysctl parameters). The same exponential growth (hidden by the green
area) before number of ports reaches sysctl limit.

At this time bind hash table has exactly one reuse-enbaled socket in a
bucket, but it is possible that they have different addresses. Actually
kernel selects the first port to try randomly, so at the beginning bind
will take roughly constant time, but with time number of port to check
after random start will increase. And that will have exponential growth,
but because of above random selection, not every next port selection
will necessary take longer time than previous. So we have to consider
the area below in the graph (if you could zoom it, you could find, that
there are many different times placed there), so area can hide another.

Blue area corresponds to the port selection optimization.

This is rather simple design approach: hashtable now maintains (unprecise
and racely updated) number of currently bound sockets, and when number
of such sockets becomes greater than predefined value (I use maximum
port range defined by sysctls), we stop traversing the whole bind hash
table and just stop at first matching bucket after random start. Above
limit roughly corresponds to the case, when bind hash table is full and
we turned on mechanism of allowing to bind more reuse-enabled sockets,
so it does not change behaviour of other sockets.

Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Tested-by: Denys Fedoryschenko <denys@visp.net.lb>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-21 14:34:31 -08:00
..
netfilter netfilter 06/09: nf_conntrack: fix ICMP/ICMPv6 timeout sysctls on big-endian 2009-01-12 21:18:35 -08:00
af_inet.c netns: igmp: allow IPPROTO_IGMP sockets in netns 2008-12-25 16:42:23 -08:00
ah4.c netns xfrm: AH/ESP in netns! 2008-11-25 17:59:27 -08:00
arp.c ipv4: Fix ARP behavior with many mac-vlans 2008-11-16 19:19:38 -08:00
cipso_ipv4.c netlabel: Update kernel configuration API 2008-12-31 12:54:11 -05:00
datagram.c mib: add net to IP_INC_STATS_BH 2008-07-16 20:20:11 -07:00
devinet.c net: clean up net/ipv4/devinet.c 2008-11-03 02:48:48 -08:00
esp4.c netns xfrm: AH/ESP in netns! 2008-11-25 17:59:27 -08:00
fib_frontend.c net: clean up net/ipv4/fib_frontend.c fib_hash.c ip_gre.c 2008-11-03 00:25:16 -08:00
fib_hash.c net: clean up net/ipv4/fib_frontend.c fib_hash.c ip_gre.c 2008-11-03 00:25:16 -08:00
fib_lookup.h [IPV4] FIB_HASH: Reduce memory needs and speedup lookups 2008-01-28 15:02:46 -08:00
fib_rules.c net: add fib_rules_ops to flush_cache method 2008-07-05 19:01:28 -07:00
fib_semantics.c net: clean up net/ipv4/ah4.c esp4.c fib_semantics.c inet_connection_sock.c inetpeer.c ip_output.c 2008-11-03 00:23:42 -08:00
fib_trie.c net: replace NIPQUAD() in net/ipv4/ net/ipv6/ 2008-10-31 00:53:57 -07:00
icmp.c netns xfrm: lookup in netns 2008-11-25 17:35:18 -08:00
igmp.c netns: igmp: make /proc/net/{igmp,mcfilter} per netns 2008-12-25 16:42:51 -08:00
inet_connection_sock.c inet: Allowing more than 64k connections and heavily optimize bind(0) time. 2009-01-21 14:34:31 -08:00
inet_diag.c net: Convert TCP/DCCP listening hash tables to use RCU 2008-11-23 17:22:55 -08:00
inet_fragment.c net: convert BUG_TRAP to generic WARN_ON 2008-07-25 21:43:18 -07:00
inet_hashtables.c inet: Allowing more than 64k connections and heavily optimize bind(0) time. 2009-01-21 14:34:31 -08:00
inet_lro.c include/net net/ - csum_partial - remove unnecessary casts 2008-11-19 15:44:53 -08:00
inet_timewait_sock.c net: convert TCP/DCCP ehash rwlocks to spinlocks 2008-11-20 20:39:09 -08:00
inetpeer.c net: clean up net/ipv4/ah4.c esp4.c fib_semantics.c inet_connection_sock.c inetpeer.c ip_output.c 2008-11-03 00:23:42 -08:00
ip_forward.c net: reduce structures when XFRM=n 2008-10-28 13:24:06 -07:00
ip_fragment.c net: '&' redux 2008-11-03 18:21:05 -08:00
ip_gre.c net: fix tunnels in netns after ndo_ changes 2008-11-23 17:26:26 -08:00
ip_input.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2008-11-18 23:38:23 -08:00
ip_options.c cipso: Add support for native local labeling and fixup mapping names 2008-10-10 10:16:34 -04:00
ip_output.c net: avoid a pair of dst_hold()/dst_release() in ip_push_pending_frames() 2008-11-24 16:07:50 -08:00
ip_sockglue.c net: ip_sockglue.c add static, annotate ports' endianness 2008-11-20 01:54:27 -08:00
ipcomp.c netns xfrm: state lookup in netns 2008-11-25 17:30:50 -08:00
ipconfig.c net: replace NIPQUAD() in net/ipv4/ net/ipv6/ 2008-10-31 00:53:57 -07:00
ipip.c net: fix tunnels in netns after ndo_ changes 2008-11-23 17:26:26 -08:00
ipmr.c ipmr: merge common code 2008-12-16 01:15:11 -08:00
Kconfig IPVS: Move IPVS to net/netfilter/ipvs 2008-10-07 08:38:24 +11:00
Makefile IPVS: Move IPVS to net/netfilter/ipvs 2008-10-07 08:38:24 +11:00
netfilter.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2008-11-28 02:19:15 -08:00
proc.c net: Fix percpu counters deadlock 2008-12-29 23:04:08 -08:00
protocol.c net: remove CVS keywords 2008-06-11 21:00:38 -07:00
raw.c net: avoid a pair of dst_hold()/dst_release() in ip_append_data() 2008-11-24 15:52:46 -08:00
route.c cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits: net 2008-12-29 22:44:47 -08:00
syncookies.c tcp: Port redirection support for TCP 2008-10-01 07:46:49 -07:00
sysctl_net_ipv4.c net: '&' redux 2008-11-03 18:21:05 -08:00
tcp_bic.c [TCP]: BIC web page link is corrected. 2008-02-28 22:14:32 -08:00
tcp_cong.c net: Remove CONFIG_KMOD from net/ (towards removing CONFIG_KMOD entirely) 2008-10-16 15:24:51 -07:00
tcp_cubic.c [TCP] CUBIC v2.3 2008-11-02 00:28:10 -07:00
tcp_diag.c net: inet_diag_handler structs can be const 2008-11-19 15:43:27 -08:00
tcp_highspeed.c [TCP]: Cong.ctrl modules: remove unused good_ack from cong_avoid 2008-01-28 14:55:41 -08:00
tcp_htcp.c tcp_htcp: last_cong bug fix 2008-11-12 01:41:09 -08:00
tcp_hybla.c tcp: Fix tcp_hybla zero congestion window growth with small rho and large cwnd. 2008-10-07 15:58:17 -07:00
tcp_illinois.c [TCP]: Cong.ctrl modules: remove unused good_ack from cong_avoid 2008-01-28 14:55:41 -08:00
tcp_input.c net_dma: convert to dma_find_channel 2009-01-06 11:38:15 -07:00
tcp_ipv4.c net_dma: convert to dma_find_channel 2009-01-06 11:38:15 -07:00
tcp_lp.c [TCP]: Cong.ctrl modules: remove unused good_ack from cong_avoid 2008-01-28 14:55:41 -08:00
tcp_minisocks.c net: clean up net/ipv4/ipip.c raw.c tcp.c tcp_minisocks.c tcp_yeah.c xfrm4_policy.c 2008-11-03 00:24:34 -08:00
tcp_output.c tcp: Always set urgent pointer if it's beyond snd_nxt 2008-12-25 17:12:58 -08:00
tcp_probe.c net: replace NIPQUAD() in net/ipv4/ net/ipv6/ 2008-10-31 00:53:57 -07:00
tcp_scalable.c [TCP]: Cong.ctrl modules: remove unused good_ack from cong_avoid 2008-01-28 14:55:41 -08:00
tcp_timer.c tcp: Stop scaring users with "treason uncloaked!" 2008-12-18 22:27:42 -08:00
tcp_vegas.c tcp: tcp_vegas cong avoid fix 2008-12-09 00:13:04 -08:00
tcp_vegas.h [TCP]: congestion control API pass RTT in microseconds 2007-07-31 02:27:57 -07:00
tcp_veno.c net: fix returning void-valued expression warnings 2008-05-01 02:47:38 -07:00
tcp_westwood.c [TCP]: congestion control API pass RTT in microseconds 2007-07-31 02:27:57 -07:00
tcp_yeah.c net: clean up net/ipv4/ipip.c raw.c tcp.c tcp_minisocks.c tcp_yeah.c xfrm4_policy.c 2008-11-03 00:24:34 -08:00
tcp.c gso: Ensure that the packet is long enough 2009-01-14 20:41:12 -08:00
tunnel4.c [IPV4] TUNNEL4: Fix incoming packet length check for inter-protocol tunnel. 2008-06-05 04:02:33 +09:00
udp_impl.h udp: introduce struct udp_table and multiple spinlocks 2008-10-29 01:41:45 -07:00
udp.c net: udp_unhash() can test if sk is hashed 2008-11-25 13:55:15 -08:00
udplite.c udp: RCU handling for Unicast packets. 2008-10-29 02:11:14 -07:00
xfrm4_input.c ipsec: Remove useless ret variable 2008-12-26 01:31:18 -08:00
xfrm4_mode_beet.c ipsec: Interfamily IPSec BEET 2008-08-06 02:39:30 -07:00
xfrm4_mode_transport.c [IPSEC]: Use IPv6 calling convention as the convention for x->mode->output 2007-10-10 16:55:54 -07:00
xfrm4_mode_tunnel.c xfrm: fix fragmentation for ipv4 xfrm tunnel 2008-06-17 16:38:23 -07:00
xfrm4_output.c [IPSEC]: Fix inter address family IPsec tunnel handling. 2008-03-24 14:51:51 -07:00
xfrm4_policy.c netns xfrm: ->get_saddr in netns 2008-11-25 17:56:49 -08:00
xfrm4_state.c xfrm: remove useless forward declarations 2008-11-25 01:05:54 -08:00
xfrm4_tunnel.c [IPCOMP]: Fix reception of incompressible packets 2008-01-31 19:27:24 -08:00