linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-16 00:52:01 +00:00

Author	SHA1	Message	Date
Denis V. Lunev	3d58b5fa8e	[INET]: Rename inet_csk_ctl_sock_create to inet_ctl_sock_create. This call is nothing common with INET connection sockets code. It simply creates an unhashes kernel sockets for protocol messages. Move the new call into af_inet.c after the rename. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 14:22:32 -07:00
Denis V. Lunev	14c0c8e8e0	[TCP]: Replace socket with sock for reset sending. Replace tcp_socket with tcp_sock. This is more effective (less derefferences on fast paths). Additionally, the approach is unified to one used in ICMP. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 14:19:38 -07:00
Herbert Xu	af2681828a	[ICMP]: Ensure that ICMP relookup maintains status quo The ICMP relookup path is only meant to modify behaviour when appropriate IPsec policies are in place and marked as requiring relookups. It is certainly not meant to modify behaviour when IPsec policies don't exist at all. However, due to an oversight on the error paths existing behaviour may in fact change should one of the relookup steps fail. This patch corrects this by redirecting all errors on relookup failures to the previous code path. That is, if the initial xfrm_lookup let the packet pass, we will stand by that decision should the relookup fail due to an error. This should be safe from a security point-of-view because compliant systems must install a default deny policy so the packet would'nt have passed in that case. Many thanks to Julian Anastasov for pointing out this error. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 12:52:19 -07:00
David S. Miller	e1ec1b8ccd	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/s2io.c	2008-04-02 22:35:23 -07:00
Pavel Emelyanov	fd4e7b5045	[IPV4][NETNS]: Display per-net info in sockstat file. Besides, now we can see per-net fragments statistics in the same file, since this stats is already per-net. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-31 19:43:18 -07:00
Pavel Emelyanov	d0538ca355	[SOCK][NETNS]: Register sockstat(6) files in each net. Currently they live in init_net only, but now almost all the info they can provide is available per-net. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-31 19:42:37 -07:00
Pavel Emelyanov	c29a0bc4df	[SOCK][NETNS]: Add a struct net argument to sock_prot_inuse_add and _get. This counter is about to become per-proto-and-per-net, so we'll need two arguments to determine which cell in this "table" to work with. All the places, but proc already pass proper net to it - proc will be tuned a bit later. Some indentation with spaces in proc files is done to keep the file coding style consistent. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-31 19:41:46 -07:00
YOSHIFUJI Hideaki	b50660f1fe	[IP] UDP: Use SEQ_START_TOKEN. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-31 19:38:15 -07:00
Denis V. Lunev	4ad96d39a2	[UDP]: Remove owner from udp_seq_afinfo. Move it to udp_seq_afinfo->seq_fops as should be. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 18:25:53 -07:00
Denis V. Lunev	3ba9441bdf	[UDP]: Place file operations directly into udp_seq_afinfo. No need to have separate never-used variable. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 18:25:32 -07:00
Denis V. Lunev	a2be75c182	[UDP]: Cleanup /proc/udp[6] creation/removal. Replace seq_open with seq_open_net and remove udp_seq_release completely. seq_release_net will do this job just fine. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 18:25:06 -07:00
Denis V. Lunev	dda61925f8	[UDP]: Move seq_ops from udp_iter_state to udp_seq_afinfo. No need to create seq_operations for each instance of 'netstat'. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 18:24:26 -07:00
Denis V. Lunev	997feb5e7a	[UDP]: No need to check afinfo != NULL in udp_proc_(un)register. udp_proc_register/udp_proc_unregister are called with a static pointer only. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 18:24:01 -07:00
Denis V. Lunev	6f191efe48	[UDP]: Replace struct net on udp_iter_state with seq_net_private. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 18:23:33 -07:00
David S. Miller	e8e16b706e	[INET]: inet_frag_evictor() must run with BH disabled Based upon a lockdep trace from Dave Jones. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 17:30:18 -07:00
Pavel Emelyanov	bdcde3d71a	[SOCK]: Drop inuse pcounter from struct proto (v2). An uppercut - do not use the pcounter on struct proto. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 16:39:33 -07:00
Joe Perches	bc578a54f0	[NET]: Rename inet_frag.h identifiers COMPLETE, FIRST_IN, LAST_IN to INET_FRAG_* On Fri, 2008-03-28 at 03:24 -0700, Andrew Morton wrote: > they should all be renamed. Done for include/net and net Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 16:35:27 -07:00
Rusty Russell	32aced7509	[NET]: Don't send ICMP_FRAG_NEEDED for GSO packets Commit `9af3912ec9` ("[NET] Move DF check to ip_forward") added a new check to send ICMP fragmentation needed for large packets. Unlike the check in ip_finish_output(), it doesn't check for GSO. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 16:23:19 -07:00
David S. Miller	8e8e43843b	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/usb/rndis_host.c drivers/net/wireless/b43/dma.c net/ipv6/ndisc.c	2008-03-27 18:48:56 -07:00
Denis V. Lunev	8eeee8b152	[NETFILTER]: Replate direct proc_fops assignment with proc_create call. This elliminates infamous race during module loading when one could lookup proc entry without proc_fops assigned. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-27 16:55:53 -07:00
Thomas Graf	920fc941a9	[ESP]: Ensure IV is in linear part of the skb to avoid BUG() due to OOB access ESP does not account for the IV size when calling pskb_may_pull() to ensure everything it accesses directly is within the linear part of a potential fragment. This results in a BUG() being triggered when the both the IPv4 and IPv6 ESP stack is fed with an skb where the first fragment ends between the end of the esp header and the end of the IV. This bug was found by Dirk Nehring <dnehring@gmx.net> . Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-27 16:08:03 -07:00
Herbert Xu	732c8bd590	[IPSEC]: Fix BEET output The IPv6 BEET output function is incorrectly including the inner header in the payload to be protected. This causes a crash as the packet doesn't actually have that many bytes for a second header. The IPv4 BEET output on the other hand is broken when it comes to handling an inner IPv6 header since it always assumes an inner IPv4 header. This patch fixes both by making sure that neither BEET output function touches the inner header at all. All access is now done through the protocol-independent cb structure. Two new attributes are added to make this work, the IP header length and the IPv4 option length. They're filled in by the inner mode's output function. Thanks to Joakim Koskela for finding this problem. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-26 16:51:09 -07:00
Pavel Emelyanov	7c0ecc4c4f	[ICMP]: Dst entry leak in icmp_send host re-lookup code (v2). Commit `8b7817f3a9` ([IPSEC]: Add ICMP host relookup support) introduced some dst leaks on error paths: the rt pointer can be forgotten to be put. Fix it bu going to a proper label. Found after net namespace's lo refused to unregister :) Many thanks to Den for valuable help during debugging. Herbert pointed out, that xfrm_lookup() will put the rtable in case of error itself, so the first goto fix is redundant. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-26 02:27:09 -07:00
Pavel Emelyanov	789e41e6f4	[NETNS][ICMP]: Build fix for NET_NS=n case (dev->nd_net is omitted). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-26 02:19:25 -07:00
Pavel Emelyanov	b34a95ee6e	[NETNS][ICMP]: Use per-net sysctls in ipv4/icmp.c. This mostly re-uses the net, used in icmp netnsization patches from Denis. After this ICMP sysctls are completely virtualized. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-26 02:00:21 -07:00
Pavel Emelyanov	68528f0998	[NETNS][ICMP]: Make ctl tables for ICMP sysctls per-net. Add some flesh to ipv4_sysctl_init_net and ipv4_sysctl_exit_net, i.e. copy the table, alter .data pointers and register it per-net. Other ipv4_table's sysctls are now global, but this is going to change once sysctl permissions patches migrate from -mm tree to mainline in 2.6.26 merge window :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-26 01:56:24 -07:00
Pavel Emelyanov	a24022e188	[NETNS][ICMP]: Move ICMP sysctls on struct net. Initialization is moved to icmp_sk_init, all the places, that refer to them use init_net for now. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-26 01:55:37 -07:00
Pavel Emelyanov	1577519d6b	[NETNS][ICMP]: Register pernet subsys to make ICMP sysctls per-net. This includes adding pernet_operations, empty init and exit hooks and a bit of changes in sysctl_ipv4_init just not to have this part in next patches. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-26 01:54:18 -07:00
Patrick McHardy	f49e1aa133	[NETFILTER]: nf_conntrack_sip: update copyright Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-25 20:27:05 -07:00
Patrick McHardy	c7f485abd6	[NETFILTER]: nf_conntrack_sip: RTP routing optimization Optimize call routing between NATed endpoints: when an external registrar sends a media description that contains an existing RTP expectation from a different SNATed connection, the gatekeeper is trying to route the call directly between the two endpoints. We assume both endpoints can reach each other directly and "un-NAT" the addresses, which makes the media stream go between the two endpoints directly. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-25 20:26:43 -07:00
Patrick McHardy	4ab9e64e5e	[NETFILTER]: nf_nat_sip: split up SDP mangling The SDP connection addresses may be contained in the payload multiple times (in the session description and/or once per media description), currently only the session description is properly updated. Split up SDP mangling so the function setting up expectations only updates the media port, update connection addresses from media descriptions while parsing them and at the end update the session description when the final addresses are known. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-25 20:26:08 -07:00
Patrick McHardy	a9c1d35917	[NETFILTER]: nf_conntrack_sip: create RTCP expectations Create expectations for the RTCP connections in addition to RTP connections. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-25 20:25:49 -07:00
Patrick McHardy	0f32a40fc9	[NETFILTER]: nf_conntrack_sip: create signalling expectations Create expectations for incoming signalling connections when seeing a REGISTER request. This is needed when the registrar uses a different source port number for signalling messages and for receiving incoming calls from other endpoints than the registrar. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-25 20:25:13 -07:00
Patrick McHardy	c978cd3a93	[NETFILTER]: nf_nat_sip: translate all Contact headers The SIP message may contain multiple Contact: addresses referring to the NATed endpoint, translate all of them. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-25 20:24:57 -07:00
Patrick McHardy	720ac7085c	[NETFILTER]: nf_nat_sip: translate all Via headers Update maddr=, received= and rport= Via-header parameters refering to the signalling connection. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-25 20:24:41 -07:00
Patrick McHardy	33cb1e9a93	[NETFILTER]: nf_conntrack_sip: perform NAT after parsing Perform NAT last after parsing the packet. This makes no difference currently, but is needed when dealing with registrations to make sure we seen the unNATed addresses. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-25 20:22:37 -07:00
Patrick McHardy	624f8b7bba	[NETFILTER]: nf_nat_sip: get rid of text based header translation Use the URI parsing helper to get the numerical addresses and get rid of the text based header translation. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-25 20:19:30 -07:00
Patrick McHardy	ea45f12a27	[NETFILTER]: nf_conntrack_sip: parse SIP headers properly Introduce new function for SIP header parsing that properly deals with continuation lines and whitespace in headers and use it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-25 20:18:57 -07:00
Patrick McHardy	ac3677406d	[NETFILTER]: nf_conntrack_sip: kill request URI "header" definitions The request URI is not a header and needs to be treated differently than real SIP headers. Add a seperate function for parsing it and get rid of the POS_REQ_URI/POS_REG_REQ_URI definitions. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-25 20:18:40 -07:00
Patrick McHardy	3e9b4600b4	[NETFILTER]: nf_conntrack_sip: add seperate SDP header parsing function SDP and SIP headers are quite different, SIP can have continuation lines, leading and trailing whitespace after the colon and is mostly case-insensitive while SDP headers always begin on a new line and are followed by an equal sign and the value, without any whitespace. Introduce new SDP header parsing function and convert all users that used the SIP header parsing function. This will allow to properly deal with the special SIP cases in the SIP header parsing function later. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-25 20:17:55 -07:00
Patrick McHardy	779382eb32	[NETFILTER]: nf_conntrack_sip: use strlen/strcmp Replace sizeof/memcmp by strlen/strcmp. Use case-insensitive comparison for SIP methods and the SIP/2.0 string, as specified in RFC 3261. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-25 20:17:36 -07:00
Patrick McHardy	212440a7d0	[NETFILTER]: nf_conntrack_sip: remove redundant function arguments The conntrack reference and ctinfo can be derived from the packet. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-25 20:17:13 -07:00
Patrick McHardy	2a6cfb22ae	[NETFILTER]: nf_conntrack_sip: adjust dptr and datalen after packet mangling After mangling the packet, the pointer to the data and the length of the data portion may change and need to be adjusted. Use double data pointers and a pointer to the length everywhere and add a helper function to the NAT helper for performing the adjustments. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-25 20:16:54 -07:00
Patrick McHardy	3d244121d8	[NETFILTER]: nf_nat_sip: fix NAT setup order We need to set up the destination NAT mapping before the source NAT mapping, so the NAT core gets to see the final tuple and can decide whether the source port needs to be remapped. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-25 20:09:51 -07:00
Patrick McHardy	6002f266b3	[NETFILTER]: nf_conntrack: introduce expectation classes and policies Introduce expectation classes and policies. An expectation class is used to distinguish different types of expectations by the same helper (for example audio/video/t.120). The expectation policy is used to hold the maximum number of expectations and the initial timeout for each class. The individual classes are isolated from each other, which means that for example an audio expectation will only evict other audio expectations. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-25 20:09:15 -07:00
Patrick McHardy	30c69fed7d	[NETFILTER]: ipt_CLUSTERIP: fix non-existant macro-name With nf_conntrack DUMP_TUPLE got renamed to NF_CT_DUMP_TUPLE, fix CLUSTERIP to use the proper macro name. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-25 20:06:59 -07:00
YOSHIFUJI Hideaki	878628fbf2	[NET] NETNS: Omit namespace comparision without CONFIG_NET_NS. Introduce an inline net_eq() to compare two namespaces. Without CONFIG_NET_NS, since no namespace other than &init_net exists, it is always 1. We do not need to convert 1) inline vs inline and 2) inline vs &init_net comparisons. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-26 04:40:00 +09:00
YOSHIFUJI Hideaki	1218854afa	[NET] NETNS: Omit seq_net_private->net without CONFIG_NET_NS. Without CONFIG_NET_NS, no namespace other than &init_net exists, no need to store net in seq_net_private. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-26 04:39:56 +09:00
YOSHIFUJI Hideaki	3b1e0a655f	[NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS. Introduce per-sock inlines: sock_net(), sock_net_set() and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-26 04:39:55 +09:00
YOSHIFUJI Hideaki	c346dca108	[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS. Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-26 04:39:53 +09:00
YOSHIFUJI Hideaki	c8cdaf998d	[IPV4,IPV6]: Share cork.rt between IPv4 and IPv6. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-25 10:23:59 +09:00
Denis V. Lunev	92f1fecb45	[NETNS]: Enable TCP/UDP/ICMP inside namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-24 15:34:06 -07:00
Denis V. Lunev	2342fd7e14	[NETNS]: Allow to create sockets in non-initial namespace. Allow to create sockets in the namespace if the protocol ok with this. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-24 15:33:42 -07:00
Denis V. Lunev	f145049a06	[NETNS]: Drop packets in the non-initial namespace on the per/protocol basis. IP layer now can handle multiple namespaces normally. So, process such packets normally and drop them only if the transport layer is not aware about namespaces. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-24 15:33:00 -07:00
Denis V. Lunev	05cf89d40c	[NETNS]: Process INET socket layer in the correct namespace. Replace all the reast of the init_net with a proper net on the socket layer. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-24 15:31:35 -07:00
Denis V. Lunev	cb84663e4d	[NETNS]: Process IP layer in the context of the correct namespace. Replace all the rest of the init_net with a proper net on the IP layer. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-24 15:31:00 -07:00
Denis V. Lunev	7a6adb92fe	[NETNS]: Add namespace parameter to ip_cmsg_send. Pass the init_net there for now. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-24 15:30:27 -07:00
Denis V. Lunev	f2c4802b3f	[NETNS]: Add namespace parameter to ip_options_get(...). Pass the init_net there for now. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-24 15:29:55 -07:00
Denis V. Lunev	0e6bd4a1c6	[NETNS]: Add namespace parameter to ip_options_compile. ip_options_compile uses inet_addr_type which requires a namespace. The packet argument is optional, so parameter is the only way to obtain it. Pass the init_net there for now. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-24 15:29:23 -07:00
Denis V. Lunev	ffc31d3d77	[NETNS]: /proc/net/arp namespacing. Seqfile operation showing /proc/net/arp are already namespace aware. All we need is to register this file for each namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-24 15:28:43 -07:00
Denis V. Lunev	49e8a279a1	[NETNS]: Process ARP in the context of the correct namespace. Get namespace from a device and pass it to the routing engine. Enable ARP packet processing and device notifiers after that. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-24 15:28:12 -07:00
Pavel Emelyanov	84c375af0f	[NETNS][UDP-Lite]: Register /proc/net/udplite(6) in a namespace. UDP-Lite sockets are displayed in another files, rather than UDP ones, so make the present in namespaces as well. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-24 14:56:57 -07:00
Pavel Emelyanov	ff2bac6a63	[UDP-Lite]: Clean up proc creation a bit. Just introduce a helper to remove ifdefs from inside the udplite4_register function. This will help to make the next patch nicer. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-24 14:56:34 -07:00
Pavel Emelyanov	757764f61d	[NETNS][TCP]: Register /proc/net/tcp in a namespace. After the commit `f40c8174d3` ([NETNS][IPV4] tcp - make proc handle the network namespaces) it is now possible to make this file present in newly created namespaces. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-24 14:56:02 -07:00
Pavel Emelyanov	15439febb0	[NETNS][UDP]: Register /proc/net/udp in a namespace. After the commit `a91275eff4` ([NETNS][IPV6] udp - make proc handle the network namespace) it is now possible to make this file present in newly created namespaces. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-24 14:53:49 -07:00
Kazunori MIYAZAWA	df9dcb4588	[IPSEC]: Fix inter address family IPsec tunnel handling. Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-24 14:51:51 -07:00
David S. Miller	06802a819a	Merge branch 'master' of ../net-2.6/ Conflicts: net/ipv6/ndisc.c	2008-03-23 22:54:03 -07:00
Stephen Hemminger	3d3b2d25a4	fib_trie: print information on all routing tables Make /proc/net/fib_trie and /proc/net/fib_triestat display all routing tables, not just local and main. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-23 22:43:56 -07:00
Florian Westphal	2051f11fb8	[TCP]: Shrink syncookie_secret by 8 byte. the first u32 copied from syncookie_secret is overwritten by the minute-counter four lines below. After adjusting the destination address, the size of syncookie_secret can be reduced accordingly. AFAICS, the only other user of syncookie_secret[] is the ipv6 syncookie support. Because ipv6 syncookies only grab 44 bytes from syncookie_secret[], this shouldn't affect them in any way. With fixes from Glenn Griffin. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Glenn Griffin <ggriffin.kernel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-23 22:21:28 -07:00
Stephen Hemminger	6440cc9e0f	[IPV4] fib_trie: fix warning from rcu_assign_poinger This gets rid of a warning caused by the test in rcu_assign_pointer. I tried to fix rcu_assign_pointer, but that devolved into a long set of discussions about doing it right that came to no real solution. Since the test in rcu_assign_pointer for constant NULL would never succeed in fib_trie, just open code instead. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 17:59:58 -07:00
Stephen Hemminger	817bc4db77	[IPV4] route: use read_mostly The route table parameters are set based on system memory and sysctl values that almost never change. Also the genid only changes every 10 minutes. RTprint is defined by never used. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 17:43:59 -07:00
Denis V. Lunev	ce25999078	[IPV4]: sk parameter is unused in ipv4_dst_blackhole. Just remove it. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 17:42:37 -07:00
Pavel Emelyanov	fc8717baa8	[RAW]: Add raw_hashinfo member on struct proto. Sorry for the patch sequence confusion :\| but I found that the similar thing can be done for raw sockets easily too late. Expand the proto.h union with the raw_hashinfo member and use it in raw_prot and rawv6_prot. This allows to drop the protocol specific versions of hash and unhash callbacks. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 16:56:51 -07:00
Pavel Emelyanov	6ba5a3c52d	[UDP]: Make full use of proto.h.udp_hash innovation. After this we have only udp_lib_get_port to get the port and two stubs for ipv4 and ipv6. No difference in udp and udplite except for initialized h.udp_hash member. I tried to find a graceful way to drop the only difference between udp_v4_get_port and udp_v6_get_port (i.e. the rcv_saddr comparison routine), but adding one more callback on the struct proto didn't appear such :( Maybe later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 16:51:21 -07:00
Pavel Emelyanov	39d8cda76c	[SOCK]: Add udp_hash member to struct proto. Inspired by the commit `ab1e0a13` ([SOCK] proto: Add hashinfo member to struct proto) from Arnaldo, I made similar thing for UDP/-Lite IPv4 and -v6 protocols. The result is not that exciting, but it removes some levels of indirection in udpxxx_get_port and saves some space in code and text. The first step is to union existing hashinfo and new udp_hash on the struct proto and give a name to this union, since future initialization of tcpxxx_prot, dccp_vx_protinfo and udpxxx_protinfo will cause gcc warning about inability to initialize anonymous member this way. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 16:50:58 -07:00
Denis V. Lunev	22aba383ce	[IPV4]: Always pass ip_options pointer into ip_options_compile. This makes code a bit more uniform and straigthforward. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 16:36:20 -07:00
Denis V. Lunev	ef722495c8	[IPV4]: Remove unused ip_options->is_data. ip_options->is_data is assigned only and never checked. The structure is not a part of kernel interface to the userspace. So, it is safe to remove this field. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 16:35:29 -07:00
Denis V. Lunev	10fe7d85e2	[IPV4]: Remove unnecessary check for opt->is_data in ip_options_compile. There is the only way to reach ip_options compile with opt != NULL: ip_options_get_finish opt->is_data = 1; ip_options_compile(opt, NULL) So, checking for is_data inside opt != NULL branch is not needed. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 16:35:00 -07:00
Herbert Xu	69d1506731	[TCP]: Let skbs grow over a page on fast peers While testing the virtio-net driver on KVM with TSO I noticed that TSO performance with a 1500 MTU is significantly worse compared to the performance of non-TSO with a 16436 MTU. The packet dump shows that most of the packets sent are smaller than a page. Looking at the code this actually is quite obvious as it always stop extending the packet if it's the first packet yet to be sent and if it's larger than the MSS. Since each extension is bound by the page size, this means that (given a 1500 MTU) we're very unlikely to construct packets greater than a page, provided that the receiver and the path is fast enough so that packets can always be sent immediately. The fix is also quite obvious. The push calls inside the loop is just an optimisation so that we don't end up doing all the sending at the end of the loop. Therefore there is no specific reason why it has to do so at MSS boundaries. For TSO, the most natural extension of this optimisation is to do the pushing once the skb exceeds the TSO size goal. This is what the patch does and testing with KVM shows that the TSO performance with a 1500 MTU easily surpasses that of a 16436 MTU and indeed the packet sizes sent are generally larger than 16436. I don't see any obvious downsides for slower peers or connections, but it would be prudent to test this extensively to ensure that those cases don't regress. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 15:47:05 -07:00
Patrick McManus	ec3c0982a2	[TCP]: TCP_DEFER_ACCEPT updates - process as established Change TCP_DEFER_ACCEPT implementation so that it transitions a connection to ESTABLISHED after handshake is complete instead of leaving it in SYN-RECV until some data arrvies. Place connection in accept queue when first data packet arrives from slow path. Benefits: - established connection is now reset if it never makes it to the accept queue - diagnostic state of established matches with the packet traces showing completed handshake - TCP_DEFER_ACCEPT timeouts are expressed in seconds and can now be enforced with reasonable accuracy instead of rounding up to next exponential back-off of syn-ack retry. Signed-off-by: Patrick McManus <mcmanus@ducksong.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 16:33:01 -07:00
Patrick McManus	e4c7884028	[TCP]: TCP_DEFER_ACCEPT updates - dont retxmt synack a socket in LISTEN that had completed its 3 way handshake, but not notified userspace because of SO_DEFER_ACCEPT, would retransmit the already acked syn-ack during the time it was waiting for the first data byte from the peer. Signed-off-by: Patrick McManus <mcmanus@ducksong.com> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 16:29:22 -07:00
Patrick McManus	539fae89be	[TCP]: TCP_DEFER_ACCEPT updates - defer timeout conflicts with max_thresh timeout associated with SO_DEFER_ACCEPT wasn't being honored if it was less than the timeout allowed by the maximum syn-recv queue size algorithm. Fix by using the SO_DEFER_ACCEPT value if the ack has arrived. Signed-off-by: Patrick McManus <mcmanus@ducksong.com> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 16:27:38 -07:00
Pavel Emelyanov	28518fc170	[NET]: NULL pointer dereference and other nasty things in /proc/net/(tcp\|udp)[6] Commits f40c81 ([NETNS][IPV4] tcp - make proc handle the network namespaces) and a91275 ([NETNS][IPV6] udp - make proc handle the network namespace) both introduced bad checks on sockets and tw buckets to belong to proper net namespace. I.e. when checking for socket to belong to given net and family the do { sk = sk_next(sk); } while (sk && sk->sk_net != net && sk->sk_family != family); constructions were used. This is wrong, since as soon as the sk->sk_net fits the net the socket is immediately returned, even if it belongs to other family. As the result four /proc/net/(udp\|tcp)[6] entries show wrong info. The udp6 entry even oopses when dereferencing inet6_sk(sk) pointer: static void udp6_sock_seq_show(struct seq_file seq, struct sock sp, int bucket) { ... struct ipv6_pinfo np = inet6_sk(sp); ... dest = &np->daddr; / will be NULL for AF_INET sockets */ ... seq_printf(... dest->s6_addr32[0], dest->s6_addr32[1], dest->s6_addr32[2], dest->s6_addr32[3], ... Fix it by converting && to \|\|. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 15:52:00 -07:00
Phil Oester	12b101555f	[IPV4]: Fix null dereference in ip_defrag Been seeing occasional panics in my testing of 2.6.25-rc in ip_defrag. Offending line in ip_defrag is here: net = skb->dev->nd_net where dev is NULL. Bisected the problem down to commit `ac18e7509e` ([NETNS][FRAGS]: Make the inet_frag_queue lookup work in namespaces). Below patch (idea from Patrick McHardy) fixes the problem for me. Signed-off-by: Phil Oester <kernel@linuxace.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 15:01:50 -07:00
Daniel Lezcano	6f8b13bcb3	[NETNS][IPV6] tcp6 - make proc per namespace Make the proc for tcp6 to be per namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 04:14:45 -07:00
Daniel Lezcano	0c96d8c50b	[NETNS][IPV6] udp6 - make proc per namespace The proc init/exit functions take a new network namespace parameter in order to register/unregister /proc/net/udp6 for a namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 04:14:17 -07:00
Daniel Lezcano	f40c8174d3	[NETNS][IPV4] tcp - make proc handle the network namespaces This patch, like udp proc, makes the proc functions to take care of which namespace the socket belongs. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 04:13:54 -07:00
Daniel Lezcano	8d9f1744ca	[NETNS][IPV6] tcp - assign the netns for timewait sockets Copy the network namespace from the socket to the timewait socket. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 04:12:54 -07:00
Daniel Lezcano	a91275eff4	[NETNS][IPV6] udp - make proc handle the network namespace This patch makes the common udp proc functions to take care of which socket they should show taking into account the namespace it belongs. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 04:11:58 -07:00
Peter P Waskiewicz Jr	82cc1a7a56	[NET]: Add per-connection option to set max TSO frame size Update: My mailer ate one of Jarek's feedback mails... Fixed the parameter in netif_set_gso_max_size() to be u32, not u16. Fixed the whitespace issue due to a patch import botch. Changed the types from u32 to unsigned int to be more consistent with other variables in the area. Also brought the patch up to the latest net-2.6.26 tree. Update: Made gso_max_size container 32 bits, not 16. Moved the location of gso_max_size within netdev to be less hotpath. Made more consistent names between the sock and netdev layers, and added a define for the max GSO size. Update: Respun for net-2.6.26 tree. Update: changed max_gso_frame_size and sk_gso_max_size from signed to unsigned - thanks Stephen! This patch adds the ability for device drivers to control the size of the TSO frames being sent to them, per TCP connection. By setting the netdevice's gso_max_size value, the socket layer will set the GSO frame size based on that value. This will propogate into the TCP layer, and send TSO's of that size to the hardware. This can be desirable to help tune the bursty nature of TSO on a per-adapter basis, where one may have 1 GbE and 10 GbE devices coexisting in a system, one running multiqueue and the other not, etc. This can also be desirable for devices that cannot support full 64 KB TSO's, but still want to benefit from some level of segmentation offloading. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 03:43:19 -07:00
David S. Miller	a25606c845	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	2008-03-21 03:42:24 -07:00
Patrick McHardy	607bfbf2d5	[TCP]: Fix shrinking windows with window scaling When selecting a new window, tcp_select_window() tries not to shrink the offered window by using the maximum of the remaining offered window size and the newly calculated window size. The newly calculated window size is always a multiple of the window scaling factor, the remaining window size however might not be since it depends on rcv_wup/rcv_nxt. This means we're effectively shrinking the window when scaling it down. The dump below shows the problem (scaling factor 2^7): - Window size of 557 (71296) is advertised, up to 3111907257: IP 172.2.2.3.33000 > 172.2.2.2.33000: . ack 3111835961 win 557 <...> - New window size of 514 (65792) is advertised, up to 3111907217, 40 bytes below the last end: IP 172.2.2.3.33000 > 172.2.2.2.33000: . 3113575668:3113577116(1448) ack 3111841425 win 514 <...> The number 40 results from downscaling the remaining window: 3111907257 - 3111841425 = 65832 65832 / 2^7 = 514 65832 % 2^7 = 40 If the sender uses up the entire window before it is shrunk, this can have chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq() will notice that the window has been shrunk since tcp_wnd_end() is before tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number. This will fail the receivers checks in tcp_sequence() however since it is before it's tp->rcv_wup, making it respond with a dupack. If both sides are in this condition, this leads to a constant flood of ACKs until the connection times out. Make sure the window is never shrunk by aligning the remaining window to the window scaling factor. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-20 16:11:27 -07:00
Daniel Hokka Zakrisson	d0ebf13359	[NETFILTER]: ipt_recent: sanity check hit count If a rule using ipt_recent is created with a hit count greater than ip_pkt_list_tot, the rule will never match as it cannot keep track of enough timestamps. This patch makes ipt_recent refuse to create such rules. With ip_pkt_list_tot's default value of 20, the following can be used to reproduce the problem. nc -u -l 0.0.0.0 1234 & for i in `seq 1 100`; do echo $i \| nc -w 1 -u 127.0.0.1 1234; done This limits it to 20 packets: iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \ --rsource iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \ 60 --hitcount 20 --name test --rsource -j DROP While this is unlimited: iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \ --rsource iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \ 60 --hitcount 21 --name test --rsource -j DROP With the patch the second rule-set will throw an EINVAL. Reported-by: Sean Kennedy <skennedy@vcn.com> Signed-off-by: Daniel Hokka Zakrisson <daniel@hozac.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-20 15:07:10 -07:00
Robert P. J. Day	938b93adb2	[NET]: Add debugging names to __RW_LOCK_UNLOCKED macros. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-18 00:59:23 -07:00
David S. Miller	577f99c1d0	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/rt2x00/rt2x00dev.c net/8021q/vlan_dev.c	2008-03-18 00:37:55 -07:00
Al Viro	5e226e4d90	[IPV4]: esp_output() misannotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-17 22:50:23 -07:00
Al Viro	e6f1cebf71	[NET] endianness noise: INADDR_ANY Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-17 22:44:53 -07:00
Ilpo Järvinen	5ea3a74806	[TCP]: Prevent sending past receiver window with TSO (at last skb) With TSO it was possible to send past the receiver window when the skb to be sent was the last in the write queue while the receiver window is the limiting factor. One can notice that there's a loophole in the tcp_mss_split_point that lacked a receiver window check for the tcp_write_queue_tail() if also cwnd was smaller than the full skb. Noticed by Thomas Gleixner <tglx@linutronix.de> in form of "Treason uncloaked! Peer ... shrinks window .... Repaired." messages (the peer didn't actually shrink its window as the message suggests, we had just sent something past it without a permission to do so). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Tested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-11 17:55:27 -07:00
David S. Miller	db8dac20d5	[UDP]: Revert udplite and code split. This reverts commit `db1ed684f6` ("[IPV6] UDP: Rename IPv6 UDP files."), commit `8be8af8fa4` ("[IPV4] UDP: Move IPv4-specific bits to other file.") and commit `e898d4db27` ("[UDP]: Allow users to configure UDP-Lite."). First, udplite is of such small cost, and it is a core protocol just like TCP and normal UDP are. We spent enormous amounts of effort to make udplite share as much code with core UDP as possible. All of that work is less valuable if we're just going to slap a config option on udplite support. It is also causing build failures, as reported on linux-next, showing that the changeset was not tested very well. In fact, this is the second build failure resulting from the udplite change. Finally, the config options provided was a bool, instead of a modular option. Meaning the udplite code does not even get build tested by allmodconfig builds, and furthermore the user is not presented with a reasonable modular build option which is particularly needed by distribution vendors. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-06 16:22:02 -08:00
Harvey Harrison	0dc47877a3	net: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-05 20:47:47 -08:00
Eric Dumazet	ee6b967301	[IPV4]: Add 'rtable' field in struct sk_buff to alias 'dst' and avoid casts (Anonymous) unions can help us to avoid ugly casts. A common cast it the (struct rtable )skb->dst one. Defining an union like : union { struct dst_entry dst; struct rtable *rtable; }; permits to use skb->rtable in place. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-05 18:30:47 -08:00
David S. Miller	255333c1db	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/rc80211_pid_algo.c	2008-03-05 12:26:41 -08:00
Stephen Hemminger	dea75bdfa5	[IPCONFIG]: The kernel gets no IP from some DHCP servers From: Stephen Hemminger <shemminger@linux-foundation.org> Based upon a patch by Marcel Wappler: This patch fixes a DHCP issue of the kernel: some DHCP servers (i.e. in the Linksys WRT54Gv5) are very strict about the contents of the DHCPDISCOVER packet they receive from clients. Table 5 in RFC2131 page 36 requests the fields 'ciaddr' and 'siaddr' MUST be set to '0'. These DHCP servers ignore Linux kernel's DHCP discovery packets with these two fields set to '255.255.255.255' (in contrast to popular DHCP clients, such as 'dhclient' or 'udhcpc'). This leads to a not booting system. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-04 17:03:49 -08:00
Herbert Xu	ed58dd41f3	[ESP]: Add select on AUTHENC Now the ESP uses the AEAD interface even for algorithms which are not combined mode, we need to select CONFIG_CRYPTO_AUTHENC as otherwise only combined mode algorithms will work. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-04 14:29:21 -08:00
Sangtae Ha	6b3d626321	[TCP]: TCP cubic v2.2 We have updated CUBIC to fix some issues with slow increase in large BDP networks. We also improved its convergence speed. The fix is in fact very simple -- the window increase limit of smax during the window probing phase (i.e., convex growth phase) is removed. We found that this does not affect TCP friendliness, but only improves its scalability. We have run some tests in our lab and also over the Internet path from NCSU to Japan. These results can be seen from the following page: http://netsrv.csc.ncsu.edu/wiki/index.php/Intra_protocol_fairness_testing_with_linux-2.6.23.9 http://netsrv.csc.ncsu.edu/wiki/index.php/RTT_fairness_testing_with_linux-2.6.23.9 http://netsrv.csc.ncsu.edu/wiki/index.php/TCP_friendliness_testing_with_linux-2.6.23.9 Signed-off-by: Sangtae Ha <sha2@ncsu.edu> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-04 14:17:41 -08:00
YOSHIFUJI Hideaki	8be8af8fa4	[IPV4] UDP: Move IPv4-specific bits to other file. Move IPv4-specific UDP bits from net/ipv4/udp.c into (new) net/ipv4/udp_ipv4.c. Rename net/ipv4/udplite.c to net/ipv4/udplite_ipv4.c. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-04 15:18:22 +09:00
YOSHIFUJI Hideaki	e898d4db27	[UDP]: Allow users to configure UDP-Lite. Let's give users an option for disabling UDP-Lite (~4K). old: \| text data bss dec hex filename \| 286498 12432 6072 305002 4a76a net/ipv4/built-in.o \| 193830 8192 3204 205226 321aa net/ipv6/ipv6.o new (without UDP-Lite): \| text data bss dec hex filename \| 284086 12136 5432 301654 49a56 net/ipv4/built-in.o \| 191835 7832 3076 202743 317f7 net/ipv6/ipv6.o Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-04 15:18:22 +09:00
Glenn Griffin	c6aefafb7e	[TCP]: Add IPv6 support to TCP SYN cookies Updated to incorporate Eric's suggestion of using a per cpu buffer rather than allocating on the stack. Just a two line change, but will resend in it's entirety. Signed-off-by: Glenn Griffin <ggriffin.kernel@gmail.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-04 15:18:21 +09:00
Eric Dumazet	11baab7ac3	[TCP]: lower stack usage in cookie_hash() function 400 bytes allocated on stack might be a litle bit too much. Using a per_cpu var is more friendly. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-04 15:18:21 +09:00
Pavel Emelyanov	988b705077	[ARP]: Introduce the arp_hdr_len helper. There are some place, that calculate the ARP header length. These calculations are correct, but a) some operate with "magic" constants, b) enlarge the code length (sometimes at the cost of coding style), c) are not informative from the first glance. The proposal is to introduce a helper, that includes all the good sides of these calculations. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-03 12:20:57 -08:00
Ilpo Järvinen	d152a7d88a	[TCP]: Must count fack_count also when skipping It makes fackets_out to grow too slowly compared with the real write queue. This shouldn't cause those BUG_TRAP(packets <= tp->packets_out) to trigger but how knows how such inconsistent fackets_out affects here and there around TCP when everything is nowadays assuming accurate fackets_out. So lets see if this silences them all. Reported by Guillaume Chazarain <guichaz@gmail.com>. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-03 12:10:16 -08:00
Denis V. Lunev	7cd04fa7e3	[TCP]: Merge exit paths in tcp_v4_conn_request. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-03 11:59:32 -08:00
Denis V. Lunev	da7ef338a2	[IPV4]: skb->dst can't be NULL in ip_options_echo. ip_options_echo is called on the packet input path after the initial routing. The dst entry on the packet is cleared only in the several very specific places and immidiately assigned back (may be new). Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-03 11:50:10 -08:00
Denis V. Lunev	1d1c8d13c4	[ICMP]: Section conflict between icmp_sk_init/icmp_sk_exit. Functions from __exit section should not be called from ones in __init section. Fix this conflict. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 14:15:19 -08:00
Denis V. Lunev	fd80eb942a	[INET]: Remove struct dst_entry *dst from request_sock_ops.rtx_syn_ack. It looks like dst parameter is used in this API due to historical reasons. Actually, it is really used in the direct call to tcp_v4_send_synack only. So, create a wrapper for tcp_v4_send_synack and remove dst from rtx_syn_ack. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:43:03 -08:00
Pavel Emelyanov	665bba1087	[NETFILTER/RXRPC]: Don't use seq_release_private where inappropriate. Some netfilter code and rxrpc one use seq_open() to open a proc file, but seq_release_private to release one. This is harmless, but ambiguous. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:39:17 -08:00
Denis V. Lunev	4a6ad7a141	[NETNS]: Make icmp_sk per namespace. All preparations are done. Now just add a hook to perform an initialization on namespace startup and replace icmp_sk macro with proper inline call. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:19:58 -08:00
Denis V. Lunev	5c8cafd65e	[NETNS]: icmp(v6)_sk should not pin a namespace. So, change icmp(v6)_sk creation/disposal to the scheme used in the netlink for rtnl, i.e. create a socket in the context of the init_net and assign the namespace without getting a referrence later. Also use sk_release_kernel instead of sock_release to properly destroy such sockets. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:19:22 -08:00
Denis V. Lunev	79c9115953	[ICMP]: Allocate data for __icmp(v6)_sk dynamically. Own __icmp(v6)_sk should be present in each namespace. So, it should be allocated dynamically. Though, alloc_percpu does not fit the case as it implies additional dereferrence for no bonus. Allocate data for pointers just like __percpu_alloc_mask does and place pointers to struct sock into this array. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:17:11 -08:00
Denis V. Lunev	405666db84	[ICMP]: Pass proper ICMP socket into icmp(v6)_xmit_(un)lock. We have to get socket lock inside icmp(v6)_xmit_lock/unlock. The socket is get from global variable now. When this code became namespaces, one should pass a namespace and get socket from it. Though, above is useless. Socket is available in the caller, just pass it inside. This saves a bit of code now and saves more later. add/remove: 0/0 grow/shrink: 1/3 up/down: 1/-169 (-168) function old new delta icmp_rcv 718 719 +1 icmpv6_rcv 2343 2303 -40 icmp_send 1566 1518 -48 icmp_reply 549 468 -81 Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:16:46 -08:00
Denis V. Lunev	b7e729c4b4	[ICMP]: Store sock rather than socket for ICMP flow control. Basically, there is no difference, what to store: socket or sock. Though, sock looks better as there will be 1 less dereferrence on the fast path. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:16:08 -08:00
Denis V. Lunev	1e3cf6834e	[ICMP]: Optimize icmp_socket usage. Use this macro only once in a function to save a bit of space. add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-98 (-98) function old new delta icmp_reply 562 561 -1 icmp_push_reply 305 258 -47 icmp_init 273 223 -50 Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:15:42 -08:00
Denis V. Lunev	a5710d6582	[ICMP]: Add return code to icmp_init. icmp_init could fail and this is normal for namespace other than initial. So, the panic should be triggered only on init_net initialization path. Additionally create rollback path for icmp_init as a separate function. It will also be used later during namespace destruction. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:14:50 -08:00
Denis V. Lunev	9b0f976f27	[INET]: Remove struct net_proto_family* from _init calls. struct net_proto_family* is not used in icmp[v6]_init, ndisc_init, igmp_init and tcp_v4_init. Remove it. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:13:15 -08:00
Sangtae Ha	0bc8c7bf9e	[TCP]: BIC web page link is corrected. Signed-off-by: Sangtae Ha <sha2@ncsu.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 22:14:32 -08:00
Denis V. Lunev	c4544c7243	[NETNS]: Process inet_select_addr inside a namespace. The context is available from a network device passed in. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:52:54 -08:00
Denis V. Lunev	3776c8891a	[NETNS]: Enable IPv4 address manipulations inside namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:52:25 -08:00
Denis V. Lunev	1937504dd1	[NETNS]: Enable all routing manipulation via netlink inside namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:52:04 -08:00
Denis V. Lunev	e5b13cb10d	[NETNS]: Process devinet ioctl in the correct namespace. Add namespace parameter to devinet_ioctl and locate device inside it for state changes. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:51:43 -08:00
Denis V. Lunev	73b3871165	[NETNS]: Register /proc/net/rt_cache for each namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:51:18 -08:00
Denis V. Lunev	a75e936f2f	[NETNS]: Process /proc/net/rt_cache inside a namespace. Show routing cache for a particular namespace only. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:50:55 -08:00
Denis V. Lunev	642d631811	[IPV4]: rt_cache_get_next should take rt_genid into account. In the other case /proc/net/rt_cache will look inconsistent in respect to genid. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:50:33 -08:00
Denis V. Lunev	317805b8f8	[NETNS]: Process ip_rt_redirect in the correct namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:50:06 -08:00
Denis V. Lunev	be162d6288	[NETNS]: Enable inetdev_event notifier. After all these preparations it is time to enable main IPv4 device initialization routine inside namespace. It is safe do this now. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:49:13 -08:00
Denis V. Lunev	2430aa85de	[NETNS]: Disable multicaststing configuration inside non-initial namespace. Do not calls hooks from device notifiers and disallow configuration from ioctl/netlink layer. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:48:49 -08:00
Denis V. Lunev	6fc68624e5	[NETFILTER]: Consolidate masq_inet_event and masq_device_event. They do exactly the same job. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:45:41 -08:00
Wang Chen	770207208e	[IPV4]: Use proc_create() to setup ->proc_fops first Use proc_create() to make sure that ->proc_fops be setup before gluing PDE to main tree. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 14:14:25 -08:00
Herbert Xu	21e43188f2	[IPCOMP]: Disable BH on output when using shared tfm Because we use shared tfm objects in order to conserve memory, (each tfm requires 128K of vmalloc memory), BH needs to be turned off on output as that can occur in process context. Previously this was done implicitly by the xfrm output code. That was lost when it became lockless. So we need to add the BH disabling to IPComp directly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 11:23:17 -08:00
Pavel Emelyanov	b37d428b24	[INET]: Don't create tunnels with '%' in name. Four tunnel drivers (ip_gre, ipip, ip6_tunnel and sit) can receive a pre-defined name for a device from the userspace. Since these drivers call the register_netdevice() (rtnl_lock, is held), which does _not_ generate the device's name, this name may contain a '%' character. Not sure how bad is this to have a device with a '%' in its name, but all the other places either use the register_netdev(), which call the dev_alloc_name(), or explicitly call the dev_alloc_name() before registering, i.e. do not allow for such names. This had to be prior to the commit 34cc7b, but I forgot to number the patches and this one got lost, sorry. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-26 23:51:04 -08:00
Bjorn Mork	148f97292e	[IPV4]: Reset scope when changing address This bug did bite at least one user, who did have to resort to rebooting the system after an "ifconfig eth0 127.0.0.1" typo. Deleting the address and adding a new is a less intrusive workaround. But I still beleive this is a bug that should be fixed. Some way or another. Another possibility would be to remove the scope mangling based on address. This will always be incomplete (are 127/8 the only address space with host scope requirements?) We set the scope to RT_SCOPE_HOST if an IPv4 interface is configured with a loopback address (127/8). The scope is never reset, and will remain set to RT_SCOPE_HOST after changing the address. This patch resets the scope if the address is changed again, to restore normal functionality. Signed-off-by: Bjorn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-26 18:42:41 -08:00
Pavel Emelyanov	34cc7ba639	[IP_TUNNEL]: Don't limit the number of tunnels with generic name explicitly. Use the added dev_alloc_name() call to create tunnel device name, rather than iterate in a hand-made loop with an artificial limit. Thanks Patrick for noticing this. [ The way this works is, when the device is actually registered, the generic code noticed the '%' in the name and invokes dev_alloc_name() to fully resolve the name. -DaveM ] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-23 20:19:20 -08:00
Joonwoo Park	eb1197bc0e	[NETFILTER]: Fix incorrect use of skb_make_writable http://bugzilla.kernel.org/show_bug.cgi?id=9920 The function skb_make_writable returns true or false. Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-19 17:18:47 -08:00
Patrick McHardy	e2b58a67b9	[NETFILTER]: {ip,ip6,nfnetlink}_queue: fix SKB_LINEAR_ASSERT when mangling packet data As reported by Tomas Simonaitis <tomas.simonaitis@gmail.com>, inserting new data in skbs queued over {ip,ip6,nfnetlink}_queue triggers a SKB_LINEAR_ASSERT in skb_put(). Going back through the git history, it seems this bug is present since at least 2.6.12-rc2, probably even since the removal of skb_linearize() for netfilter. Linearize non-linear skbs through skb_copy_expand() when enlarging them. Tested by Thomas, fixes bugzilla #9933. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-19 17:17:52 -08:00
Adrian Bunk	94cb1503c7	ipv4/fib_hash.c: fix NULL dereference Unless I miss a guaranteed relation between between "f" and "new_fa->fa_info" this patch is required for fixing a NULL dereference introduced by commit `a6501e080c` ("[IPV4] FIB_HASH: Reduce memory needs and speedup lookups") and spotted by the Coverity checker. Eric Dumazet says: Hum, you are right, kmem_cache_free() doesnt allow a NULL object, like kfree() does. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-19 16:28:54 -08:00
Kris Katterjohn	9bf1d83e7e	[TCP]: Fix tcp_v4_send_synack() comment Signed-off-by: Kris Katterjohn <katterjohn@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-17 22:29:19 -08:00
Uwe Kleine-Koenig	9c00409a2a	[IPV4]: fix alignment of IP-Config output Make the indented lines aligned in the output (not in the code). Signed-off-by: Uwe Kleine-Koenig <Uwe.Kleine-Koenig@digi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-17 22:28:32 -08:00
David S. Miller	9ff5660746	Revert "[NDISC]: Fix race in generic address resolution" This reverts commit `69cc64d8d9`. It causes recursive locking in IPV6 because unlike other neighbour layer clients, it even needs neighbour cache entries to send neighbour soliciation messages :-( We'll have to find another way to fix this race. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-17 18:39:54 -08:00
Adrian Bunk	324b57619b	[INET]: Unexport inet_listen_wlock This patch removes the no longer used EXPORT_SYMBOL(inet_listen_wlock). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-13 17:40:25 -08:00
Adrian Bunk	74da4d34e4	[INET]: Unexport __inet_hash_connect This patch removes the unused EXPORT_SYMBOL_GPL(__inet_hash_connect). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-13 17:39:34 -08:00
Herbert Xu	b318e0e4ef	[IPSEC]: Fix bogus usage of u64 on input sequence number Al Viro spotted a bogus use of u64 on the input sequence number which is big-endian. This patch fixes it by giving the input sequence number its own member in the xfrm_skb_cb structure. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-12 22:50:35 -08:00

1 2 3 4 5 ...

2462 Commits