linux

Author	SHA1	Message	Date
Alexey Kuznetsov	db44575f6f	[NET]: fix oops after tunnel module unload Tunnel modules used to obtain module refcount each time when some tunnel was created, which meaned that tunnel could be unloaded only after all the tunnels are deleted. Since killing old MOD_*_USE_COUNT macros this protection has gone. It is possible to return it back as module_get/put, but it looks more natural and practically useful to force destruction of all the child tunnels on module unload. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-30 17:46:44 -07:00
Harald Welte	1f494c0e04	[NETFILTER] Inherit masq_index to slave connections masq_index is used for cleanup in case the interface address changes (such as a dialup ppp link with dynamic addreses). Without this patch, slave connections are not evicted in such a case, since they don't inherit masq_index. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-30 17:44:07 -07:00
Baruch Even	d1b04c081e	[NET]: Spelling mistakes threshoulds -> thresholds Just simple spelling mistake fixes. Signed-Off-By: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-30 17:41:59 -07:00
Matt Mackall	5e43db7730	[NET]: Move in_aton from net/ipv4/utils.c to net/core/utils.c Move in_aton to allow netpoll and pktgen to work without the rest of the IPv4 stack. Fix whitespace and add comment for the odd placement. Delete now-empty net/ipv4/utils.c Re-enable netpoll/netconsole without CONFIG_INET Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-27 15:24:42 -07:00
Nick Sillik	7cee432a22	[NETFILTER]: Fix -Wunder error in ip_conntrack_core.c Signed-off-by: Nick Sillik <n.sillik@temple.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-27 14:46:03 -07:00
Hans-Juergen Tappe (SYSGO AG)	eaa1c5d059	[IPV4]: Fix Kconfig syntax error From: "Hans-Juergen Tappe (SYSGO AG)" <hjt@sysgo.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-27 13:00:04 -07:00
Patrick McHardy	74bb421da7	[NETFILTER]: Use correct byteorder in ICMP NAT Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-22 12:51:38 -07:00
Patrick McHardy	21f930e4ab	[NETFILTER]: Wait until all references to ip_conntrack_untracked are dropped on unload Fixes a crash when unloading ip_conntrack. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-22 12:51:03 -07:00
Patrick McHardy	d04b4f8c1c	[NETFILTER]: Fix potential memory corruption in NAT code (aka memory NAT) The portptr pointing to the port in the conntrack tuple is declared static, which could result in memory corruption when two packets of the same protocol are NATed at the same time and one conntrack goes away. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-22 12:50:29 -07:00
Rusty Russell	4acdbdbe50	[NETFILTER]: ip_conntrack_expect_related must not free expectation If a connection tracking helper tells us to expect a connection, and we're already expecting that connection, we simply free the one they gave us and return success. The problem is that NAT helpers (eg. FTP) have to allocate the expectation first (to see what port is available) then rewrite the packet. If that rewrite fails, they try to remove the expectation, but it was freed in ip_conntrack_expect_related. This is one example of a larger problem: having registered the expectation, the pointer is no longer ours to use. Reference counting is needed for ctnetlink anyway, so introduce it now. To have a single "put" path, we need to grab the reference to the connection on creation, rather than open-coding it in the caller. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-21 13:14:46 -07:00
Patrick McHardy	0303770deb	[NET]: Make ipip/ip6_tunnel independant of XFRM Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-19 14:03:34 -07:00
Stephen Hemminger	c877efb207	[IPV4]: Fix up lots of little whitespace indentation stuff in fib_trie. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-19 14:01:51 -07:00
Patrick McHardy	abaacad9bc	[IPV4]: Don't select XFRM for ip_gre Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-19 13:59:17 -07:00
Adrian Bunk	6876f95f20	[IPV4]: fix IP_FIB_HASH kconfig warning This patch fixes the following kconfig warning: net/ipv4/Kconfig:92:warning: defaults for choice values not supported Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-18 13:55:19 -07:00
Phil Oester	84531c24f2	[NETFILTER]: Revert nf_reset change Revert the nf_reset change that caused so much trouble, drop conntrack references manually before packets are queued to packet sockets. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-12 11:57:52 -07:00
Sam Ravnborg	6a2e9b738c	[NET]: move config options out to individual protocols Move the protocol specific config options out to the specific protocols. With this change net/Kconfig now starts to become readable and serve as a good basis for further re-structuring. The menu structure is left almost intact, except that indention is fixed in most cases. Most visible are the INET changes where several "depends on INET" are replaced with a single ifdef INET / endif pair. Several new files were created to accomplish this change - they are small but serve the purpose that config options are now distributed out where they belongs. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-11 21:13:56 -07:00
Olaf Kirch	0b7f22aab4	[IPV4]: Prevent oops when printing martian source In some cases, we may be generating packets with a source address that qualifies as martian. This can happen when we're in the middle of setting up the network, and netfilter decides to reject a packet with an RST. The IPv4 routing code would try to print a warning and oops, because locally generated packets do not have a valid skb->mac.raw pointer at this point. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-11 21:01:42 -07:00
Julian Anastasov	af9debd461	[IPVS]: Add and reorder bh locks after moving to keventd. An addition to the last ipvs changes that move update_defense_level/si_meminfo to keventd: - ip_vs_random_dropentry now runs in process context and should use _bh locks to protect from softirqs - update_defense_level still needs _bh locks after si_meminfo is called, for the same purpose Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-11 20:59:57 -07:00
David L Stevens	84b42baef7	[IPV4]: fix IPv4 leave-group group matching This patch fixes the multicast group matching for IP_DROP_MEMBERSHIP, similar to the IP_ADD_MEMBERSHIP fix in a prior patch. Groups are identifiedby <group address,interface> and including the interface address in the match will fail if a leave-group is done by address when the join was done by index, or if different addresses on the same interface are used in the join and leave. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 17:48:38 -07:00
David L Stevens	9951f036fe	[IPV4]: (INCLUDE,empty)/leave-group equivalence for full-state MSF APIs & errno fix 1) Adds (INCLUDE, empty)/leave-group equivalence to the full-state multicast source filter APIs (IPv4 and IPv6) 2) Fixes an incorrect errno in the IPv6 leave-group (ENOENT should be EADDRNOTAVAIL) Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 17:47:28 -07:00
David L Stevens	917f2f105e	[IPV4]: multicast API "join" issues 1) In the full-state API when imsf_numsrc == 0 errno should be "0", but returns EADDRNOTAVAIL 2) An illegal filter mode change errno should be EINVAL, but returns EADDRNOTAVAIL 3) Trying to do an any-source option without IP_ADD_MEMBERSHIP errno should be EINVAL, but returns EADDRNOTAVAIL 4) Adds comments for the less obvious error return values Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 17:45:16 -07:00
David L Stevens	8cdaaa15da	[IPV4]: multicast API "join" issues 1) Changes IP_ADD_SOURCE_MEMBERSHIP and MCAST_JOIN_SOURCE_GROUP to ignore EADDRINUSE errors on a "courtesy join" -- prior membership or not is ok for these. 2) Adds "leave group" equivalence of (INCLUDE, empty) filters in the delta-based API. Without this, mixing delta-based API calls that end in an (INCLUDE, empty) filter would not allow a subsequent regular IP_ADD_MEMBERSHIP. It also frees socket buffer memory that isn't needed for both the multicast group record and source filter. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 17:39:23 -07:00
David L Stevens	ca9b907d14	[IPV4]: multicast API "join" issues This patch corrects a few problems with the IP_ADD_MEMBERSHIP socket option: 1) The existing code makes an attempt at reference counting joins when using the ip_mreqn/imr_ifindex interface. Joining the same group on the same socket is an error, whatever the API. This leads to unexpected results when mixing ip_mreqn by index with ip_mreqn by address, ip_mreq, or other API's. For example, ip_mreq followed by ip_mreqn of the same group will "work" while the same two reversed will not. Fixed to always return EADDRINUSE on a duplicate join and removed the (now unused) reference count in ip_mc_socklist. 2) The group-search list in ip_mc_join_group() is comparing a full ip_mreqn structure and all of it must match for it to find the group. This doesn't correctly match a group that was joined with ip_mreq or ip_mreqn with an address (with or without an index). It also doesn't match groups that are joined by different addresses on the same interface. All of these are the same multicast group, which is identified by group address and interface index. Fixed the check to correctly match groups so we don't get duplicate group entries on the ip_mc_socklist. 3) The old code allocates a multicast address before searching for duplicates requiring it to free in various error cases. This patch moves the allocate until after the search and igmp_max_memberships check, so never a need to allocate, then free an entry. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 17:38:07 -07:00
Alexey Kuznetsov	4c866aa798	[IPV4]: Apply sysctl_icmp_echo_ignore_broadcasts to ICMP_TIMESTAMP as well. This was the full intention of the original code. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 17:34:46 -07:00
Victor Fusco	86a76caf87	[NET]: Fix sparse warnings From: Victor Fusco <victor@cetuc.puc-rio.br> Fix the sparse warning "implicit cast to nocast type" Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 14:57:47 -07:00
David S. Miller	b03efcfb21	[NET]: Transform skb_queue_len() binary tests into skb_queue_empty() This is part of the grand scheme to eliminate the qlen member of skb_queue_head, and subsequently remove the 'list' member of sk_buff. Most users of skb_queue_len() want to know if the queue is empty or not, and that's trivially done with skb_queue_empty() which doesn't use the skb_queue_head->qlen member and instead uses the queue list emptyness as the test. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 14:57:23 -07:00
David S. Miller	908a75c17a	[TCP]: Never TSO defer under periods of congestion. Congestion window recover after loss depends upon the fact that if we have a full MSS sized frame at the head of the send queue, we will send it. TSO deferral can defeat the ACK clocking necessary to exit cleanly from recovery. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:43:58 -07:00
David S. Miller	c1b4a7e695	[TCP]: Move to new TSO segmenting scheme. Make TSO segment transmit size decisions at send time not earlier. The basic scheme is that we try to build as large a TSO frame as possible when pulling in the user data, but the size of the TSO frame output to the card is determined at transmit time. This is guided by tp->xmit_size_goal. It is always set to a multiple of MSS and tells sendmsg/sendpage how large an SKB to try and build. Later, tcp_write_xmit() and tcp_push_one() chop up the packet if necessary and conditions warrant. These routines can also decide to "defer" in order to wait for more ACKs to arrive and thus allow larger TSO frames to be emitted. A general observation is that TSO elongates the pipe, thus requiring a larger congestion window and larger buffering especially at the sender side. Therefore, it is important that applications 1) get a large enough socket send buffer (this is accomplished by our dynamic send buffer expansion code) 2) do large enough writes. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:24:38 -07:00
David S. Miller	0d9901df62	[TCP]: Break out send buffer expansion test. This makes it easier to understand, and allows easier tweaking of the heuristic later on. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:21:10 -07:00
David S. Miller	cb83199a29	[TCP]: Do not call tcp_tso_acked() if no work to do. In tcp_clean_rtx_queue(), if the TSO packet is not even partially acked, do not waste time calling tcp_tso_acked(). Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:20:55 -07:00
David S. Miller	a56476962e	[TCP]: Kill bogus comment above tcp_tso_acked(). Everything stated there is out of data. tcp_trim_skb() does adjust the available socket send buffer space and skb->truesize now. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:20:41 -07:00
David S. Miller	b4e26f5ea0	[TCP]: Fix send-side cpu utiliziation regression. Only put user data purely to pages when doing TSO. The extra page allocations cause two problems: 1) Add the overhead of the page allocations themselves. 2) Make us do small user copies when we get to the end of the TCP socket cache page. It is still beneficial to purely use pages for TSO, so we will do it for that case. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:20:27 -07:00
David S. Miller	aa93466bdf	[TCP]: Eliminate redundant computations in tcp_write_xmit(). tcp_snd_test() is run for every packet output by a single call to tcp_write_xmit(), but this is not necessary. For one, the congestion window space needs to only be calculated one time, then used throughout the duration of the loop. This cleanup also makes experimenting with different TSO packetization schemes much easier. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:20:09 -07:00
David S. Miller	7f4dd0a943	[TCP]: Break out tcp_snd_test() into it's constituent parts. tcp_snd_test() does several different things, use inline functions to express this more clearly. 1) It initializes the TSO count of SKB, if necessary. 2) It performs the Nagle test. 3) It makes sure the congestion window is adhered to. 4) It makes sure SKB fits into the send window. This cleanup also sets things up so that things like the available packets in the congestion window does not need to be calculated multiple times by packet sending loops such as tcp_write_xmit(). Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:19:54 -07:00
David S. Miller	55c97f3e99	[TCP]: Fix __tcp_push_pending_frames() 'nonagle' handling. 'nonagle' should be passed to the tcp_snd_test() function as 'TCP_NAGLE_PUSH' if we are checking an SKB not at the tail of the write_queue. This is because Nagle does not apply to such frames since we cannot possibly tack more data onto them. However, while doing this __tcp_push_pending_frames() makes all of the packets in the write_queue use this modified 'nonagle' value. Fix the bug and simplify this function by just calling tcp_write_xmit() directly if sk_send_head is non-NULL. As a result, we can now make tcp_data_snd_check() just call tcp_push_pending_frames() instead of the specialized __tcp_data_snd_check(). Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:19:38 -07:00
David S. Miller	a2e2a59c93	[TCP]: Fix redundant calculations of tcp_current_mss() tcp_write_xmit() uses tcp_current_mss(), but some of it's callers, namely __tcp_push_pending_frames(), already has this value available already. While we're here, fix the "cur_mss" argument to be "unsigned int" instead of plain "unsigned". Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:19:23 -07:00
David S. Miller	92df7b518d	[TCP]: tcp_write_xmit() tabbing cleanup Put the main basic block of work at the top-level of tabbing, and mark the TCP_CLOSE test with unlikely(). Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:19:06 -07:00
David S. Miller	a762a98007	[TCP]: Kill extra cwnd validate in __tcp_push_pending_frames(). The tcp_cwnd_validate() function should only be invoked if we actually send some frames, yet __tcp_push_pending_frames() will always invoke it. tcp_write_xmit() does the call for us, so the call here can simply be removed. Also, tcp_write_xmit() can be marked static. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:18:51 -07:00
David S. Miller	f44b527177	[TCP]: Add missing skb_header_release() call to tcp_fragment(). When we add any new packet to the TCP socket write queue, we must call skb_header_release() on it in order for the TSO sharing checks in the drivers to work. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:18:34 -07:00
David S. Miller	84d3e7b957	[TCP]: Move __tcp_data_snd_check into tcp_output.c It reimplements portions of tcp_snd_check(), so it we move it to tcp_output.c we can consolidate it's logic much easier in a later change. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:18:18 -07:00
David S. Miller	f6302d1d78	[TCP]: Move send test logic out of net/tcp.h This just moves the code into tcp_output.c, no code logic changes are made by this patch. Using this as a baseline, we can begin to untangle the mess of comparisons for the Nagle test et al. We will also be able to reduce all of the redundant computation that occurs when outputting data packets. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:18:03 -07:00
David S. Miller	fc6415bcb0	[TCP]: Fix quick-ack decrementing with TSO. On each packet output, we call tcp_dec_quickack_mode() if the ACK flag is set. It drops tp->ack.quick until it hits zero, at which time we deflate the ATO value. When doing TSO, we are emitting multiple packets with ACK set, so we should decrement tp->ack.quick that many segments. Note that, unlike this case, tcp_enter_cwr() should not take the tcp_skb_pcount(skb) into consideration. That function, one time, readjusts tp->snd_cwnd and moves into TCP_CA_CWR state. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:17:45 -07:00
David S. Miller	c65f7f00c5	[TCP]: Simplify SKB data portion allocation with NETIF_F_SG. The ideal and most optimal layout for an SKB when doing scatter-gather is to put all the headers at skb->data, and all the user data in the page array. This makes SKB splitting and combining extremely simple, especially before a packet goes onto the wire the first time. So, when sk_stream_alloc_pskb() is given a zero size, make sure there is no skb_tailroom(). This is achieved by applying SKB_DATA_ALIGN() to the header length used here. Next, make select_size() in TCP output segmentation use a length of zero when NETIF_F_SG is true on the outgoing interface. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:17:25 -07:00
Robert Olsson	2f36895aa7	[IPV4]: More broken memory allocation fixes for fib_trie Below a patch to preallocate memory when doing resize of trie (inflate halve) If preallocations fails it just skips the resize of this tnode for this time. The oops we got when killing bgpd (with full routing) is now gone. Patrick memory patch is also used. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:02:40 -07:00
Eric Dumazet	bb1d23b026	[IPV4]: Bug fix in rt_check_expire() - rt_check_expire() fixes (an overflow occured if size of the hash was >= 65536) reminder of the bugfix: The rt_check_expire() has a serious problem on machines with large route caches, and a standard HZ value of 1000. With default values, ie ip_rt_gc_interval = 60*HZ = 60000 ; the loop count : for (t = ip_rt_gc_interval << rt_hash_log; t >= 0; overflows (t is a 31 bit value) as soon rt_hash_log is >= 16 (65536 slots in route cache hash table). In this case, rt_check_expire() does nothing at all Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:00:32 -07:00
Eric Dumazet	424c4b70cc	[IPV4]: Use the fancy alloc_large_system_hash() function for route hash table - rt hash table allocated using alloc_large_system_hash() function. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:58:19 -07:00
Eric Dumazet	22c047ccbc	[NET]: Hashed spinlocks in net/ipv4/route.c - Locking abstraction - Spinlocks moved out of rt hash table : Less memory (50%) used by rt hash table. it's a win even on UP. - Sizing of spinlocks table depends on NR_CPUS Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:55:24 -07:00
Patrick McHardy	f0e36f8cee	[IPV4]: Handle large allocations in fib_trie Inflating a node a couple of times makes it exceed the 128k kmalloc limit. Use __get_free_pages for allocations > PAGE_SIZE, as in fib_hash. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Robert Olsson <Robert.Olsson@data.slu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:44:55 -07:00
Herbert Xu	30e224d76f	[IPV4]: Fix crash in ip_rcv while booting related to netconsole Makes IPv4 ip_rcv registration happen last in af_inet. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:40:10 -07:00
Thomas Graf	e176fe8954	[NET]: Remove unused security member in sk_buff Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:12:44 -07:00

1 2 3

138 Commits