linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-16 09:02:00 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	19c7578fb2	udp: add struct net argument to udp_hashfn Every caller already has this one. The new argument is currently unused, but this will be fixed shortly. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-16 17:12:29 -07:00
Pavel Emelyanov	e31634931d	udp: provide a struct net pointer for __udp[46]_lib_mcast_deliver They both calculate the hash chain, but currently do not have a struct net pointer, so pass one there via additional argument, all the more so their callers already have such. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-16 17:12:11 -07:00
Pavel Emelyanov	d6266281f8	udp: introduce a udp_hashfn function Currently the chain to store a UDP socket is calculated with simple (x & (UDP_HTABLE_SIZE - 1)). But taking net into account would make this calculation a bit more complex, so moving it into a function would help. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-16 17:11:50 -07:00
Brian Haley	7d06b2e053	net: change proto destroy method to return void Change struct proto destroy function pointer to return void. Noticed by Al Viro. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-14 17:04:49 -07:00
David S. Miller	4ae127d1b6	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/smc911x.c	2008-06-13 20:52:39 -07:00
Linus Torvalds	51558576ea	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: tcp: Revert 'process defer accept as established' changes. ipv6: Fix duplicate initialization of rawv6_prot.destroy bnx2x: Updating the Maintainer net: Eliminate flush_scheduled_work() calls while RTNL is held. drivers/net/r6040.c: correct bad use of round_jiffies() fec_mpc52xx: MPC52xx_MESSAGES_DEFAULT: 2nd NETIF_MSG_IFDOWN => IFUP ipg: fix receivemode IPG_RM_RECEIVEMULTICAST{,HASH} in ipg_nic_set_multicast_list() netfilter: nf_conntrack: fix ctnetlink related crash in nf_nat_setup_info() netfilter: Make nflog quiet when no one listen in userspace. ipv6: Fail with appropriate error code when setting not-applicable sockopt. ipv6: Check IPV6_MULTICAST_LOOP option value. ipv6: Check the hop limit setting in ancillary data. ipv6 route: Fix route lifetime in netlink message. ipv6 mcast: Check address family of gf_group in getsockopt(MS_FILTER). dccp: Bug in initial acknowledgment number assignment dccp ccid-3: X truncated due to type conversion dccp ccid-3: TFRC reverse-lookup Bug-Fix dccp ccid-2: Bug-Fix - Ack Vectors need to be ignored on request sockets dccp: Fix sparse warnings dccp ccid-3: Bug-Fix - Zero RTT is possible	2008-06-13 07:34:47 -07:00
David S. Miller	ec0a196626	tcp: Revert 'process defer accept as established' changes. This reverts two changesets, `ec3c0982a2` ("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and the follow-on bug fix `9ae27e0adb` ("tcp: Fix slab corruption with ipv6 and tcp6fuzz"). This change causes several problems, first reported by Ingo Molnar as a distcc-over-loopback regression where connections were getting stuck. Ilpo Järvinen first spotted the locking problems. The new function added by this code, tcp_defer_accept_check(), only has the child socket locked, yet it is modifying state of the parent listening socket. Fixing that is non-trivial at best, because we can't simply just grab the parent listening socket lock at this point, because it would create an ABBA deadlock. The normal ordering is parent listening socket --> child socket, but this code path would require the reverse lock ordering. Next is a problem noticed by Vitaliy Gusev, he noted: ---------------------------------------- >--- a/net/ipv4/tcp_timer.c >+++ b/net/ipv4/tcp_timer.c >@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data) > goto death; > } > >+ if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) { >+ tcp_send_active_reset(sk, GFP_ATOMIC); >+ goto death; Here socket sk is not attached to listening socket's request queue. tcp_done() will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should release this sk) as socket is not DEAD. Therefore socket sk will be lost for freeing. ---------------------------------------- Finally, Alexey Kuznetsov argues that there might not even be any real value or advantage to these new semantics even if we fix all of the bugs: ---------------------------------------- Hiding from accept() sockets with only out-of-order data only is the only thing which is impossible with old approach. Is this really so valuable? My opinion: no, this is nothing but a new loophole to consume memory without control. ---------------------------------------- So revert this thing for now. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-12 16:34:35 -07:00
David S. Miller	e6e30add6b	Merge branch 'net-next-2.6-misc-20080612a' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-next	2008-06-11 22:33:59 -07:00
Adrian Bunk	0b04082995	net: remove CVS keywords This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-11 21:00:38 -07:00
YOSHIFUJI Hideaki	9501f97229	tcp md5sig: Let the caller pass appropriate key for tcp_v{4,6}_do_calc_md5_hash(). As we do for other socket/timewait-socket specific parameters, let the callers pass appropriate arguments to tcp_v{4,6}_do_calc_md5_hash(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-06-12 03:46:30 +09:00
YOSHIFUJI Hideaki	8d26d76dd4	tcp md5sig: Share most of hash calcucaltion bits between IPv4 and IPv6. We can share most part of the hash calculation code because the only difference between IPv4 and IPv6 is their pseudo headers. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-06-12 02:38:20 +09:00
YOSHIFUJI Hideaki	076fb72233	tcp md5sig: Remove redundant protocol argument. Protocol is always TCP, so remove useless protocol argument. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-06-12 02:38:19 +09:00
YOSHIFUJI Hideaki	7d5d5525bd	tcp md5sig: Share MD5 Signature option parser between IPv4 and IPv6. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-06-12 02:38:18 +09:00
Linus Torvalds	f7f866eed0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits) net: Fix routing tables with id > 255 for legacy software sky2: Hold RTNL while calling dev_close() s2io iomem annotations atl1: fix suspend regression qeth: start dev queue after tx drop error qeth: Prepare-function to call s390dbf was wrong qeth: reduce number of kernel messages qeth: Use ccw_device_get_id(). qeth: layer 3 Oops in ip event handler virtio: use callback on empty in virtio_net virtio: virtio_net free transmit skbs in a timer virtio: Fix typo in virtio_net_hdr comments virtio_net: Fix skb->csum_start computation ehea: set mac address fix sfc: Recover from RX queue flush failure add missing lance_* exports ixgbe: fix typo forcedeth: msi interrupts ipsec: pfkey should ignore events when no listeners pppoe: Unshare skb before anything else ...	2008-06-11 08:39:51 -07:00
Krzysztof Piotr Oledzki	709772e6e0	net: Fix routing tables with id > 255 for legacy software Most legacy software do not like tables > 255 as rtm_table is u8 so tb_id is sent &0xff and it is possible to mismatch for example table 510 with table 254 (main). This patch introduces RT_TABLE_COMPAT=252 so the code uses it if tb_id > 255. It makes such old applications happy, new ones are still able to use RTA_TABLE to get a proper table id. Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-10 15:44:49 -07:00
Thomas Graf	573bf470e6	ipv4 addr: Send netlink notification for address label changes Makes people happy who try to keep a list of addresses up to date by listening to notifications. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-10 15:40:04 -07:00
Arnaldo Carvalho de Melo	ce4a7d0d48	inet{6}_request_sock: Init ->opt and ->pktopts in the constructor Wei Yongjun noticed that we may call reqsk_free on request sock objects where the opt fields may not be initialized, fix it by introducing inet_reqsk_alloc where we initialize ->opt to NULL and set ->pktopts to NULL in inet6_reqsk_alloc. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-10 12:39:35 -07:00
David S. Miller	65b53e4cc9	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/tg3.c drivers/net/wireless/rt2x00/rt2x00dev.c net/mac80211/ieee80211_i.h	2008-06-10 02:22:26 -07:00
Rami Rosen	e64bda89b8	netfilter: {ip,ip6,nfnetlink}_queue: misc cleanups - No need to perform data_len = 0 in the switch command, since data_len is initialized to 0 in the beginning of the ipq_build_packet_message() method. - {ip,ip6}_queue: We can reach nlmsg_failure only from one place; skb is sure to be NULL when getting there; since skb is NULL, there is no need to check this fact and call kfree_skb(). Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-09 16:00:45 -07:00
Fabian Hugelshofer	718d4ad98e	netfilter: nf_conntrack: properly account terminating packets Currently the last packet of a connection isn't accounted when its causing abnormal termination. Introduces nf_ct_kill_acct() which increments the accounting counters on conntrack kill. The new function was necessary, because there are calls to nf_ct_kill() which don't need accounting: nf_conntrack_proto_tcp.c line ~847: Kills ct and returns NF_REPEAT. We don't want to count twice. nf_conntrack_proto_tcp.c line ~880: Kills ct and returns NF_DROP. I think we don't want to count dropped packets. nf_conntrack_netlink.c line ~824: As far as I can see ctnetlink_del_conntrack() is used to destroy a conntrack on behalf of the user. There is an sk_buff, but I don't think this is an actual packet. Incrementing counters here is therefore not desired. Signed-off-by: Fabian Hugelshofer <hugelshofer2006@gmx.ch> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-09 15:59:40 -07:00
Patrick McHardy	51091764f2	netfilter: nf_conntrack: add nf_ct_kill() Encapsulate the common if (del_timer(&ct->timeout)) ct->timeout.function((unsigned long)ct) sequence in a new function. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-09 15:59:06 -07:00
James Morris	560ee653b6	netfilter: ip_tables: add iptables security table for mandatory access control rules The following patch implements a new "security" table for iptables, so that MAC (SELinux etc.) networking rules can be managed separately to standard DAC rules. This is to help with distro integration of the new secmark-based network controls, per various previous discussions. The need for a separate table arises from the fact that existing tools and usage of iptables will likely clash with centralized MAC policy management. The SECMARK and CONNSECMARK targets will still be valid in the mangle table to prevent breakage of existing users. Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-09 15:57:24 -07:00
Chris Wright	ddb2c43594	asn1: additional sanity checking during BER decoding - Don't trust a length which is greater than the working buffer. An invalid length could cause overflow when calculating buffer size for decoding oid. - An oid length of zero is invalid and allows for an off-by-one error when decoding oid because the first subid actually encodes first 2 subids. - A primitive encoding may not have an indefinite length. Thanks to Wei Wang from McAfee for report. Cc: Steven French <sfrench@us.ibm.com> Cc: stable@kernel.org Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-06-05 14:24:54 -07:00
Octavian Purdila	293ad60401	tcp: Fix for race due to temporary drop of the socket lock in skb_splice_bits. skb_splice_bits temporary drops the socket lock while iterating over the socket queue in order to break a reverse locking condition which happens with sendfile. This, however, opens a window of opportunity for tcp_collapse() to aggregate skbs and thus potentially free the current skb used in skb_splice_bits and tcp_read_sock. This patch fixes the problem by (re-)getting the same "logical skb" after the lock has been temporary dropped. Based on idea and initial patch from Evgeniy Polyakov. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-04 15:45:58 -07:00
Sridhar Samudrala	26af65cbeb	tcp: Increment OUTRSTS in tcp_send_active_reset() TCP "resets sent" counter is not incremented when a TCP Reset is sent via tcp_send_active_reset(). Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-04 15:19:35 -07:00
Denis V. Lunev	22dd485022	raw: Raw socket leak. The program below just leaks the raw kernel socket int main() { int fd = socket(PF_INET, SOCK_RAW, IPPROTO_UDP); struct sockaddr_in addr; memset(&addr, 0, sizeof(addr)); inet_aton("127.0.0.1", &addr.sin_addr); addr.sin_family = AF_INET; addr.sin_port = htons(2048); sendto(fd, "a", 1, MSG_MORE, &addr, sizeof(addr)); return 0; } Corked packet is allocated via sock_wmalloc which holds the owner socket, so one should uncork it and flush all pending data on close. Do this in the same way as in UDP. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-04 15:16:12 -07:00
David S. Miller	aed5a833fb	Merge branch 'net-2.6-misc-20080605a' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-fix	2008-06-04 12:10:21 -07:00
Ilpo Järvinen	a6604471db	tcp: fix skb vs fack_count out-of-sync condition This bug is able to corrupt fackets_out in very rare cases. In order for this to cause corruption: 1) DSACK in the middle of previous SACK block must be generated. 2) In order to take that particular branch, part or all of the DSACKed segment must already be SACKed so that we have that in cache in the first place. 3) The new info must be top enough so that fackets_out will be updated on this iteration. ...then fack_count is updated while skb wasn't, then we walk again that particular segment thus updating fack_count twice for a single skb and finally that value is assigned to fackets_out by tcp_sacktag_one. It is safe to call tcp_sacktag_one just once for a segment (at DSACK), no need to call again for plain SACK. Potential problem of the miscount are limited to premature entry to recovery and to inflated reordering metric (which could even cancel each other out in the most the luckiest scenarios :-)). Both are quite insignificant in worst case too and there exists also code to reset them (fackets_out once sacked_out becomes zero and reordering metric on RTO). This has been reported by a number of people, because it occurred quite rarely, it has been very evasive. Andy Furniss was able to get it to occur couple of times so that a bit more info was collected about the problem using a debug patch, though it still required lot of checking around. Thanks also to others who have tried to help here. This is listed as Bugzilla #10346. The bug was introduced by me in commit `68f8353b48` ([TCP]: Rewrite SACK block processing & sack_recv_cache use), I probably thought back then that there's need to scan that entry twice or didn't dare to make it go through it just once there. Going through twice would have required restoring fack_count after the walk but as noted above, I chose to drop the additional walk step altogether here. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-04 12:07:44 -07:00
Denis V. Lunev	36d926b94a	[IPV6]: inet_sk(sk)->cork.opt leak IPv6 UDP sockets wth IPv4 mapped address use udp_sendmsg to send the data actually. In this case ip_flush_pending_frames should be called instead of ip6_flush_pending_frames. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-06-05 04:02:38 +09:00
YOSHIFUJI Hideaki	baa2bfb8ae	[IPV4] TUNNEL4: Fix incoming packet length check for inter-protocol tunnel. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-06-05 04:02:33 +09:00
Ilpo Järvinen	8aca6cb117	tcp: Fix inconsistency source (CA_Open only when !tcp_left_out(tp)) It is possible that this skip path causes TCP to end up into an invalid state where ca_state was left to CA_Open while some segments already came into sacked_out. If next valid ACK doesn't contain new SACK information TCP fails to enter into tcp_fastretrans_alert(). Thus at least high_seq is set incorrectly to a too high seqno because some new data segments could be sent in between (and also, limited transmit is not being correctly invoked there). Reordering in both directions can easily cause this situation to occur. I guess we would want to use tcp_moderate_cwnd(tp) there as well as it may be possible to use this to trigger oversized burst to network by sending an old ACK with huge amount of SACK info, but I'm a bit unsure about its effects (mainly to FlightSize), so to be on the safe side I just currently fixed it minimally to keep TCP's state consistent (obviously, such nasty ACKs have been possible this far). Though it seems that FlightSize is already underestimated by some amount, so probably on the long term we might want to trigger recovery there too, if appropriate, to make FlightSize calculation to resemble reality at the time when the losses where discovered (but such change scares me too much now and requires some more thinking anyway how to do that as it likely involves some code shuffling). This bug was found by Brian Vowell while running my TCP debug patch to find cause of another TCP issue (fackets_out miscount). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-04 11:34:22 -07:00
Thomas Graf	ab32cd793d	route: Remove unused ifa_anycast field The field was supposed to allow the creation of an anycast route by assigning an anycast address to an address prefix. It was never implemented so this field is unused and serves no purpose. Remove it. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-03 16:37:33 -07:00
Thomas Graf	1f9d11c7c9	route: Mark unused routing attributes as such Also removes an unused policy entry for an attribute which is only used in kernel->user direction. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-03 16:36:27 -07:00
Thomas Graf	51b77cae0d	route: Mark unused route cache flags as such. Also removes an obsolete check for the unused flag RTCF_MASQ. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-03 16:36:01 -07:00
David S. Miller	43154d08d6	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/cpmac.c net/mac80211/mlme.c	2008-05-25 23:26:10 -07:00
Rami Rosen	071f92d059	net: The world is not perfect patch. Unless there will be any objection here, I suggest consider the following patch which simply removes the code for the -DI_WISH_WORLD_WERE_PERFECT in the three methods which use it. The compilation errors we get when using -DI_WISH_WORLD_WERE_PERFECT show that this code was not built and not used for really a long time. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-21 17:47:54 -07:00
Denis Cheng	51f82a2b12	net/ipv4/arp.c: Use common hex_asc helpers Here the local hexbuf is a duplicate of global const char hex_asc from lib/hexdump.c, except the hex letters' cases: const char hexbuf[] = "0123456789ABCDEF"; const char hex_asc[] = "0123456789abcdef"; and here to print HW addresses, the hex cases are not significant. Thanks to Harvey Harrison to introduce the hex_asc_hi/hex_asc_lo helpers. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-21 17:34:32 -07:00
Sridhar Samudrala	7d227cd235	tcp: TCP connection times out if ICMP frag needed is delayed We are seeing an issue with TCP in handling an ICMP frag needed message that is received after net.ipv4.tcp_retries1 retransmits. The default value of retries1 is 3. So if the path mtu changes and ICMP frag needed is lost for the first 3 retransmits or if it gets delayed until 3 retransmits are done, TCP doesn't update MSS correctly and continues to retransmit the orginal message until it timesout after tcp_retries2 retransmits. I am seeing this issue even with the latest 2.6.25.4 kernel. In tcp_retransmit_timer(), when retransmits counter exceeds tcp_retries1 value, the dst cache entry of the socket is reset. At this time, if we receive an ICMP frag needed message, the dst entry gets updated with the new MTU, but the TCP sockets dst_cache entry remains NULL. So the next time when we try to retransmit after the ICMP frag needed is received, tcp_retransmit_skb() gets called. Here the cur_mss value is calculated at the start of the routine with a NULL sk_dst_cache. Instead we should call tcp_current_mss after the rebuild_header that caches the dst entry with the updated mtu. Also the rebuild_header should be called before tcp_fragment so that skb is fragmented if the mss goes down. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-21 16:42:20 -07:00
Pavel Emelyanov	cf3677ae19	ipmr: Use on-device stats instead of private ones. These devices use the private area of appropriate size for statistics. Turning them to use on-device ones make them "privless" and thus - really small wrt kmalloc cache, they are allocated from. Besides, code looks nicer, because of absence of multi-braced type casts and dereferences. [ Fix build failures -DaveM ] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-21 14:43:45 -07:00
Pavel Emelyanov	2f4c02d40c	ipmr: Ipip tunnel uses on-device stats. The ipmr uses ipip tunnels for its purposes and updates the tunnels' stats, but the ipip driver is already switched to use on-device ones. Actually, this is a part of the patch #4 from this set. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-21 14:16:14 -07:00
Pavel Emelyanov	50f59cea07	ipip: Use on-device stats instead of private ones. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-21 14:15:16 -07:00
Pavel Emelyanov	addd68eb6f	ipgre: Use on-device stats instead of private ones. Just switch from tunnel->stat to tunnel->dev->stats. The ip_tunnel->stat member itself will be removed after I fix its other users (very soon). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-21 14:14:22 -07:00
Herbert Xu	1ac06e0306	ipsec: Use the correct ip_local_out function Because the IPsec output function xfrm_output_resume does its own dst_output call it should always call __ip_local_output instead of ip_local_output as the latter may invoke dst_output directly. Otherwise the return values from nf_hook and dst_output may clash as they both use the value 1 but for different purposes. When that clash occurs this can cause a packet to be used after it has been freed which usually leads to a crash. Because the offending value is only returned from dst_output with qdiscs such as HTB, this bug is normally not visible. Thanks to Marco Berizzi for his perseverance in tracking this down. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-20 14:32:14 -07:00
Pavel Emelyanov	7d291ebb83	inet: Register fragmentation some ctls at read-only root. Parts of fragments-related sysctls are read-only, but this is done by cloning all the tables and dropping write-bits from mode. Do the same but with read-only root. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-19 13:53:02 -07:00
Pavel Emelyanov	0a64b4b811	inet: Rename fragmentation sysctl-related functions/variables. The fragments sysctls also contains some, that are to be visible, but read-only in net namespaces. The naming in net/core/sysctl_net_core.c is - tables, that are to be registered in namespaces have a "ns" word in their names. So rename ones in ipv4/ip_fragment.c and ipv6/reassembly.c to fit this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-19 13:51:29 -07:00
Linus Torvalds	6aa5fc4349	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (73 commits) net: Fix typo in net/core/sock.c. ppp: Do not free not yet unregistered net device. netfilter: xt_iprange: module aliases for xt_iprange netfilter: ctnetlink: dump conntrack ID in event messages irda: Fix a misalign access issue. (v2) sctp: Fix use of uninitialized pointer cipso: Relax too much careful cipso hash function. tcp FRTO: work-around inorder receivers tcp FRTO: Fix fallback to conventional recovery New maintainer for Intel ethernet adapters DM9000: Use delayed work to update MII PHY state DM9000: Update and fix driver debugging messages DM9000: Add __devinit and __devexit attributes to probe and remove sky2: fix simple define thinko [netdrvr] sfc: sfc: Add self-test support [netdrvr] sfc: Increment rx_reset when reported as driver event [netdrvr] sfc: Remove unused macro EFX_XAUI_RETRAIN_MAX [netdrvr] sfc: Fix code formatting [netdrvr] sfc: Remove kernel-doc comments for removed members of struct efx_nic [netdrvr] sfc: Remove garbage from comment ...	2008-05-14 10:08:24 -07:00
Pavel Emelyanov	5e0f8923f3	cipso: Relax too much careful cipso hash function. The cipso_v4_cache is allocated to contain CIPSO_V4_CACHE_BUCKETS buckets. The CIPSO_V4_CACHE_BUCKETS = 1 << CIPSO_V4_CACHE_BUCKETBITS, where CIPSO_V4_CACHE_BUCKETBITS = 7. The bucket-selection function for this hash is calculated like this: bkt = hash & (CIPSO_V4_CACHE_BUCKETBITS - 1); ^^^ i.e. picking only 4 buckets of possible 128 :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-13 23:23:55 -07:00
Ilpo Järvinen	79d44516b4	tcp FRTO: work-around inorder receivers If receiver consumes segments successfully only in-order, FRTO fallback to conventional recovery produces RTO loop because FRTO's forward transmissions will always get dropped and need to be resent, yet by default they're not marked as lost (which are the only segments we will retransmit in CA_Loss). Price to pay about this is occassionally unnecessarily retransmitting the forward transmission(s). SACK blocks help a bit to avoid this, so it's mainly a concern for NewReno case though SACK is not fully immune either. This change has a side-effect of fixing SACKFRTO problem where it didn't have snd_nxt of the RTO time available anymore when fallback become necessary (this problem would have only occured when RTO would occur for two or more segments and ECE arrives in step 3; no need to figure out how to fix that unless the TODO item of selective behavior is considered in future). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Damon L. Chesser <damon@damtek.com> Tested-by: Damon L. Chesser <damon@damtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-13 02:54:19 -07:00
Ilpo Järvinen	a1c1f281b8	tcp FRTO: Fix fallback to conventional recovery It seems that commit `009a2e3e4e` ("[TCP] FRTO: Improve interoperability with other undo_marker users") run into another land-mine which caused fallback to conventional recovery to break: 1. Cumulative ACK arrives after FRTO retransmission 2. tcp_try_to_open sees zero retrans_out, clears retrans_stamp which should be kept like in CA_Loss state it would be 3. undo_marker change allowed tcp_packet_delayed to return true because of the cleared retrans_stamp once FRTO is terminated causing LossUndo to occur, which means all loss markings FRTO made are reverted. This means that the conventional recovery basically recovered one loss per RTT, which is not that efficient. It was quite unobvious that the undo_marker change broken something like this, I had a quite long session to track it down because of the non-intuitiviness of the bug (luckily I had a trivial reproducer at hand and I was also able to learn to use kprobes in the process as well :-)). This together with the NewReno+FRTO fix and FRTO in-order workaround this fixes Damon's problems, this and the first mentioned are enough to fix Bugzilla #10063. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Damon L. Chesser <damon@damtek.com> Tested-by: Damon L. Chesser <damon@damtek.com> Tested-by: Sebastian Hyrwall <zibbe@cisko.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-13 02:53:26 -07:00
Johannes Berg	f5184d267c	net: Allow netdevices to specify needed head/tailroom This patch adds needed_headroom/needed_tailroom members to struct net_device and updates many places that allocate sbks to use them. Not all of them can be converted though, and I'm sure I missed some (I mostly grepped for LL_RESERVED_SPACE) Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-12 20:48:31 -07:00

1 2 3 4 5 ...

2544 Commits