Commit Graph

1926 Commits

Author SHA1 Message Date
Cosmin Ratiu
a8fdf2b331 ipv6: Fix tcp_v6_send_response(): it didn't set skb transport header
Here is a patch which fixes an issue observed when using TCP over IPv6
and AH from IPsec.

When a connection gets closed the 4-way method and the last ACK from
the server gets dropped, the subsequent FINs from the client do not
get ACKed because tcp_v6_send_response does not set the transport
header pointer. This causes ah6_output to try to allocate a lot of
memory, which typically fails, so the ACKs never make it out of the
stack.

I have reproduced the problem on kernel 2.6.7, but after looking at
the latest kernel it seems the problem is still there.

Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03 20:44:38 -07:00
Wu Fengguang
aa1330766c tcp: replace hard coded GFP_KERNEL with sk_allocation
This fixed a lockdep warning which appeared when doing stress
memory tests over NFS:

	inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.

	page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock

	mount_root => nfs_root_data => tcp_close => lock sk_lock =>
			tcp_send_fin => alloc_skb_fclone => page reclaim

David raised a concern that if the allocation fails in tcp_send_fin(), and it's
GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
for the allocation to succeed.

But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
loop endlessly under memory pressure.

CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
CC: David S. Miller <davem@davemloft.net>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 23:45:45 -07:00
Eric Dumazet
6ce9e7b5fe ip: Report qdisc packet drops
Christoph Lameter pointed out that packet drops at qdisc level where not
accounted in SNMP counters. Only if application sets IP_RECVERR, drops
are reported to user (-ENOBUFS errors) and SNMP counters updated.

IP_RECVERR is used to enable extended reliable error message passing,
but these are not needed to update system wide SNMP stats.

This patch changes things a bit to allow SNMP counters to be updated,
regardless of IP_RECVERR being set or not on the socket.

Example after an UDP tx flood
# netstat -s 
...
IP:
    1487048 outgoing packets dropped
...
Udp:
...
    SndbufErrors: 1487048


send() syscalls, do however still return an OK status, to not
break applications.

Note : send() manual page explicitly says for -ENOBUFS error :

 "The output queue for a network interface was full.
  This generally indicates that the interface has stopped sending,
  but may be caused by transient congestion.
  (Normally, this does not occur in Linux. Packets are just silently
  dropped when a device queue overflows.) "

This is not true for IP_RECVERR enabled sockets : a send() syscall
that hit a qdisc drop returns an ENOBUFS error.

Many thanks to Christoph, David, and last but not least, Alexey !

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 18:05:33 -07:00
Stephen Hemminger
5ca1b998d3 net: file_operations should be const
All instances of file_operations should be const.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 01:03:53 -07:00
Stephen Hemminger
3b401a81c0 inet: inet_connection_sock_af_ops const
The function block inet_connect_sock_af_ops contains no data
make it constant.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 01:03:49 -07:00
Stephen Hemminger
b2e4b3debc tcp: MD5 operations should be const
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 01:03:43 -07:00
Stephen Hemminger
98147d527a net: seq_operations should be const
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 01:03:39 -07:00
David S. Miller
6cdee2f96a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/yellowfin.c
2009-09-02 00:32:56 -07:00
Eric Dumazet
0625491493 ipv6: ip6_push_pending_frames() should increment IPSTATS_MIB_OUTDISCARDS
qdisc drops should be notified to IP_RECVERR enabled sockets, as done in IPV4.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 18:37:16 -07:00
Stephen Hemminger
89d69d2b75 net: make neigh_ops constant
These tables are never modified at runtime. Move to read-only
section.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:57 -07:00
Alexey Dobriyan
86393e52c3 netns: embed ip6_dst_ops directly
struct net::ipv6.ip6_dst_ops is separatedly dynamically allocated,
but there is no fundamental reason for it. Embed it directly into
struct netns_ipv6.

For that:
* move struct dst_ops into separate header to fix circular dependencies
	I honestly tried not to, it's pretty impossible to do other way
* drop dynamical allocation, allocate together with netns

For a change, remove struct dst_ops::dst_net, it's deducible
by using container_of() given dst_ops pointer.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:31 -07:00
Stephen Hemminger
6fef4c0c8e netdev: convert pseudo-devices to netdev_tx_t
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 01:13:07 -07:00
David Ward
31ce8c71a3 ipv6: Update Neighbor Cache when IPv6 RA is received on a router
When processing a received IPv6 Router Advertisement, the kernel
creates or updates an IPv6 Neighbor Cache entry for the sender --
but presently this does not occur if IPv6 forwarding is enabled
(net.ipv6.conf.*.forwarding = 1), or if IPv6 Router Advertisements
are not accepted (net.ipv6.conf.*.accept_ra = 0), because in these
cases processing of the Router Advertisement has already halted.

This patch allows the Neighbor Cache to be updated in these cases,
while still avoiding any modification to routes or link parameters.

This continues to satisfy RFC 4861, since any entry created in the
Neighbor Cache as the result of a received Router Advertisement is
still placed in the STALE state.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-29 00:04:09 -07:00
Sascha Hlusiak
8945a808f7 sit: allow ip fragmentation when using nopmtudisc to fix package loss
if tunnel parameters have frag_off set to IP_DF, pmtudisc on the ipv4 link
will be performed by deriving the mtu from the ipv4 link and setting the
DF-Flag of the encapsulating IPv4 Header. If fragmentation is needed on the
way, the IPv4 pmtu gets adjusted, the ipv6 package will be resent eventually,
using the new and lower mtu and everyone is happy.

If the frag_off parameter is unset, the mtu for the tunnel will be derived
from the tunnel device or the ipv6 pmtu, which might be higher than the ipv4
pmtu. In that case we must allow the fragmentation of the IPv4 packet because
the IPv6 mtu wouldn't 'learn' from the adjusted IPv4 pmtu, resulting in
frequent icmp_frag_needed and package loss on the IPv6 layer.

This patch allows fragmentation when tunnel was created with parameter
nopmtudisc, like in ipip/gre tunnels.

Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:53:53 -07:00
Bruno Prémont
ca6982b858 ipv6: Fix commit 63d9950b08 (ipv6: Make v4-mapped bindings consistent with IPv4)
Commit 63d9950b08
  (ipv6: Make v4-mapped bindings consistent with IPv4)
changes behavior of inet6_bind() for v4-mapped addresses so it should
behave the same way as inet_bind().

During this change setting of err to -EADDRNOTAVAIL got lost:

af_inet.c:469 inet_bind()
	err = -EADDRNOTAVAIL;
	if (!sysctl_ip_nonlocal_bind &&
	    !(inet->freebind || inet->transparent) &&
	    addr->sin_addr.s_addr != htonl(INADDR_ANY) &&
	    chk_addr_ret != RTN_LOCAL &&
	    chk_addr_ret != RTN_MULTICAST &&
	    chk_addr_ret != RTN_BROADCAST)
		goto out;


af_inet6.c:463 inet6_bind()
	if (addr_type == IPV6_ADDR_MAPPED) {
		int chk_addr_ret;

		/* Binding to v4-mapped address on a v6-only socket                         
		 * makes no sense                                                           
		 */
		if (np->ipv6only) {
			err = -EINVAL;
			goto out; 
		}

		/* Reproduce AF_INET checks to make the bindings consitant */               
		v4addr = addr->sin6_addr.s6_addr32[3];                                      
		chk_addr_ret = inet_addr_type(net, v4addr);                                 
		if (!sysctl_ip_nonlocal_bind &&                                             
		    !(inet->freebind || inet->transparent) &&                               
		    v4addr != htonl(INADDR_ANY) &&
		    chk_addr_ret != RTN_LOCAL &&                                            
		    chk_addr_ret != RTN_MULTICAST &&                                        
		    chk_addr_ret != RTN_BROADCAST)
			goto out;
	} else {


Signed-off-by Bruno Prémont <bonbons@linux-vserver.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-23 19:06:28 -07:00
Gerrit Renker
26ced1e4aa inet6: Set default traffic class
This patch addresses:
 * assigning -1 to np->tclass as it is currently done is not very meaningful,
   since it turns into 0xff;
 * RFC 3542, 6.5 allows -1 for clearing the sticky IPV6_TCLASS option
   and specifies -1 to mean "use kernel default":
   - RFC 2460, 7. requires that the default traffic class must be zero for
     all 8 bits,
   - this is consistent with RFC 2474, 4.1 which recommends a default PHB of 0,
     in combination with a value of the ECN field of "non-ECT" (RFC 3168, 5.).

This patch changes the meaning of -1 from assigning 255 to mean the RFC 2460
default, which at the same time allows to satisfy clearing the sticky TCLASS
option as per RFC 3542, 6.5.

(When passing -1 as ancillary data, the fallback remains np->tclass, which
 has either been set via socket options, or contains the default value.)

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-13 16:43:32 -07:00
Gerrit Renker
e651f03afe inet6: Conversion from u8 to int
This replaces assignments of the type "int on LHS" = "u8 on RHS" with
simpler code. The LHS can express all of the unsigned right hand side
values, hence the assigned value can not be negative.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-13 16:43:31 -07:00
Jens Rosenboom
a6fa328665 ipv6: Log the explicit address that triggered DAD failure
If an interface has multiple addresses, the current message for DAD
failure isn't really helpful, so this patch adds the address itself to
the printk.

Signed-off-by: Jens Rosenboom <jens@mcbone.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-13 16:26:10 -07:00
Jan Engelhardt
36cbd3dcc1 net: mark read-only arrays as const
String literals are constant, and usually, we can also tag the array
of pointers const too, moving it to the .rodata section.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-05 10:42:58 -07:00
David S. Miller
db71789c01 xfrm6: Fix xfrm6_policy.c build when SYSCTL disabled.
Same as how Randy Dunlap fixed the ipv4 side of things.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-04 20:32:16 -07:00
Gerrit Renker
81e4321388 inet6: functions shadow global variable
This renames away a variable clash:
 * ipv6_table[] is declared as a static global table;
 * ipv6_sysctl_net_init() uses ipv6_table to refer/destroy dynamic memory;
 * ipv6_sysctl_net_exit() also uses ipv6_table for the same purpose;
 * both the two last functions call kfree() on ipv6_table.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02 12:54:30 -07:00
Neil Horman
a33bc5c151 xfrm: select sane defaults for xfrm[4|6] gc_thresh
Choose saner defaults for xfrm[4|6] gc_thresh values on init

Currently, the xfrm[4|6] code has hard-coded initial gc_thresh values
(set to 1024).  Given that the ipv4 and ipv6 routing caches are sized
dynamically at boot time, the static selections can be non-sensical.
This patch dynamically selects an appropriate gc threshold based on
the corresponding main routing table size, using the assumption that
we should in the worst case be able to handle as many connections as
the routing table can.

For ipv4, the maximum route cache size is 16 * the number of hash
buckets in the route cache.  Given that xfrm4 starts garbage
collection at the gc_thresh and prevents new allocations at 2 *
gc_thresh, we set gc_thresh to half the maximum route cache size.

For ipv6, its a bit trickier.  there is no maximum route cache size,
but the ipv6 dst_ops gc_thresh is statically set to 1024.  It seems
sane to select a simmilar gc_thresh for the xfrm6 code that is half
the number of hash buckets in the v6 route cache times 16 (like the v4
code does).

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-30 18:52:15 -07:00
Neil Horman
a44a4a006b xfrm: export xfrm garbage collector thresholds via sysctl
Export garbage collector thresholds for xfrm[4|6]_dst_ops

Had a problem reported to me recently in which a high volume of ipsec
connections on a system began reporting ENOBUFS for new connections
eventually.

It seemed that after about 2000 connections we started being unable to
create more.  A quick look revealed that the xfrm code used a dst_ops
structure that limited the gc_thresh value to 1024, and always
dropped route cache entries after 2x the gc_thresh.

It seems the most direct solution is to export the gc_thresh values in
the xfrm[4|6] dst_ops as sysctls, like the main routing table does, so
that higher volumes of connections can be supported.  This patch has
been tested and allows the reporter to increase their ipsec connection
volume successfully.

Reported-by: Joe Nall <joe@nall.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

ipv4/xfrm4_policy.c |   18 ++++++++++++++++++
ipv6/xfrm6_policy.c |   18 ++++++++++++++++++
2 files changed, 36 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-27 11:35:32 -07:00
David S. Miller
74d154189d Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/iwmc3200wifi/netdev.c
	net/wireless/scan.c
2009-07-23 19:03:51 -07:00
Gerrit Renker
3c2b8d180a mcastv6: Local variable shadows function argument
The local variable 'idev' shadows the function argument 'idev' to
ip6_mc_add_src(). Fixed by removing the local declaration, as pmc->idev
should be identical with 'idev' passed as argument.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-21 11:13:25 -07:00
John Dykstra
e547bc1ecc tcp: Use correct peer adr when copying MD5 keys
When the TCP connection handshake completes on the passive
side, a variety of state must be set up in the "child" sock,
including the key if MD5 authentication is being used.  Fix TCP
for both address families to label the key with the peer's
destination address, rather than the address from the listening
sock, which is usually the wildcard.

Reported-by:   Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-20 07:49:08 -07:00
John Dykstra
e3afe7b75e tcp: Fix MD5 signature checking on IPv4 mapped sockets
Fix MD5 signature checking so that an IPv4 active open
to an IPv6 socket can succeed.  In particular, use the
correct address family's signature generation function
for the SYN/ACK.

Reported-by:   Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-20 07:49:07 -07:00
David S. Miller
da8120355e Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/orinoco/main.c
2009-07-16 20:21:24 -07:00
Sridhar Samudrala
ba73542585 udpv6: Handle large incoming UDP/IPv6 packets and support software UFO
- validate and forward GSO UDP/IPv6 packets from untrusted sources.
- do software UFO if the outgoing device doesn't support UFO.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12 14:29:29 -07:00
Sridhar Samudrala
7ea2f2c5a6 udpv6: Remove unused skb argument of ipv6_select_ident()
- move ipv6_select_ident() inline function to ipv6.h and remove the unused
  skb argument

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12 14:29:28 -07:00
Sridhar Samudrala
c31d532690 udpv6: Fix gso_size setting in ip6_ufo_append_data
- fix gso_size setting for ipv6 fragment to be a multiple of 8 bytes.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12 14:29:26 -07:00
Sridhar Samudrala
493c6be3fe udpv6: Fix HW checksum support for outgoing UFO packets
- add HW checksum support for outgoing large UDP/IPv6 packets destined for
  a UFO enabled device.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12 14:29:24 -07:00
Sascha Hlusiak
f2ba025b20 sit: fix regression: do not release skb->dst before xmit
The sit module makes use of skb->dst in it's xmit function, so since
93f154b594 ("net: release dst entry in dev_hard_start_xmit()") sit
tunnels are broken, because the flag IFF_XMIT_DST_RELEASE is not
unset.

This patch unsets that flag for sit devices to fix this
regression.

Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-11 20:30:52 -07:00
Eric Dumazet
e51a67a9c8 net: ip_push_pending_frames() fix
After commit 2b85a34e91
(net: No more expensive sock_hold()/sock_put() on each tx)
we do not take any more references on sk->sk_refcnt on outgoing packets.

I forgot to delete two __sock_put() from ip_push_pending_frames()
and ip6_push_pending_frames().

Reported-by: Emil S Tantilov <emils.tantilov@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Emil S Tantilov <emils.tantilov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-11 20:26:21 -07:00
Mark Smith
5c91face51 ipv6: correct return on ipv6_rcv() packet drop
The routine ipv6_rcv() uses magic number 0 for a return when it drops a
packet. This corresponds to NET_RX_SUCCESS, which is obviously
incorrect. Correct this by using NET_RX_DROP instead.

ps. It isn't exactly clear who the IPv6 maintainers are, apologies if
I've missed any.

Signed-off-by: Mark Smith <markzzzsmith@yahoo.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-06 18:07:55 -07:00
Patrick McHardy
6ed106549d net: use NETDEV_TX_OK instead of 0 in ndo_start_xmit() functions
This patch is the result of an automatic spatch transformation to convert
all ndo_start_xmit() return values of 0 to NETDEV_TX_OK.

Some occurences are missed by the automatic conversion, those will be
handled in a seperate patch.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-05 19:16:04 -07:00
Brian Haley
a1ed05263b IPv6: preferred lifetime of address not getting updated
There's a bug in addrconf_prefix_rcv() where it won't update the
preferred lifetime of an IPv6 address if the current valid lifetime
of the address is less than 2 hours (the minimum value in the RA).

For example, If I send a router advertisement with a prefix that
has valid lifetime = preferred lifetime = 2 hours we'll build
this address:

3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
    inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic
       valid_lft 7175sec preferred_lft 7175sec

If I then send the same prefix with valid lifetime = preferred
lifetime = 0 it will be ignored since the minimum valid lifetime
is 2 hours:

3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
    inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic
       valid_lft 7161sec preferred_lft 7161sec

But according to RFC 4862 we should always reset the preferred lifetime
even if the valid lifetime is invalid, which would cause the address
to immediately get deprecated.  So with this patch we'd see this:

5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
    inet6 2001:1890:1109:a20:21f:29ff:fe5a:ef04/64 scope global deprecated dynamic
       valid_lft 7163sec preferred_lft 0sec

The comment winds-up being 5x the size of the code to fix the problem.

Update the preferred lifetime of IPv6 addresses derived from a prefix
info option in a router advertisement even if the valid lifetime in
the option is invalid, as specified in RFC 4862 Section 5.5.3e.  Fixes
an issue where an address will not immediately become deprecated.
Reported by Jens Rosenboom.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-03 19:10:13 -07:00
Wei Yongjun
59cae0092e xfrm6: fix the proto and ports decode of sctp protocol
The SCTP pushed the skb above the sctp chunk header, so the
check of pskb_may_pull(skb, nh + offset + 1 - skb->data) in
_decode_session6() will never return 0 and the ports decode
of sctp will always fail. (nh + offset + 1 - skb->data < 0)

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-03 19:10:10 -07:00
Herbert Xu
71f9dacd2e inet: Call skb_orphan before tproxy activates
As transparent proxying looks up the socket early and assigns
it to the skb for later processing, we must drop any existing
socket ownership prior to that in order to distinguish between
the case where tproxy is active and where it is not.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-26 19:22:37 -07:00
Jesper Dangaard Brouer
1f2ccd00f2 ipv6: Use rcu_barrier() on module unload.
The ipv6 module uses rcu_call() thus it should use rcu_barrier() on
module unload.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-26 13:51:31 -07:00
Jens Rosenboom
a1faa69810 ipv6: avoid wraparound for expired preferred lifetime
Avoid showing wrong high values when the preferred lifetime of an address
is expired.

Signed-off-by: Jens Rosenboom <me@jayr.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-25 20:03:50 -07:00
Brian Haley
d5fdd6babc ipv6: Use correct data types for ICMPv6 type and code
Change all the code that deals directly with ICMPv6 type and code
values to use u8 instead of a signed int as that's the actual data
type.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-23 04:31:07 -07:00
Eric Dumazet
31e6d363ab net: correct off-by-one write allocations reports
commit 2b85a34e91
(net: No more expensive sock_hold()/sock_put() on each tx)
changed initial sk_wmem_alloc value.

We need to take into account this offset when reporting
sk_wmem_alloc to user, in PROC_FS files or various
ioctls (SIOCOUTQ/TIOCOUTQ)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18 00:29:12 -07:00
David S. Miller
9cbc1cb8cd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/scsi/fcoe/fcoe.c
	net/core/drop_monitor.c
	net/core/net-traces.c
2009-06-15 03:02:23 -07:00
Tom Goff
403dbb97f6 PIM-SM: namespace changes
IPv4:
  - make PIM register vifs netns local
  - set the netns when a PIM register vif is created
  - make PIM available in all network namespaces (if CONFIG_IP_PIMSM_V2)
    by adding the protocol handler when multicast routing is initialized

IPv6:
  - make PIM register vifs netns local
  - make PIM available in all network namespaces (if CONFIG_IPV6_PIMSM_V2)
    by adding the protocol handler when multicast routing is initialized

Signed-off-by: Tom Goff <thomas.goff@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-14 03:16:13 -07:00
Masatake YAMATO
590a9887a2 trivial: Fix a typo in comment of addrconf_dad_start()
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12 18:01:51 +02:00
Pavel Machek
4737f0978d trivial: Kconfig: .ko is normally not included in module names
.ko is normally not included in Kconfig help, make it consistent.

Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12 18:01:50 +02:00
Patrick McHardy
36432dae73 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2009-06-11 16:00:49 +02:00
Eric Dumazet
2b85a34e91 net: No more expensive sock_hold()/sock_put() on each tx
One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.

This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.

We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.

We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.

As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.

skb_set_owner_w() doesnt call sock_hold() anymore

sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.

Drawback is that a skb->truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.

Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-11 02:55:43 -07:00
David S. Miller
343a99724e netfilter: Use frag list abstraction interfaces.
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09 00:23:58 -07:00