Commit Graph

17021 Commits

Author SHA1 Message Date
Eric Dumazet
1f1b9c9990 fib: fib_result_assign() should not change fib refcounts
After commit ebc0ffae5 (RCU conversion of fib_lookup()),
fib_result_assign()  should not change fib refcounts anymore.

Thanks to Michael who did the bisection and bug report.

Reported-by: Michael Ellerman <michael@ellerman.id.au>
Tested-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-04 12:05:32 -07:00
Jan Engelhardt
cccbe5ef85 netfilter: ip6_tables: fix information leak to userspace
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:55:39 -07:00
David S. Miller
758cb41106 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2010-11-03 18:52:32 -07:00
Herbert Xu
c00b2c9e79 cls_cgroup: Fix crash on module unload
Somewhere along the lines net_cls_subsys_id became a macro when
cls_cgroup is built as a module.  Not only did it make cls_cgroup
completely useless, it also causes it to crash on module unload.

This patch fixes this by removing that macro.

Thanks to Eric Dumazet for diagnosing this problem.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:50:50 -07:00
andrew hendry
a6331d6f9a memory corruption in X.25 facilities parsing
Signed-of-by: Andrew Hendry <andrew.hendry@gmail.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:50:50 -07:00
Xiaotian Feng
41bb78b4b9 net dst: fix percpu_counter list corruption and poison overwritten
There're some percpu_counter list corruption and poison overwritten warnings
in recent kernel, which is resulted by fc66f95c.

commit fc66f95c switches to use percpu_counter, in ip6_route_net_init, kernel
init the percpu_counter for dst entries, but, the percpu_counter is never destroyed
in ip6_route_net_exit. So if the related data is freed by kernel, the freed percpu_counter
is still on the list, then if we insert/remove other percpu_counter, list corruption
resulted. Also, if the insert/remove option modifies the ->prev,->next pointer of
the freed value, the poison overwritten is resulted then.

With the following patch, the percpu_counter list corruption and poison overwritten
warnings disappeared.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:50:07 -07:00
Pavel Emelyanov
8200a59f24 rds: Remove kfreed tcp conn from list
All the rds_tcp_connection objects are stored list, but when
being freed it should be removed from there.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:50:07 -07:00
Pavel Emelyanov
58c490babd rds: Lost locking in loop connection freeing
The conn is removed from list in there and this requires
proper lock protection.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:50:06 -07:00
sjur.brandeland@stericsson.com
47d1ff1765 caif: Remove noisy printout when disconnecting caif socket
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:50:04 -07:00
André Carvalho de Matos
f2527ec436 caif: Bugfix for socket priority, bindtodev and dbg channel.
Changes:
o Bugfix: SO_PRIORITY for SOL_SOCKET could not be handled
  in caif's setsockopt,  using the struct sock attribute priority instead.

o Bugfix: SO_BINDTODEVICE for SOL_SOCKET could not be handled
  in caif's setsockopt,  using the struct sock attribute ifindex instead.

o Wrong assert statement for RFM layer segmentation.

o CAIF Debug channels was not working over SPI, caif_payload_info
  containing padding info must be initialized.

o Check on pointer before dereferencing when unregister dev in caif_dev.c

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:50:03 -07:00
Vasiliy Kulikov
b5f15ac4f8 ipv4: netfilter: ip_tables: fix information leak to userland
Structure ipt_getinfo is copied to userland with the field "name"
that has the last elements unitialized.  It leads to leaking of
contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-03 08:45:06 +01:00
Vasiliy Kulikov
1a8b7a6722 ipv4: netfilter: arp_tables: fix information leak to userland
Structure arpt_getinfo is copied to userland with the field "name"
that has the last elements unitialized.  It leads to leaking of
contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-03 08:44:12 +01:00
Tom Herbert
df32cc193a net: check queue_index from sock is valid for device
In dev_pick_tx recompute the queue index if the value stored in the
socket is greater than or equal to the number of real queues for the
device.  The saved index in the sock structure is not guaranteed to
be appropriate for the egress device (this could happen on a route
change or in presence of tunnelling).  The result of the queue index
being bad would be to return a bogus queue (crash could prersumably
follow).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-01 12:55:52 -07:00
Dr. David Alan Gilbert
6f9b901823 l2tp: kzalloc with swapped params in l2tp_dfs_seq_open
'sparse' spotted that the parameters to kzalloc in l2tp_dfs_seq_open
were swapped.

Tested on current git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
at 1792f17b72  build, boots and I can see that directory,
but there again I could see /sys/kernel/debug/l2tp with it swapped; I don't have
any l2tp in use.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-01 06:56:02 -07:00
Thomas Graf
5ec1cea057 text ematch: check for NULL pointer before destroying textsearch config
While validating the configuration em_ops is already set, thus the
individual destroy functions are called, but the ematch data has
not been allocated and associated with the ematch yet.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-31 09:37:38 -07:00
Linus Torvalds
253eacc070 net: Truncate recvfrom and sendto length to INT_MAX.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-30 16:44:07 -07:00
Andy Grover
d139ff0907 RDS: Let rds_message_alloc_sgs() return NULL
Even with the previous fix, we still are reading the iovecs once
to determine SGs needed, and then again later on. Preallocating
space for sg lists as part of rds_message seemed like a good idea
but it might be better to not do this. While working to redo that
code, this patch attempts to protect against userspace rewriting
the rds_iovec array between the first and second accesses.

The consequences of this would be either a too-small or too-large
sg list array. Too large is not an issue. This patch changes all
callers of message_alloc_sgs to handle running out of preallocated
sgs, and fail gracefully.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-30 16:34:18 -07:00
Andy Grover
fc8162e3c0 RDS: Copy rds_iovecs into kernel memory instead of rereading from userspace
Change rds_rdma_pages to take a passed-in rds_iovec array instead
of doing copy_from_user itself.

Change rds_cmsg_rdma_args to copy rds_iovec array once only. This
eliminates the possibility of userspace changing it after our
sanity checks.

Implement stack-based storage for small numbers of iovecs, based
on net/socket.c, to save an alloc in the extremely common case.

Although this patch reduces iovec copies in cmsg_rdma_args to 1,
we still do another one in rds_rdma_extra_size. Getting rid of
that one will be trickier, so it'll be a separate patch.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-30 16:34:17 -07:00
Andy Grover
f4a3fc03c1 RDS: Clean up error handling in rds_cmsg_rdma_args
We don't need to set ret = 0 at the end -- it's initialized to 0.

Also, don't increment s_send_rdma stat if we're exiting with an
error.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-30 16:34:17 -07:00
Andy Grover
a09f69c49b RDS: Return -EINVAL if rds_rdma_pages returns an error
rds_cmsg_rdma_args would still return success even if rds_rdma_pages
returned an error (or overflowed).

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-30 16:34:16 -07:00
Linus Torvalds
1b1f693d7a net: fix rds_iovec page count overflow
As reported by Thomas Pollet, the rdma page counting can overflow.  We
get the rdma sizes in 64-bit unsigned entities, but then limit it to
UINT_MAX bytes and shift them down to pages (so with a possible "+1" for
an unaligned address).

So each individual page count fits comfortably in an 'unsigned int' (not
even close to overflowing into signed), but as they are added up, they
might end up resulting in a signed return value. Which would be wrong.

Catch the case of tot_pages turning negative, and return the appropriate
error code.

Reported-by: Thomas Pollet <thomas.pollet@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-30 16:34:16 -07:00
Eric Dumazet
3285ee3bb2 ip_gre: fix fallback tunnel setup
Before making the fallback tunnel visible to lookups, we should make
sure it is completely setup, once ipgre_tunnel_init() had been called
and tstats per_cpu pointer allocated.

move rcu_assign_pointer(ign->tunnels_wc[0], tunnel); from
ipgre_fb_tunnel_init() to ipgre_init_net()

Based on a patch from Pavel Emelyanov

Reported-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-30 16:21:28 -07:00
Eric Dumazet
870be39258 ipv6/udp: report SndbufErrors and RcvbufErrors
commit a18135eb93 (Add UDP_MIB_{SND,RCV}BUFERRORS handling.)
forgot to make the necessary changes in net/ipv6/proc.c to report
additional counters in /proc/net/snmp6

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-30 16:17:23 -07:00
David S. Miller
a4765fa7bf Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-10-29 12:23:15 -07:00
Jesper Juhl
520efd1ace mac80211: fix failure to check kmalloc return value in key_key_read
I noticed two small issues in mac80211/debugfs_key.c::key_key_read while
reading through the code. Patch below.

The key_key_read() function returns ssize_t and the value that's actually
returned is the return value of simple_read_from_buffer() which also
returns ssize_t, so let's hold the return value in a ssize_t local
variable rather than a int one.

Also, memory is allocated dynamically with kmalloc() which can fail, but
the return value of kmalloc() is not checked, so we may end up operating
on a null pointer further on. So check for a NULL return and bail out with
-ENOMEM in that case.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-10-29 14:33:26 -04:00
Eric Dumazet
d817d29d0b netfilter: fix nf_conntrack_l4proto_register()
While doing __rcu annotations work on net/netfilter I found following
bug. On some arches, it is possible we publish a table while its content
is not yet committed to memory, and lockless reader can dereference wild
pointer.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-29 19:59:40 +02:00
Patrick McHardy
64e4674922 netfilter: nf_nat: fix compiler warning with CONFIG_NF_CT_NETLINK=n
net/ipv4/netfilter/nf_nat_core.c:52: warning: 'nf_nat_proto_find_get' defined but not used
net/ipv4/netfilter/nf_nat_core.c:66: warning: 'nf_nat_proto_put' defined but not used

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-29 16:28:07 +02:00
David S. Miller
089282fb02 netfilter: xt_socket: Make tproto signed in socket_mt6_v1().
Otherwise error indications from ipv6_find_hdr() won't be noticed.

This required making the protocol argument to extract_icmp6_fields()
signed too.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28 12:59:53 -07:00
Nelson Elhage
448d7b5daf pktgen: Limit how much data we copy onto the stack.
A program that accidentally writes too much data to the pktgen file can overflow
the kernel stack and oops the machine. This is only triggerable by root, so
there's no security issue, but it's still an unfortunate bug.

printk() won't print more than 1024 bytes in a single call, anyways, so let's
just never copy more than that much data. We're on a fairly shallow stack, so
that should be safe even with CONFIG_4KSTACKS.

Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28 11:47:53 -07:00
David S. Miller
8acfe468b0 net: Limit socket I/O iovec total length to INT_MAX.
This helps protect us from overflow issues down in the
individual protocol sendmsg/recvmsg handlers.  Once
we hit INT_MAX we truncate out the rest of the iovec
by setting the iov_len members to zero.

This works because:

1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial
   writes are allowed and the application will just continue
   with another write to send the rest of the data.

2) For datagram oriented sockets, where there must be a
   one-to-one correspondance between write() calls and
   packets on the wire, INT_MAX is going to be far larger
   than the packet size limit the protocol is going to
   check for and signal with -EMSGSIZE.

Based upon a patch by Linus Torvalds.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28 11:47:52 -07:00
Pavel Emelyanov
4aa2c466a7 fib: Fix fib zone and its hash leak on namespace stop
When we stop a namespace we flush the table and free one, but the
added fn_zone-s (and their hashes if grown) are leaked. Need to free.
Tries releases all its stuff in the flushing code.

Shame on us - this bug exists since the very first make-fib-per-net
patches in 2.6.27 :(

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28 10:27:03 -07:00
Gerrit Renker
1c0e0a0569 dccp ccid-2: Stop polling
This updates CCID-2 to use the CCID dequeuing mechanism, converting from
previous continuous-polling to a now event-driven mechanism.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28 10:27:01 -07:00
Gerrit Renker
b1fcf55eea dccp: Refine the wait-for-ccid mechanism
This extends the existing wait-for-ccid routine so that it may be used with
different types of CCID, addressing the following problems:

 1) The queue-drain mechanism only works with rate-based CCIDs. If CCID-2 for
    example has a full TX queue and becomes network-limited just as the
    application wants to close, then waiting for CCID-2 to become unblocked
    could lead to an indefinite  delay (i.e., application "hangs").
 2) Since each TX CCID in turn uses a feedback mechanism, there may be changes
    in its sending policy while the queue is being drained. This can lead to
    further delays during which the application will not be able to terminate.
 3) The minimum wait time for CCID-3/4 can be expected to be the queue length
    times the current inter-packet delay. For example if tx_qlen=100 and a delay
    of 15 ms is used for each packet, then the application would have to wait
    for a minimum of 1.5 seconds before being allowed to exit.
 4) There is no way for the user/application to control this behaviour. It would
    be good to use the timeout argument of dccp_close() as an upper bound. Then
    the maximum time that an application is willing to wait for its CCIDs to can
    be set via the SO_LINGER option.

These problems are addressed by giving the CCID a grace period of up to the
`timeout' value.

The wait-for-ccid function is, as before, used when the application
 (a) has read all the data in its receive buffer and
 (b) if SO_LINGER was set with a non-zero linger time, or
 (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ
     state (client application closes after receiving CloseReq).

In addition, there is a catch-all case of __skb_queue_purge() after waiting for
the CCID. This is necessary since the write queue may still have data when
 (a) the host has been passively-closed,
 (b) abnormal termination (unread data, zero linger time),
 (c) wait-for-ccid could not finish within the given time limit.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28 10:27:01 -07:00
Gerrit Renker
dc841e30ea dccp: Extend CCID packet dequeueing interface
This extends the packet dequeuing interface of dccp_write_xmit() to allow
 1. CCIDs to take care of timing when the next packet may be sent;
 2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds).

The main purpose is to take CCID-2 out of its polling mode (when it is network-
limited, it tries every millisecond to send, without interruption).

The mode of operation for (2) is as follows:
 * new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(),
 * ccid_hc_tx_send_packet() detects that it may not send (e.g. window full),
 * it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER',
 * dccp_write_xmit() returns without further action;
 * after some time the wait-condition for CCID becomes true,
 * that CCID schedules the tasklet,
 * tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(),
 * since the wait-condition is now true, ccid_hc_tx_packet() returns "send now",
 * packet is sent, and possibly more (since dccp_write_xmit() loops).

Code reuse: the taskled function calls dccp_write_xmit(), the timer function
            reduces to a wrapper around the same code.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28 10:27:00 -07:00
Gerrit Renker
fe84f4140f dccp: Return-value convention of hc_tx_send_packet()
This patch reorganises the return value convention of the CCID TX sending
function, to permit more flexible schemes, as required by subsequent patches.

Currently the convention is
 * values < 0     mean error,
 * a value == 0   means "send now", and
 * a value x > 0  means "send in x milliseconds".

The patch provides symbolic constants and a function to interpret return values.

In addition, it caps the maximum positive return value to 0xFFFF milliseconds,
corresponding to 65.535 seconds.  This is possible since in CCID-3/4 the
maximum possible inter-packet gap is fixed at t_mbi = 64 sec.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28 10:27:00 -07:00
Eric Dumazet
6b1686a71e netfilter: nf_conntrack: allow nf_ct_alloc_hashtable() to get highmem pages
commit ea781f197d (use SLAB_DESTROY_BY_RCU and get rid of call_rcu())
did a mistake in __vmalloc() call in nf_ct_alloc_hashtable().

I forgot to add __GFP_HIGHMEM, so pages were taken from LOWMEM only.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-28 12:34:21 +02:00
Pavel Emelyanov
74b0b85b88 tunnels: Fix tunnels change rcu protection
After making rcu protection for tunnels (ipip, gre, sit and ip6) a bug
was introduced into the SIOCCHGTUNNEL code.

The tunnel is first unlinked, then addresses change, then it is linked
back probably into another bucket. But while changing the parms, the
hash table is unlocked to readers and they can lookup the improper tunnel.

Respective commits are b7285b79 (ipip: get rid of ipip_lock), 1507850b
(gre: get rid of ipgre_lock), 3a43be3c (sit: get rid of ipip6_lock) and
94767632 (ip6tnl: get rid of ip6_tnl_lock).

The quick fix is to wait for quiescent state to pass after unlinking,
but if it is inappropriate I can invent something better, just let me
know.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 14:20:08 -07:00
Jouni Malinen
dc9f48ce7c mac80211: Fix scan_ies_len to include DS Params
Commit 651b52254f added DS Parameter Set
information into Probe Request frames that are transmitted on 2.4 GHz
band, but it failed to increment local->scan_ies_len to cover this new
information. This variable needs to be updated to match the maximum IE
data length so that the extra buffer need gets reduced from the driver
limit.

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-10-27 15:46:51 -04:00
Eric Dumazet
b914c4ea92 inetpeer: __rcu annotations
Adds __rcu annotations to inetpeer
	(struct inet_peer)->avl_left
	(struct inet_peer)->avl_right

This is a tedious cleanup, but removes one smp_wmb() from link_to_pool()
since we now use more self documenting rcu_assign_pointer().

Note the use of RCU_INIT_POINTER() instead of rcu_assign_pointer() in
all cases we dont need a memory barrier.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:33 -07:00
Eric Dumazet
7a2b03c517 fib_rules: __rcu annotates ctarget
Adds __rcu annotation to (struct fib_rule)->ctarget

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:32 -07:00
Eric Dumazet
b33eab0844 tunnels: add __rcu annotations
Add __rcu annotations to :
        (struct ip_tunnel)->prl
        (struct ip_tunnel_prl_entry)->next
        (struct xfrm_tunnel)->next
	struct xfrm_tunnel *tunnel4_handlers
	struct xfrm_tunnel *tunnel64_handlers

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:32 -07:00
Eric Dumazet
e0ad61ec86 net: add __rcu annotations to protocol
Add __rcu annotations to :
        struct net_protocol *inet_protos
        struct net_protocol *inet6_protos

And use appropriate casts to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:31 -07:00
Eric Dumazet
1c31720a74 ipv4: add __rcu annotations to routes.c
Add __rcu annotations to :
        (struct dst_entry)->rt_next
        (struct rt_hash_bucket)->chain

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:31 -07:00
Ursula Braun
853dc2e03d ipv6: fix refcnt problem related to POSTDAD state
After running this bonding setup script
    modprobe bonding miimon=100 mode=0 max_bonds=1
    ifconfig bond0 10.1.1.1/16
    ifenslave bond0 eth1
    ifenslave bond0 eth3
on s390 with qeth-driven slaves, modprobe -r fails with this message
    unregister_netdevice: waiting for bond0 to become free. Usage count = 1
due to twice detection of duplicate address.
Problem is caused by a missing decrease of ifp->refcnt in addrconf_dad_failure.
An extra call of in6_ifa_put(ifp) solves it.
Problem has been introduced with commit f2344a131b.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:30 -07:00
Ben Hutchings
66c68bcc48 net: NETIF_F_HW_CSUM does not imply FCoE CRC offload
NETIF_F_HW_CSUM indicates the ability to update an TCP/IP-style 16-bit
checksum with the checksum of an arbitrary part of the packet data,
whereas the FCoE CRC is something entirely different.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Cc: stable@kernel.org [2.6.32+]
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:29 -07:00
Ben Hutchings
af1905dbec net: Fix some corner cases in dev_can_checksum()
dev_can_checksum() incorrectly returns true in these cases:

1. The skb has both out-of-band and in-band VLAN tags and the device
   supports checksum offload for the encapsulated protocol but only with
   one layer of encapsulation.
2. The skb has a VLAN tag and the device supports generic checksumming
   but not in conjunction with VLAN encapsulation.

Rearrange the VLAN tag checks to avoid these.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:29 -07:00
Glenn Wurster
7a876b0efc IPv6: Temp addresses are immediately deleted.
There is a bug in the interaction between ipv6_create_tempaddr and
addrconf_verify.  Because ipv6_create_tempaddr uses the cstamp and tstamp
from the public address in creating a private address, if we have not
received a router advertisement in a while, tstamp + temp_valid_lft might be
< now.  If this happens, the new address is created inside
ipv6_create_tempaddr, then the loop within addrconf_verify starts again and
the address is immediately deleted.  We are left with no temporary addresses
on the interface, and no more will be created until the public IP address is
updated.  To avoid this, set the expiry time to be the minimum of the time
left on the public address or the config option PLUS the current age of the
public interface.

Signed-off-by: Glenn Wurster <gwurster@scs.carleton.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26 12:35:13 -07:00
Glenn Wurster
aed65501e8 IPv6: Create temporary address if none exists.
If privacy extentions are enabled, but no current temporary address exists,
then create one when we get a router advertisement.

Signed-off-by: Glenn Wurster <gwurster@scs.carleton.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26 12:35:12 -07:00
Eric Dumazet
ded85aa86b fib_hash: fix rcu sparse and logical errors
While fixing CONFIG_SPARSE_RCU_POINTER errors, I had to fix accesses to
fz->fz_hash for real.

-	&fz->fz_hash[fn_hash(f->fn_key, fz)]
+	rcu_dereference(fz->fz_hash) + fn_hash(f->fn_key, fz)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26 11:42:39 -07:00
Eric Dumazet
ebb9fed2de fib: fix fib_nl_newrule()
Some panic reports in fib_rules_lookup() show a rule could have a NULL
pointer as a next pointer in the rules_list.

This can actually happen because of a bug in fib_nl_newrule() : It
checks if current rule is the destination of unresolved gotos. (Other
rules have gotos to this about to be inserted rule)

Problem is it does the resolution of the gotos before the rule is
inserted in the rules_list (and has a valid next pointer)

Fix this by moving the rules_list insertion before the changes on gotos.

A lockless reader can not any more follow a ctarget pointer, unless
destination is ready (has a valid next pointer)

Reported-by: Oleg A. Arkhangelsky <sysoleg@yandex.ru>
Reported-by: Joe Buehler <aspam@cox.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26 11:42:38 -07:00