Commit Graph

11561 Commits

Author SHA1 Message Date
Kalle Valo
572e001221 mac80211: use ps-poll when dynamic power save mode is disabled
When a directed tim bit is set, mac80211 currently disables power save
ands sends a null frame to the AP. But if dynamic power save is
disabled, mac80211 will not enable power save ever gain. Fix this by
adding ps-poll functionality to mac80211. When a directed tim bit is
set, mac80211 sends a ps-poll frame to the AP and checks for the more
data bit in the returned data frames.

Using ps-poll is slower than waking up with null frame, but it's saves more
power in cases where the traffic is low. Userspace can control if either
ps-poll or null wakeup method is used by enabling and disabling dynamic
power save.

Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-13 13:45:17 -05:00
Kalle Valo
1fb3606bc5 mac80211: remove multicast check from check_tim()
Currently mac80211 checks for the multicast tim bit from beacons,
disables power save and sends a null frame if the bit is set. This was
added to support ath9k. But this is a bit controversial because the AP will
send multicast frames immediately after the beacon and the time constraints
are really high. Relying mac80211 to be fast enough here might not be
reliable in all situations. And there's no need to send a null frame, AP
will send the frames immediately after the dtim beacon no matter what.

Also if dynamic power save is disabled (iwconfig wlan0 power timeout 0)
currently mac80211 disables power save whenever the multicast bit is set
but it's never enabled again after receiving the first multicast/broadcast
frame.

The current implementation is not usable on p54/stlc45xx and the
easiest way to fix this is to remove the multicast tim bit check
altogether. Handling multicast tim bit in host is rare, most of the
designs do this in firmware/hardware, so it's better not to have it in
mac80211. It's a lot better to do this in firmware/hardware, or if
that's not possible it could be done in the driver.

Also renamed the function to ieee80211_check_tim() to follow the style
of the file.

Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-13 13:45:15 -05:00
Vivek Natarajan
97d97b8098 mac80211: Fix the wrong WARN_ON message appearing on enabling power save.
This issue happens only when we are associated with a 11n AP and power save
is enabled. In the function 'ieee80211_master_start_xmit', ps_disable_work
is queued where wake_queues is called. But before this work is executed,
we check if the queues are stopped in _ieee80211_tx and return TX_AGAIN to
ieee8011_tx which leads to the warning message.
This patch fixes this erroneous case.

Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-13 13:44:38 -05:00
Vasanthakumar Thiagarajan
7a94708060 mac80211: Free current bss information in few places where we don't need it any more
Signed-off-by: Vasanthakumar Thiagarajan <vasanth@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-11 11:44:23 -05:00
David S. Miller
b4ac530fc3 net: Move skbuff symbol exports after each symbol's definition.
net/core/skbuff.c is a hodge-podge of symbol export placement.
Some of the exports are right after the definition of the
symbol being exported, others are clumped together into a big
group at the end of the file.

Make things consistent.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-10 02:09:24 -08:00
Jarek Poplawski
149490f131 pkt_sched: sch_multiq: Change errno on non-multiqueue devices use.
Current "RTNETLINK answers: Invalid argument" warning, while trying to
add multiq qdisc to non-multiqueue device, isn't very helpful and some
of these devs can be changed btw., so let's use a better errno.

With feedback from Stephen Hemminger <shemminger@vyatta.com>

Reported-by: Badalian Vyacheslav <slavon@bigtelecom.ru>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-10 00:11:21 -08:00
David S. Miller
4b53b361e0 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-02-09 23:30:44 -08:00
David S. Miller
0ecc103aec Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/gianfar.c
2009-02-09 23:22:21 -08:00
Herbert Xu
4906f9985e bridge: Fix LRO crash with tun
> Kernel BUG at drivers/net/tun.c:444
> invalid opcode: 0000 [1] SMP
> last sysfs file: /class/net/lo/ifindex
> CPU 0
> Modules linked in: tun ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack
> nfnetlink ipt_REJECT xt_tcpudp iptable_filter d
> Pid: 6912, comm: qemu-kvm Tainted: G      2.6.18-128.el5 #1
> RIP: 0010:[<ffffffff886f57b0>]  [<ffffffff886f57b0>]
> :tun:tun_chr_readv+0x2b1/0x3a6
> RSP: 0018:ffff8102202c5e48  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff8102202c5e98 RCX: 0000000004010000
> RDX: ffff810227063680 RSI: ffff8102202c5e9e RDI: ffff8102202c5e92
> RBP: 0000000000010ff6 R08: 0000000000000000 R09: 0000000000000001
> R10: ffff8102202c5e94 R11: 0000000000000202 R12: ffff8102275357c0
> R13: ffff81022755e500 R14: 0000000000000000 R15: ffff8102202c5ef8
> FS:  00002ae4398db980(0000) GS:ffffffff803ac000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00002ae4ab514000 CR3: 0000000221344000 CR4: 00000000000026e0
> Process qemu-kvm (pid: 6912, threadinfo ffff8102202c4000, task
> ffff81022e58d820)
> Stack:  00000000498735cb ffff810229d1a3c0 0000000000000000 ffff81022e58d820
>  ffffffff8008a461 ffff81022755e528 ffff81022755e528 ffffffff8009f925
>  000005ea05ea0000 ffff8102209d0000 00001051143e1600 ffffffff8003c00e
> Call Trace:
>  [<ffffffff8008a461>] default_wake_function+0x0/0xe
>  [<ffffffff8009f925>] enqueue_hrtimer+0x55/0x70
>  [<ffffffff8003c00e>] hrtimer_start+0xbc/0xce
>  [<ffffffff886f58bf>] :tun:tun_chr_read+0x1a/0x1f
>  [<ffffffff8000b3f3>] vfs_read+0xcb/0x171
>  [<ffffffff800117d4>] sys_read+0x45/0x6e
>  [<ffffffff8005d116>] system_call+0x7e/0x83
>
>
> Code: 0f 0b 68 40 62 6f 88 c2 bc 01 f6 42 0a 08 74 0c 80 4c 24 41
> RIP  [<ffffffff886f57b0>] :tun:tun_chr_readv+0x2b1/0x3a6
>  RSP <ffff8102202c5e48>
>  <0>Kernel panic - not syncing: Fatal exception

This crashed when an LRO packet generated by bnx2x reached a
tun device through the bridge.  We're supposed to drop it at
the bridge.  However, because the check was placed in br_forward
instead of __br_forward, it's only effective if we are sending
the packet through a single port.

This patch fixes it by moving the check into __br_forward.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-09 15:07:18 -08:00
Noriaki TAKAMIYA
20461c1740 IPv6: fix to set device name when new IPv6 over IPv6 tunnel device is created.
When the user creates IPv6 over IPv6 tunnel, the device name created
by the kernel isn't set to t->parm.name, which is referred as the
result of ioctl().

Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-09 15:01:19 -08:00
Qu Haoran
d4e2675a61 netfilter: xt_sctp: sctp chunk mapping doesn't work
When user tries to map all chunks given in argument, kernel
works on a copy of the chunkmap, but at the end it doesn't
check the copy, but the orginal one.

Signed-off-by: Qu Haoran <haoran.qu@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-09 14:34:56 -08:00
Pablo Neira Ayuso
1f9da25616 netfilter: ctnetlink: fix echo if not subscribed to any multicast group
This patch fixes echoing if the socket that has sent the request to
create/update/delete an entry is not subscribed to any multicast
group. With the current code, ctnetlink would not send the echo
message via unicast as nfnetlink_send() would be skip.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-09 14:34:26 -08:00
Pablo Neira Ayuso
c969aa7d2c netfilter: ctnetlink: allow changing NAT sequence adjustment in creation
This patch fixes an inconsistency in the current ctnetlink code
since NAT sequence adjustment bit can only be updated but not set
in the conntrack entry creation.

This patch is used by conntrackd to successfully recover newly
created entries that represent connections with helpers and NAT
payload mangling.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-09 14:33:57 -08:00
Eric Leblond
3f9007135c netfilter: nf_conntrack_ipv6: don't track ICMPv6 negotiation message
This patch removes connection tracking handling for ICMPv6 messages
related to Stateless Address Autoconfiguration, MLD, and MLDv2. They
can not be tracked because they are massively using multicast (on
pre-defined address). But they are not invalid and should not be
detected as such.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-09 14:33:20 -08:00
Eric Leblond
a51f42f3c9 netfilter: fix tuple inversion for Node information request
The patch fixes a typo in the inverse mapping of Node Information
request. Following draft-ietf-ipngwg-icmp-name-lookups-09, "Querier"
sends a type 139 (ICMPV6_NI_QUERY) packet to "Responder" which answer
with a type 140 (ICMPV6_NI_REPLY) packet.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-09 14:33:03 -08:00
Vasanthakumar Thiagarajan
d43e87868f mac80211: Remove bss information of the current AP when it goes out of range
There is no point having the bss information of currently associated AP
when the AP is detected to be out of range.

Signed-off-by: Vasanthakumar Thiagarajan <vasanth@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-09 15:03:48 -05:00
Luis R. Rodriguez
f130347c2d cfg80211: add get reg command
This lets userspace request to get the currently set
regulatory domain.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-09 15:03:45 -05:00
Luis R. Rodriguez
47f4d8872f mac80211: do not TX injected frames when not allowed
Monitor mode is able to TX by using injected frames. We should
not allow injected frames to be sent unless allowed by regulatory
rules. Since AP mode uses a monitor interfaces to transmit
management frames we have to take care to not break AP mode as
well while resolving this. We can deal with this by allowing compliant
APs solutions to inform mac80211 if their monitor interface is
intended to be used for an AP by setting a cfg80211 flag for the
monitor interface. hostapd, for example, currently does its own
checks to ensure AP mode is not used on channels which require radar
detection. Once such solutions are available it can can add this
flag for monitor interfaces.

Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-09 15:03:45 -05:00
Johannes Berg
7230645e32 mac80211: convert master interface to netdev_ops
Also call our own ieee80211_master_setup routine instead of
overwriting almost all the values from ether_setup; this
loses a few assignments that are pointless on the master
interface anyway.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-09 15:03:45 -05:00
Johannes Berg
587e729ecf mac80211: convert to net_device_ops
Convert to new net_device_ops in 2.6.28 and later.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-09 15:03:44 -05:00
Johannes Berg
7fee5372d8 mac80211: remove HW_SIGNAL_DB
Giving the signal in dB isn't much more useful to userspace
than giving the signal in unspecified units. This removes
some radiotap information for zd1211 (the only driver using
this flag), but it helps a lot for getting cfg80211-based
scanning which won't support dB, and zd1211 being dB is a
little fishy anyway.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Bruno Randolf <bruno@thinktube.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-09 15:03:44 -05:00
Harvey Harrison
c1b4aa3fb6 wireless: replace uses of __constant_{endian}
The base versions handle constant folding now.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-09 15:03:43 -05:00
Alina Friedrichsen
c4e3a58448 mac80211: IBSS join rework
I hold back this patch for around a week to avoid
confusion. This is the second step of
"mac80211: Fixed BSSID handling revisited".

With it, in the situation of a strange merge to the
same BSSID (e.g. caused by a TSF overflow) only
reset_tsf() is called.

And sta_info_flush_delayed() is only called if you
change the network manually, not on an automatic
BSSID merge.

Signed-off-by: Alina Friedrichsen <x-alina@gmx.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-09 15:03:40 -05:00
Alina Friedrichsen
c0415b547d mac80211: Creating new IBSS with fixed BSSID
This fixes a bug when creating a new IBSS network with a
fixed BSSID. The fixed BSSID situation is now with one of
my last patches handled in ieee80211_sta_find_ibss()
function.

It's more robust to test against
(ifsta->flags & IEEE80211_STA_PREV_BSSID_SET), because
ifsta->state is not seted right in every situation and so
the creating of the new IBSS network sometimes hangs after
the first try to scan for a network to merge.

Signed-off-by: Alina Friedrichsen <x-alina@gmx.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-09 15:03:39 -05:00
Sujith
e374055afb mac80211: Reset assoc_scan_tries after an unsuccessful scan run
Trying to associate with a non-existent SSID stops the
state machine after the first run. Subsequent association
requests fail to start the scan engine. Fix this by resetting
assoc_scan_tries to zero after completing a scan run.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-09 15:03:38 -05:00
Herbert Xu
aa6320d336 gro: Optimise TCP packet reception
gro: Optimise TCP packet reception

As this function can be called more than half a million times for
10GbE, it's important to optimise it as much as we can.

This patch uses bit ops to logical ops, as well as open coding
memcmp to exploit alignment properties.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-08 20:22:19 -08:00
Herbert Xu
a5ad24be72 gro: Optimise IPv4 packet reception
As this function can be called more than half a million times for
10GbE, it's important to optimise it as much as we can.

This patch does some obvious changes to use 2-byte and 4-byte
operations instead of byte-oriented ones where possible.  Bit
ops are also used to replace logical ops to reduce branching.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-08 20:22:19 -08:00
Herbert Xu
aa4b9f533e gro: Optimise Ethernet header comparison
This patch optimises the Ethernet header comparison to use 2-byte
and 4-byte xors instead of memcmp.  In order to facilitate this,
the actual comparison is now carried out by the callers of the
shared dev_gro_receive function.

This has a significant impact when receiving 1500B packets through
10GbE.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-08 20:22:18 -08:00
Herbert Xu
4ae5544f9a gro: Remember number of held packets instead of counting every time
This patch prepares for the move of the same_flow checks out of
dev_gro_receive.  As such we need to remember the number of held
packets since doing a loop just to count them every time is silly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-08 20:22:17 -08:00
David S. Miller
409f0a9014 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-agn.c
	drivers/net/wireless/iwlwifi/iwl3945-base.c
2009-02-07 02:52:44 -08:00
Ilpo Järvinen
1f0fa15432 net/sunrpc/xprtsock.c: some common code found
$ diff-funcs xs_udp_write_space net/sunrpc/xprtsock.c
net/sunrpc/xprtsock.c xs_tcp_write_space
 --- net/sunrpc/xprtsock.c:xs_udp_write_space()
 +++ net/sunrpc/xprtsock.c:xs_tcp_write_space()
@@ -1,4 +1,4 @@
- * xs_udp_write_space - callback invoked when socket buffer space
+ * xs_tcp_write_space - callback invoked when socket buffer space
  *                             becomes available
  * @sk: socket whose state has changed
  *
@@ -7,12 +7,12 @@
  * progress, otherwise we'll waste resources thrashing kernel_sendmsg
  * with a bunch of small requests.
  */
-static void xs_udp_write_space(struct sock *sk)
+static void xs_tcp_write_space(struct sock *sk)
 {
 	read_lock(&sk->sk_callback_lock);

-	/* from net/core/sock.c:sock_def_write_space */
-	if (sock_writeable(sk)) {
+	/* from net/core/stream.c:sk_stream_write_space */
+	if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)) {
 		struct socket *sock;
 		struct rpc_xprt *xprt;


$ codiff net/sunrpc/xprtsock.o net/sunrpc/xprtsock.o.new
net/sunrpc/xprtsock.c:
  xs_tcp_write_space | -163
  xs_udp_write_space | -163
 2 functions changed, 326 bytes removed

net/sunrpc/xprtsock.c:
  xs_write_space | +179
 1 function changed, 179 bytes added

net/sunrpc/xprtsock.o.new:
 3 functions changed, 179 bytes added, 326 bytes removed, diff: -147

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 23:48:33 -08:00
Ilpo Järvinen
b5f348e5a4 ipv6/addrconf: common code located
$ codiff net/ipv6/addrconf.o net/ipv6/addrconf.o.new
net/ipv6/addrconf.c:
 addrconf_notify | -267
1 function changed, 267 bytes removed

net/ipv6/addrconf.c:
 add_addr |  +86
1 function changed, 86 bytes added

net/ipv6/addrconf.o.new:
2 functions changed, 86 bytes added, 267 bytes removed, diff: -181

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 23:48:01 -08:00
Ilpo Järvinen
d73f08011b ipv6/ndisc: join error paths
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 23:47:37 -08:00
Ilpo Järvinen
910d30b704 ax25: more common return path joining
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 23:47:14 -08:00
Ilpo Järvinen
69ebbf58f3 ipmr: use goto to common label instead of opencoding
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 23:46:51 -08:00
Eric Van Hensbergen
beeebc92ee 9p: fix endian issues [attempt 3]
When the changes were done to the protocol last release, some endian
bugs crept in.  This patch fixes those endian problems and has been
verified to run on 32/64 bit and x86/ppc architectures.

This version of the patch incorporates the correct annotations
for endian variables.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 22:07:41 -08:00
David S. Miller
b4bd07c20b net_dma: call dmaengine_get only if NET_DMA enabled
Based upon a patch from Atsushi Nemoto <anemo@mba.ocn.ne.jp>

--------------------
The commit 649274d993 ("net_dma:
acquire/release dma channels on ifup/ifdown") added unconditional call
of dmaengine_get() to net_dma.  The API should be called only if
NET_DMA was enabled.
--------------------

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Dan Williams <dan.j.williams@intel.com>
2009-02-06 22:06:43 -08:00
David Howells
15bde72738 RxRPC: Fix a potential NULL dereference
Fix a potential NULL dereference bug during error handling in
rxrpc_kernel_begin_call(), whereby rxrpc_put_transport() may be handed a NULL
pointer.

This was found with a code checker (http://repo.or.cz/w/smatch.git/).

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 21:50:52 -08:00
Jesper Dangaard Brouer
2783ef2312 udp: Fix potential wrong ip_hdr(skb) pointers
Like the UDP header fix, pskb_may_pull() can potentially
alter the SKB buffer.  Thus the saddr and daddr, pointers
may point to the old skb->data buffer.

I haven't seen corruptions, as its only seen if the old
skb->data buffer were reallocated by another user and
written into very quickly (or poison'd by SLAB debugging).

Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 01:59:12 -08:00
Gautam Kachroo
efc683fc2a neigh: some entries can be skipped during dumping
neightbl_dump_info and neigh_dump_table  can skip entries if the
*fill*info functions return an error. This results in an incomplete
dump ((invoked by netlink requests for RTM_GETNEIGHTBL or
RTM_GETNEIGH)

nidx and idx should not be incremented if the current entry was not
placed in the output buffer

Signed-off-by: Gautam Kachroo <gk@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 00:52:04 -08:00
David S. Miller
684de409ac ipv6: Disallow rediculious flowlabel option sizes.
Just like PKTINFO, limit the options area to 64K.

Based upon report by Eric Sesterhenn and analysis by
Roland Dreier.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 00:49:55 -08:00
Pablo Neira Ayuso
ff491a7334 netlink: change return-value logic of netlink_broadcast()
Currently, netlink_broadcast() reports errors to the caller if no
messages at all were delivered:

1) If, at least, one message has been delivered correctly, returns 0.
2) Otherwise, if no messages at all were delivered due to skb_clone()
   failure, return -ENOBUFS.
3) Otherwise, if there are no listeners, return -ESRCH.

With this patch, the caller knows if the delivery of any of the
messages to the listeners have failed:

1) If it fails to deliver any message (for whatever reason), return
   -ENOBUFS.
2) Otherwise, if all messages were delivered OK, returns 0.
3) Otherwise, if no listeners, return -ESRCH.

In the current ctnetlink code and in Netfilter in general, we can add
reliable logging and connection tracking event delivery by dropping the
packets whose events were not successfully delivered over Netlink. Of
course, this option would be settable via /proc as this approach reduces
performance (in terms of filtered connections per seconds by a stateful
firewall) but providing reliable logging and event delivery (for
conntrackd) in return.

This patch also changes some clients of netlink_broadcast() that
may report ENOBUFS errors via printk. This error handling is not
of any help. Instead, the userspace daemons that are listening to
those netlink messages should resync themselves with the kernel-side
if they hit ENOBUFS.

BTW, netlink_broadcast() clients include those that call
cn_netlink_send(), nlmsg_multicast() and genlmsg_multicast() since they
internally call netlink_broadcast() and return its error value.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05 23:56:36 -08:00
Herbert Xu
56035022d8 gro: Fix frag_list merging on imprecisely split packets
The previous fix ad0f990444 (gro:
Fix handling of imprecisely split packets) only fixed the case
of frags merging, frag_list merging in the same circumstances
were still broken.

In particular, the packet headers end up in the data stream.

This patch fixes this plus another issue where an imprecisely
split packet header may be read incorrectly (this is mostly
harmless since it'll simply cause the packet to not match and
be rejected for GRO).

Thanks to Emil Tantilov and Jeff Kirsher for helping to track
this down.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05 21:26:52 -08:00
David S. Miller
a23f4bbd8d Revert "tcp: Always set urgent pointer if it's beyond snd_nxt"
This reverts commit 64ff3b938e.

Jeff Chua reports that it breaks rlogin for him.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05 15:38:31 -08:00
Herbert Xu
0178b695fd ipv6: Copy cork options in ip6_append_data
As the options passed to ip6_append_data may be ephemeral, we need
to duplicate it for corking.  This patch applies the simplest fix
which is to memdup all the relevant bits.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05 15:15:50 -08:00
Jesper Dangaard Brouer
7b5e56f9d6 udp: Fix UDP short packet false positive
The UDP header pointer assignment must happen after calling
pskb_may_pull().  As pskb_may_pull() can potentially alter the SKB
buffer.

This was exposted by running multicast traffic through the NIU driver,
as it won't prepull the protocol headers into the linear area on
receive.

Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05 15:05:45 -08:00
Herbert Xu
4cc7f68d65 net: Reexport sock_alloc_send_pskb
The function sock_alloc_send_pskb is completely useless if not
exported since most of the code in it won't be used as is.  In
fact, this code has already been duplicated in the tun driver.

Now that we need accounting in the tun driver, we can in fact
use this function as is.  So this patch marks it for export again.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-04 16:55:54 -08:00
Herbert Xu
9a279bcbe3 net: Partially allow skb destructors to be used on receive path
As it currently stands, skb destructors are forbidden on the
receive path because the protocol end-points will overwrite
any existing destructor with their own.

This is the reason why we have to call skb_orphan in the loopback
driver before we reinject the packet back into the stack, thus
creating a period during which loopback traffic isn't charged
to any socket.

With virtualisation, we have a similar problem in that traffic
is reinjected into the stack without being associated with any
socket entity, thus providing no natural congestion push-back
for those poor folks still stuck with UDP.

Now had we been consistent in telling them that UDP simply has
no congestion feedback, I could just fob them off.  Unfortunately,
we appear to have gone to some length in catering for this on
the standard UDP path, with skb/socket accounting so that has
created a very unhealthy dependency.

Alas habits are difficult to break out of, so we may just have
to allow skb destructors on the receive path.

It turns out that making skb destructors useable on the receive path
isn't as easy as it seems.  For instance, simply adding skb_orphan
to skb_set_owner_r isn't enough.  This is because we assume all
over the IP stack that skb->sk is an IP socket if present.

The new transparent proxy code goes one step further and assumes
that skb->sk is the receiving socket if present.

Now all of this can be dealt with by adding simple checks such
as only treating skb->sk as an IP socket if skb->sk->sk_family
matches.  However, it turns out that for bridging at least we
don't need to do all of this work.

This is of interest because most virtualisation setups use bridging
so we don't actually go through the IP stack on the host (with
the exception of our old nemesis the bridge netfilter, but that's
easily taken care of).

So this patch simply adds skb_orphan to the point just before we
enter the IP stack, but after we've gone through the bridge on the
receive path.  It also adds an skb_orphan to the one place in
netfilter that touches skb->sk/skb->destructor, that is, tproxy.

One word of caution, because of the internal code structure, anyone
wishing to deploy this must use skb_set_owner_w as opposed to
skb_set_owner_r since many functions that create a new skb from
an existing one will invoke skb_set_owner_w on the new skb.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-04 16:55:27 -08:00
David S. Miller
005c79b3d4 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2009-02-04 16:51:58 -08:00
Randy Dunlap
55128bc23e sunrpc: fix rdma dependencies
Fix sunrpc/rdma build dependencies.
Survives 12 build combinations of INET, IPV6, SUNRPC,
INFINIBAND, and INFINIBAND_ADDR_TRANS.

ERROR: "rdma_destroy_id" [net/sunrpc/xprtrdma/xprtrdma.ko] undefined!
ERROR: "rdma_connect" [net/sunrpc/xprtrdma/xprtrdma.ko] undefined!
ERROR: "rdma_destroy_qp" [net/sunrpc/xprtrdma/xprtrdma.ko] undefined!
ERROR: "rdma_create_id" [net/sunrpc/xprtrdma/xprtrdma.ko] undefined!
ERROR: "rdma_create_qp" [net/sunrpc/xprtrdma/xprtrdma.ko] undefined!
ERROR: "rdma_resolve_route" [net/sunrpc/xprtrdma/xprtrdma.ko] undefined!
ERROR: "rdma_disconnect" [net/sunrpc/xprtrdma/xprtrdma.ko] undefined!
ERROR: "rdma_resolve_addr" [net/sunrpc/xprtrdma/xprtrdma.ko] undefined!
ERROR: "rdma_accept" [net/sunrpc/xprtrdma/svcrdma.ko] undefined!
ERROR: "rdma_destroy_id" [net/sunrpc/xprtrdma/svcrdma.ko] undefined!
ERROR: "rdma_listen" [net/sunrpc/xprtrdma/svcrdma.ko] undefined!
ERROR: "rdma_create_id" [net/sunrpc/xprtrdma/svcrdma.ko] undefined!
ERROR: "rdma_create_qp" [net/sunrpc/xprtrdma/svcrdma.ko] undefined!
ERROR: "rdma_bind_addr" [net/sunrpc/xprtrdma/svcrdma.ko] undefined!
ERROR: "rdma_disconnect" [net/sunrpc/xprtrdma/svcrdma.ko] undefined!

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-03 15:20:13 -08:00