Commit Graph

15731 Commits

Author SHA1 Message Date
Joerg Marx
fc350777c7 netfilter: nf_conntrack: fix a race in __nf_conntrack_confirm against nf_ct_get_next_corpse()
This race was triggered by a 'conntrack -F' command running in parallel
to the insertion of a hash for a new connection. Losing this race led to
a dead conntrack entry effectively blocking traffic for a particular
connection until timeout or flushing the conntrack hashes again.
Now the check for an already dying connection is done inside the lock.

Signed-off-by: Joerg Marx <joerg.marx@secunet.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-20 15:55:30 +02:00
Linus Torvalds
f72caf7e49 Merge branch 'for-2.6.35' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.35' of git://linux-nfs.org/~bfields/linux: (45 commits)
  Revert "nfsd4: distinguish expired from stale stateids"
  nfsd: safer initialization order in find_file()
  nfs4: minor callback code simplification, comment
  NFSD: don't report compiled-out versions as present
  nfsd4: implement reclaim_complete
  nfsd4: nfsd4_destroy_session must set callback client under the state lock
  nfsd4: keep a reference count on client while in use
  nfsd4: mark_client_expired
  nfsd4: introduce nfs4_client.cl_refcount
  nfsd4: refactor expire_client
  nfsd4: extend the client_lock to cover cl_lru
  nfsd4: use list_move in move_to_confirmed
  nfsd4: fold release_session into expire_client
  nfsd4: rename sessionid_lock to client_lock
  nfsd4: fix bare destroy_session null dereference
  nfsd4: use local variable in nfs4svc_encode_compoundres
  nfsd: further comment typos
  sunrpc: centralise most calls to svc_xprt_received
  nfsd4: fix unlikely race in session replay case
  nfsd4: fix filehandle comment
  ...
2010-05-19 17:24:54 -07:00
Linus Torvalds
6a6be470c3 Merge branch 'nfs-for-2.6.35' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.35' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (78 commits)
  SUNRPC: Don't spam gssd with upcall requests when the kerberos key expired
  SUNRPC: Reorder the struct rpc_task fields
  SUNRPC: Remove the 'tk_magic' debugging field
  SUNRPC: Move the task->tk_bytes_sent and tk_rtt to struct rpc_rqst
  NFS: Don't call iput() in nfs_access_cache_shrinker
  NFS: Clean up nfs_access_zap_cache()
  NFS: Don't run nfs_access_cache_shrinker() when the mask is GFP_NOFS
  SUNRPC: Ensure rpcauth_prune_expired() respects the nr_to_scan parameter
  SUNRPC: Ensure memory shrinker doesn't waste time in rpcauth_prune_expired()
  SUNRPC: Dont run rpcauth_cache_shrinker() when gfp_mask is GFP_NOFS
  NFS: Read requests can use GFP_KERNEL.
  NFS: Clean up nfs_create_request()
  NFS: Don't use GFP_KERNEL in rpcsec_gss downcalls
  NFSv4: Don't use GFP_KERNEL allocations in state recovery
  SUNRPC: Fix xs_setup_bc_tcp()
  SUNRPC: Replace jiffies-based metrics with ktime-based metrics
  ktime: introduce ktime_to_ms()
  SUNRPC: RPC metrics and RTT estimator should use same RTT value
  NFS: Calldata for nfs4_renew_done()
  NFS: Squelch compiler warning in nfs_add_server_stats()
  ...
2010-05-19 17:24:05 -07:00
Linus Torvalds
98c89cdd3a Merge branch 'bkl/procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing
* 'bkl/procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
  sunrpc: Include missing smp_lock.h
  procfs: Kill the bkl in ioctl
  procfs: Push down the bkl from ioctl
  procfs: Use generic_file_llseek in /proc/vmcore
  procfs: Use generic_file_llseek in /proc/kmsg
  procfs: Use generic_file_llseek in /proc/kcore
  procfs: Kill BKL in llseek on proc base
2010-05-19 17:23:28 -07:00
Michael S. Tsirkin
dc3f5e68f8 trans_virtio: use virtqueue_xxx wrappers
Switch trans_virtio to new virtqueue_xxx wrappers.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-05-19 22:15:45 +09:30
Herbert Xu
622ccdf107 ipv6: Never schedule DAD timer on dead address
This patch ensures that all places that schedule the DAD timer
look at the address state in a safe manner before scheduling the
timer.  This ensures that we don't end up with pending timers
after deleting an address.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-18 15:56:06 -07:00
Herbert Xu
f2344a131b ipv6: Use POSTDAD state
This patch makes use of the new POSTDAD state.  This prevents
a race between DAD completion and failure.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-18 15:55:27 -07:00
Herbert Xu
4c5ff6a6fe ipv6: Use state_lock to protect ifa state
This patch makes use of the new state_lock to synchronise between
updates to the ifa state.  This fixes the issue where a remotely
triggered address deletion (through DAD failure) coincides with a
local administrative address deletion, causing certain actions to
be performed twice incorrectly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-18 15:54:18 -07:00
Herbert Xu
e9d3e08497 ipv6: Replace inet6_ifaddr->dead with state
This patch replaces the boolean dead flag on inet6_ifaddr with
a state enum.  This allows us to roll back changes when deleting
an address according to whether DAD has completed or not.

This patch only adds the state field and does not change the logic.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-18 15:36:06 -07:00
Randy Dunlap
b3bcb72edb bridge: fix build for CONFIG_SYSFS disabled
Fix build when CONFIG_SYSFS is not enabled:
net/bridge/br_if.c:136: error: 'struct net_bridge_port' has no member named 'sysfs_name'

Note: dev->name == sysfs_name except when change name is in
progress, and we are protected from that by RTNL mutex.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-18 12:26:27 -07:00
Joe Perches
3fa21e07e6 net: Remove unnecessary returns from void function()s
This patch removes from net/ (but not any netfilter files)
all the unnecessary return; statements that precede the
last closing brace of void functions.

It does not remove the returns that are immediately
preceded by a label as gcc doesn't like that.

Done via:
$ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
  xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 23:23:14 -07:00
stephen hemminger
b60b6592ba net sched: cleanup and rate limit warning
If the user has a bad classification configuration, and gets a packet
that goes through too many steps. Chances are more packets will arrive,
and the message spew will overrun syslog because it is not rate limited.
And because it is not tagged with appropriate priority it can't not be screened.

Added the qdisc to the message to try and give some more context when
the message does arrive.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 23:23:13 -07:00
stephen hemminger
207024b947 pfkey: add severity to printk
Put severity level on pfkey printk messages

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 23:23:13 -07:00
stephen hemminger
62db5cfd70 xfrm: add severity to printk
Serious oh sh*t messages converted to WARN().
Add KERN_NOTICE severity to the unknown policy type messages.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 23:23:13 -07:00
stephen hemminger
6ff9c3644e net sched: printk message severity
The previous patch encourage me to go look at all the messages in
the network scheduler and fix them. Many messages were missing
any severity level. Some serious ones that should never happen
were turned into WARN(), and the random noise messages that were
handled changed to pr_debug().

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 23:23:12 -07:00
Julia Lawall
49afa55b5b net/caif: Use kzalloc
Use kzalloc rather than the combination of kmalloc and memset.

A simplified version of the semantic patch that makes this change is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,size,flags;
statement S;
@@

-x = kmalloc(size,flags);
+x = kzalloc(size,flags);
 if (x == NULL) S
-memset(x, 0, size);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:55:09 -07:00
Wei Yongjun
2e3219b5c8 sctp: fix append error cause to ERROR chunk correctly
commit 5fa782c2f5
  sctp: Fix skb_over_panic resulting from multiple invalid \
    parameter errors (CVE-2010-1173) (v4)

cause 'error cause' never be add the the ERROR chunk due to
some typo when check valid length in sctp_init_cause_fixed().

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:51:58 -07:00
Scott Feldman
57b610805c net: Add netlink support for virtual port management (was iovnl)
Add new netdev ops ndo_{set|get}_vf_port to allow setting of
port-profile on a netdev interface.  Extends netlink socket RTM_SETLINK/
RTM_GETLINK with two new sub msgs called IFLA_VF_PORTS and IFLA_PORT_SELF
(added to end of IFLA_cmd list).  These are both nested atrtibutes
using this layout:

              [IFLA_NUM_VF]
              [IFLA_VF_PORTS]
                      [IFLA_VF_PORT]
                              [IFLA_PORT_*], ...
                      [IFLA_VF_PORT]
                              [IFLA_PORT_*], ...
                      ...
              [IFLA_PORT_SELF]
                      [IFLA_PORT_*], ...

These attributes are design to be set and get symmetrically.  VF_PORTS
is a list of VF_PORTs, one for each VF, when dealing with an SR-IOV
device.  PORT_SELF is for the PF of the SR-IOV device, in case it wants
to also have a port-profile, or for the case where the VF==PF, like in
enic patch 2/2 of this patch set.

A port-profile is used to configure/enable the external switch virtual port
backing the netdev interface, not to configure the host-facing side of the
netdev.  A port-profile is an identifier known to the switch.  How port-
profiles are installed on the switch or how available port-profiles are
made know to the host is outside the scope of this patch.

There are two types of port-profiles specs in the netlink msg.  The first spec
is for 802.1Qbg (pre-)standard, VDP protocol.  The second spec is for devices
that run a similar protocol as VDP but in firmware, thus hiding the protocol
details.  In either case, the specs have much in common and makes sense to
define the netlink msg as the union of the two specs.  For example, both specs
have a notition of associating/deassociating a port-profile.  And both specs
require some information from the hypervisor manager, such as client port
instance ID.

The general flow is the port-profile is applied to a host netdev interface
using RTM_SETLINK, the receiver of the RTM_SETLINK msg communicates with the
switch, and the switch virtual port backing the host netdev interface is
configured/enabled based on the settings defined by the port-profile.  What
those settings comprise, and how those settings are managed is again
outside the scope of this patch, since this patch only deals with the
first step in the flow.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:49:55 -07:00
Eric Dumazet
d19d56ddc8 net: Introduce skb_tunnel_rx() helper
skb rxhash should be cleared when a skb is handled by a tunnel before
being delivered again, so that correct packet steering can take place.

There are other cleanups and accounting that we can factorize in a new
helper, skb_tunnel_rx()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:36:55 -07:00
Eric Dumazet
de213e5eed tcp: tcp_synack_options() fix
Commit 33ad798c92 (tcp: options clean up) introduced a problem
if MD5+SACK+timestamps were used in initial SYN message.

Some stacks (old linux for example) try to negotiate MD5+SACK+TSTAMP
sessions, but since 40 bytes of tcp options space are not enough to
store all the bits needed, we chose to disable timestamps in this case.

We send a SYN-ACK _without_ timestamp option, but socket has timestamps
enabled and all further outgoing messages contain a TS block, all with
the initial timestamp of the remote peer.

Fix is to really disable timestamps option for the whole session.

Reported-by: Bijay Singh <Bijay.Singh@guavus.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:35:36 -07:00
Stephen Hemminger
eedf042a63 ipv6: fix the bug of address check
The duplicate address check code got broken in the conversion
to hlist (2.6.35).  The earlier patch did not fix the case where
two addresses match same hash value. Use two exit paths,
rather than depending on state of loop variables (from macro).

Based on earlier fix by Shan Wei.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:27:12 -07:00
Steven Rostedt
f0218b3e99 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip into trace/tip/tracing/core-6
Conflicts:
	include/trace/ftrace.h
	kernel/trace/trace_kprobe.c

Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-05-18 00:35:23 -04:00
David S. Miller
820ae8a80e Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-05-17 21:09:11 -07:00
Patrick McHardy
a2f7922713 net_sched: sch_hfsc: fix classification loops
When attaching filters to a class pointing to a class higher up in the
hierarchy, classification may enter an endless loop. Currently this is
prevented for filters that are already resolved, but not for filters
resolved at runtime.

Only allow filters to point downwards in the hierarchy, similar to what
CBQ does.

Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:44:46 -07:00
stephen hemminger
f0cd15081a tbf: stop wanton destruction of children (v2)
Several netem users use TBF for rate control. But every time the parameters
of TBF are changed it destroys the child qdisc, requiring reconfigation.
Better to just keep child qdisc and just notify it of changed limit.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:44:35 -07:00
Joe Perches
ccbd6a5a4f net: Remove unnecessary semicolons after switch statements
Also added an explicit break; to avoid
a fallthrough in net/ipv4/tcp_input.c

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:44:35 -07:00
andrew hendry
935e2a26b8 X25: Remove bkl in sockopts
Removes the BKL in x25 setsock and getsockopts.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:39:28 -07:00
andrew hendry
37cda78741 X25: Move accept approve flag to bitfield
Moves the x25 accept approve flag from char into bitfield.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:39:27 -07:00
andrew hendry
b7792e34cb X25: Move interrupt flag to bitfield
Moves the x25 interrupt flag from char into bitfield.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:39:27 -07:00
andrew hendry
cb863ffd4a X25: Move qbit flag to bitfield
Moves the X25 q bit flag from char into a bitfield to allow BKL cleanup.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:39:26 -07:00
Eric Dumazet
ab6e3feba1 net: No dst refcounting in ip_queue_xmit()
TCP outgoing packets can avoid two atomic ops, and dirtying
of previously higly contended cache line using new refdst
infrastructure.

Note 1: loopback device excluded because of !IFF_XMIT_DST_RELEASE
Note 2: UDP packets dsts are built before ip_queue_xmit().

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:18:52 -07:00
Eric Dumazet
4a94445c9a net: Use ip_route_input_noref() in input path
Use ip_route_input_noref() in ip fast path, to avoid two atomic ops per
incoming packet.

Note: loopback is excluded from this optimization in ip_rcv_finish()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:18:51 -07:00
Eric Dumazet
407eadd996 net: implements ip_route_input_noref()
ip_route_input() is the version returning a refcounted dst, while
ip_route_input_noref() returns a non refcounted one.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:18:51 -07:00
Eric Dumazet
7fee226ad2 net: add a noref bit on skb dst
Use low order bit of skb->_skb_dst to tell dst is not refcounted.

Change _skb_dst to _skb_refdst to make sure all uses are catched.

skb_dst() returns the dst, regardless of noref bit set or not, but
with a lockdep check to make sure a noref dst is not given if current
user is not rcu protected.

New skb_dst_set_noref() helper to set an notrefcounted dst on a skb.
(with lockdep check)

skb_dst_drop() drops a reference only if skb dst was refcounted.

skb_dst_force() helper is used to force a refcount on dst, when skb
is queued and not anymore RCU protected.

Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if
!IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in
sock_queue_rcv_skb(), in __nf_queue().

Use skb_dst_force() in dev_requeue_skb().

Note: dst_use_noref() still dirties dst, we might transform it
later to do one dirtying per jiffies.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:18:50 -07:00
Eric Dumazet
ebda37c27d rps: avoid one atomic in enqueue_to_backlog
If CONFIG_SMP=y, then we own a queue spinlock, we can avoid the atomic
test_and_set_bit() from napi_schedule_prep().

We now have same number of atomic ops per netif_rx() calls than with
pre-RPS kernel.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:18:50 -07:00
Florian Westphal
0771275b25 ipv6 addrlabel: permit deletion of labels assigned to removed dev
as addrlabels with an interface index are left alone when the
interface gets removed this results in addrlabels that can no
longer be removed.

Restrict validation of index to adding new addrlabels.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:08:08 -07:00
John W. Linville
6fe70aae0d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2010-05-17 13:57:43 -04:00
John W. Linville
0c348d7c14 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-05-17 13:31:05 -04:00
David S. Miller
6811d58fc1 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	include/linux/if_link.h
2010-05-16 22:26:58 -07:00
Frederic Weisbecker
99df95a22f sunrpc: Include missing smp_lock.h
Now that cache_ioctl_procfs() calls the bkl explicitly, we need to
include the relevant header as well.

This fixes the following build error:

	net/sunrpc/cache.c: In function 'cache_ioctl_procfs':
	net/sunrpc/cache.c:1355: error: implicit declaration of function 'lock_kernel'
	net/sunrpc/cache.c:1359: error: implicit declaration of function 'unlock_kernel'

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-05-17 03:06:31 +02:00
Frederic Weisbecker
d79b6f4de5 procfs: Push down the bkl from ioctl
Push down the bkl from procfs's ioctl main handler to its users.
Only three procfs users implement an ioctl (non unlocked) handler.
Turn them into unlocked_ioctl and push down the Devil inside.

v2: PDE(inode)->data doesn't need to be under bkl
v3: And don't forget to git-add the result
v4: Use wrappers to pushdown instead of an invasive and error prone
    handlers surgery.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: John Kacur <jkacur@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
2010-05-17 03:06:12 +02:00
Chris Wright
c02db8c629 rtnetlink: make SR-IOV VF interface symmetric
Now we have a set of nested attributes:

  IFLA_VFINFO_LIST (NESTED)
    IFLA_VF_INFO (NESTED)
      IFLA_VF_MAC
      IFLA_VF_VLAN
      IFLA_VF_TX_RATE

This allows a single set to operate on multiple attributes if desired.
Among other things, it means a dump can be replayed to set state.

The current interface has yet to be released, so this seems like
something to consider for 2.6.34.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16 01:05:45 -07:00
Wei Yongjun
55fa0cfd7c sctp: delete active ICMP proto unreachable timer when free transport
transport may be free before ICMP proto unreachable timer expire, so
we should delete active ICMP proto unreachable timer when transport
is going away.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16 00:46:22 -07:00
Eric Dumazet
2d6c9ffcca net: congestion notifications are not dropped packets
vlan/macvlan start_xmit() can inform caller of congestion with
NET_XMIT_CN return value. This doesnt mean packet was dropped.
Increment normal stat counters instead of tx_dropped.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16 00:42:15 -07:00
Eric Dumazet
a465419b1f net: Introduce sk_route_nocaps
TCP-MD5 sessions have intermittent failures, when route cache is
invalidated. ip_queue_xmit() has to find a new route, calls
sk_setup_caps(sk, &rt->u.dst), destroying the 

sk->sk_route_caps &= ~NETIF_F_GSO_MASK

that MD5 desperately try to make all over its way (from
tcp_transmit_skb() for example)

So we send few bad packets, and everything is fine when
tcp_transmit_skb() is called again for this socket.

Since ip_queue_xmit() is at a lower level than TCP-MD5, I chose to use a
socket field, sk_route_nocaps, containing bits to mask on sk_route_caps.

Reported-by: Bhaskar Dutta <bhaskie@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16 00:36:33 -07:00
Eric Dumazet
35790c0421 tcp: fix MD5 (RFC2385) support
TCP MD5 support uses percpu data for temporary storage. It currently
disables preemption so that same storage cannot be reclaimed by another
thread on same cpu.

We also have to make sure a softirq handler wont try to use also same
context. Various bug reports demonstrated corruptions.

Fix is to disable preemption and BH.

Reported-by: Bhaskar Dutta <bhaskie@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16 00:34:04 -07:00
Eric Dumazet
3b098e2d7c net: Consistent skb timestamping
With RPS inclusion, skb timestamping is not consistent in RX path.

If netif_receive_skb() is used, its deferred after RPS dispatch.

If netif_rx() is used, its done before RPS dispatch.

This can give strange tcpdump timestamps results.

I think timestamping should be done as soon as possible in the receive
path, to get meaningful values (ie timestamps taken at the time packet
was delivered by NIC driver to our stack), even if NAPI already can
defer timestamping a bit (RPS can help to reduce the gap)

Tom Herbert prefer to sample timestamps after RPS dispatch. In case
sampling is expensive (HPET/acpi_pm on x86), this makes sense.

Let admins switch from one mode to another, using a new
sysctl, /proc/sys/net/core/netdev_tstamp_prequeue

Its default value (1), means timestamps are taken as soon as possible,
before backlog queueing, giving accurate timestamps.

Setting a 0 value permits to sample timestamps when processing backlog,
after RPS dispatch, to lower the load of the pre-RPS cpu.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:57:10 -07:00
Timo Teras
a1aa348304 xfrm: fix policy unreferencing on larval drop
I mistakenly had the error path to use num_pols to decide how
many policies we need to drop (cruft from earlier patch set
version which did not handle socket policies right).

This is wrong since normally we do not keep explicit references
(instead we hold reference to the cache entry which holds references
to policies). drop_pols is set to num_pols if we are holding the
references, so use that. Otherwise we eventually BUG_ON inside
xfrm_policy_destroy due to premature policy deletion.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:49:26 -07:00
Jiri Pirko
a14462f1bd net: adjust handle_macvlan to pass port struct to hook
Now there's null check here and also again in the hook. Looking at bridge bits
which are simmilar, port structure is rcu_dereferenced right away in
handle_bridge and passed to hook. Looks nicer.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:48:02 -07:00
Amerigo Wang
e3826f1e94 net: reserve ports for applications using fixed port numbers
(Dropped the infiniband part, because Tetsuo modified the related code,
I will send a separate patch for it once this is accepted.)

This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which
allows users to reserve ports for third-party applications.

The reserved ports will not be used by automatic port assignments
(e.g. when calling connect() or bind() with port number 0). Explicit
port allocation behavior is unchanged.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:28:40 -07:00
David S. Miller
1cdc5abf40 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/inaky/wimax 2010-05-15 23:14:16 -07:00
Simon Arlott
e0f43752a9 bridge: update sysfs link names if port device names have changed
Links for each port are created in sysfs using the device
name, but this could be changed after being added to the
bridge.

As well as being unable to remove interfaces after this
occurs (because userspace tools don't recognise the new
name, and the kernel won't recognise the old name), adding
another interface with the old name to the bridge will
cause an error trying to create the sysfs link.

This fixes the problem by listening for NETDEV_CHANGENAME
notifications and renaming the link.

https://bugzilla.kernel.org/show_bug.cgi?id=12743

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:10:15 -07:00
stephen hemminger
28a16c9796 bridge: change console message interface
Use one set of macro's for all bridge messages.

Note: can't use netdev_XXX macro's because bridge is purely
virtual and has no device parent.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:10:02 -07:00
stephen hemminger
cfb478da70 bridge: netpoll cleanup
Move code around so that the ifdef for NETPOLL_CONTROLLER don't have to
show up in main code path. The control functions should be in helpers
that are only compiled if needed.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:10:01 -07:00
Randy Dunlap
83827f6a89 netfilter: xt_TEE depends on NF_CONNTRACK
Fix xt_TEE build for the case of NF_CONNTRACK=m and
NETFILTER_XT_TARGET_TEE=y:

xt_TEE.c:(.text+0x6df5c): undefined reference to `nf_conntrack_untracked'
4x

Built with all 4 m/y combinations.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-14 13:52:30 -07:00
Patrick McHardy
a1d7c1b4b8 netfilter: nf_ct_sip: handle non-linear skbs
Handle non-linear skbs by linearizing them instead of silently failing.
Long term the helper should be fixed to either work with non-linear skbs
directly by using the string search API or work on a copy of the data.

Based on patch by Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-14 21:18:17 +02:00
Trond Myklebust
126e216a87 SUNRPC: Don't spam gssd with upcall requests when the kerberos key expired
Now that the rpc.gssd daemon can explicitly tell us that the key expired,
we should cache that information to avoid spamming gssd.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:37 -04:00
Trond Myklebust
d72b6cec8d SUNRPC: Remove the 'tk_magic' debugging field
It has not triggered in almost a decade. Time to get rid of it...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:36 -04:00
Trond Myklebust
d60dbb20a7 SUNRPC: Move the task->tk_bytes_sent and tk_rtt to struct rpc_rqst
It seems strange to maintain stats for bytes_sent in one structure, and
bytes received in another. Try to assemble all the RPC request-related
stats in struct rpc_rqst

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:36 -04:00
Trond Myklebust
2067340653 SUNRPC: Ensure rpcauth_prune_expired() respects the nr_to_scan parameter
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:35 -04:00
Trond Myklebust
93a05e65c0 SUNRPC: Ensure memory shrinker doesn't waste time in rpcauth_prune_expired()
The 'cred_unused' list, that is traversed by rpcauth_cache_shrinker is
ordered by time. If we hit a credential that is under the 60 second garbage
collection moratorium, we should exit because we know at that point that
all successive credentials are subject to the same moratorium...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:34 -04:00
Trond Myklebust
d300a41ef1 SUNRPC: Dont run rpcauth_cache_shrinker() when gfp_mask is GFP_NOFS
Under some circumstances, put_rpccred() can end up allocating memory, so
check the gfp_mask to prevent deadlocks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:34 -04:00
Trond Myklebust
1f4c86c0be NFS: Don't use GFP_KERNEL in rpcsec_gss downcalls
Again, we can deadlock if the memory reclaim triggers a writeback that
requires a rpcsec_gss credential lookup.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:33 -04:00
Trond Myklebust
712a433866 SUNRPC: Fix xs_setup_bc_tcp()
It is a BUG for anybody to call this function without setting
args->bc_xprt. Trying to return an error value is just wrong, since the
user cannot fix this: it is a programming error, not a user error.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:33 -04:00
Chuck Lever
ff8399709e SUNRPC: Replace jiffies-based metrics with ktime-based metrics
Currently RPC performance metrics that tabulate elapsed time use
jiffies time values.  This is problematic on systems that use slow
jiffies (for instance 100HZ systems built for paravirtualized
environments).  It is also a problem for computing precise latency
statistics for advanced network transports, such as InfiniBand,
that can have round-trip latencies significanly faster than a single
clock tick.

For the RPC client, adopt the high resolution time stamp mechanism
already used by the network layer and blktrace: ktime.

We use ktime format time stamps for all internal computations, and
convert to milliseconds for presentation.  As a result, we need only
addition operations in the performance critical paths; multiply/divide
is required only for presentation.

We could report RTT metrics in microseconds.  In fact the mountstats
format is versioned to accomodate exactly this kind of interface
improvement.

For now, however, we'll stay with millisecond precision for
presentation to maintain backwards compatibility with the handful of
currently deployed user space tools.  At a later point, we'll move to
an API such as BDI_STATS where a finer timestamp precision can be
reported.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:33 -04:00
Chuck Lever
bbc72cea58 SUNRPC: RPC metrics and RTT estimator should use same RTT value
Compute an RPC request's RTT once, and use that value both for reporting
RPC metrics, and for adjusting the RTT context used by the RPC client's RTT
estimator algorithm.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:32 -04:00
Trond Myklebust
a8ce4a8f37 SUNRPC: Fail over more quickly on connect errors
We should not allow soft tasks to wait for longer than the major timeout
period when waiting for a reconnect to occur.

Remove the field xprt->connect_timeout since it has been obsoleted by
xprt->reestablish_timeout.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:30 -04:00
Trond Myklebust
0b9e794313 SUNRPC: Move the test for XPRT_CONNECTING into xprt_connect()
This fixes a bug with setting xprt->stat.connect_start.

Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:29 -04:00
Trond Myklebust
19445b99b6 SUNRPC: Cleanup - make rpc_new_task() call rpc_release_calldata on failure
Also have it return an ERR_PTR(-ENOMEM) instead of a null pointer.

Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:29 -04:00
Trond Myklebust
ee5ebe851e SUNRPC: Clean up xprt_release()
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:29 -04:00
Trond Myklebust
fc54a0c65f gss_krb5: Advertise rc4-hmac enctype support in the rpcsec_gss/krb5 upcall
Update the upcall info indicating which Kerberos enctypes
the kernel supports

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:21 -04:00
Kevin Coffman
fffdaef2eb gss_krb5: Add support for rc4-hmac encryption
Add necessary changes to add kernel support for the rc4-hmac Kerberos
encryption type used by Microsoft and described in rfc4757.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:20 -04:00
Kevin Coffman
5af46547ec gss_krb5: Use confounder length in wrap code
All encryption types use a confounder at the beginning of the
wrap token.  In all encryption types except arcfour-hmac, the
confounder is the same as the blocksize.  arcfour-hmac has a
blocksize of one, but uses an eight byte confounder.

Add an entry to the crypto framework definitions for the
confounder length and change the wrap/unwrap code to use
the confounder length rather than assuming it is always
the blocksize.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:20 -04:00
Kevin Coffman
1dbd9029f3 gssd_krb5: More arcfour-hmac support
For the arcfour-hmac support, the make_seq_num and get_seq_num
functions need access to the kerberos context structure.
This will be used in a later patch.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:20 -04:00
Kevin Coffman
fc263a917a gss_krb5: Save the raw session key in the context
This is needed for deriving arcfour-hmac keys "on the fly"
using the sequence number or checksu

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:19 -04:00
Kevin Coffman
8b23707612 gssd_krb5: arcfour-hmac support
For arcfour-hmac support, the make_checksum function needs a usage
field to correctly calculate the checksum differently for MIC and
WRAP tokens.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:19 -04:00
Trond Myklebust
bf6d359c50 gss_krb5: Advertise AES enctype support in the rpcsec_gss/krb5 upcall
Update upcall info indicating which Kerberos enctypes
the kernel supports

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:19 -04:00
Kevin Coffman
934a95aa1c gss_krb5: add remaining pieces to enable AES encryption support
Add the remaining pieces to enable support for Kerberos AES
encryption types.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:19 -04:00
Kevin Coffman
de9c17eb4a gss_krb5: add support for new token formats in rfc4121
This is a step toward support for AES encryption types which are
required to use the new token formats defined in rfc4121.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
[SteveD: Fixed a typo in gss_verify_mic_v2()]
Signed-off-by: Steve Dickson <steved@redhat.com>
[Trond: Got rid of the TEST_ROTATE/TEST_EXTRA_COUNT crap]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:18 -04:00
Kevin Coffman
c43abaedaf xdr: Add an export for the helper function write_bytes_to_xdr_buf()
Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:18 -04:00
Trond Myklebust
4018bf3eec gss_krb5: Advertise triple-des enctype support in the rpcsec_gss/krb5 upcall
Update the upcall info indicating which Kerberos enctypes the kernel
supports.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:18 -04:00
Kevin Coffman
958142e97e gss_krb5: add support for triple-des encryption
Add the final pieces to support the triple-des encryption type.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:17 -04:00
Trond Myklebust
683ac6656c gss_krb5: Add upcall info indicating supported kerberos enctypes
The text based upcall now indicates which Kerberos encryption types are
supported by the kernel rpcsecgss code.  This is used by gssd to
determine which encryption types it should attempt to negotiate
when creating a context with a server.

The server principal's database and keytab encryption types are
what limits what it should negotiate.  Therefore, its keytab
should be created with only the enctypes listed by this file.

Currently we support des-cbc-crc, des-cbc-md4 and des-cbc-md5

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:17 -04:00
Kevin Coffman
47d8480776 gss_krb5: handle new context format from gssd
For encryption types other than DES, gssd sends down context information
in a new format.  This new format includes the information needed to
support the new Kerberos GSS-API tokens defined in rfc4121.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:17 -04:00
Kevin Coffman
4891f2d008 gss_krb5: import functionality to derive keys into the kernel
Import the code to derive Kerberos keys from a base key into the
kernel.  This will allow us to change the format of the context
information sent down from gssd to include only a single key.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:16 -04:00
Kevin Coffman
e1f6c07b11 gss_krb5: add ability to have a keyed checksum (hmac)
Encryption types besides DES may use a keyed checksum (hmac).
Modify the make_checksum() function to allow for a key
and take care of enctype-specific processing such as truncating
the resulting hash.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:16 -04:00
Kevin Coffman
81d4a4333a gss_krb5: introduce encryption type framework
Add enctype framework and change functions to use the generic
values from it rather than the values hard-coded for des.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:16 -04:00
Kevin Coffman
a8cc1cb7d7 gss_krb5: prepare for new context format
Prepare for new context format by splitting out the old "v1"
context processing function

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:16 -04:00
Kevin Coffman
1ac3719a22 gss_krb5: split up functions in preparation of adding new enctypes
Add encryption type to the krb5 context structure and use it to switch
to the correct functions depending on the encryption type.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:15 -04:00
J. Bruce Fields
54ec3d462f gss_krb5: Don't expect blocksize to always be 8 when calculating padding
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:15 -04:00
Kevin Coffman
7561042fb7 gss_krb5: Added and improved code comments
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:15 -04:00
Kevin Coffman
725f2865d4 gss_krb5: Introduce encryption type framework
Make the client and server code consistent regarding the extra buffer
space made available for the auth code when wrapping data.

Add some comments/documentation about the available buffer space
in the xdr_buf head and tail when gss_wrap is called.

Add a compile-time check to make sure we are not exceeding the available
buffer space.

Add a central function to shift head data.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:15 -04:00
Steven Rostedt
38516ab59f tracing: Let tracepoints have data passed to tracepoint callbacks
This patch adds data to be passed to tracepoint callbacks.

The created functions from DECLARE_TRACE() now need a mandatory data
parameter. For example:

DECLARE_TRACE(mytracepoint, int value, value)

Will create the register function:

int register_trace_mytracepoint((void(*)(void *data, int value))probe,
                                void *data);

As the first argument, all callbacks (probes) must take a (void *data)
parameter. So a callback for the above tracepoint will look like:

void myprobe(void *data, int value)
{
}

The callback may choose to ignore the data parameter.

This change allows callbacks to register a private data pointer along
with the function probe.

	void mycallback(void *data, int value);

	register_trace_mytracepoint(mycallback, mydata);

Then the mycallback() will receive the "mydata" as the first parameter
before the args.

A more detailed example:

  DECLARE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

  /* In the C file */

  DEFINE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

  [...]

       trace_mytracepoint(status);

  /* In a file registering this tracepoint */

  int my_callback(void *data, int status)
  {
	struct my_struct my_data = data;
	[...]
  }

  [...]
	my_data = kmalloc(sizeof(*my_data), GFP_KERNEL);
	init_my_data(my_data);
	register_trace_mytracepoint(my_callback, my_data);

The same callback can also be registered to the same tracepoint as long
as the data registered is different. Note, the data must also be used
to unregister the callback:

	unregister_trace_mytracepoint(my_callback, my_data);

Because of the data parameter, tracepoints declared this way can not have
no args. That is:

  DECLARE_TRACE(mytracepoint, TP_PROTO(void), TP_ARGS());

will cause an error.

If no arguments are needed, a new macro can be used instead:

  DECLARE_TRACE_NOARGS(mytracepoint);

Since there are no arguments, the proto and args fields are left out.

This is part of a series to make the tracepoint footprint smaller:

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4914025	1088868	 861512	6864405	 68be15	vmlinux.class
4918492	1084612	 861512	6864616	 68bee8	vmlinux.tracepoint

Again, this patch also increases the size of the kernel, but
lays the ground work for decreasing it.

 v5: Fixed net/core/drop_monitor.c to handle these updates.

 v4: Moved the DECLARE_TRACE() DECLARE_TRACE_NOARGS out of the
     #ifdef CONFIG_TRACE_POINTS, since the two are the same in both
     cases. The __DECLARE_TRACE() is what changes.
     Thanks to Frederic Weisbecker for pointing this out.

 v3: Made all register_* functions require data to be passed and
     all callbacks to take a void * parameter as its first argument.
     This makes the calling functions comply with C standards.

     Also added more comments to the modifications of DECLARE_TRACE().

 v2: Made the DECLARE_TRACE() have the ability to pass arguments
     and added a new DECLARE_TRACE_NOARGS() for tracepoints that
     do not need any arguments.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-05-14 09:50:34 -04:00
David S. Miller
e7874c996b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2010-05-13 14:14:10 -07:00
Joe Perches
736d58e3a2 netfilter: remove unnecessary returns from void function()s
This patch removes from net/ netfilter files
all the unnecessary return; statements that precede the
last closing brace of void functions.

It does not remove the returns that are immediately
preceded by a label as gcc doesn't like that.

Done via:
$ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
  xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

Signed-off-by: Joe Perches <joe@perches.com>
[Patrick: changed to keep return statements in otherwise empty function bodies]
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-13 15:16:27 +02:00
Stephen Hemminger
654d0fbdc8 netfilter: cleanup printk messages
Make sure all printk messages have a severity level.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-13 15:02:08 +02:00
Stephen Hemminger
af5676039a netfilter: change NF_ASSERT to WARN_ON
Change netfilter asserts to standard WARN_ON. This has the
benefit of backtrace info and also causes netfilter errors
to show up on kerneloops.org.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-13 15:00:20 +02:00
Bart De Schuymer
e94c67436e netfilter: bridge-netfilter: fix crash in br_nf_forward_finish()
[ 4593.956206] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 4593.956219] IP: [<ffffffffa03357a4>] br_nf_forward_finish+0x154/0x170 [bridge]
[ 4593.956232] PGD 195ece067 PUD 1ba005067 PMD 0
[ 4593.956241] Oops: 0000 [#1] SMP
[ 4593.956248] last sysfs file:
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:08/ATK0110:00/hwmon/hwmon0/temp2_label
[ 4593.956253] CPU 3
...
[ 4593.956380] Pid: 29512, comm: kvm Not tainted 2.6.34-rc7-net #195 P6T DELUXE/System Product Name
[ 4593.956384] RIP: 0010:[<ffffffffa03357a4>]  [<ffffffffa03357a4>] br_nf_forward_finish+0x154/0x170 [bridge]
[ 4593.956395] RSP: 0018:ffff880001e63b78  EFLAGS: 00010246
[ 4593.956399] RAX: 0000000000000608 RBX: ffff880057181700 RCX: ffff8801b813d000
[ 4593.956402] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff880057181700
[ 4593.956406] RBP: ffff880001e63ba8 R08: ffff8801b9d97000 R09: ffffffffa0335650
[ 4593.956410] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801b813d000
[ 4593.956413] R13: ffffffff81ab3940 R14: ffff880057181700 R15: 0000000000000002
[ 4593.956418] FS:  00007fc40d380710(0000) GS:ffff880001e60000(0000) knlGS:0000000000000000
[ 4593.956422] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[ 4593.956426] CR2: 0000000000000018 CR3: 00000001ba1d7000 CR4: 00000000000026e0
[ 4593.956429] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4593.956433] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 4593.956437] Process kvm (pid: 29512, threadinfo ffff8801ba566000, task ffff8801b8003870)
[ 4593.956441] Stack:
[ 4593.956443]  0000000100000020 ffff880001e63ba0 ffff880001e63ba0 ffff880057181700
[ 4593.956451] <0> ffffffffa0335650 ffffffff81ab3940 ffff880001e63bd8 ffffffffa03350e6
[ 4593.956462] <0> ffff880001e63c40 000000000000024d ffff880057181700 0000000080000000
[ 4593.956474] Call Trace:
[ 4593.956478]  <IRQ>
[ 4593.956488]  [<ffffffffa0335650>] ? br_nf_forward_finish+0x0/0x170 [bridge]
[ 4593.956496]  [<ffffffffa03350e6>] NF_HOOK_THRESH+0x56/0x60 [bridge]
[ 4593.956504]  [<ffffffffa0335282>] br_nf_forward_arp+0x112/0x120 [bridge]
[ 4593.956511]  [<ffffffff813f7184>] nf_iterate+0x64/0xa0
[ 4593.956519]  [<ffffffffa032f920>] ? br_forward_finish+0x0/0x60 [bridge]
[ 4593.956524]  [<ffffffff813f722c>] nf_hook_slow+0x6c/0x100
[ 4593.956531]  [<ffffffffa032f920>] ? br_forward_finish+0x0/0x60 [bridge]
[ 4593.956538]  [<ffffffffa032f800>] ? __br_forward+0x0/0xc0 [bridge]
[ 4593.956545]  [<ffffffffa032f86d>] __br_forward+0x6d/0xc0 [bridge]
[ 4593.956550]  [<ffffffff813c5d8e>] ? skb_clone+0x3e/0x70
[ 4593.956557]  [<ffffffffa032f462>] deliver_clone+0x32/0x60 [bridge]
[ 4593.956564]  [<ffffffffa032f6b6>] br_flood+0xa6/0xe0 [bridge]
[ 4593.956571]  [<ffffffffa032f800>] ? __br_forward+0x0/0xc0 [bridge]

Don't call nf_bridge_update_protocol() for ARP traffic as skb->nf_bridge isn't
used in the ARP case.

Reported-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-13 14:55:34 +02:00
David S. Miller
bf47f4b0ba Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/ipmr-2.6 2010-05-12 23:30:45 -07:00
Allan Stephens
23461e835b tipc: Reduce footprint by un-inlining tipc_msg_* routines
Convert tipc_msg_* inline routines that are more than one line into
standard functions, thereby eliminating some repeated code.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-12 23:02:29 -07:00
Allan Stephens
3032cca4d5 tipc: Reduce footprint by un-inlining buf_acquire routine
Convert buf_acquire inline routine that is more than one line into
a standard function, thereby eliminating some repeated code.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-12 23:02:28 -07:00
Allan Stephens
b274f4ab8e tipc: Reduce footprint by un-inlining bearer congestion routine
Convert bearer congestion inline routine that is more than one line into
a standard function, thereby eliminating some repeated code.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-12 23:02:28 -07:00
Allan Stephens
43608edc2d tipc: Reduce footprint by un-inlining port list routines
Converts port list inline routines that are more than one line into
standard functions, thereby eliminating a significant amount of
repeated code.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-12 23:02:27 -07:00
Allan Stephens
3e22e62b62 tipc: Reduce footprint by un-inlining nmap routines
Converts nmap inline routines that are more than one line into standard
functions, thereby eliminating a significant amount of repeated code.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-12 23:02:27 -07:00
Allan Stephens
80e0c33064 tipc: Reduce footprint by un-inlining address routines
Convert address-related inline routines that are more than one
line into standard functions, thereby eliminating a significant
amount of repeated code.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-12 23:02:26 -07:00
Allan Stephens
c68ca7b720 tipc: add tipc_ prefix to fcns targeted for un-inlining
These functions have enough code in them such that they
seem like sensible targets for un-inlining.  Prior to doing
that, this adds the tipc_ prefix to the functions, so that
in the event of a panic dump or similar, the subsystem from
which the functions come from is immediately clear.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-12 23:02:25 -07:00
Allan Stephens
01fee256a6 tipc: Relocate trivial link status functions to header file
Rather than live in link.c where they can only be used in that file alone,
these helper routines are better served by being in link.h

Relocated are the following:

	link_working_working
	link_working_unknown
	link_reset_unknown
	link_reset_reset
	link_blocked
	link_congested

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-12 23:02:24 -07:00
Allan Stephens
15e979da7c tipc: remove abstraction for link_max_pkt
This is just a straight return of a field; there is no
value in the abstraction of hiding it behind a function.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-12 23:02:24 -07:00
Allan Stephens
107e7be628 tipc: Add support for "-s" configuration option
Provide initial support for displaying overall TIPC status/statistics
information at runtime.  Currently, only version info for the TIPC
kernel module is displayed.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-12 23:02:23 -07:00
Allan Stephens
3aec9cc936 tipc: Rename "multicast-link" to "broadcast-link"
Make a cosmetic change to the name displayed for the broadcast link,
to better reflect its true nature. Since TIPC utilizes this link to
distribute name table information, in addition to multicast messages
sent by user applications, the prior name "multicast-link" is
no longer appropriate.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-12 23:02:22 -07:00
Allan Stephens
9ccc2eb4e1 tipc: Eliminate unnecessary initialization in native API send routines
Eliminate a couple of instances where TIPC's native API send routines
were doing pointless initialization of local variables.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-12 23:02:22 -07:00
Allan Stephens
289464e4fc tipc: Prune unused data structures from configuration service
Eliminate some unused data structures in the TIPC
configuration service that relate to the handling of link
subscriptions, which were not supported when TIPC 1.5 was
introduced.  If and when support for link subscriptions is
offered in TIPC, these elements may need to be re-introduced.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-12 23:02:21 -07:00
Allan Stephens
b82834e66a tipc: Eliminate unused argument in print statement
Eliminate an argument in a print statement that has no corresponding
format specification.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-12 23:02:21 -07:00
Allan Stephens
df4ef33716 tipc: Eliminate obsolete port's "congested_link" field
Eliminate a field of the TIPC port structure that is populated,
but never referenced.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-12 23:02:20 -07:00
Jan Engelhardt
9b7ce2b762 netfilter: xtables: add missing depends for xt_TEE
Aviod these link-time errors when IPV6=m, XT_TEE=y:

net/built-in.o: In function `tee_tg_route6':
xt_TEE.c:(.text+0x45ca5): undefined reference to `ip6_route_output'
net/built-in.o: In function `tee_tg6':
xt_TEE.c:(.text+0x45d79): undefined reference to `ip6_local_out'

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-12 14:54:15 -07:00
Abhijeet Kolekar
058897a4e9 mac80211: fix paged defragmentation
Paged RX skb patch broke the defragmentation. We need to read hdr again
after linearization.

It fixes following bug
http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2194

Signed-off-by: Zhu, Yi <yi.zhu@intel.com>
Signed-off-by: Abhijeet Kolekar <abhijeet.kolekar@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-12 16:39:07 -04:00
Wey-Yi Guy
9feaddc77b mac80211: check channel switch mode for future frames transmit
Check the mode in channel switch ie for either 0 or 1 on transmission.
A channel switch mode set to 1 means that the STA in a BSS to which the
frame containing the element is addressed shall transmit no further
frames 	within the BSS until the scheduled channel switch.

Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-12 16:39:05 -04:00
Johannes Berg
5ce6e438d5 mac80211: add offload channel switch support
This adds support for offloading the channel switch
operation to devices that support such, typically
by having specific firmware API for it. The reasons
for this could be that the firmware provides better
timing or that regulatory enforcement done by the
device requires special handling of CSAs.

In order to allow drivers to specify the timing to
the device, the new channel_switch callback will
pass through the received frame's mactime, where
available.

Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-12 16:39:05 -04:00
Johannes Berg
b8d92c9c14 mac80211: don't process work item with wrong frame
When we process a frame, we currently just match it
to the work struct by the MAC addresses, and not by
the work type. This means that we can end up doing
the work for an association request item when (for
whatever reason) we receive another frame type, for
example a probe response. Processing the wrong type
of frame will lead to completely invalid data being
processed, and will lead to various problems like
thinking the association was successful even if the
AP never sent an assocation response.

Fix this by making each processing function check
that it is invoked for the right work struct type
only and continue processing otherwise (and drop
frames that we didn't expect).

This bug was uncovered during the debugging for
https://bugzilla.kernel.org/show_bug.cgi?id=15862
but doesn't seem to be the cause for any of the
various problems reported there.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-12 16:28:52 -04:00
David S. Miller
278554bd65 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/net/wireless/ath/ar9170/usb.c
	drivers/scsi/iscsi_tcp.c
	net/ipv4/ipmr.c
2010-05-12 00:05:35 -07:00
Dan Carpenter
d3e56c0ad8 wimax: checking ERR_PTR vs null
stch_skb is allocated with wimax_gnl_re_state_change_alloc().  That
function returns ERR_PTRs on failure and doesn't return NULL.

Signed-off-by: Dan Carpenter <error27@gmail.com>
2010-05-11 14:09:10 -07:00
John W. Linville
cc755896a4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
Conflicts:
	drivers/net/wireless/ath/ar9170/main.c
2010-05-11 14:24:55 -04:00
Linus Torvalds
9fc282baa8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  net: Fix FDDI and TR config checks in ipv4 arp and LLC.
  IPv4: unresolved multicast route cleanup
  mac80211: remove association work when processing deauth request
  ar9170: wait for asynchronous firmware loading
  ipv4: udp: fix short packet and bad checksum logging
  phy: Fix initialization in micrel driver.
  sctp: Fix a race between ICMP protocol unreachable and connect()
  veth: Dont kfree_skb() after dev_forward_skb()
  IPv6: fix IPV6_RECVERR handling of locally-generated errors
  net/gianfar: drop recycled skbs on MTU change
  iwlwifi: work around passive scan issue
2010-05-11 10:11:40 -07:00
Patrick McHardy
cba7a98a47 Merge branch 'master' of git://dev.medozas.de/linux 2010-05-11 18:59:21 +02:00
Jan Engelhardt
4538506be3 netfilter: xtables: combine built-in extension structs
Prepare the arrays for use with the multiregister function. The
future layer-3 xt matches can then be easily added to it without
needing more (un)register code.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-05-11 18:36:18 +02:00
Jan Engelhardt
b4ba26119b netfilter: xtables: change hotdrop pointer to direct modification
Since xt_action_param is writable, let's use it. The pointer to
'bool hotdrop' always worried (8 bytes (64-bit) to write 1 byte!).
Surprisingly results in a reduction in size:

   text    data     bss filename
5457066  692730  357892 vmlinux.o-prev
5456554  692730  357892 vmlinux.o

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-05-11 18:35:27 +02:00
Jan Engelhardt
62fc805108 netfilter: xtables: deconstify struct xt_action_param for matches
In future, layer-3 matches will be an xt module of their own, and
need to set the fragoff and thoff fields. Adding more pointers would
needlessy increase memory requirements (esp. so for 64-bit, where
pointers are wider).

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-05-11 18:33:37 +02:00
Jan Engelhardt
4b560b447d netfilter: xtables: substitute temporary defines by final name
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-05-11 18:31:17 +02:00
Jan Engelhardt
de74c16996 netfilter: xtables: combine struct xt_match_param and xt_target_param
The structures carried - besides match/target - almost the same data.
It is possible to combine them, as extensions are evaluated serially,
and so, the callers end up a little smaller.

  text  data  bss  filename
-15318   740  104  net/ipv4/netfilter/ip_tables.o
+15286   740  104  net/ipv4/netfilter/ip_tables.o
-15333   540  152  net/ipv6/netfilter/ip6_tables.o
+15269   540  152  net/ipv6/netfilter/ip6_tables.o

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-05-11 18:23:43 +02:00
Patrick McHardy
5b285cac35 ipv6: ip6mr: add support for dumping routing tables over netlink
The ip6mr /proc interface (ip6_mr_cache) can't be extended to dump routes
from any tables but the main table in a backwards compatible fashion since
the output format ends in a variable amount of output interfaces.

Introduce a new netlink interface to dump multicast routes from all tables,
similar to the netlink interface for regular routes.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-11 14:40:56 +02:00
Patrick McHardy
d1db275dd3 ipv6: ip6mr: support multiple tables
This patch adds support for multiple independant multicast routing instances,
named "tables".

Userspace multicast routing daemons can bind to a specific table instance by
issuing a setsockopt call using a new option MRT6_TABLE. The table number is
stored in the raw socket data and affects all following ip6mr setsockopt(),
getsockopt() and ioctl() calls. By default, a single table (RT6_TABLE_DFLT)
is created with a default routing rule pointing to it. Newly created pim6reg
devices have the table number appended ("pim6regX"), with the exception of
devices created in the default table, which are named just "pim6reg" for
compatibility reasons.

Packets are directed to a specific table instance using routing rules,
similar to how regular routing rules work. Currently iif, oif and mark
are supported as keys, source and destination addresses could be supported
additionally.

Example usage:

- bind pimd/xorp/... to a specific table:

uint32_t table = 123;
setsockopt(fd, SOL_IPV6, MRT6_TABLE, &table, sizeof(table));

- create routing rules directing packets to the new table:

# ip -6 mrule add iif eth0 lookup 123
# ip -6 mrule add oif eth0 lookup 123

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-11 14:40:55 +02:00
Patrick McHardy
6bd5214339 ipv6: ip6mr: move mroute data into seperate structure
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-11 14:40:53 +02:00
Patrick McHardy
f30a778421 ipv6: ip6mr: convert struct mfc_cache to struct list_head
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-11 14:40:51 +02:00
Patrick McHardy
b5aa30b191 ipv6: ip6mr: remove net pointer from struct mfc6_cache
Now that cache entries in unres_queue don't need to be distinguished by their
network namespace pointer anymore, we can remove it from struct mfc6_cache
add pass the namespace as function argument to the functions that need it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-11 14:40:50 +02:00
Patrick McHardy
c476efbcde ipv6: ip6mr: move unres_queue and timer to per-namespace data
The unres_queue is currently shared between all namespaces. Following patches
will additionally allow to create multiple multicast routing tables in each
namespace. Having a single shared queue for all these users seems to excessive,
move the queue and the cleanup timer to the per-namespace data to unshare it.

As a side-effect, this fixes a bug in the seq file iteration functions: the
first entry returned is always from the current namespace, entries returned
after that may belong to any namespace.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-11 14:40:48 +02:00
David S. Miller
d250fe91ae Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2010-05-10 23:03:26 -07:00
David S. Miller
de02d72bb3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-05-10 22:53:41 -07:00
Reinette Chatre
a15707d80e Merge branch 'wireless-2.6' into wireless-next-2.6
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-dev.h
2010-05-10 15:08:11 -07:00
Mark Gross
ed77134bfc PM QOS update
This patch changes the string based list management to a handle base
implementation to help with the hot path use of pm-qos, it also renames
much of the API to use "request" as opposed to "requirement" that was
used in the initial implementation.  I did this because request more
accurately represents what it actually does.

Also, I added a string based ABI for users wanting to use a string
interface.  So if the user writes 0xDDDDDDDD formatted hex it will be
accepted by the interface.  (someone asked me for it and I don't think
it hurts anything.)

This patch updates some documentation input I got from Randy.

Signed-off-by: markgross <mgross@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-05-10 23:08:19 +02:00
Patrick McHardy
b56f2d55c6 netfilter: use rcu_dereference_protected()
Restore the rcu_dereference() calls in conntrack/expectation notifier
and logger registration/unregistration, but use the _protected variant,
which will be required by the upcoming __rcu annotations.

Based on patch by Eric Dumazet <eric.dumazet@gmail.com>

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-10 18:47:57 +02:00
Patrick McHardy
1e4b105712 Merge branch 'master' of /repos/git/net-next-2.6
Conflicts:
	net/bridge/br_device.c
	net/bridge/br_forward.c

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-10 18:39:28 +02:00
Patrick McHardy
3b254c54ec netfilter: nf_conntrack_proto: fix warning with CONFIG_PROVE_RCU
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
include/net/netfilter/nf_conntrack_l3proto.h:92 invoked rcu_dereference_check()
without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
2 locks held by iptables/3197:
 #0:  (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff8149bd8c>]
ip_setsockopt+0x7c/0xa0
 #1:  (&xt[i].mutex){+.+.+.}, at: [<ffffffff8148a5fe>]
xt_find_table_lock+0x3e/0x110

stack backtrace:
Pid: 3197, comm: iptables Not tainted 2.6.34-rc4 #2
Call Trace:
 [<ffffffff8105e2e8>] lockdep_rcu_dereference+0xb8/0xc0
 [<ffffffff8147fb3b>] nf_ct_l3proto_module_put+0x6b/0x70
 [<ffffffff8148d891>] state_mt_destroy+0x11/0x20
 [<ffffffff814d3baf>] cleanup_match+0x2f/0x50
 [<ffffffff814d3c63>] cleanup_entry+0x33/0x90
 [<ffffffff814d5653>] ? __do_replace+0x1a3/0x210
 [<ffffffff814d564c>] __do_replace+0x19c/0x210
 [<ffffffff814d651a>] do_ipt_set_ctl+0x16a/0x1b0
 [<ffffffff8147a610>] nf_sockopt+0x60/0xa0
...

The __nf_ct_l3proto_find() call doesn't actually need rcu read side
protection since the caller holds a reference to the protocol. Use
rcu_read_lock() anyways to avoid the warning.

Kernel bugzilla #15781: https://bugzilla.kernel.org/show_bug.cgi?id=15781

Reported-by: Christian Casteyde <casteyde.christian@free.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-10 17:45:56 +02:00
David S. Miller
f0ecde1466 net: Fix FDDI and TR config checks in ipv4 arp and LLC.
Need to check both CONFIG_FOO and CONFIG_FOO_MODULE

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-10 04:59:07 -07:00
Andreas Meissner
bbd725435d IPv4: unresolved multicast route cleanup
Fixes the expiration timer for unresolved multicast route entries.
In case new multicast routing requests come in faster than the 
expiration timeout occurs (e.g. zap through multicast TV streams), the 
timer is prevented from being called at time for already existing entries.

As the single timer is resetted to default whenever a new entry is made, 
the timeout for existing unresolved entires are missed and/or not 
updated. As a consequence new requests are denied when the limit of 
unresolved entries has been reached because old entries live longer than 
they are supposed to.

The solution is to reset the timer only for the first unresolved entry 
in the multicast routing cache. All other timers are already set and 
updated correctly within the timer function itself by now.

Signed-off by: Andreas Meissner <andreas.meissner@sphairon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-10 04:47:49 -07:00
Marcel Holtmann
2b0b05ddc0 Bluetooth: Fix issues where sk_sleep() helper is needed now
There were some left-overs that used sk->sk_sleep instead of the new
sk_sleep() helper.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 11:33:10 +02:00
Tomas Winkler
7b767cad29 Bluetooth: Use strict_strtoul instead of simple_strtoul
Use strict_strtoul as suggested by checkpatch.pl for more strict input
checking.

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:34:04 +02:00
Marcel Holtmann
f48fd9c8cd Bluetooth: Create per controller workqueue
Instead of having a global workqueue for all controllers, it makes
more sense to have a workqueue per controller.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:34:03 +02:00
Gustavo F. Padovan
844c097242 Bluetooth: Fix spec error in the RemoteBusy Logic
On the receipt of an RR(P=1) under RemoteBusy set to TRUE(on the RECV
state table) we have to call sendIorRRorRNR(F=1) and just after set
RemoteBusy to False. This leads to a freeze in the sending process since
it's not allowed send data with RemoteBusy set to true and no one
call SendPending-I-Frames after set RemoteBusy to false(The last action
for that event).

Actually sendIorRRorRNR() calls SendPending-I-Frames but at that moment
RemoteBusy is still True and we cannot send any frame, after, no one
calls SendPending-I-Frames again and the sending process stops.

The solution here is to set RemoteBusy to false inside
SendPending-I-Frames just before call SendPending-I-Frames. That will
make SendPending-I-Frames able to send frames. This solution is similar
to what RR(P=0)(F=0) on the RECV table and RR(P=1) on the SREJ_SENT
table do.

Actually doesn't make any sense call SendPending-I-Frames if we can send
any frame, i. e., RemoteBusy is True.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:53 +02:00
Gustavo F. Padovan
4178ba462a Bluetooth: Prevents buffer overflow on l2cap_ertm_reassembly_sdu()
The checks should be done before the the memcpy to avoid buffer
overflow.

Reported-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:53 +02:00
Gustavo F. Padovan
dfc909befb Bluetooth: Fix race condition on l2cap_ertm_send()
l2cap_ertm_send() can be called both from user context and bottom half
context. The socket locks for that contexts are different, the user
context uses a mutex(which can sleep) and the second one uses a
spinlock_bh. That creates a race condition when we have interruptions on
both contexts at the same time.

The better way to solve this is to add a new spinlock to lock
l2cap_ertm_send() and the vars it access. The other solution was to defer
l2cap_ertm_send() with a workqueue, but we the sending process already
has one defer on the hci layer. It's not a good idea add another one.

The patch refactor the code to create l2cap_retransmit_frames(), then we
encapulate the lock of l2cap_ertm_send() for some call. It also changes
l2cap_retransmit_frame() to l2cap_retransmit_one_frame() to avoid
confusion

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:53 +02:00
Gustavo F. Padovan
6161c0382b Bluetooth: Add wait_queue to wait ack of all sent packets
To guarantee that all packets we sent were received we need to wait for
theirs ack before shutdown the socket.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:53 +02:00
Gustavo F. Padovan
1890d36bb5 Bluetooth: Implement Local Busy Condition handling
Supports Local Busy condition handling through a waitqueue that wake ups
each 200ms and try to push the packets to the upper layer. If it can
push all the queue then it leaves the Local Busy state.

The patch modifies the behaviour of l2cap_ertm_reassembly_sdu() to
support retry of the push operation.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:53 +02:00
João Paulo Rechi Vita
9b53350d3c Bluetooth: Completes the I-frame tx_seq check logic on RECV
Add checks for invalid tx_seq and fixes the duplicated tx_seq check.

Signed-off-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Acked-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:53 +02:00
Gustavo F. Padovan
18778a63dd Bluetooth: Implement missing parts of the Invalid Frame Detection
There is a plenty of situation where ERTM shall close the channel, this
commit treats the cases regarding Invalid Frame Detection.
It create one reassembly SDU function for ERTM and other for Streaming
Mode to make the Invalid Frame Detection handling less complex.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:52 +02:00
Gustavo F. Padovan
f11d676da4 Bluetooth: Refactor l2cap_retransmit_frame()
Make the code flow cleaner and changes the function to void.
It also fixes a potential NULL dereference with skb.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:52 +02:00
Gustavo F. Padovan
9a9c6a3441 Bluetooth: Make hci_send_acl() void
hci_send_acl can't fail, so we can make it void. This patch changes
that and all the funcions that use hci_send_acl().
That change exposed a bug on sending connectionless data. We were not
reporting the lenght send back to the user space.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:52 +02:00
Gustavo F. Padovan
ff12fd6433 Bluetooth: Fix lockdep annotation on ERTM
A spin_lock_init() call was missing. :)

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:52 +02:00
Gustavo F. Padovan
9b16dc6551 Bluetooth: Check if we really are in WAIT_F when F bit comes
F-bit set should be processed only if we are in the WAIT_F state.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:52 +02:00
Gustavo F. Padovan
a2e12a2a31 Bluetooth: Remove unneeded control vars
Trivial clean up.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:52 +02:00
Gustavo F. Padovan
0301ef04b5 Bluetooth: Remove set of SrejSaveReqSeq under receipt of REJ frame
That action is not specified by the ERTM spec, so removing it.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:52 +02:00
Gustavo F. Padovan
59203a21a5 Bluetooth: Fix errors reported by checkpatch.pl
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:51 +02:00
Gustavo F. Padovan
44651b85cc Bluetooth: Don't set control bits to zero first
We can set the SAR bits in the control field directly.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:51 +02:00
João Paulo Rechi Vita
01760bdde9 Bluetooth: Close L2CAP channel on invalid ReqSeq
Signed-off-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Acked-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:51 +02:00
Gustavo F. Padovan
afefdbc4cf Bluetooth: Fix SDU reassembly under SREJ
The code was reusing the control var without its reinitialization.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:51 +02:00
João Paulo Rechi Vita
0041ecfa30 Bluetooth: Check if mode is supported on getsockopt
Add this check to getsockopt makes possible to fail early instead of
waiting until listen / connect.

Signed-off-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Acked-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:51 +02:00
Gustavo F. Padovan
bd3c9e255e Bluetooth: Add SOCK_STREAM support to L2CAP
if enable_ertm is true and we have SOCK_STREAM the default mode will be
ERTM, otherwise Basic Mode.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:51 +02:00
Gustavo F. Padovan
84fb0a6334 Bluetooth: Add Kconfig option for L2CAP Extended Features
The L2CAP Extended Features are still unstable and under development,
so we are adding them under the EXPERIMENTAL flag to get more feedback
on them. L2CAP Extended Features includes the Enhanced Retransmission
and Streaming Modes, Frame Check Sequence (FCS), and Segmentation and
Reassemby (SAR).

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:51 +02:00
Gustavo F. Padovan
3b1a9f3fa6 Bluetooth: Optimize SREJ_QUEUE append
When the I-frame received is the expected, i.e., its tx_seq is equal to
expected_tx_seq and we are under a SREJ, we can just add it to the tail
of the list. Doing that we change the complexity from O(n) to O(1).

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:50 +02:00
Gustavo F. Padovan
812e737e29 Bluetooth: Fix drop of acked packets on ERTM
l2cap_drop_acked_frames() was droping not sent packets, causing them to
be not transmitted.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:50 +02:00
Gustavo F. Padovan
0ee0d20855 Bluetooth: Fix crash when monitor timeout expires
The code was crashing due to a invalid access to hci_conn after the
channel disconnect.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:50 +02:00
Gustavo F. Padovan
f6e6b16823 Bluetooth: Fix bug when retransmitting I-frames
If there is no frames to retransmit l2cap was crashing the kernel, now
we check if the queue is empty first.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:50 +02:00
Gustavo F. Padovan
68d7f0ce91 Bluetooth: Enable option to configure Max Transmission value via sockopt
With the sockopt extension we can set a per-channel MaxTx value.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:50 +02:00
Gustavo F. Padovan
369ba30264 Bluetooth: Add module parameter for txWindow size on L2CAP
Very useful for testing purposes.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
[jprvita@profusion.mobi: improved parameter description]
Signed-off-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:50 +02:00
Gustavo F. Padovan
803020c6fa Bluetooth: Change acknowledgement to use the value of txWindow
Now that we can set the txWindow we need to change the acknowledgement
procedure to ack after each (pi->txWindow/6 + 1). The plus 1 is to avoid
the zero value.
It also renames pi->num_to_ack to a better name: pi->num_acked.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:50 +02:00
Gustavo F. Padovan
14b5aa71ec Bluetooth: Add sockopt configuration for txWindow on L2CAP
Now we can set/get Transmission Window size via sockopt.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:49 +02:00
Gustavo F. Padovan
855666cccc Bluetooth: Send Ack after clear the SREJ list
As specified by Bluetooth 3.0 spec we shall send an acknowledgment using
the Send-Ack() after clear the SREJ list.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:49 +02:00
Gustavo F. Padovan
052897ca50 Bluetooth: Check the SDU size against the MTU value
If the SDU size is greater than the MTU something is wrong, so report
an error.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
[jprvita@profusion.mobi: set err to appropriate errno value]
Signed-off-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:49 +02:00
Gustavo F. Padovan
10467e9e9b Bluetooth: Add le16 macro to Retransmission and Monitor Timeouts values
Fix a possible problem with Big Endian machines.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:48 +02:00
Gustavo F. Padovan
1c7621596d Bluetooth: Fix configuration of the MPS value
We were accepting values bigger than we can accept. This was leading
ERTM to drop packets because of wrong FCS checks.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:48 +02:00
Gustavo F. Padovan
7b1c0049be Bluetooth: Read RFC conf option on a successful Conf RSP
On Enhanced Retransmission Mode and Streaming Mode a entity can send, on
a successful Conf RSP, new values for the RFC fields. For example, the
entity can send txWindow and MPS values less than the value received on
a Conf REQ.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:48 +02:00
Gustavo F. Padovan
2fb862e215 Bluetooth: Ignore Tx Window value with Streaming mode
Tx Window value shall not be used with Streaming Mode and the receiver
of the config Request shall ignore its value.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:48 +02:00
Gustavo F. Padovan
c1b4f43be0 Bluetooth: Add timer to Acknowledge I-frames
We ack I-frames on each txWindow/5 I-frames received, but if the sender
stop to send I-frames and it's not a txWindow multiple we can leave some
frames unacked.
So I added a timer to ack I-frames on this case. The timer expires in
200ms.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:48 +02:00
Gustavo F. Padovan
05fbd89dd4 Bluetooth: Finish implementation for Rec RR (P=1) on ERTM
Now the code handles the case under SREJ_SENT state.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:48 +02:00
Gustavo F. Padovan
8abb52ee00 Bluetooth: Remove duplicate use of __get_reqseq() macro on L2CAP
tx_seq var already has the value of __get_reqseq().

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:48 +02:00
Gustavo F. Padovan
6e3a59819f Bluetooth: Group the ack of I-frames into l2cap_data_channel_rrframe()
It also fix a bug: we weren't acknowledging I-frames when P=1.
Note that when F=1 we are acknowledging packets before setting
RemoteBusy to False. The spec says we should do that in the opposite
order, but acknowledment of packets doesn't care about RemoteBusy flag
so we can do that in the order we want.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:47 +02:00
Gustavo F. Padovan
99b0d4b7b0 Bluetooth: Handle all cases of receipt of RNR-frames into L2CAP
We weren't handling the receipt under SREJ_SENT state table.
It also introduce l2cap_send_srejtail(). It will be used in the nexts
commits too.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:47 +02:00
Gustavo F. Padovan
e072745f4a Bluetooth: Split l2cap_data_channel_sframe()
Create a function for each type fo S-frame and avoid a lot of nested
code.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:47 +02:00
Gustavo F. Padovan
73edaa9933 Bluetooth: Add Recv RR (P=0)(F=0) for SREJ_SENT state on ERTM
This finishes the implementation of Recv RR (P=0)(F=0) for the Enhanced
Retransmission Mode on L2CAP.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:47 +02:00
Gustavo F. Padovan
f0946ccfc7 Bluetooth: Move set of P-bit to l2cap_send_sframe()
Abstract the send of of P-bit and avoids code duplication like we
did with the setting of F-bit.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:47 +02:00
Gustavo F. Padovan
9e917af13d Bluetooth: Implement SendAck() Action on ERTM.
Shall be used to ack received frames, It must decide type of
acknowledgment between a RR frame, a RNR frame or transmission of
pending I-frames.
It also modifies l2cap_ertm_send() to report the number of frames sent.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:47 +02:00
Gustavo F. Padovan
36f2fd585f Bluetooth: Check if SDU size is greater than MTU on L2CAP
After reassembly the SDU we need to check his size. It can't overflow
the MTU size.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:47 +02:00
Gustavo F. Padovan
277ffbe362 Bluetooth: Check the minimum {I,S}-frame size into L2CAP
All packets with size fewer than the minimum specified is dropped.
Note that the size of the l2cap basic header, FCS and SAR fields are
already subtracted of len at the moment of the size check.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:46 +02:00
Gustavo F. Padovan
1d8f5d1691 Bluetooth: Support case with F bit set under WAIT_F state.
On receipt of a F=1 under WAIT_F state ERTM shall stop monitor timer and
start retransmission timer (if there are unacked frames).

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:46 +02:00
Gustavo F. Padovan
d5392c8f1e Bluetooth: Implement 'Send IorRRorRNR' event
After receive a RR with P bit set ERTM shall use this funcion to choose
what type of frame to reply with F bit = 1.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:46 +02:00
Gustavo F. Padovan
e8235c6bdd Bluetooth: Use a l2cap_pinfo struct instead l2cap_pi() macro
Trivial clean up.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:46 +02:00
Gustavo F. Padovan
d1daa091e8 Bluetooth: Fix ACL MTU issue
ERTM and Streaming Modes was having problems when the ACL MTU is lower
than MPS. The 'minus 10' is to take in account the header and fcs
lenghts.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:46 +02:00
Gustavo F. Padovan
7dffe42102 Bluetooth: Fix expected_tx_seq calculation on L2CAP
All operation related to the txWindow should be modulo 64.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:46 +02:00
Gustavo F. Padovan
faaebd192e Bluetooth: Fix memory leak of S-frames into L2CAP
l2cap_data_channel do not free the S-frame, so we free it here.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:45 +02:00
Gustavo F. Padovan
c69163e9ed Bluetooth: Move specific Basic Mode code to the right place
Inside "case L2CAP_MODE_BASIC:" we don't need to check for sk_type and
L2CAP mode. So only the length check is fine.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:45 +02:00
Gustavo F. Padovan
b9dbdbc1f4 Bluetooth: Trivial clean ups to SCO
Remove extra braces and labels, break over column 80 lines, etc

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:45 +02:00
Gustavo F. Padovan
0d861d8b8e Bluetooth: Make hci_send_sco() void
It also removes an unneeded check for the MTU. The check is done before
on sco_send_frame()

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:45 +02:00
Neil Horman
3ee943728f ipv4: remove ip_rt_secret timer (v4)
A while back there was a discussion regarding the rt_secret_interval timer.
Given that we've had the ability to do emergency route cache rebuilds for awhile
now, based on a statistical analysis of the various hash chain lengths in the
cache, the use of the flush timer is somewhat redundant.  This patch removes the
rt_secret_interval sysctl, allowing us to rely solely on the statistical
analysis mechanism to determine the need for route cache flushes.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-08 01:57:52 -07:00
John W. Linville
a472e71b3c mac80211: set IEEE80211_TX_CTL_FIRST_FRAGMENT for beacons
Also simplify the flags assignment into a single statement at the
end of ieee80211_beacon_get_tim.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-07 14:55:55 -04:00
Johannes Berg
0aaffa9b96 mac80211: improve HT channel handling
Currently, when one interface switches HT mode,
all others will follow along. This is clearly
undesirable, since the new one might switch to
no-HT while another one is operating in HT.

Address this issue by keeping track of the HT
mode per interface, and allowing only changes
that are compatible, i.e. switching into HT40+
is not possible when another interface is in
HT40-, in that case the second one needs to
fall back to HT20.

Also, to allow drivers to know what's going on,
store the per-interface HT mode (channel type)
in the virtual interface's bss_conf.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-07 14:55:51 -04:00
Johannes Berg
f444de05d2 cfg80211/mac80211: better channel handling
Currently (all tested with hwsim) you can do stupid
things like setting up an AP on a certain channel,
then adding another virtual interface and making
that associate on another channel -- this will make
the beaconing to move channel but obviously without
the necessary IEs data update.

In order to improve this situation, first make the
configuration APIs (cfg80211 and nl80211) aware of
multi-channel operation -- we'll eventually need
that in the future anyway. There's one userland API
change and one API addition. The API change is that
now SET_WIPHY must be called with virtual interface
index rather than only wiphy index in order to take
effect for that interface -- luckily all current
users (hostapd) do that. For monitor interfaces, the
old setting is preserved, but monitors are always
slaved to other devices anyway so no guarantees.

The second userland API change is the introduction
of a per virtual interface SET_CHANNEL command, that
hostapd should use going forward to make it easier
to understand what's going on (it can automatically
detect a kernel with this command).

Other than mac80211, no existing cfg80211 drivers
are affected by this change because they only allow
a single virtual interface.

mac80211, however, now needs to be aware that the
channel settings are per interface now, and needs
to disallow (for now) real multi-channel operation,
which is another important part of this patch.

One of the immediate benefits is that you can now
start hostapd to operate on a hardware that already
has a connection on another virtual interface, as
long as you specify the same channel.

Note that two things are left unhandled (this is an
improvement -- not a complete fix):

 * different HT/no-HT modes

   currently you could start an HT AP and then
   connect to a non-HT network on the same channel
   which would configure the hardware for no HT;
   that can be fixed fairly easily

 * CSA

   An AP we're connected to on a virtual interface
   might indicate switching channels, and in that
   case we would follow it, regardless of how many
   other interfaces are operating; this requires
   more effort to fix but is pretty rare after all

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-07 14:55:50 -04:00
Johannes Berg
ac8dd506e4 mac80211: fix BSS info reconfiguration
When reconfiguring an interface due to a previous
hardware restart, mac80211 will currently include
the new IBSS flag on non-IBSS interfaces which may
confuse drivers.

Instead of doing the ~0 trick, simply spell out
which things are going to be reconfigured.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-07 14:55:49 -04:00
Reinette Chatre
79733a865c mac80211: remove association work when processing deauth request
In https://bugzilla.kernel.org/show_bug.cgi?id=15794 a user encountered the
following:

[18967.469098] wlan0: authenticated
[18967.472527] wlan0: associate with 00:1c:10:b8:e3:ea (try 1)
[18967.472585] wlan0: deauthenticating from 00:1c:10:b8:e3:ea by local choice (reason=3)
[18967.672057] wlan0: associate with 00:1c:10:b8:e3:ea (try 2)
[18967.872357] wlan0: associate with 00:1c:10:b8:e3:ea (try 3)
[18968.072960] wlan0: association with 00:1c:10:b8:e3:ea timed out
[18968.076890] ------------[ cut here ]------------
[18968.076898] WARNING: at net/wireless/mlme.c:341 cfg80211_send_assoc_timeout+0xa8/0x140()
[18968.076900] Hardware name: GX628
[18968.076924] Pid: 1408, comm: phy0 Not tainted 2.6.34-rc4-00082-g250541f-dirty #3
[18968.076926] Call Trace:
[18968.076931]  [<ffffffff8103459e>] ?  warn_slowpath_common+0x6e/0xb0
[18968.076934]  [<ffffffff8157c2d8>] ?  cfg80211_send_assoc_timeout+0xa8/0x140
[18968.076937]  [<ffffffff8103ff8b>] ? mod_timer+0x10b/0x180
[18968.076940]  [<ffffffff8158f0fc>] ?  ieee80211_assoc_done+0xbc/0xc0
[18968.076943]  [<ffffffff81590d53>] ?  ieee80211_work_work+0x553/0x11c0
[18968.076945]  [<ffffffff8102d931>] ? finish_task_switch+0x41/0xb0
[18968.076948]  [<ffffffff81590800>] ?  ieee80211_work_work+0x0/0x11c0
[18968.076951]  [<ffffffff810476fb>] ? worker_thread+0x13b/0x210
[18968.076954]  [<ffffffff8104b6b0>] ?  autoremove_wake_function+0x0/0x30
[18968.076956]  [<ffffffff810475c0>] ? worker_thread+0x0/0x210
[18968.076959]  [<ffffffff8104b21e>] ? kthread+0x8e/0xa0
[18968.076962]  [<ffffffff810031f4>] ?  kernel_thread_helper+0x4/0x10
[18968.076964]  [<ffffffff8104b190>] ? kthread+0x0/0xa0
[18968.076966]  [<ffffffff810031f0>] ?  kernel_thread_helper+0x0/0x10
[18968.076968] ---[ end trace 8aa6265f4b1adfe0 ]---

As explained by Johannes Berg <johannes@sipsolutions.net>:

We authenticate successfully, and then userspace requests association.
Then we start that process, but the AP doesn't respond. While we're
still waiting for an AP response, userspace asks for a deauth. We do
the deauth, but don't abort the association work. Then once the
association work times out we tell cfg80211, but it no longer wants
to know since for all it is concerned we accepted the deauth that
also kills the association attempt.

Fix this by, upon receipt of deauth request, removing the association work
and continuing to send the deauth.

Unfortunately the user reporting the issue is not able to reproduce this
problem anymore and cannot verify this fix. This seems like a well understood
issue though and I thus present the patch.

Bug-identified-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-07 14:26:38 -04:00
Eric Dumazet
eecfd7c4e3 rps: Various optimizations
Introduce ____napi_schedule() helper for callers in irq disabled
contexts. rps_trigger_softirq() becomes a leaf function.

Use container_of() in process_backlog() instead of accessing per_cpu
address.

Use a custom inlined version of __napi_complete() in process_backlog()
to avoid one locked instruction :

 only current cpu owns and manipulates this napi,
 and NAPI_STATE_SCHED is the only possible flag set on backlog.
 we can use a plain write instead of clear_bit(),
 and we dont need an smp_mb() memory barrier, since RPS is on,
 backlog is protected by a spinlock.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-06 22:07:48 -07:00
Bjørn Mork
d6bc0149d8 ipv6: udp: make short packet logging consistent with ipv4
Adding addresses and ports to the short packet log message,
like ipv4/udp.c does it, makes these messages a lot more useful:

[  822.182450] UDPv6: short packet: From [2001:db8:ffb4:3::1]:47839 23715/178 to [2001:db8:ffb4:3:5054:ff:feff:200]:1234

This requires us to drop logging in case pskb_may_pull() fails,
which also is consistent with ipv4/udp.c

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-06 21:50:17 -07:00
Bjørn Mork
ccc2d97cb7 ipv4: udp: fix short packet and bad checksum logging
commit 2783ef23 moved the initialisation of saddr and daddr after
pskb_may_pull() to avoid a potential data corruption.  Unfortunately
also placing it after the short packet and bad checksum error paths,
where these variables are used for logging.  The result is bogus
output like

[92238.389505] UDP: short packet: From 2.0.0.0:65535 23715/178 to 0.0.0.0:65535

Moving the saddr and daddr initialisation above the error paths, while still
keeping it after the pskb_may_pull() to keep the fix from commit 2783ef23.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Cc: stable@kernel.org
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-06 21:49:59 -07:00
Vlad Yasevich
50b5d6ad63 sctp: Fix a race between ICMP protocol unreachable and connect()
ICMP protocol unreachable handling completely disregarded
the fact that the user may have locked the socket.  It proceeded
to destroy the association, even though the user may have
held the lock and had a ref on the association.  This resulted
in the following:

Attempt to release alive inet socket f6afcc00

=========================
[ BUG: held lock freed! ]
-------------------------
somenu/2672 is freeing memory f6afcc00-f6afcfff, with a lock still held
there!
 (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c
1 lock held by somenu/2672:
 #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c

stack backtrace:
Pid: 2672, comm: somenu Not tainted 2.6.32-telco #55
Call Trace:
 [<c1232266>] ? printk+0xf/0x11
 [<c1038553>] debug_check_no_locks_freed+0xce/0xff
 [<c10620b4>] kmem_cache_free+0x21/0x66
 [<c1185f25>] __sk_free+0x9d/0xab
 [<c1185f9c>] sk_free+0x1c/0x1e
 [<c1216e38>] sctp_association_put+0x32/0x89
 [<c1220865>] __sctp_connect+0x36d/0x3f4
 [<c122098a>] ? sctp_connect+0x13/0x4c
 [<c102d073>] ? autoremove_wake_function+0x0/0x33
 [<c12209a8>] sctp_connect+0x31/0x4c
 [<c11d1e80>] inet_dgram_connect+0x4b/0x55
 [<c11834fa>] sys_connect+0x54/0x71
 [<c103a3a2>] ? lock_release_non_nested+0x88/0x239
 [<c1054026>] ? might_fault+0x42/0x7c
 [<c1054026>] ? might_fault+0x42/0x7c
 [<c11847ab>] sys_socketcall+0x6d/0x178
 [<c10da994>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c1002959>] syscall_call+0x7/0xb

This was because the sctp_wait_for_connect() would aqcure the socket
lock and then proceed to release the last reference count on the
association, thus cause the fully destruction path to finish freeing
the socket.

The simplest solution is to start a very short timer in case the socket
is owned by user.  When the timer expires, we can do some verification
and be able to do the release properly.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-06 00:56:07 -07:00
Eric Dumazet
6ec82562ff veth: Dont kfree_skb() after dev_forward_skb()
In case of congestion, netif_rx() frees the skb, so we must assume
dev_forward_skb() also consume skb.

Bug introduced by commit 445409602c
(veth: move loopback logic to common location)

We must change dev_forward_skb() to always consume skb, and veth to not
double free it.

Bug report : http://marc.info/?l=linux-netdev&m=127310770900442&w=3

Reported-by: Martín Ferrari <martin.ferrari@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-06 00:53:53 -07:00
WANG Cong
c06ee961d3 bridge: make bridge support netpoll
Based on the previous patch, make bridge support netpoll by:

1) implement the 2 methods to support netpoll for bridge;

2) modify netpoll during forwarding packets via bridge;

3) disable netpoll support of bridge when a netpoll-unabled device
   is added to bridge;

4) enable netpoll support when all underlying devices support netpoll.

Cc: David Miller <davem@davemloft.net>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-06 00:48:24 -07:00
WANG Cong
0e34e93177 netpoll: add generic support for bridge and bonding devices
This whole patchset is for adding netpoll support to bridge and bonding
devices. I already tested it for bridge, bonding, bridge over bonding,
and bonding over bridge. It looks fine now.

To make bridge and bonding support netpoll, we need to adjust
some netpoll generic code. This patch does the following things:

1) introduce two new priv_flags for struct net_device:
   IFF_IN_NETPOLL which identifies we are processing a netpoll;
   IFF_DISABLE_NETPOLL is used to disable netpoll support for a device
   at run-time;

2) introduce one new method for netdev_ops:
   ->ndo_netpoll_cleanup() is used to clean up netpoll when a device is
     removed.

3) introduce netpoll_poll_dev() which takes a struct net_device * parameter;
   export netpoll_send_skb() and netpoll_poll_dev() which will be used later;

4) hide a pointer to struct netpoll in struct netpoll_info, ditto.

5) introduce ->real_dev for struct netpoll.

6) introduce a new status NETDEV_BONDING_DESLAE, which is used to disable
   netconsole before releasing a slave, to avoid deadlocks.

Cc: David Miller <davem@davemloft.net>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-06 00:47:21 -07:00
Brian Haley
d40a4de0be IPv6: fix IPV6_RECVERR handling of locally-generated errors
I noticed when I added support for IPV6_DONTFRAG that if you set
IPV6_RECVERR and tried to send a UDP packet larger than 64K to an
IPv6 destination, you'd correctly get an EMSGSIZE, but reading from
MSG_ERRQUEUE returned the incorrect address in the cmsg:

struct msghdr:
	 msg_name         0x7fff8f3c96d0
	 msg_namelen      28
struct sockaddr_in6:
	 sin6_family      10
	 sin6_port        7639
	 sin6_flowinfo    0
	 sin6_addr        ::ffff:38.32.0.0
	 sin6_scope_id    0  ((null))

It should have returned this in my case:

struct msghdr:
	 msg_name         0x7fffd866b510
	 msg_namelen      28
struct sockaddr_in6:
	 sin6_family      10
	 sin6_port        7639
	 sin6_flowinfo    0
	 sin6_addr        2620:0:a09:e000:21f:29ff:fe57:f88b
	 sin6_scope_id    0  ((null))

The problem is that ipv6_recv_error() assumes that if the error
wasn't generated by ICMPv6, it's an IPv4 address sitting there,
and proceeds to create a v4-mapped address from it.

Change ipv6_icmp_error() and ipv6_local_error() to set skb->protocol
to htons(ETH_P_IPV6) so that ipv6_recv_error() knows the address
sitting right after the extended error is IPv6, else it will
incorrectly map the first octet into an IPv4-mapped IPv6 address
in the cmsg structure returned in a recvmsg() call to obtain
the error.

Signed-off-by: Brian Haley <brian.haley@hp.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-05 21:32:40 -07:00
David S. Miller
2861a185e3 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-05-05 15:09:05 -07:00
John W. Linville
83163244f8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
Conflicts:
	drivers/net/wireless/libertas_tf/cmd.c
	drivers/net/wireless/libertas_tf/main.c
2010-05-05 16:14:16 -04:00
Johannes Berg
adfba3c7c0 mac80211: use fixed channel in ibss join when appropriate
"mac80211: improve IBSS scanning" was missing a hunk.
This adds that hunk as originally intended.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-05 15:10:57 -04:00
Linus Torvalds
7437e7d367 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  FEC: Fix kernel panic in fec_set_mac_address.
  ipv6: Fix default multicast hops setting.
  net: ep93xx_eth stops receiving packets
  drivers/net/phy: micrel phy driver
  dm9601: fix phy/eeprom write routine
  ppp_generic: handle non-linear skbs when passing them to pppd
  ppp_generic: pull 2 bytes so that PPP_PROTO(skb) is valid
  net: fix compile error due to double return type in SOCK_DEBUG
  net/usb: initiate sync sequence in sierra_net.c driver
  net/usb: remove default in Kconfig for sierra_net driver
  r8169: Fix rtl8169_rx_interrupt()
  e1000e: Fix oops caused by ASPM patch.
  net/sb1250: register mdio bus in probe
  sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v4)
  p54pci: fix bugs in p54p_check_tx_ring
2010-05-05 07:55:07 -07:00
Eric Dumazet
ec7d2f2cf3 net: __alloc_skb() speedup
With following patch I can reach maximum rate of my pktgen+udpsink
simulator :
- 'old' machine : dual quad core E5450  @3.00GHz
- 64 UDP rx flows (only differ by destination port)
- RPS enabled, NIC interrupts serviced on cpu0
- rps dispatched on 7 other cores. (~130.000 IPI per second)
- SLAB allocator (faster than SLUB in this workload)
- tg3 NIC
- 1.080.000 pps without a single drop at NIC level.

Idea is to add two prefetchw() calls in __alloc_skb(), one to prefetch
first sk_buff cache line, the second to prefetch the shinfo part.

Also using one memset() to initialize all skb_shared_info fields instead
of one by one to reduce number of instructions, using long word moves.

All skb_shared_info fields before 'dataref' are cleared in 
__alloc_skb().

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-05 01:07:37 -07:00
J. Bruce Fields
5306293c9c Merge commit 'v2.6.34-rc6'
Conflicts:
	fs/nfsd/nfs4callback.c
2010-05-04 11:29:05 -04:00
David S. Miller
f935aa9e99 ipv6: Fix default multicast hops setting.
As per RFC 3493 the default multicast hops setting
for a socket should be "1" just like ipv4.

Ironically we have a IPV6_DEFAULT_MCASTHOPS macro
it just wasn't being used.

Reported-by: Elliot Hughes <enh@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-03 23:42:27 -07:00
Eric Dumazet
93bb64eac1 net: skb_free_datagram_locked() fix
Commit 4b0b72f7dd ( net: speedup udp receive path )
introduced a bug in skb_free_datagram_locked().

We should not skb_orphan() skb if we dont have the guarantee we are the
last skb user, this might happen with MSG_PEEK concurrent users.

To keep socket locked for the smallest period of time, we split
consume_skb() logic, inlined in skb_free_datagram_locked()

Reported-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-03 23:18:14 -07:00
David S. Miller
f546061840 Merge branch 'net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vxy/lksctp-dev
Add missing linux/vmalloc.h include to net/sctp/probe.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-03 16:24:31 -07:00
Eric Dumazet
4f70ecca9c net: rcu fixes
Add hlist_for_each_entry_rcu_bh() and
hlist_for_each_entry_continue_rcu_bh() macros, and use them in
ipv6_get_ifaddr(), if6_get_first() and if6_get_next() to fix lockdeps
warnings.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-03 15:53:54 -07:00
Ilpo Järvinen
a2f3be17c0 unix/garbage: kill copy of the skb queue walker
Worse yet, it seems that its arguments were in reverse order. Also
remove one related helper which seems hardly worth keeping.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-03 15:39:58 -07:00
John W. Linville
1d7d969dd0 Merge branch 'wireless-next-2.6' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-2.6 2010-05-03 14:53:49 -04:00
Johannes Berg
be4a4b6a5d mac80211: improve IBSS scanning
When IBSS is fixed to a frequency, it can still
scan to try to find the right BSSID. This makes
sense if the BSSID isn't also fixed, but it need
not scan all channels -- just one is sufficient.
Make it do that by moving the scan setup code to
ieee80211_request_internal_scan() and include
a channel variable setting.

Note that this can be further improved to start
the IBSS right away if both frequency and BSSID
are fixed.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-03 14:53:10 -04:00
Johannes Berg
a75b4363ea mac80211: allow controlling aggregation manually
This allows enabling TX and disabling both TX and
RX aggregation sessions manually in debugfs. It is
very useful for debugging session initiation and
teardown problems since with this you don't have
to force a lot of traffic to get aggregation and
thus have less data to analyse.

Also, to debug mac80211 code itself, make hwsim
"support" aggregation sessions. It will still just
transfer the frame, but go through the setup and
teardown handshakes.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-03 14:53:10 -04:00
Johannes Berg
f7c65594f7 mac80211: fix ieee80211_find_sta[_by_hw]
Both of these functions can currently return
a station pointer that, to the driver, is
invalid (in IBSS mode only) because adding
the station failed. Check for that, and also
make ieee80211_find_sta() properly use the
per interface station search.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-03 14:51:47 -04:00
Neil Brown
b48fa6b991 sunrpc: centralise most calls to svc_xprt_received
svc_xprt_received must be called when ->xpo_recvfrom has finished
receiving a message, so that the XPT_BUSY flag will be cleared and
if necessary, requeued for further work.

This call is currently made in each ->xpo_recvfrom function, often
from multiple different points.  In each case it is the earliest point
on a particular path where it is known that the protection provided by
XPT_BUSY is no longer needed.

However there are (still) some error paths which do not call
svc_xprt_received, and requiring each ->xpo_recvfrom to make the call
does not encourage robustness.

So: move the svc_xprt_received call to be made just after the
call to ->xpo_recvfrom(), and move it of the various ->xpo_recvfrom
methods.

This means that it may not be called at the earliest possible instant,
but this is unlikely to be a measurable performance issue.

Note that there are still other calls to svc_xprt_received as it is
also needed when an xprt is newly created.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-03 08:33:00 -04:00
Changli Gao
dee42870a4 net: fix softnet_stat
Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
without any protection. And enqueue_to_backlog should update the netdev_rx_stat
of the target CPU.

This patch renames softnet_data.total to softnet_data.processed: the number of
packets processed in uppper levels(IP stacks).

softnet_stat data is moved into softnet_data.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/linux/netdevice.h |   17 +++++++----------
 net/core/dev.c            |   26 ++++++++++++--------------
 net/sched/sch_generic.c   |    2 +-
 3 files changed, 20 insertions(+), 25 deletions(-)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-02 22:26:57 -07:00
David S. Miller
7ef527377b Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-05-02 22:02:06 -07:00
Jan Engelhardt
ef53d702c3 netfilter: xtables: dissolve do_match function
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-05-02 14:13:03 +02:00
Jan Engelhardt
c29c949288 netfilter: xtables: fix incorrect return code
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-05-02 14:04:54 +02:00
Jan Engelhardt
b5cad0dfd3 netfilter: ip_tables: fix compilation when debug is enabled
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-05-02 13:55:34 +02:00
David S. Miller
47d29646a2 net: Inline skb_pull() in eth_type_trans().
In commit 6be8ac2f ("[NET]: uninline skb_pull, de-bloats a lot")
we uninlined skb_pull.

But in some critical paths it makes sense to inline this thing
and it helps performance significantly.

Create an skb_pull_inline() so that we can do this in a way that
serves also as annotation.

Based upon a patch by Eric Dumazet.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-02 02:21:44 -07:00
Eric Dumazet
4381548237 net: sock_def_readable() and friends RCU conversion
sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
need two atomic operations (and associated dirtying) per incoming
packet.

RCU conversion is pretty much needed :

1) Add a new structure, called "struct socket_wq" to hold all fields
that will need rcu_read_lock() protection (currently: a
wait_queue_head_t and a struct fasync_struct pointer).

[Future patch will add a list anchor for wakeup coalescing]

2) Attach one of such structure to each "struct socket" created in
sock_alloc_inode().

3) Respect RCU grace period when freeing a "struct socket_wq"

4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
socket_wq"

5) Change sk_sleep() function to use new sk->sk_wq instead of
sk->sk_sleep

6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
a rcu_read_lock() section.

7) Change all sk_has_sleeper() callers to :
  - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
  - Use wq_has_sleeper() to eventually wakeup tasks.
  - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)

8) sock_wake_async() is modified to use rcu protection as well.

9) Exceptions :
  macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
instead of dynamically allocated ones. They dont need rcu freeing.

Some cleanups or followups are probably needed, (possible
sk_callback_lock conversion to a spinlock for example...).

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-01 15:00:15 -07:00
Patrick McHardy
e772c349a1 netfilter: nf_ct_h323: switch "incomplete TPKT" message to pr_debug()
The message might be falsely triggered by non-H.323 traffic on port
1720.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-01 18:29:43 +02:00
Vlad Yasevich
0e3aef8d09 sctp: Tag messages that can be Nagle delayed at creation.
When we create the sctp_datamsg and fragment the user data,
we know exactly if we are sending full segments or not and
how they might be bundled.  During this time, we can mark
messages a Nagle capable or not.  This makes the check at
transmit time much simpler.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:10 -04:00
Vlad Yasevich
bfa0d9843a sctp: Optimize computation of highest new tsn in SACK.
Right now, if the highest tsn in the SACK doesn't change, we'll
end up scanning the transmitted lists on the transports twice:
once for locating the highest _new_ tsn, and once for actually
tagging chunks as acked.  This is a waste, since we can record
the highest _new_ tsn at the same time as tagging chunks.  Long
ago this was not possible because we would try to mark chunks
as missing at the same time as tagging them acked and this approach
didn't work.  Now that the two steps are separate, we can re-use
the old approach.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:10 -04:00
Vlad Yasevich
ea862c8d1f sctp: correctly mark missing chunks in fast recovery
According to RFC 4960 Section 7.2.4:
 					If an endpoint is in Fast
   Recovery and a SACK arrives that advances the Cumulative TSN Ack
   Point, the miss indications are incremented for all TSNs reported
   missing in the SACK.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:10 -04:00
Vlad Yasevich
6588337189 sctp: rwnd_press should be cumulative
rwnd_press tracks the pressure on the recieve window.  Every
timer the receive buffer overlows, we truncate the receive
window and then grow it back.  However, if we don't track
the cumulative presser, it's possible to reach a situation
when receive buffer is empty, but rwnd stays truncated.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:10 -04:00
Vlad Yasevich
cf9b4812e1 sctp: fast recovery algorithm is per association.
SCTP fast recovery algorithm really applies per association
and impacts all transports.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:10 -04:00
Vlad Yasevich
b2cf9b6bd9 sctp: update transport initializations
Right now, sctp transports are not fully initialized and when
adding any new fields, they have to be explicitely initialized.
This is prone to mistakes.  So we switch to calling kzalloc()
which makes things much simpler.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:10 -04:00
Vlad Yasevich
d9efc2231b sctp: Do not force T3 timer on fast retransmissions.
We don't need to force the T3 timer any more and it's
actually wrong to do as it causes too long of a delay.
The timer will be started if one is not running, but if
one is running, we leave it alone.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:09 -04:00
Vlad Yasevich
ae19c54866 sctp: remove 'resent' bit from the chunk
The 'resent' bit is used to make sure that we don't update
rto estimate based on retransmitted chunks.  However, we already
have the 'rto_pending' bit that we test when need to update rto,
so 'resent' bit is just extra.  Additionally, we currently have
a bug in that we always set a 'resent' bit and thus rto estimate
is only updated by Heartbeats.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:09 -04:00
Vlad Yasevich
d598b166ce sctp: Make sure we always return valid retransmit path
commit 4951feda0c60d1ef681f1a270afdd617924ab041
    sctp: Do no select unconfirmed transports for retransmissions

added code to make sure that we do not select unconfirmed paths
for data transmission.  This caused a problem when there are only
2 paths, 1 unconfirmed and 1 unreachable.  In that case, the next
retransmit path returned is NULL and that causes a kernel crash.

The solution is to only change retransmit paths if we found one to use.

Reported-by: Frank Schuster <frank.schuster01@web.de>
Signed-off-b: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:09 -04:00
Dan Carpenter
b99a4d53a7 sctp: cleanup: remove duplicate assignment
This assignment isn't needed because we did it earlier already.

Also another reason to delete the assignment is because it triggers a
Smatch warning about checking for NULL pointers after a dereference.

Reported-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:09 -04:00
Wei Yongjun
787a51a087 sctp: implement sctp association probing module
This patch implement sctp association probing module, the module
will be called sctp_probe.

This module allows for capturing the changes to SCTP association
state in response to incoming packets. It is used for debugging
SCTP congestion control algorithms.

Usage:
  $ modprobe sctp_probe [full=n] [port=n] [bufsize=n]
  $ cat /proc/net/sctpprobe

  The output format is:
    TIME     ASSOC     LPORT RPORT MTU    RWND  UNACK <REMOTE-ADDR   STATE  CWND   SSTHRESH  INFLIGHT  PARTIAL_BYTES_ACKED MTU> ...

  The output will be like this:
    9.226086 c4064c48  9000  8000  1500    53352     1 *192.168.0.19  1     4380    54784     1252        0     1500
    9.287195 c4064c48  9000  8000  1500    45144     5 *192.168.0.19  1     5880    54784     6500        0     1500
    9.289130 c4064c48  9000  8000  1500    42724     5 *192.168.0.19  1     7380    54784     6500        0     1500
    9.620332 c4064c48  9000  8000  1500    48284     4 *192.168.0.19  1     8880    54784     5200        0     1500
    ......

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:09 -04:00